Month: September 2017

  • Forte Specialization: Machine learning

    Abstract
    This article discusses a potential method to increase the efficiency of a neural network in processing information. More specifically, the use of synaptic plasticity in order to optimize data flows between neurons in a convolutional neural network.

    Introduction
    When building a convolutional neural network for machine learning, it is common to hard code linkages between neurons, where data is passed through each layer of neurons, and back propagation is used to help each layer in the network learn the weights that should be assigned to each input feature so as to predict a more accurate outcome.1

    Choosing the number of neurons and layers in a neural network could have some effect, however, it might be more effective to allow each neuron to independently determine the priority of each input. More specifically – if more than one input provides the same information, it could choose the input that provides the information first (be more sensitive to that input from the previous layer), and give a lower priority to the slower input (using neural fatigue, in a sense). The priority given to each input can determine the strength of the linkage between the neuron and its input. If the strength is below a certain threshold, the linkage can be severed, and thus, less data must be processed by the individual neuron. In fact, this could lead to “specialization”, where certain neurons that are closer to specific stimuli (types of data) become better at “knowing” about that data. This is similar to the way parts of our brains can be trained to interpret audio or visual information.
    This concept may give a turbo boost to parallel processing. Each process can be fine tuned to get excited by specific “frequencies” of data.

    Methods
    Each neuron would keep track of each input. When an event occurs (for example, data from an image is sent for processing) the neuron would assign the input values to the appropriate inputs with a time stamp. It could then compare inputs, and assign a priority to inputs based on how unique their information is, and how quickly the information was received.
    Should an input’s priority fall below a specified threshold, the neuron could send a “slow down” or “terminate” signal to the input, telling it to send data at a lower frequency or to stop sending data altogether.

    If an input neuron no longer has any upstream link, it can effectively die, and send a terminate signal back to all of it’s own inputs.
    Should the priority of many inputs be high on the other hand, the neuron might consider making new linkages with the next layer of neurons, since it is receiving a lot of high priority signals.

    Keep in mind that the priority is independent of the weights of features that are being learned. The point of “Forte Specialization” is not to determine which features are important, but rather, which inputs are important.

    Results and Discussion
    If and when I get around to it, I intend on implementing the above logic by writing code and running some tests and benchmarks to see what we might learn. If you are interested in collaborating with me, please get in touch.

    Works Cited

    Aside from the concepts related to neural plasticity and input priority, credit goes to Geoffrey Hinton of the University of Toronto and Andrew Ng of Stanford who both published excellent machine learning courses online, describing neural networks and their implementations.

    Copyright 2017 Frank Forte. All rights reserved. Unauthorized reproduction of this work are prohibited.

  • Un-hackable customer data: how Equifax can prevent future hacks

    Image courtesy of Pexels

    Keep your customer’s data secure. It’s your responsibility. You ask for that data, you specifically require it from your customers to process credit checks. So, keep their data safe.

    Equifax could have done many, many things to protect customer data. This post is about one of those things, storing data in an un-hackable way.

    Anyone who writes code will at some point learn about hashes. There are ways to use hashes of customer data instead of the customer data itself to meet business requirements.

    A hash is a scrambled string of characters, for example,3c1fd1b2c915eacf, that is created by taking a piece of data such as a social security number, SIN or date of birth, and using a cryptographic algorithm to scramble that data in a repeatable way.

    The great part about hashes is that they are searchable. You can store a hashed version of someone’s name, SIN number, or other sensitive or personally identifiable information (PII), and you can find that user’s record by typing in the original data when required, which can be hashed with a program, then the database can be searched for a matching hash. Even better? They are un-hackable. Even if a hacker steals this data, it is meaningless and has no value, what are they going to do with “3c1fd1b2c915eacf”?

    What about data that you need to send to third parties? You need the original data in some cases. You can link the hashed records with an encrypted record that only contains the bare minimum information needed to conduct business. The hash search can pull up that encrypted data, then the data can be decrypted and sent back to authorized employees or third parties over an encrypted connection, like https.

    Now that you have an idea of what is possible, how should we apply this knowledge?

    Organizations need a policy for classifying data.

    More specifically, determine how each field is used and how each should be stored:

    A) required for search (hash)

    B) required for validating correct information supplied (hash)

    C) required by an outside party (encrypt)

    D) required for Analytics or statistics? Ideally do not include PII and do not link it to the original records, so that data still provides insight, but it is anonymous, so hackers still find very little value in stealing the data, and liability is very limited.

    E) how long is each field required? Delete data once it’s useful life is over. You would be surprised how much data can be deleted almost immediately with minimal impact on operations. This minimizes the amount of information exposed if and when a hack does happen.

    Once customers give you their data, they don’t have control of that data anymore, so they rely on you to keep it safe! Be smart, invest in better data storage and retention policies.

    If you need a consultant on preventing hacks in your organization, contact me. If you you are a software developer just stopping by to learn more, please make sure you research why you need to add a “salt” to your hashes, and use up to date hashing algorithms like sha256.

    Thanks for your interest!

    Copyright 2017. Frank Forte. Unauthorized reproduction of this work is prohibited.

  • 502 / 400 Error in Apache when using Firebug or ChromePHP

    It turns out that Apache will return an HTTP error (usually 502) when headers are larger than 8KB. This is much like the White Page Of Death since there is no indication of why the page failed to load.

    I developed a fork for ChromePHP that allows you to shrink the ChromePHP debug log so that it prevents the Apache error:
    https://github.com/frankforte/chromephp

    It shrinks the log by removing notices first, then warnings or errors.