Starting from the basics, it is convenient to ask ourselves the question: What is Deep Learning? The answer is simple; Deep Learning is a Machine Learning technique. In computer science, Machine Learning is the area of artificial intelligence whose objective is the development of techniques that allow computation-based systems (machines) to learn the hidden patterns of structured and unstructured data. Robotics, natural language processing, image recognition and artificial intelligence are some of the wonders derived from automatic learning.

Then we need to continue asking, are neural networks and Deep Learning the same? No, on the one hand when we talk about neural networks we refer to the generic name we give to any scheme that connects multiple nodes (neurons). On the other hand, Deep Learning is a type of neural network that is characterized by having more than one layer of interconnected neurons. Neural networks are a powerful modeling tool that accounts for the interactions between our inputs.

To know what neural networks are, it is best to start with simple perceptrons (*perceptron*). Perceptrons began to develop in the 1950s (yes, in the 50s!) by Frank Rosenblatt. Rosenblatt was based on the work developed by Warren McCulloch and Walter Pitts in the 1940s and on a learning rule based on error correction. This illustrates that it is a technology that began to take shape more than 70 years ago. Today the perceptron is not a widely used neural model due in part to the development of more advanced networks. However, to understand how the more complex networks work, it is best to start with the simplest perceptrons.

A perceptron can be used in classification problems. It can get multiple entries {X1, X2 … Xn} of binary values {0.1}. In our example the perceptron will output two inputs (X1 y X2). Rosenblatt proposed a simple rule to get the output: He introduced the concept of weight {W1, W2 … Wn}. These weights are real numbers that indicate the importance of each of the inputs with respect to the output. The perceptron output, also zero or one, is determined if the weighted sum (the sum of each W*i**·X**i*) is less or greater than a certain threshold value ( a real number), being this value a parameter specific to the neuron.

The output will take a value of zero (deactivated) if the weighted sum is less than the threshold value and a value of one (activated) if the weighted sum is greater than or equal to that value. We can express it mathematically through the following expression:

We will make a change in the expression, replacing the negative of the threshold with the bias (b=-θ). The bias resembles how easy it is to get an output from one. Therefore, our mathematical expression would look like:

Let’s see an example to finish understanding how networks based on simple perceptrons work.

Let’s imagine that we have to go back to university. The professor explains that for the final grade he takes into account two tests that we will do throughout the course. These tests will consist of an oral presentation (X1) and a written exam (X2). It would be a bit strange, but imagine that there are only two possible results for each test. We can fail (zero value) or pass (value of one). In addition, the teacher explains that each test has a weight with respect to the final grade. For him, the most important thing in the course will be the written exam, an essential condition to pass, so it will give him a high weight. Let’s say a weight of W2=5. The other test is important but we will consider it complementary to the written exam, so it will give a weight of W1=2.

Once the weights have been chosen, we decide on a bias of -4 (threshold of 4) for the perceptron. With this configuration the perceptron will apply a value of 1 if we pass the written test and 0 if we fail, regardless of the results of the other tests. Therefore, if we have failed the oral but passed the written the output result would be:

We can vary the results of the model only by varying the weights and the threshold value. If we increase the threshold value, we will move to a situation in which not only will it be necessary to pass the written examination. So, by raising the threshold we are making it more difficult to pass. If we lower the threshold, it may happen that you can pass only with the oral test.

In order to “choose” the weights and the threshold value, we follow a process of error correction. It consists of starting with some random initial values and modifying them iteratively when the output obtained does not coincide with the observed output (remember, we are in supervised learning). Therefore, this learning is a method of error detection and correction. Weights are only modified when the results are not ideal.

Years later, Minsky (1969) saw the existing limitations in networks based on simple perceptrons. He verified that they only serve for linearly separable classifications by a decision boundary. These problems were already solved by statistical methods, requiring a smaller amount of data. Partly because of this, it evolved into more complex neural networks.

Featured customer

Junior Data Engineer @Synergic Partners