Let’s start with a simple question ; if knowing how to cook does not make you a chef, nor knowing how to drive makes you a pilot, then, do you think that working with data is enough for someone to become a Data Scientist?
Deep Learning is the latest fashion buzzword in Data Science, driven by the proliferation of libraries (Tensorflow, Caffe, Keras, H2O, …) that put these powerful algorithms at our fingertips to train Deep Learning models.
We can understand Deep Learning with the following analogy to avoid falling into the black hole of illusion that these models represent:
Deep Learning is elegant and powerful like a Ferrari. But what happens when we leave a Ferrari to a 17-year-old boy? It’s very likely that he crashes in the first corner, although the feeling of speed in the first minutes is impressive. In my opinion, the Ferraris (Deep Learning models) should be used by two types of people: professional pilots in the most demanding races or by veteran pilots who will use them to show off at 80 km / h.
A Ferrari is not always the best mean of transport and its elegance and power do not usually justify its high price. In the same way, Deep Learning’s algorithms are not the most appropriate for most problems and their complexity and need for information does not justify their use in most cases”.
But what is Deep Learning? Deep Learning is a subset of Neural Network models that try to take advantage of distributed environments to train very complex Neural Networks imitating the human brain. In this sense, neural networks are made up of neurons that relate to each other in order to reach a conclusion. This can be to determine if in a photo appears a cat, to which number corresponds a graphism or whether a client will leave the company.
If we are trying to understand what a simple neural network is, we have to understand which are its elements, the neurons. A neuron is nothing more than a decision-maker, who receives several inputs and transforms them into an output. In the case of the example of the cat, we can have decision-makers who answer yes or no to the following questions: do eyes (in the photo) appear, are the ears pointy, etc, the combination of these decision-makers along with other more complex, allows us to predict whether a cat appears in the photo or not.
In theory it is very powerful, but in order to train a neuron, an enormous amount of data is needed, which is not normally available. This is usually the biggest challenge when using neural networks.
In one of the Masters in which I teach, I ask the students to make a predicting model of contracting for a financial product. Normally I recommend them to do it with simple models since the objective is to understand the use of the model and its value in the business. Once a student asked me about a problem he was having, he wanted to explore more complex models of neural networks to try to get a more reliable model. When he trained the model he found unimaginable patterns that allowed him to classify in a correct way practically all the training cases, however, when he put it to work on new data he could not get it right. The student did not know why, when he even had followed all the indications that he had found in a web page on neural networks.
We did the following exercise, I asked him about the number of variables, 40, I asked him about the size of the neural network he had considered. It was a neural network with 3 intermediate layers of 32, 32 and 16 neurons and a final decision maker. It was practically a toy neural network. We began to count how many coefficients the neural network had. Each neuron of the first layer has 40 inputs plus the independent term, so in the first layer we have 41*32=1.312 coefficients, the second layer 33*32=1.056 coefficients, the third layer 33*16= 528 coefficients and the last layer 17 coefficients, in total 2.913 coefficients. Here, the key question is how many records you need to be able to train a network of these characteristics that is stable and above all generalizable. If we consider an approximation using as maximum number of coefficient the square root of the number of registers (particularly it seems to me very optimistic), we would need 8.485.569 registers of training to be able to train it (again, being very optimistic). Very far away were our 900,000 total records. Having 9 million records are not too many, companies have much larger databases, but when we talk about modeling boards, having 9 million records is a barbarity that few companies face. If each register represents a client, a machine, a telephone line, a product, etc. Who has 9 million clients (machinery, products, etc.) to whom you are going to propose an analytical model? In Spain, you may find a handful only.
This example shows the complexity that a neural network can have and the need for data, even those very small neural networks. It is true that techniques have been developed to deal with these cases, but require a very advanced knowledge in neural networks that is not achieved by reading 2 websites and copying a github code. With the latter it is very simple to “hit a key” and get a neural network trained. Another thing is if it will be generalizable, if it will extract knowledge that will help us to improve our business and if we will know how to use it. But we leave this to the professional pilots.
In order to understand the neural networks, the key are the neurons, which we have to know perfectly, just as the professional pilot is concerned not only in driving but in understanding the mechanics and the elements that compose it. For doing this it is necessary to understand the neuron from several points of view: From the statistical point of view, the neuron is a statistical distribution that is a linear combination of other distributions through a link function. From the geometric point of view, the neuron is a related hyperplane that allows separating or distancing points in space. From an algebraic point of view, the neuron is an over-jective application represented by a transition matrix. From an analytical/topological point of view, it is a continuous application that transforms distances. Thanks to the recursive algorithms provided by computing, we can teach these neurons how to make decisions
Each approach requires a deep knowledge of the scientific bases, so I strongly recommend that before driving a Ferrari, we learn how to crawl, walk, ride a bike and drive, which is not the same as piloting. Later on, with the right experience and consolidated knowledge, you will be able to compete in speed races.