Deep Learning/Artificial Intelligent (AI) is a buzzword in the industry right now. Undoubtedly, it is one of the most trendy technologies recently. It has been integrated into and is the future direction in the development of almost all tech companies in the world. From complex applications – Apple introduced the Facial Recognition system that uses AI to enhance the performance, Google brought in the real-time translation earphones which are capable of translating 40 languages thanks to machine learning – to simpler ones – Youtube recommends videos that we may like, Amazon suggests deals that we interest – AI is being utilized everywhere. Before, the ever-changing market trends used to be a challenge for businesses to understand consumer behavior accurately. However, with deep learning, companies study our habits, examine things that we are interested in and then based on the result, they enhance our user experience. Furthermore, they can easily find the patterns and use it to create value, cut cost and drive ROI for them, or even discover new ways of making advertising and branding campaigns effective. Altogether, we can clearly observe that AI has a huge impact on consumers and enterprise.
Nevertheless, how could we possibly generate such strong AI to serve our purpose? Neural Networks would be an answer for this question.
I – What is Neural Networks?
Neural Network is a powerful learning algorithm inspired by how the brain works. This is a state-of-the-art technique for many application nowadays. To understand how Neural Network works, let’s first examine model of out brain.
Our brain is constructed from hundred billions of “neurons”. Neurons continuously gather information that we perceive and propagate it through the entire brain. Moreover, each neuron communicates with other via axon and synapse in order to process the received data and then perform appropriate actions.
In order to mimic this process, “perceptron” concept is introduced. A perceptron is an artificial neuron, where it will also received several inputs and produce a single output. Figure 1 shows an example of a perceptron where it takes several inputs and produce one output (with different directions).
Obviously, one neuron cannot construct a complete model of human decision-making. Hence, one perceptron cannot also form a proper Neural Network. Perceptron is just a basic unit in Neural Network. Therefore, the Neural Network will consist of many perceptrons connecting together. The output of this perceptron will be the input of the next one that it is connected to. The information after received will be propagate through these perceptrons via edges called “weights”. Weights explain the relationship between the input and the output. It quantitatively measures the effect, the influence of different inputs on the outcome.
Figure 2 shows a simple Neural Network which has 3 fully-connected “layers” (the input layer is not counted). A layer in Neural Network is a column of neurons that are independent from each other. In a layer, each perceptron is fully-connected to all other perceptrons in the previous and the next layers in the network. Each connection will have the same output as one perceptron can only generate one output, but with different weights to construct a complex decision-making network. The more layers the system has the more complex it is. Nevertheless, it doesn’t always result in a higher performance system. We often prefer a “Deep Neural Network” for a Neural Network with many layers.
There are many instances of Neural Network. Figure 3 gives an overview of some popular Neural Networks.
II – Activation Functions
In order to process the incoming information, a neuron must perform some calculations on the data and generate the output. The perceptron also has this kind of function. It will first perform basic linear function on the data and then the perceptron is “fired” using “activation functions”. Figure 4 shows what happens inside a perceptron.
An activation function is often non-linear, otherwise it would make no sense as results from stack of linear functions can be achieved by performing one linear function. The activation function is dealing with the input and its weight for each perceptron. By using these activation functions, when changes are made in the weights, with the same input, it will produce different output. This makes the learning process possible; based on the expected output and the output of the perceptron, modification of the weights will be made according to the activation functions used. Some popular activations are Sigmoid, Tanh, Recitified Linear Unit (ReLU).
Figure 5 shows some common activation functions. Based on different applications, we will use different functions. Normally, in a binary classification problem Sigmoid function could be helpful. ReLU is probably the most used function recently as it solves problem of “Zero-mean” and “Vanishing/Exploding Gradient” of Sigmoid when applying into deep network.
III – Training Neural Networks
Training a Neural Network is to produce outputs that are similar to what is expected from the data. For example, when giving picture of a cat, the Neural Network should be able to tell that it is a cat. To be able to decide whether a given picture is a cat or not, the system will need to “learn” (just like the baby learn how to recognize things). It will “learn” the connection between the input and the perceptron, i.e the weights.
The weights, at first, will be generated randomly. In order to provide the best outcome, the system will try to minimize the “error between the predicted output and the label” or we can call it “loss”. Loss function is a function used to calculated the loss of one example. Cost function is the average loss over the whole training data. To optimize the cost (we want it to be as small as possible), we need to make modifications on the weights. A popular technique to handle this process is “Gradient Descent”. This process can be imagined as the ball rolling down the hill, whereas the ball is the cost and the bottom of the hill is the minimum value. As the ball is going down the hill, it may oscillate around the bottom several times before it stops. The global cost minimum behaves in the same manner.
Note that there could be other local minimum that we don’t want the cost to converge there, thus to prevent this from happening, a good weight initializer and optimizer is needed. We will discuss about this problem in the coming blogs.
Gradient Descent performs the following steps to update the weights. “Forward Propagation” is when data goes all the way from the first layer of perceptron to the last layer – output layer. This step will generate the output of our Neural Network based on the current weights. After that, the output will be taken into account. Derivatives (gradients) of activation functions with respect to parameters (weights and bias) in the network will be calculated and fed backward. This step is called “Backward Propagation”. Then, the system will update the parameters based on the gradients in order to minimize the cost. Gradually, optimized weights, which can produce the best result in term of low-error, will be achieve.
When a complete training set is learned, it is called an “epoch”. The forward and backward Propagation will be executed in several epochs.
We have had a glimpse of what is a Neural Network and how it works. In the coming blogs, detailed on how to build and achieve a high performance system will be introduced.