If you are reading this chances are you are familiar with one or more of the following terms: artificial intelligence, machine learning, neural networks, or deep learning. Or at least I am assuming you are. This is a post about deep learning. More specifically this is a post about what I am learning enrolled in the Udacity Deep Learning Nanodegree. I registered in late March and have been going through the classes for about a month and a half. So far it has been a great experience, I am learning a lot and having to practice what I learn. I also like the way the material is delivered though a combination of short videos, one hour live coding sessions a week with Siraj Raval, and a wealth of external material along with the ability to discuss things with instructors and other students over Slack. Hopefully this will be one of a series of posts about what I learn and how I learn it while going through the nanodegree.
Weeks one and two
The first two weeks were packed! They started out with introductory lessons on tools and libraries that I would rely on to complete lessons and projects. Tools such as conda, numpy, and Jupyter notebook. I had never used conda before and loved what it lets you do in terms of environment and package management. If you have ever gone through the pain of compiling code from source on multiple machines or sharing code with someone else you would appreciate it too.
What I also liked about first two weeks was that they had a lesson that allowed me to use deep learning techniques (like style-transfer) before diving into any details. This was cool because it served as a motivation for me by illustrating what I would be able to do with what I learn. After that came some basic machine learning theory and concepts, like classification and regression; which are essential for anyone who wants to use or contribute to machine learning. The nanodegree requires knowledge of linear algebra and multivariable calculus, the concepts needed are discussed briefly but many good resources are provided. This has actually been the more time consuming aspect of the lessons for me, I check out almost every resource mentioned and read a lot of the papers that are cited. This takes a lot of time but helps me more understand the theory and mathematics involved in deep learning.
After the introduction came lessons that dive straight into the fundamentals of an artificial neural network; concepts like neuron, multilayer perceptron, weights, training, gradient descent, backpropagation, and others are discussed in details. The coolest thing I learned more about was the SoftMax activation function; a generalization of the logistic function that “squashes” a K-dimensional vector of arbitrary real values to a K-dimensional vector of real values in the range (0, 1). It is very helpful in classification problems and is mathematically expressed as
One of the lessons in the first two weeks dealt with building a program (from scratch) called MiniFlow. This served an an introduction to how Google’s TensorFlow works; by using concepts from computational graphs and leveraging them to efficiently build a neural network and compute the needed gradients for backpropagation. So by end of the first two weeks I had learned a bunch of new tools and tricks in python for deep learning, gained an understanding of some basic theoretical concepts that I had misconceptions about, and written code form scratch for stochastic gradient decent and backpropagartion. By the third week I had also written all the code needed to build and train a neural network to predict daily bike rental ridership, from actual data.
Week three kicked off with model evaluation and validation. Concepts on how to test your data and evaluate your models are very important in deep learning because the performance of your deployed solution depends on that. The details of accuracy, mean squared error, and confusion matrix were among the many others discussed. I would say the two important concepts detailed in those lessons where underfitting and overfitting; the earlier being when a model is oversimplified and does not learn the important details of the data and therefore has an error due to bias. And the latter is where you over complicate your model and end up memorizing features in the data and making errors due to variance. Here is a nice read on those.
Methods to prepare, or preproccess, your data were also discussed in week three lessons. Some practical approaches such as how to deal with a missing data point in a data set as well as padding were discussed, but also theoretically grounded concepts of data reduction through principal component analysis were discussed. At this point this seemed like a recurring theme in the nanodegree where content is presented in a practical “get things done” kind of way but at the same time being theoretically grounded. An approach I must say I like and agree with and here is why: if you are only given the steps on how to implement something then you never fully understand the concept and it will be difficult to debug problems, on the other hand if you are only bombarded with the abstract theory you could lose your way when it comes to implementation. Being theoretically grounded as well as focusing on practical implementation, in my opinion, is always the way to go!
The final part of the third week included a 6 mini project tutorial prepared by Andrew Trask on sentiment analysis with neural networks. The projects dealt with using a data set of IMDB movie reviews and training a neural network to classify a review as a positive or negative one. This session was highly practical and introduced some concepts on dealing with text input such as the bag-of-words model form the field of natural language processing. The tutorial also discussed noise propagation in a neural network and how it can be reduced as well is some notes on increasing the computational efficiency of the network by leveraging the structure of the problem. You can checkout part of my solutions to this tutorial here. Another session on sentiment analysis showed how to build sentiment classification networks using TFLearn, a modular and transparent deep learning library built on top of Tensorflow.
After building this foundation of knowledge and tools, in the fourth week I was introduced to TensorFlow. An intro to TensorFlow course explained how to use its basic commands to build any kind of neural network, train it, and run it in a TensorFlow session. Some of the things I learned were the difference between a
tf.Variable(), I also learned how to use
tf.truncated_normal() to generate numbers from a normal distribution. It also discussed new concepts of one-hot encoding, regularization, dropout, and cross entropy. At the end of this course there was a lab to train a simple neural network using TensorFlow to recognize letters A-J from images taken from different fonts using from notMNIST data set. This simple network produced relatively good classification accuracy in training and validation (76%), and on a test set that the network had not seen before an accuracy of 84% was achieved.
So far I have been having a great first month with the Udacity Deep Learning Foundation Nanodegree. There was so much information presented in the first four weeks, I sometimes feel I cannot wrap my head around all of it. At the same time this makes me excited about learning these new concepts and I find myself constantly going back to look up things in the material when working on class projects or reading machine learning papers. I also find it really helpful to discuss topics and solutions with other students in the nanodegree slack channels, many of the students are very creative and bouncing ideas off of them helps me solidify my understanding of a concept. So despite the fact that I have been spending every weekend for the last month working on it, I have a full time job you know, this nanodegree has been very rewarding from a learning standpoint. And hopefully I will find the time to keep posting about my experience.