Computer vision with neural nets
We're going to do a quick dive into how to get started with neural networks that can read text and recognise things in images. There are already a ton of techniques for computer vision that rely on a bunch of clever math, but there's something alluring about a platform as easy as a neural network.
Neural networks
The idea is super super simple. You have a layer of input neurons, and they send their values along to the next layer, and this continues until you reach the end.
CC BY-SA 3.0 Glosser.ca
The secret sauce here is the "weights" between each layer - a weight of 1.0 means we pass through the value as is, a weight of -1.0 means we pass through the opposite. We also use an "activation function" which is a way to add a some complexity to the numbers - ultimately this is what allows us to make neural networks process data in a meaningful manner.
When we talk about "training" a neural network, what we do is pass through some input data, look at the output, and then "backpropagate" the error. In other words, we tell the network what it should have given us as output, and then it goes back and adjust all its weights a little bit as a result. We consider a network trained when it gives us the right answer most of the time. We consider a neural network "overtrained" when it returns the right answer for data it has been trained on, but still gives us the wrong answer for similar data that it hasn't seen before. This isn't something you need to worry about now, but it's a good thing to keep in mind if you're having trouble in the future.
The MNIST database
This is the best place to start. The MNIST database is a set of 70,000 handwritten digits split into two sets - 60,000 for training on, and 10,000 for testing on. You get a high score by training on the training set and then guessing as many of the testing set as possible. This means that you can't just memorise all the numbers - you need to have a program that can actually read digits and recognise them.
You can get a copy of the MNIST database from http://yann.lecun.com/exdb/mnist/
Building a number reader
We'll be using Keras - it's a python library that lets you build and use neural networks. There's already some sample code for training on MNIST, so let's just go through how that works:
For starters, download keras
pip install keras
Then download this script from the keras repo: https://raw.githubusercontent.com/fchollet/keras/master/examples/mnist_mlp.py
Then fire away
python mnist_mlp.py
Leave it for a bit, and watch the output
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
60000/60000 [==============================] - 9s - loss: 0.2453 - acc: 0.9248 - val_loss: 0.1055 - val_acc: 0.9677
Epoch 2/20
60000/60000 [==============================] - 9s - loss: 0.1016 - acc: 0.9692 - val_loss: 0.0994 - val_acc: 0.9676
Epoch 3/20
60000/60000 [==============================] - 10s - loss: 0.0753 - acc: 0.9772 - val_loss: 0.0868 - val_acc: 0.9741
Epoch 4/20
60000/60000 [==============================] - 12s - loss: 0.0598 - acc: 0.9818 - val_loss: 0.0748 - val_acc: 0.9787
Epoch 5/20
60000/60000 [==============================] - 12s - loss: 0.0515 - acc: 0.9843 - val_loss: 0.0760 - val_acc: 0.9792
Epoch 6/20
60000/60000 [==============================] - 12s - loss: 0.0433 - acc: 0.9873 - val_loss: 0.0851 - val_acc: 0.9796
Epoch 7/20
60000/60000 [==============================] - 11s - loss: 0.0382 - acc: 0.9884 - val_loss: 0.0773 - val_acc: 0.9820
Epoch 8/20
60000/60000 [==============================] - 11s - loss: 0.0342 - acc: 0.9900 - val_loss: 0.0829 - val_acc: 0.9821
Epoch 9/20
60000/60000 [==============================] - 11s - loss: 0.0333 - acc: 0.9901 - val_loss: 0.0917 - val_acc: 0.9812
Epoch 10/20
60000/60000 [==============================] - 12s - loss: 0.0297 - acc: 0.9915 - val_loss: 0.0943 - val_acc: 0.9804
Epoch 11/20
60000/60000 [==============================] - 11s - loss: 0.0262 - acc: 0.9927 - val_loss: 0.0961 - val_acc: 0.9823
Epoch 12/20
60000/60000 [==============================] - 11s - loss: 0.0244 - acc: 0.9926 - val_loss: 0.0954 - val_acc: 0.9823
Epoch 13/20
60000/60000 [==============================] - 12s - loss: 0.0248 - acc: 0.9938 - val_loss: 0.0868 - val_acc: 0.9828
Epoch 14/20
60000/60000 [==============================] - 12s - loss: 0.0235 - acc: 0.9938 - val_loss: 0.1007 - val_acc: 0.9806
Epoch 15/20
60000/60000 [==============================] - 12s - loss: 0.0198 - acc: 0.9946 - val_loss: 0.0921 - val_acc: 0.9837
Epoch 16/20
60000/60000 [==============================] - 15s - loss: 0.0195 - acc: 0.9946 - val_loss: 0.0978 - val_acc: 0.9842
Epoch 17/20
60000/60000 [==============================] - 15s - loss: 0.0208 - acc: 0.9946 - val_loss: 0.1084 - val_acc: 0.9843
Epoch 18/20
60000/60000 [==============================] - 14s - loss: 0.0206 - acc: 0.9947 - val_loss: 0.1112 - val_acc: 0.9816
Epoch 19/20
60000/60000 [==============================] - 13s - loss: 0.0195 - acc: 0.9951 - val_loss: 0.0986 - val_acc: 0.9845
Epoch 20/20
60000/60000 [==============================] - 11s - loss: 0.0177 - acc: 0.9956 - val_loss: 0.1152 - val_acc: 0.9838
Test score: 0.115194263857
Test accuracy: 0.9838
What just happened here? What do all those numbers mean? Is that good or bad?
Building a neural network
Let's have a look at the code. There's a bunch of imports and prep at the start, but the important stuff is buried right at the bottom
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
We start off by creating a keras.Sequential() object, and then we add layers onto it. The first layer is a Dense layer that takes in a 784-point vector - this is the same size as the handwritten numbers. Each number is a 28x28 pixel black and white image, and 28x28 = 784 pixels. We set this in the "input_shape" parameter, and the other parameter is the number 512 - that's how many neurons are in this layer.
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
We start off by creating a keras.Sequential() object, and then we add layers onto it. The first layer is a Dense layer that takes in a 784-point vector - this is the same size as the handwritten numbers. Each number is a 28x28 pixel black and white image, and 28x28 = 784 pixels. We set this in the "input_shape" parameter, and the other parameter is the number 512 - that's how many neurons are in this layer.
A Dense layer is the bread and butter of neural nets - they connect every neuron in the layer before to every neuron in the layer after. You'll see there are 3 being used here - the first two have 512 neurons each, and the third one has 10 neurons - we'll find out why in a second.
After the first Dense layer we have an Activation layer. The activation function we use here is "relu" - a Rectified Linear Unit. All it does is pass through any positive numbers, and round up any negative numbers to 0. These are an important part of neural networks - otherwise we're just adding the same numbers to each other.
After the Activation layer we have a Dropout layer. This is to fix a problem called "overfitting" - when your neural network memorises the training data but doesn't actually learn to recognise. You can detect this when you see your training loss drops but your validation loss stays high.
So this is how we build our neural network, but where do the images come from? How do the images and labels fit into this?
Preparing your data
This is an important step. The input data is a set of images and corresponding labels, like follows:
A sample number 2 |
With Keras we can just import the dataset, but you'd normally load your images with a library like PIL and then convert them into numpy arrays by hand. First, let's look at the code we're using here:
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
This gives us lists of X_train, y_train, X_test, and y_test. The _train lists have the images, and the _test lists have the labels. We use the numpy.reshape method to change the images from 2-dimensional 28x28 pixel images to 1-dimensional 784 pixel images, as we're just using a Dense layer next. We then convert from integers to float32, and scale from the 0-255 range to the 0.0-1.0 range as neural networks tend to work best with numbers in this range.
This is all we need to do here - the rest just works!
What's next?
This is a fairly basic intro to Keras and neural networks, and there's a lot more you can do from here. We lose a lot of information by flattening the image to a 1-dimensional array, so we can get some improvements by using Convolution2D layers to learn a bit more about the shape of the numbers. We can also try making the network a bit deeper by layering more Convolution2D layers, and look at different training techniques.
That's all for now though, so get out there and start training some robots!
No comments:
Post a Comment