View profile

Going deeper into Deep Learning

Going deeper into Deep Learning
By Shivan Sivakumaran • Issue #57 • View online

Kia ora e hoa,
In last week’s newsletter, we touched on deep learning basics. We got stuck on collecting a dataset for metal frames and acetate frames. It turns out metal frames don’t have to be glasses. They could be a metal frame for a house or a metal frame for a picture.
We decided to see if we could differentiate between blue and brown eyes. Now, I want to demonstrate how easy it is to develop a basic deep learning image classifier using fastai and pytorch, or at least, how easy it is to get started.
Just to warn you, there are plenty of images, so if they don’t turn up on the email, you may have to go to the web view of this newsletter.
Firstly, we collect our data as we did before with Bing Image Search. We need to collect this data because we are doing to use this data to train our model. Here we have a collection of data:
Collected data from Bing Image Search
Collected data from Bing Image Search
We can already see some immediate problems with our data. There are brown eyes being labelled as blue eyes possibly due to the makeup. There is even an image with blue and brown eyes. But we will proceed anyway.
Next, we augment our images. This involves manipulating the image at random, changing the contrast, brightness, even skewing the image. This aims to increase the size of the data set, but it also creates random variation, increasing the chance of the model to generalise on training.
Augmentation of the same image
Augmentation of the same image
Now, we train the model on the data we have collected and augmented. We will be using a convolutional neural network. This is a type of neural network that is best suited for image recognition. It closely resembles the inner workings of the human visual system.
Training the model
Training the model
We give the model four opportunities to run through the dataset or “epoch”, from zero to three. We can see the error_rate reduce with each epoch.
Under the hood, all 150 images are split into training and validation (usually 80% and 20%, respectively). The model will train on the training images and then validate against the validation set.
When running over the test examples, we can determine the images the model performs worse on.
Biggest errors made by the model
Biggest errors made by the model
Here, we have examples where the model misidentified the eye colour. Some examples have the colour blue in the image. The possible reason for this misidentification is likely the model is picking up the colour of the background or if any makeup is present. Furthermore, we knew our training data wasn’t perfect as well.
Below we have a confusion matrix showing the actual eye colours and what the model predicted. We can see some that it recognises incorrectly, but there seems to be a trend of correct identification (that is better than flipping a coin!).
Confusion matrix
Confusion matrix
Let’s test this model on some real-world examples. I’d love the thank my friends who gave me permission to use their handsome faces in the name of whatever I’m trying to accomplish.
90% certainty of blue eyes?
90% certainty of blue eyes?
We can definitely tell that this fine gentleman has brown eyes. On the other hand, the model is pretty certain that this individual has blue eyes with a probability greater than 90%. The model is most likely picking up on blue clothing.
Equally as attractive male - 69%
Equally as attractive male - 69%
We have a correct prediction but a not very confident probability of 69% (oh yeah!).
Even worse
Even worse
Finally, we try it on me. Again, correct prediction but not very confident.
We can see there is some limitation to the model we have created. There are two major barriers:
  • Imperfect data to train the model.
  • We are also using a CNN from scratch.
The imperfect data is self-explanatory. The issue is our model is fresh. It has to learn what edges, colours, patterns from scratch. We can use transfer learning. This is where a pre-trained model can be used instead. A pre-trained model is already astute to the basics of what is in an image and so data is better utilised for our objective: eye colour.
I hope you found this interesting. Please let me know if you would like your photo tested against this model. Next, we will discuss the ethics of technology.
Thanks for reading and all the best for the week ahead.
Ngā mihi nui,
Shivan :)
My Favourite Things
  1. Course Building HTTP APIs With Django REST Framework I’ve completed a certificate in the Django REST framework. This is a great framework if you want to build a backend API.
  2. Podcast The unpaid work that GDP ignores Marilyn Waring, former Member of Parliament, talks about how Gross Domestic Profit doesn’t measure important metrics like the cost of child-rearing. GDP optimises for waste and ignores activities that provide a positive long term impact on the environment.
  3. Book A Life on Our Planet David Attenborough talks about our future. How we are destroying as well as attempts to prevent the demise of our planet. Attenborough also mentions how New Zealand was the first country to put people and the planet above GDP — the lockdown was a no-brainer, we had to protect the people, not profit.
Kindle Highlight of the Week
How can I use this? Why must I use this? When will I use this?
Jim Kwik, Limitless
Did you enjoy this issue?
Shivan Sivakumaran

My life's journey talking about technology 💻, optometry 👓 and a whole variety straight into your inbox every Sunday morning NZT.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
Aotearoa, New Zealand