Dog Breed Classifier
There have been so many times when I come across people with their pet dogs and cannot resist playing with them for a while. The idea of adopting and living with a dog has appears several times in my mind. To take my first step, I need to know what kind of dog I would like to adopt.
There are around 300 different dog breeds and I only know no more than 10 of them. I am sure I am not alone. To help people like me know better about their interested dog breeds, I chose the ‘Dog Breed Classifier Project’ among different projects for the Data Scientist Nanodegree by Udacity.
In this project I will develop ideas for a dog identification app using deep learning concepts. The software is intended to accept any user-supplied image as input. If a dog is detected in the image, it will provide an estimate of the dog’s breed. If a human is detected, it will provide an estimate of the dog breed that is most resembling.
The dataset used in this project is provided by Udacity’s Data Scientist Nanodegree program. This includes 8351 dog images of 133 different dog breeds and 13233 human images. The dog entire set of dog images are further divided into 6680 training images, 835 validation images, and 836 test images.
As described before, the goal of this project is to make the first steps towards developing an algorithm that could be used as part of a mobile or web app for dog breed recognition/classification. At the end of this project, the code will accept any user-supplied image as input. If a dog is detected in the image, it will provide an estimate of the dog’s breed. If a human is detected, it will provide an estimate of the dog breed that is most resembling.
In order to achieve this goal, the following steps are designed and followed to work through the project:
- Step 0: Import Datasets
- Step 1: Detect Humans
- Step 2: Detect Dogs
- Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
- Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)
- Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)
- Step 6: Write your Algorithm
- Step 7: Test Your Algorithm
The accuracy of the model is expected to achieve no less than 80%. Here, the loss function is computed as the crossentropy loss between the labels and predictions. The acccuracy measures the frequency with which the predicted breed type (prediction) matches the true breed type (the label).
Here, the dataset of dog images are imported. A few variables are created to save the data:
test_files— numpy arrays containing file paths to images
test_targets— numpy arrays containing onehot-encoded classification labels
dog_names— list of string-valued dog breed names for translating labels
human_files— numpy array containing file paths of human images
There are 133 total dog categories.
There are 8351 total dog images.
There are 6680 training dog images.
There are 835 validation dog images.
There are 836 test dog images.There are 13233 total human images.
A sample visualization of the data set is shown below:
It can be seen that most of the images clearly shows the human/dog face, which is advantageous for the breed recognition. However, the size of the images are pretty different.
The following image shows the number of images by breed in the training set. It can be seen that although the numbers for different breeds range between around 30–80, the data categories are pretty balanced among most breed types in general.
As is indicated in the sample figures, the images are of different sizes. Therefore, we want to know the range of images sizes among the data as well as its aspect ratio (i.e., height v.s. width).
It is shown that a large portion of the images have aspect ratio close to 1, i.e., the hight and width of a typical image are close. This can be seen that most of the dots are clustered around the diagonal in the above image.
Meanwhile, the height/width of the images shave similar distributions. The sizes are mostly in the range of 200–800 pixels.
When using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll also refer to as a 4D tensor) as input, with shape
nb_samples corresponds to the total number of images (or samples), and
channels correspond to the number of rows, columns, and channels for each image, respectively.
path_to_tensor function below takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is 224×224224×224 pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape
paths_to_tensor function takes a numpy array of string-valued image paths as input and returns a 4D tensor with shape
nb_samples is the number of samples, or number of images, in the supplied array of image paths. It is best to think of
nb_samples as the number of 3D tensors (where each 3D tensor corresponds to a different image) in the dataset.
The preprocessing step is realized in this following code cell
The implementation follows the steps designed earlier in the problem statement section. The details of each part is described in more detail as follows:
In order to detect human faces in images we use OpenCV’s implementation of Haar feature-based cascade classifiers. The approach of this classifier is based on the concept of Haar-like features, which is widely used in the field of object recognition because of its convincing calculation speed. A human face detector is realized using this method in the following code cell:
Assess the Human Face Detector
Ideally, we would like 100% of human images with a detected face and 0% of dog images with a detected face. In reality, the face detector shows that 100% human images have a detected human face, while 11% dog images also have a detected human face.
A pre-trained ResNet-50 model is utilized to detect dogs in images. Our first line of code downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.
A dog detector is created via:
Assess the Dog Detector
This dog detector shows that 0% human images show that a dog is detected, while 100% dog images show a dog is detected.
Create a CNN to Classify Dog Breeds (from scratch)
Now that we have functions for detecting humans and dogs in images, we need a way to predict breed from images. Convolutional neural networks (CNNs) are a class of deep neural networks primarily used in the analysis of images.
The source code for creating our model with the Keras library looks like this:
This is the resultant CNN model:
Let’s now train and test our model:
The accuracy history of this first model is shown below. We can see that the accuracy of the training data is increasing as epoch number increases. However, the training accuracy in general is not very high (max = 0.65). Meanwhile, the validation accuracy stagnates after around 40 epochs. The overall accuracy is not super encouraging…
This results in a test accuracy of 24.76%. This is not bad, but certainly not accurate enough for a serious application, so let’s see if we can do any better.
Use a CNN to Classify Dog Breeds (using transfer learning)
The general idea behind transfer learning is the fact that it is much easier to teach specialized skills to a subject that already has basic knowledge in the specific domain. There are a lot of neural network models out there that already specialize in image recognition and have been trained on a huge amount of data. In this step, a CNN is trained using VGG-16 bottleneck features.
The following source code shows the process of building, training, and testing a CNN using pre-trained VGG-16 network:
Again, we show the accuracy history of this model:
It can be seen that even with less training accuracy, the validation accuracy is increasing with the increased epoch number.
This results in a test accuracy of 36.48%. Still not very impressive and far-away from our goal (test accuracy > 60%). Let’s move on and see how to further improve the performance.
Create a CNN to Classify Dog Breeds (using transfer learning)
In the previous step, a CNN is built using VGG-16 bottleneck features. In this step, bottleneck features are obtained from a different pre-trained model — ResNet-50.
The source code looks similar to that is used in Step 4:
And again, we show the accuracy history of this model:
It can be seen that with the ResNet-50 model, the training accuracy reached over 95% after 50 epochs. The validation accuracy reached about 80%, indicating the model is overfitting the training set.
The overfitting is likely a result of a relatively small dataset. There are only 8351 dog images from 133 breeds. The image number of each breed is less than 80. Augmentation of the dataset would improve the accuracy and the overfitting.
Overall, the test accuracy of this model is 84.21%, which is quite impressive. Now we are ready to use this model for dog breed prediction.
Model evaluation and validation
In this step, we write an algorithm that takes an image of a dog or a human and outputs a dog breed that closely resembles the dog or human. The algorithm is as follows:
Finally, let’s test our algorithm with a few test images! The dog images that were downloaded from Google Images to check the algorithm.
The humans were also identified as human, and a dog breed is predicted.
The dogs were correctly identified as dogs. The breed looks a little bit off — it is possible that including more training data or augmenting the dataset would increase the test accuracy thus increase the prediction. But in general the prediction is pretty close. The human face were all detected as human faces. The predicted dog breed type is also pretty close. Again, there are still space for the improvement of the model, which will be discussed next.
In this project, several approaches are developed for the development of an app for the identification of dog breeds. We achieved our best results with the application of a transfer learning model. An accuracy of 84.21% is achieved in our tests.
In this project, we learned how to build convolution networks from scratch, which was a very educational undertaking. This project has a very smooth design for me to follow each step. The tools and algorithms (keras ResNet, VGG-16, etc) embedded in this project give me great insights into usage of a broader range of built-in machine learning models.
The challenges that I faced in this project is not having enough background knowledge of all the newly developed the machine learning algorithms and tools. Therefore, building the model from scratch took me some time.
Despite the decent test accuracy, there are several options to further improve the model/algorithm:
- employ data augmentation to prevent overfitting.
- add more layers to make our model more complex and more powerful.
- build a simple web app to leverage the model and predict breeds through user-input images.