How to Use PIL to Upload Images in Flowers Recognition?

The current topic is very important to understand and prepare image data for machine learning model. I did my best to make the process easy to understand. I am using Pillow or PIL (Python Image Library) to load images in Jupyter Notebook to train machine learning model instead of using ImageGenerator function to load images and CNN (Convolutional Neural Network). Let us get started step by step;

Table of Contents

Step 1: Check either Pillow is installed already on your system or not

Just run the following in your Jupyter Notebook,

import PIL

If it does not show any error it means Pillow is already there. If it shows error then install it by running the following command in any cell of Jupyter Notebook,

!pip install pillow

In case if you are using Anaconda3 then run the following,

conda install pillow

For Linux users, it would be,

pip3 install pillow

So, I hope you have Python Image Library installed now. Numpy library should also be installed.

Step 2: Loading flowers data

Assume, We have a folder named flowers containing five folders named as daisy, dandelion, rose, sunflower, tulip.

I have put flowers folder in the same directory where is my Jupyter Notebook file. Let us run the following lines of code to load the image data into notebook.

 # import all necessary libraries as follows
 import numpy as np
 import os
 import os.path
 import PIL
 from PIL import Image

 # let us make a function that can create dataset containing images and their labels

 def flower_dataset(image_folder):
     image_data=[] 
     image_label=[] 
     for folders in os.listdir(image_folder):     
         for photo in os.listdir(os.path.join(image_folder, folders)):         
             image_path= os.path.join(image_folder, folders,  photo)         
             image= Image.open(image_path)                     
             image = np.array(Image.open(image_path))         
             image = image.astype('float32')         
             image_data.append(image)         
             image_label.append(folders) 
 return image_data , image_label

Now calling the above function flower_dataset(images), creating images and labels as,

image_data, image_label = flower_dataset(r'flowers')

Let us check any image, assuming at index 0 or any index within range as,

image_data[0], image_label[0]

Step 3: Resizing and converting to grayscale

The following steps for resizing and converting to grayscale could not be fulfilled inside flower_dataset function, so I am doing the following for each individual sub folder. Here I am taking folder “daisy”. For other sub-folders i.e. dandelion, rose, sunflower and tulip, to be replaced one by one and run code again and again.

folders = r'flowers/daisy' 
for file in os.listdir(folders):
   folder_img = folders+"/"+file
   image = Image.open(folder_img)
   image = image.resize((200,200)) # resizing to 200x200 pixels
   image = image.convert('L') # converting to grayscale
   image.save(folder_img)

Let us check a few images (assumed 3 images) from selected sub folders (assumed “daisy”) to justify the above steps using matplotlib i.e. resizing and grayscale,

 import random
 import matplotlib.pyplot as plt
 import matplotlib.image as mpimg
 %matplotlib inline
 plt.figure(figsize=(20,20))
 image_folder=r'flowers/daisy' # let choose sub-folder "daisy"
 for i in range(3):
     file = random.choice(os.listdir(image_folder))
     image_path = os.path.join(image_folder, file)
     image = mpimg.imread(image_path)
     x = plt.subplot(1,3,i+1) # we will see just 3 random images
     x.title.set_text(file)
     plt.imshow(image)

Step 4: Labels encoding with mapping

The following lines of code will make label encoding using mapping. I am not using LabelEncoder() function here. I am just creating dictionary here as:

label_dict={k: v for v, k in enumerate(np.unique(image_label))}
label_val = [label_dict[image_label[i]] for i in range(len(image_label))]

Step 5: Distributing data into image data and image labels

We assume X for data images and y for labels as:

X = np.array(image_data, np.float32)
y = np.array(list(map(int,label_val)), np.float32)

At this stage, we have all the data in float, we have accomplished encoding.

Step 6: Splitting data into train and test using train_test_split function

train_test_split() function is very famous in splitting data into train data, test data and validation data keeping sense of images and labels. sklearn library i.e. scikit-learn library is open source for machine learning jobs.

Let us assume, we have to split image data into train and test in ratios 60% and 40% respectively then the following lines of codes should be followed as:

 from sklearn.model_selection import train_test_split
 train_ratio = .6 # given ratio is 60% as training
 test_ratio = .4 # given 40 % as test
 x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = test_ratio)

Step 7: Preparing images and labels regarding train and test data

Now let us understand the philosophy of RGB prior to discuss preparing images and labels,

RGB (Red, Green, Blue) are consuming 8 bit each. The range for each individual color is 0-255 (i.e. 256 possibilities).
The combination range is 256x256x256 in maximum approach.

Dividing by 255, the 0-255 range can be described with a 0.0-1.0 range where 0.0 means 0 and 1.0 means 255.

As we have split all the data into train (images and labels) and test (images and labels). We prepare it further for machine learning as given in the following:

x_train = np.array(x_train).reshape((2593, 200 * 200))
x_train = x_train.astype('float32') / 255
x_test = np.array(x_test).reshape((1729, 200 * 200))
x_test = x_test.astype('float32') / 255

Here we have these figures 2593 and 1729, where these come from?. The total number of images are 4322 in flowers dataset. We have split 60% & 40% of total number of images i.e. 2593 and 1729 respectively as in step 6.

Now preparing labels categorically using keras utility to_categorical as:

 from keras.utils import to_categorical
 y_train = to_categorical(y_train)
 y_test = to_categorical(y_test)

Step 8: Creating a very basic and simple machine learning model

We are not going to use any convolution neural network(CNN), we would just right very basic code for our machine learning model. Let us do it.

 import tensorflow as tf
 import tensorflow.keras as keras
 from tensorflow.keras.models import Sequential
 from tensorflow.keras.layers import Dense 
 from tensorflow.keras.optimizers import *

 model = Sequential()
 model.add(tf.keras.layers.Dense(512, activation='relu'))
 model.add(tf.keras.layers.Dense(5, activation='softmax')) # as 5 labels are there 
 model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
 model.fit(x_train, y_train, epochs=180)

Note that all the essential libraries for machine must be installed prior to start creating machine learning model. If you need guideline please read my blog or watch video tutorial, Click Here. If you have Anaconda3 installed on system then please read my blog or watch video tutorial about setting up Anaconda3 for machine learning, Click Here. Or you could use google colab, Click Here

Conclusion

The purpose of this blog and video tutorial is just to deliver very simple and basic understanding about to use Python Image Library(PIL) through a few essential steps to create a basic machine learning model. This model could be improved further.