Classify Flowers in Tensorflow

PUBLISHED ON DEC 21, 2017

I was talking with some friends who are starting a flower delivery company, which they hope becomes the worlds most fair flower company in the world. They have an app and with it seek to increase transparency between growers and the end consumer, making sure that everyone involved in the process is treated with love and respect, as if they were family.

I know from personal experience that learning about flowers and the story behind them is fun, but good luck finding an answer with a google search for “gardenia” using it’s common, yet beautiful visual characteristics. This got me thinking. Why not build for them their own image classification algorithm, leveraging tensorflow’s retraining scripts? Sounds reasonable. Problem is, we don’t have data to train on - so let’s go get some and see how we we do.

First thing we need is a list of the flowers we’d like to be able to identify. I’ll get that from my friends so that no matter which flower their customer is curious about, we should be able to identify it. Here’s the list below:

flowers = ['stephanotis', 'pansy', 'marigold', 'stock', 'lotus', 'dahlia', 'gladioli', 'chrysanthemum', 'apple blossom', 'camellia', 'sweet pea', 'lavender', 'ranunculus', 'statice', 'protea', "queen anne's lace", 'poinsettia', 'snapdragons', 'lisianthus', 'freesia', 'delphinium', 'bourvardia', 'aster', 'bird of paradise', 'heather', 'anthurium', 'anemomne', 'crocus', 'amaryllis', 'alstromeria', 'cypress', 'hibiscus', 'morning glory', 'laurel', 'dianthus', 'canna', 'snapdragon', 'oriental poppy', "parrot's beak", 'bleeding heart', 'jade vine']

Google image search is typically a good resource for finding images so let’s scrape the first 100 images for each flower we want to be able to identify and store it locally.

create the python file.

touch img_dl.py

Open it.

subl img_dl.py

And let’s get started. We’ll first write our headers and the the libraries we’ll need.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import sys
import time
import urllib2
from bs4 import BeautifulSoup

We’ll need to download the html so that we can parse it - let’s use urllib2.

def get_html(url):
  req = urllib2.Request(url)
  response = urllib2.urlopen(req)
  return response.read()

And a utlity function to parse the html using beautiful soup.

def get_image_urls(html):
  found_img_urls = []
  soup = BeautifulSoup(html, 'lxml')
  img_urls = soup.find_all('img', {'class': 'rg_i'})
  for url in img_urls:
      if url.get('data-src'):
          found_img_urls.append(url['data-src'])
  return found_img_urls

For tensorflow retraining, we’ll need each of our flower images to be in a folder with it’s respective name. We’ll call the folder the image folders go in flower_images. The main function should run until we’ve exhausted our list of flower names, create a folder to dump the images, and parse the html from google to create a list of urls where the images we’re interested in using for our training set are available. Finally, we’ll iterate through that list, downloading each one of the images to their respective folder. Here’s the main function in full.

def main():
  image_folder = 'flower_images'
  if not os.path.isdir(image_folder):
    os.mkdir(image_folder)

  i = 0
  while i < len(flowers):
    items = []
    flower = flowers[i]
    search_string = flowers[i].replace(' ','%20')
    
    if not os.path.isdir('{}/{}'.format(image_folder, flower)):
      os.mkdir('{}/{}'.format(image_folder, flower))

    url = 'https://www.google.com/search?q=' + search_string + '&source=lnms&tbm=isch&sa=X&ved=0ahUKEwiYiNyJ-JvYAhUNzWMKHbcdBvoQ_AUICigB&biw=1680&bih=949'
    html =  get_html(url)
    img_urls = get_image_urls(html=html)

    img_no = 0
    while img_no < len(img_urls):        
      try:
        req = urllib2.Request(img_urls[img_no])
        response = urllib2.urlopen(req,None,15)
        output_file = open('{}/{}/{}.jpg'.format(image_folder, flower, str(img_no + 1)), mode='wb')
        data = response.read()
        output_file.write(data)
        response.close();
        img_no += 1
      except urllib2.HTTPError as e:
        img_no += 1
      except urllib2.URLError as e:
        img_no += 1
      except IOError:
        img_no += 1

    i += 1

Finally add

if __name__ == '__main__':
  main()

And run it.

python img_dl.py

cd into the flower_images folder and ls to make sure you see folders with names of the flowers you’re interested in classifying.

cd flower_images
ls

The complete script is available here.

Now we have data to train our model with! Let’s get started with tensorflow. If you don’t already have it, you can download tensorflow here. Tensorflow is an open-source symbolic library used for machine learning. It’s managed by google and they have already released code that’s perfect for our task. We’ll simply walk through the most important functions so we know that it’s doing and why.

In order to train a machine learning model, you need a training set, a validation set, and a testing set. Here’s a function that splits our images into those sets.

def create_image_lists(image_dir, testing_percentage, validation_percentage):
  if not gfile.Exists(image_dir):
    tf.logging.error("Image directory '" + image_dir + "' not found.")
    return None
  result = {}
  sub_dirs = [x[0] for x in gfile.Walk(image_dir)]
  is_root_dir = True
  for sub_dir in sub_dirs:
    if is_root_dir:
      is_root_dir = False
      continue
    extensions = ['jpg', 'jpeg', 'JPG', 'JPEG']
    file_list = []
    dir_name = os.path.basename(sub_dir)
    if dir_name == image_dir:
      continue
    tf.logging.info("Looking for images in '" + dir_name + "'")
    for extension in extensions:
      file_glob = os.path.join(image_dir, dir_name, '*.' + extension)
      file_list.extend(gfile.Glob(file_glob))
    if not file_list:
      tf.logging.warning('No files found')
      continue
    if len(file_list) < 20:
      tf.logging.warning(
          'WARNING: Folder has less than 20 images, which may cause issues.')
    elif len(file_list) > MAX_NUM_IMAGES_PER_CLASS:
      tf.logging.warning(
          'WARNING: Folder {} has more than {} images. Some images will '
          'never be selected.'.format(dir_name, MAX_NUM_IMAGES_PER_CLASS))
    label_name = re.sub(r'[^a-z0-9]+', ' ', dir_name.lower())
    training_images = []
    testing_images = []
    validation_images = []
    for file_name in file_list:
      base_name = os.path.basename(file_name)
      hash_name = re.sub(r'_nohash_.*$', '', file_name)
      hash_name_hashed = hashlib.sha1(compat.as_bytes(hash_name)).hexdigest()
      percentage_hash = ((int(hash_name_hashed, 16) %
                          (MAX_NUM_IMAGES_PER_CLASS + 1)) *
                         (100.0 / MAX_NUM_IMAGES_PER_CLASS))
      if percentage_hash < validation_percentage:
        validation_images.append(base_name)
      elif percentage_hash < (testing_percentage + validation_percentage):
        testing_images.append(base_name)
      else:
        training_images.append(base_name)
    result[label_name] = {
        'dir': dir_name,
        'training': training_images,
        'testing': testing_images,
        'validation': validation_images,
    }
  return result

Since tensorflow is really just a super fast way of computing matrix operators, you can think of our statistical model as one big graph. Here’s the function for creating the model graph.

def create_model_graph(model_info):
  with tf.Graph().as_default() as graph:
    model_path = os.path.join(FLAGS.model_dir, model_info['model_file_name'])
    with gfile.FastGFile(model_path, 'rb') as f:
      graph_def = tf.GraphDef()
      graph_def.ParseFromString(f.read())
      bottleneck_tensor, resized_input_tensor = (tf.import_graph_def(
          graph_def,
          name='',
          return_elements=[
              model_info['bottleneck_tensor_name'],
              model_info['resized_input_tensor_name'],
          ]))
  return graph, bottleneck_tensor, resized_input_tensor

In machine learning, a bottleneck is a representation of the input with reduced dimensionality. We extract it by decoding the jpg image, resizing and resaling the pixel values and finally running it through the recognition network.

def run_bottleneck_on_image(sess, image_data, image_data_tensor,
                            decoded_image_tensor, resized_input_tensor,
                            bottleneck_tensor):
  resized_input_values = sess.run(decoded_image_tensor,
                                  {image_data_tensor: image_data})
  bottleneck_values = sess.run(bottleneck_tensor,
                               {resized_input_tensor: resized_input_values})
  bottleneck_values = np.squeeze(bottleneck_values)
  return bottleneck_values

Inception v3 is a deep convolutional neural network and was designed for the ImageNet Large Scale Visual Recognition Challege.

We want to retrain the last layer of the inception net to identify the type of flower we are looking at. Here’s what the function looks like.

def add_final_training_ops(class_count, final_tensor_name, bottleneck_tensor,
                           bottleneck_tensor_size):
  with tf.name_scope('input'):
    bottleneck_input = tf.placeholder_with_default(
        bottleneck_tensor,
        shape=[None, bottleneck_tensor_size],
        name='BottleneckInputPlaceholder')

    ground_truth_input = tf.placeholder(tf.float32,
                                        [None, class_count],
                                        name='GroundTruthInput')

  layer_name = 'final_training_ops'
  with tf.name_scope(layer_name):
    with tf.name_scope('weights'):
      initial_value = tf.truncated_normal(
          [bottleneck_tensor_size, class_count], stddev=0.001)

      layer_weights = tf.Variable(initial_value, name='final_weights')

      variable_summaries(layer_weights)
    with tf.name_scope('biases'):
      layer_biases = tf.Variable(tf.zeros([class_count]), name='final_biases')
      variable_summaries(layer_biases)
    with tf.name_scope('Wx_plus_b'):
      logits = tf.matmul(bottleneck_input, layer_weights) + layer_biases
      tf.summary.histogram('pre_activations', logits)

  final_tensor = tf.nn.softmax(logits, name=final_tensor_name)
  tf.summary.histogram('activations', final_tensor)

  with tf.name_scope('cross_entropy'):
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
        labels=ground_truth_input, logits=logits)
    with tf.name_scope('total'):
      cross_entropy_mean = tf.reduce_mean(cross_entropy)
  tf.summary.scalar('cross_entropy', cross_entropy_mean)

  with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
    train_step = optimizer.minimize(cross_entropy_mean)

  return (train_step, cross_entropy_mean, bottleneck_input, ground_truth_input,
          final_tensor)

Now for our main function. It’s a lot, but we’ll go through the most important steps.

Neural networks can be a bit of a black box, so any information we can get out of it, we should.

tf.logging.set_verbosity(tf.logging.INFO)

Prepare the directories to be used during training.

prepare_file_system()

There are a lot of great model architectures to work with. Here, we’re using Inception v3.

model_info = create_model_info(FLAGS.architecture)
if not model_info:
  tf.logging.error('Did not recognize architecture flag')
  return -1

The pretrained model can be large, so only download it if you need to. Then set up the pre-trained graph.

maybe_download_and_extract(model_info['data_url'])
graph, bottleneck_tensor, resized_image_tensor = (
    create_model_graph(model_info))

We need a way of quickly accessing the image data. Look at the folder structure and add them to a list.

image_lists = create_image_lists(FLAGS.image_dir, FLAGS.testing_percentage,
                                 FLAGS.validation_percentage)
class_count = len(image_lists.keys())
if class_count == 0:
  tf.logging.error('No valid folders of images found at ' + FLAGS.image_dir)
  return -1
if class_count == 1:
  tf.logging.error('Only one valid folder of images found at ' +
                   FLAGS.image_dir +
                   ' - multiple classes are needed for classification.')
  return -1

With image distortion, we can avoid overfitting and augment our data, effectively increasing the power of our training set. I happen to know that flowers aren’t hard to classify so this isn’t neccessary here.

do_distort_images = should_distort_images(
    FLAGS.flip_left_right, FLAGS.random_crop, FLAGS.random_scale,
    FLAGS.random_brightness)

Tensorflow is static so we must compile our session before we run it.

with tf.Session(graph=graph) as sess:

And within the session we will set up the image decoding subgraph,

jpeg_data_tensor, decoded_image_tensor = add_jpeg_decoding(
    model_info['input_width'], model_info['input_height'],
    model_info['input_depth'], model_info['input_mean'],
    model_info['input_std'])

Distort images if necessary.

if do_distort_images:
  (distorted_jpeg_data_tensor,
   distorted_image_tensor) = add_input_distortions(
       FLAGS.flip_left_right, FLAGS.random_crop, FLAGS.random_scale,
       FLAGS.random_brightness, model_info['input_width'],
       model_info['input_height'], model_info['input_depth'],
       model_info['input_mean'], model_info['input_std'])
else:
  cache_bottlenecks(sess, image_lists, FLAGS.image_dir,
                    FLAGS.bottleneck_dir, jpeg_data_tensor,
                    decoded_image_tensor, resized_image_tensor,
                    bottleneck_tensor, FLAGS.architecture)

Add the new layer that we’ll be training.

(train_step, cross_entropy, bottleneck_input, ground_truth_input,
 final_tensor) = add_final_training_ops(
     len(image_lists.keys()), FLAGS.final_tensor_name, bottleneck_tensor,
     model_info['bottleneck_tensor_size'])

Create the operations we need to evaluate the accuracy of our new layer.

evaluation_step, prediction = add_evaluation_step(
    final_tensor, ground_truth_input)

Merge all the summaries and write them out to the summaries_dir

merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train',
                                     sess.graph)

validation_writer = tf.summary.FileWriter(
    FLAGS.summaries_dir + '/validation')

Set up all our weights to their initial default values.

init = tf.global_variables_initializer()
sess.run(init)

And run the training for as many cycles as requested on the command line.

  for i in range(FLAGS.how_many_training_steps):
    if do_distort_images:
      (train_bottlenecks,
       train_ground_truth) = get_random_distorted_bottlenecks(
           sess, image_lists, FLAGS.train_batch_size, 'training',
           FLAGS.image_dir, distorted_jpeg_data_tensor,
           distorted_image_tensor, resized_image_tensor, bottleneck_tensor)
    else:
      (train_bottlenecks,
       train_ground_truth, _) = get_random_cached_bottlenecks(
           sess, image_lists, FLAGS.train_batch_size, 'training',
           FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor,
           decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
           FLAGS.architecture)
    train_summary, _ = sess.run(
        [merged, train_step],
        feed_dict={bottleneck_input: train_bottlenecks,
                   ground_truth_input: train_ground_truth})
    train_writer.add_summary(train_summary, i)

    is_last_step = (i + 1 == FLAGS.how_many_training_steps)
    if (i % FLAGS.eval_step_interval) == 0 or is_last_step:
      train_accuracy, cross_entropy_value = sess.run(
          [evaluation_step, cross_entropy],
          feed_dict={bottleneck_input: train_bottlenecks,
                     ground_truth_input: train_ground_truth})
      tf.logging.info('%s: Step %d: Train accuracy = %.1f%%' %
                      (datetime.now(), i, train_accuracy * 100))
      tf.logging.info('%s: Step %d: Cross entropy = %f' %
                      (datetime.now(), i, cross_entropy_value))
      validation_bottlenecks, validation_ground_truth, _ = (
          get_random_cached_bottlenecks(
              sess, image_lists, FLAGS.validation_batch_size, 'validation',
              FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor,
              decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
              FLAGS.architecture))
      validation_summary, validation_accuracy = sess.run(
          [merged, evaluation_step],
          feed_dict={bottleneck_input: validation_bottlenecks,
                     ground_truth_input: validation_ground_truth})
      validation_writer.add_summary(validation_summary, i)
      tf.logging.info('%s: Step %d: Validation accuracy = %.1f%% (N=%d)' %
                      (datetime.now(), i, validation_accuracy * 100,
                       len(validation_bottlenecks)))

    intermediate_frequency = FLAGS.intermediate_store_frequency

    if (intermediate_frequency > 0 and (i % intermediate_frequency == 0)
        and i > 0):
      intermediate_file_name = (FLAGS.intermediate_output_graphs_dir +
                                'intermediate_' + str(i) + '.pb')
      tf.logging.info('Save intermediate result to : ' +
                      intermediate_file_name)
      save_graph_to_file(sess, graph, intermediate_file_name)

  test_bottlenecks, test_ground_truth, test_filenames = (
      get_random_cached_bottlenecks(
          sess, image_lists, FLAGS.test_batch_size, 'testing',
          FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor,
          decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
          FLAGS.architecture))
  test_accuracy, predictions = sess.run(
      [evaluation_step, prediction],
      feed_dict={bottleneck_input: test_bottlenecks,
                 ground_truth_input: test_ground_truth})
  tf.logging.info('Final test accuracy = %.1f%% (N=%d)' %
                  (test_accuracy * 100, len(test_bottlenecks)))

  if FLAGS.print_misclassified_test_images:
    tf.logging.info('=== MISCLASSIFIED TEST IMAGES ===')
    for i, test_filename in enumerate(test_filenames):
      if predictions[i] != test_ground_truth[i].argmax():
        tf.logging.info('%70s  %s' %
                        (test_filename,
                         list(image_lists.keys())[predictions[i]]))

  save_graph_to_file(sess, graph, FLAGS.output_graph)
  with gfile.FastGFile(FLAGS.output_labels, 'w') as f:
    f.write('\n'.join(image_lists.keys()) + '\n')

Now we need to call the function, importing where we stored the downloaded flower images, where we’d like to save the output graph and labels, how many steps we’d like to run (5,000), learning rate (.01) and which archtecture we’d like to use.

python retrain_classifier.py --image_dir flower_images --output_graph output_graph.pb ---output_labels output_labels.txt --how_many_training_steps 5000 --learning_rate 0.01 --architecture inception_v3

While it’s running, I suggest running tensorboard at to monitor your results. Tensorboard is a suite of web applications for inspecting and understanding tensorflow graphs.

tensorboard --logdir /tmp/retrain_logs

It looks like this

This is an image

Since we’re simply retraining the last layer and our task is not all too computationally complex, I’ll save time by not setting up a GPU and running it locally on my CPU. After just 15 minutes, we’ve managed to retrain the model and achieved 66.1% accuracy on our test set. Not perfect, but good enough to start with. Now it’s time to test it out in the real world!

Google also has a script for testing a retrained inception net, so let’s look at a couple of the more important elements in that code.

First you have to load the image, graph, and labels. Then simply open a new session, and return a softmax tensor with predictions.

The graph function looks like this.

def run_graph(image_data, labels, input_layer_name, output_layer_name, num_top_predictions):
  with tf.Session() as sess:
    softmax_tensor = sess.graph.get_tensor_by_name(output_layer_name)
    predictions, = sess.run(softmax_tensor, {input_layer_name: image_data})
    top_k = predictions.argsort()[-num_top_predictions:][::-1]
    for node_id in top_k:
      human_string = labels[node_id]
      score = predictions[node_id]
      print('%s (score = %.5f)' % (human_string, score))
    return 0

Call the python script like this.

python label.py --graph output_graph.pb --labels output_labels.txt --image gardenia.jpg

Here’s what it returns.

gardenia (score = 0.65286)
stephanotis (score = 0.09459)
hydrangea (score = 0.04508)
magnolia (score = 0.01834)
orchid (score = 0.01795)

And voila! We have trained an image classifier to solve our original problem starting with absolutely no data. Pretty cool, huh?

The original scripts can be found from google’s tensorflow github. My script is available at my github.