You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by ac...@apache.org on 2017/08/09 20:50:30 UTC
systemml git commit: [SYSTEMML-1703] Image Classification using Caffe
VGG-19 model sample notebook
Repository: systemml
Updated Branches:
refs/heads/master 5906682b0 -> c2124544d
[SYSTEMML-1703] Image Classification using Caffe VGG-19 model sample notebook
Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/c2124544
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/c2124544
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/c2124544
Branch: refs/heads/master
Commit: c2124544d2ddf8afc081670ea120ac148ef1bf12
Parents: 5906682
Author: Arvind Surve <ac...@yahoo.com>
Authored: Wed Aug 9 13:50:02 2017 -0700
Committer: Arvind Surve <ac...@yahoo.com>
Committed: Wed Aug 9 13:50:02 2017 -0700
----------------------------------------------------------------------
.../Image_Classify_Using_VGG_19.ipynb | 344 +++++++++++++++++++
1 file changed, 344 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/systemml/blob/c2124544/samples/jupyter-notebooks/Image_Classify_Using_VGG_19.ipynb
----------------------------------------------------------------------
diff --git a/samples/jupyter-notebooks/Image_Classify_Using_VGG_19.ipynb b/samples/jupyter-notebooks/Image_Classify_Using_VGG_19.ipynb
new file mode 100644
index 0000000..71d87d8
--- /dev/null
+++ b/samples/jupyter-notebooks/Image_Classify_Using_VGG_19.ipynb
@@ -0,0 +1,344 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Image Classification using Caffe VGG-19 model\n",
+ "\n",
+ "This notebook demonstrates importing VGG-19 model from Caffe to SystemML and use that model to do an image classification. VGG-19 model has been trained using ImageNet dataset (1000 classes with ~ 14M images). If an image to be predicted is in one of the class VGG-19 has trained on then accuracy will be higher.\n",
+ "We expect prediction of any image through SystemML using VGG-19 model will be similar to that of image predicted through Caffe using VGG-19 model directly."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Prerequisite:\n",
+ "1. SystemML Python Package\n",
+ "To run this notebook you need to install systeml 1.0 (Master Branch code as of 07/26/2017 or later) python package.\n",
+ "2. Caffe \n",
+ "If you want to verify results through Caffe, then you need to have Caffe python package or Caffe installed.\n",
+ "For this verification I have installed Caffe on local system instead of Caffe python package."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "##### SystemML Python Package information"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!pip show systemml"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### SystemML Build information\n",
+ "Following code will show SystemML information which is installed in the environment."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "from systemml import MLContext\n",
+ "ml = MLContext(sc)\n",
+ "print (\"SystemML Built-Time:\"+ ml.buildTime())\n",
+ "print(ml.info())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true,
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "# Workaround for Python 2.7.13 to avoid certificate validation issue while downloading any file.\n",
+ "\n",
+ "import ssl\n",
+ "\n",
+ "try:\n",
+ " _create_unverified_https_context = ssl._create_unverified_context\n",
+ "except AttributeError:\n",
+ " # Legacy Python that doesn't verify HTTPS certificates by default\n",
+ " pass\n",
+ "else:\n",
+ " # Handle target environment that doesn't support HTTPS verification\n",
+ " ssl._create_default_https_context = _create_unverified_https_context"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Download model, proto files and convert them to SystemML format.\n",
+ "\n",
+ "1. Download Caffe Model (VGG-19), proto files (deployer, network and solver) and label file.\n",
+ "2. Convert the Caffe model into SystemML input format.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": [
+ "# Download caffemodel and proto files \n",
+ "\n",
+ "\n",
+ "def downloadAndConvertModel(downloadDir='.', trained_vgg_weights='trained_vgg_weights'):\n",
+ " \n",
+ " # Step 1: Download the VGG-19 model and other files.\n",
+ " import errno\n",
+ " import os\n",
+ " import urllib\n",
+ "\n",
+ " # Create directory, if exists don't error out\n",
+ " try:\n",
+ " os.makedirs(os.path.join(downloadDir,trained_vgg_weights))\n",
+ " except OSError as exc: # Python >2.5\n",
+ " if exc.errno == errno.EEXIST and os.path.isdir(trained_vgg_weights):\n",
+ " pass\n",
+ " else:\n",
+ " raise\n",
+ " \n",
+ " # Download deployer, network, solver proto and label files.\n",
+ " urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/vgg19/VGG_ILSVRC_19_layers_deploy.proto', os.path.join(downloadDir,'VGG_ILSVRC_19_layers_deploy.proto'))\n",
+ " urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/vgg19/VGG_ILSVRC_19_layers_network.proto',os.path.join(downloadDir,'VGG_ILSVRC_19_layers_network.proto'))\n",
+ " urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/vgg19/VGG_ILSVRC_19_layers_solver.proto',os.path.join(downloadDir,'VGG_ILSVRC_19_layers_solver.proto'))\n",
+ "\n",
+ " # Get labels for data\n",
+ " urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/labels.txt', os.path.join(downloadDir, trained_vgg_weights, 'labels.txt'))\n",
+ "\n",
+ " # Following instruction download model of size 500MG file, so based on your network it may take time to download file.\n",
+ " urllib.urlretrieve('http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_19_layers.caffemodel', os.path.join(downloadDir,'VGG_ILSVRC_19_layers.caffemodel'))\n",
+ "\n",
+ " # Step 2: Convert the caffemodel to trained_vgg_weights directory\n",
+ " import systemml as sml\n",
+ " sml.convert_caffemodel(sc, os.path.join(downloadDir,'VGG_ILSVRC_19_layers_deploy.proto'), os.path.join(downloadDir,'VGG_ILSVRC_19_layers.caffemodel'), os.path.join(downloadDir,trained_vgg_weights))\n",
+ " \n",
+ " return"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "##### PrintTopK\n",
+ "This function will print top K probabilities and indices from the result."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": [
+ "# Print top K indices and probability\n",
+ "\n",
+ "def printTopK(prob, label, k):\n",
+ " print(label, 'Top ', k, ' Index : ', np.argsort(-prob)[0, :k])\n",
+ " print(label, 'Top ', k, ' Probability : ', prob[0,np.argsort(-prob)[0, :k]])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Classify image using Caffe\n",
+ "Prerequisite: You need to have Caffe installed on a system to run this code. (or have Caffe Python package installed)\n",
+ "\n",
+ "This will classify image using Caffe code directly. \n",
+ "This can be used to verify classification through SystemML if matches with that through Caffe directly."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "def getCaffeLabel(url, printTopKData, topK, size=(224,224), modelDir='trained_vgg_weights'):\n",
+ " import caffe\n",
+ "\n",
+ "\n",
+ " urllib.urlretrieve(url, 'test.jpg')\n",
+ " image = caffe.io.resize_image(caffe.io.load_image('test.jpg'), size)\n",
+ "\n",
+ " image = [(image * 255).astype(np.float)]\n",
+ "\n",
+ " deploy_file = 'VGG_ILSVRC_19_layers_deploy.proto'\n",
+ " caffemodel_file = 'VGG_ILSVRC_19_layers.caffemodel'\n",
+ "\n",
+ " net = caffe.Classifier(deploy_file, caffemodel_file)\n",
+ " caffe_prob = net.predict(image)\n",
+ " caffe_prediction = caffe_prob.argmax(axis=1)\n",
+ " \n",
+ " if(printTopKData):\n",
+ " printTopK(caffe_prob, 'Caffe', topK)\n",
+ "\n",
+ " import pandas as pd\n",
+ " labels = pd.read_csv(os.path.join(modelDir,'labels.txt'), names=['index', 'label'])\n",
+ " caffe_prediction_labels = [ labels[labels.index == x][['label']].values[0][0] for x in caffe_prediction ]\n",
+ " \n",
+ " return net, caffe_prediction_labels\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Classify images\n",
+ "\n",
+ "This function classify images from images specified through urls.\n",
+ "\n",
+ "###### Input Parameters: \n",
+ " urls: List of urls\n",
+ " printTokKData (default False): Whether to print top K indices and probabilities\n",
+ " topK: Top K elements to be displayed.\n",
+ " caffeInstalled (default False): If Caffe has been installed. If installed, then it will classify image (with top K probability and indices) based on printTopKData. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import urllib\n",
+ "from systemml.mllearn import Caffe2DML\n",
+ "import systemml as sml\n",
+ "\n",
+ "# Setting other than current directory causes \"network file not found\" issue, as network file\n",
+ "# location is defined in solver file which does not have a path, so it searches in current dir.\n",
+ "downloadDir = '.' # /home/asurve/caffe_models' \n",
+ "trained_vgg_weights = 'trained_vgg_weights'\n",
+ "\n",
+ "img_shape = (3, 224, 224)\n",
+ "size = (img_shape[1], img_shape[2])\n",
+ "\n",
+ "\n",
+ "def classifyImages(urls,printTokKData=False, topK=5, caffeInstalled=False):\n",
+ "\n",
+ " downloadAndConvertModel(downloadDir, trained_vgg_weights)\n",
+ " \n",
+ " vgg = Caffe2DML(sqlCtx, solver=os.path.join(downloadDir,'VGG_ILSVRC_19_layers_solver.proto'), input_shape=img_shape)\n",
+ " vgg.load(trained_vgg_weights)\n",
+ "\n",
+ " for url in urls:\n",
+ " outFile = 'inputTest.jpg'\n",
+ " urllib.urlretrieve(url, outFile)\n",
+ " \n",
+ " from IPython.display import Image, display\n",
+ " display(Image(filename=outFile))\n",
+ " \n",
+ " print (\"Prediction of above image to ImageNet Class using\");\n",
+ "\n",
+ " ## Do image classification through SystemML processing\n",
+ " from PIL import Image\n",
+ " input_image = sml.convertImageToNumPyArr(Image.open(outFile), img_shape=img_shape\n",
+ " , color_mode='BGR', mean=sml.getDatasetMean('VGG_ILSVRC_19_2014'))\n",
+ " print (\"Image preprocessed through SystemML :: \", vgg.predict(input_image)[0])\n",
+ " if(printTopKData == True):\n",
+ " sysml_proba = vgg.predict_proba(input_image)\n",
+ " printTopK(sysml_proba, 'SystemML BGR', topK)\n",
+ " \n",
+ " if(caffeInstalled == True):\n",
+ " net, caffeLabel = getCaffeLabel(url, printTopKData, topK, size, os.path.join(downloadDir, trained_vgg_weights))\n",
+ " print (\"Image classification through Caffe :: \", caffeLabel[0])\n",
+ "\n",
+ " print (\"Caffe input data through SystemML :: \", vgg.predict(np.matrix(net.blobs['data'].data.flatten()))[0])\n",
+ " \n",
+ " if(printTopKData == True):\n",
+ " sysml_proba = vgg.predict_proba(np.matrix(net.blobs['data'].data.flatten()))\n",
+ " printTopK(sysml_proba, 'With Caffe input data', topK)\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Sample API call to classify image\n",
+ "\n",
+ "There are couple of parameters to set based on what you are looking for.\n",
+ "1. printTopKData (default False): If this parameter gets set to True, then top K results (probabilities and indices) will be displayed. \n",
+ "2. topK (default 5): How many entities (K) to be displayed.\n",
+ "3. caffeInstalled (default False): If Caffe has installed. If not installed then verification through Caffe won't be done."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "printTopKData=False\n",
+ "topK=5\n",
+ "caffeInstalled=False\n",
+ "\n",
+ "\n",
+ "\n",
+ "urls = ['https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/MountainLion.jpg/312px-MountainLion.jpg', 'https://s-media-cache-ak0.pinimg.com/originals/f2/56/59/f2565989f455984f206411089d6b1b82.jpg', 'http://i2.cdn.cnn.com/cnnnext/dam/assets/161207140243-vanishing-elephant-closeup-exlarge-169.jpg', 'http://wallpaper-gallery.net/images/pictures-of-lilies/pictures-of-lilies-7.jpg', 'https://cdn.pixabay.com/photo/2012/01/07/21/56/sunflower-11574_960_720.jpg', 'https://image.shutterstock.com/z/stock-photo-bird-nest-on-tree-branch-with-five-blue-eggs-inside-108094613.jpg', 'https://i.ytimg.com/vi/6jQDbIv0tDI/maxresdefault.jpg','https://cdn.pixabay.com/photo/2016/11/01/23/53/cat-1790093_1280.jpg']\n",
+ "\n",
+ "\n",
+ "classifyImages(urls,printTopKData, topK, caffeInstalled)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 2",
+ "language": "python",
+ "name": "python2"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 2
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython2",
+ "version": "2.7.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}