You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/12 20:24:52 UTC

[GitHub] szha closed pull request #9036: Update for MXNet 1.0 and generalize to Python 2.7/3.6

szha closed pull request #9036: Update for MXNet 1.0 and generalize to Python 2.7/3.6
URL: https://github.com/apache/incubator-mxnet/pull/9036
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/tutorials/nlp/cnn.md b/docs/tutorials/nlp/cnn.md
index 23f74c4efb..7f56b76531 100644
--- a/docs/tutorials/nlp/cnn.md
+++ b/docs/tutorials/nlp/cnn.md
@@ -1,6 +1,6 @@
 # Text Classification Using a Convolutional Neural Network on MXNet
 
-This tutorial is based of Yoon Kim's [paper](https://arxiv.org/abs/1408.5882) on using convolutional neural networks for sentence sentiment classification.
+This tutorial is based of Yoon Kim's [paper](https://arxiv.org/abs/1408.5882) on using convolutional neural networks for sentence sentiment classification. The tutorial has been tested on MXNet 1.0 running under Python 2.7 and Python 3.6.
 
 For this tutorial, we will train a convolutional deep network model on movie review sentences from Rotten Tomatoes labeled with their sentiment. The result will be a model that can classify a sentence based on its sentiment (with 1 being a purely positive sentiment, 0 being a purely negative sentiment and 0.5 being neutral).
 
@@ -8,16 +8,24 @@ Our first step will be to fetch the labeled training data of positive and negati
 
 
 ```python
-import urllib2
+from __future__ import print_function
+
+from collections import Counter
+import itertools
 import numpy as np
 import re
-import itertools
-from collections import Counter
 
+try:
+    # For Python 3.0 and later
+    from urllib.request import urlopen
+except ImportError:
+    # Fall back to Python 2's urllib2
+    from urllib2 import urlopen
+    
 def clean_str(string):
     """
-    Tokenization/string cleaning for all datasets except for SST.
-    Original taken from https://github.com/yoonkim/CNN_sentence/blob/master/process_data.py
+    Tokenization/string cleaning.
+    Original from https://github.com/yoonkim/CNN_sentence/blob/master/process_data.py
     """
     string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string)
     string = re.sub(r"\'s", " \'s", string)
@@ -32,38 +40,42 @@ def clean_str(string):
     string = re.sub(r"\)", " \) ", string)
     string = re.sub(r"\?", " \? ", string)
     string = re.sub(r"\s{2,}", " ", string)
+    
     return string.strip().lower()
 
+def download_sentences(url):
+    """
+    Download sentences from specified URL. 
+    
+    Strip trailing newline, convert to Unicode.
+    """
+    
+    remote_file = urlopen(url)
+    return [line.decode('Latin1').strip() for line in remote_file.readlines()]
+    
 def load_data_and_labels():
     """
-    Loads MR polarity data from files, splits the data into words and generates labels.
+    Loads polarity data from files, splits the data into words and generates labels.
     Returns split sentences and labels.
     """
-    # Pull sentences with positive sentiment
-    pos_file = urllib2.urlopen('https://raw.githubusercontent.com/yoonkim/CNN_sentence/master/rt-polarity.pos')
-
-    # Pull sentences with negative sentiment
-    neg_file = urllib2.urlopen('https://raw.githubusercontent.com/yoonkim/CNN_sentence/master/rt-polarity.neg')
-
-    # Load data from files
-    positive_examples = list(pos_file.readlines())
-    positive_examples = [s.strip() for s in positive_examples]
-    negative_examples = list(neg_file.readlines())
-    negative_examples = [s.strip() for s in negative_examples]
-    # Split by words
+
+    positive_examples = download_sentences('https://raw.githubusercontent.com/yoonkim/CNN_sentence/master/rt-polarity.pos')
+    negative_examples = download_sentences('https://raw.githubusercontent.com/yoonkim/CNN_sentence/master/rt-polarity.neg')
+    
+    # Tokenize
     x_text = positive_examples + negative_examples
-    x_text = [clean_str(sent) for sent in x_text]
-    x_text = [s.split(" ") for s in x_text]
+    x_text = [clean_str(sent).split(" ") for sent in x_text]
+
     # Generate labels
     positive_labels = [1 for _ in positive_examples]
     negative_labels = [0 for _ in negative_examples]
     y = np.concatenate([positive_labels, negative_labels], 0)
-    return [x_text, y]
+    return x_text, y
 
 
 def pad_sentences(sentences, padding_word="</s>"):
     """
-    Pads all sentences to the same length. The length is defined by the longest sentence.
+    Pads all sentences to be the length of the longest sentence.
     Returns padded sentences.
     """
     sequence_length = max(len(x) for x in sentences)
@@ -73,33 +85,40 @@ def pad_sentences(sentences, padding_word="</s>"):
         num_padding = sequence_length - len(sentence)
         new_sentence = sentence + [padding_word] * num_padding
         padded_sentences.append(new_sentence)
+        
     return padded_sentences
 
 
 def build_vocab(sentences):
     """
-    Builds a vocabulary mapping from word to index based on the sentences.
+    Builds a vocabulary mapping from token to index based on the sentences.
     Returns vocabulary mapping and inverse vocabulary mapping.
     """
     # Build vocabulary
     word_counts = Counter(itertools.chain(*sentences))
+    
     # Mapping from index to word
     vocabulary_inv = [x[0] for x in word_counts.most_common()]
+    
     # Mapping from word to index
     vocabulary = {x: i for i, x in enumerate(vocabulary_inv)}
-    return [vocabulary, vocabulary_inv]
+    
+    return vocabulary, vocabulary_inv
 
 
 def build_input_data(sentences, labels, vocabulary):
     """
     Maps sentences and labels to vectors based on a vocabulary.
     """
-    x = np.array([[vocabulary[word] for word in sentence] for sentence in sentences])
+    x = np.array([
+            [vocabulary[word] for word in sentence]
+            for sentence in sentences])
     y = np.array(labels)
-    return [x, y]
+    
+    return x, y
 
 """
-Loads and preprocessed data for the MR dataset.
+Loads and preprocesses data for the MR dataset.
 Returns input vectors, labels, vocabulary, and inverse vocabulary.
 """
 # Load and preprocess data
@@ -123,11 +142,11 @@ y_train, y_dev = y_shuffled[:-1000], y_shuffled[-1000:]
 
 sentence_size = x_train.shape[1]
 
-print 'Train/Dev split: %d/%d' % (len(y_train), len(y_dev))
-print 'train shape:', x_train.shape
-print 'dev shape:', x_dev.shape
-print 'vocab_size', vocab_size
-print 'sentence max words', sentence_size
+print('Train/Dev split: %d/%d' % (len(y_train), len(y_dev)))
+print('train shape:', x_train.shape)
+print('dev shape:', x_dev.shape)
+print('vocab_size', vocab_size)
+print('sentence max words', sentence_size)
 ```
 
     Train/Dev split: 9662/1000
@@ -150,8 +169,8 @@ import sys,os
 Define batch size and the place holders for network inputs and outputs
 '''
 
-batch_size = 50 # the size of batches to train network with
-print 'batch size', batch_size
+batch_size = 50
+print('batch size', batch_size)
 
 input_x = mx.sym.Variable('data') # placeholder for input data
 input_y = mx.sym.Variable('softmax_label') # placeholder for output label
@@ -163,7 +182,7 @@ Define the first network layer (embedding)
 
 # create embedding layer to learn representation of words in a lower dimensional subspace (much like word2vec)
 num_embed = 300 # dimensions to embed words into
-print 'embedding dimensions', num_embed
+print('embedding dimensions', num_embed)
 
 embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')
 
@@ -185,14 +204,14 @@ Because each convolution+pool filter produces tensors of different shapes we nee
 ```python
 # create convolution + (max) pooling layer for each filter operation
 filter_list=[3, 4, 5] # the size of filters to use
-print 'convolution filters', filter_list
+print('convolution filters', filter_list)
 
 num_filter=100
 pooled_outputs = []
-for i, filter_size in enumerate(filter_list):
+for filter_size in filter_list:
     convi = mx.sym.Convolution(data=conv_input, kernel=(filter_size, num_embed), num_filter=num_filter)
     relui = mx.sym.Activation(data=convi, act_type='relu')
-    pooli = mx.sym.Pooling(data=relui, pool_type='max', kernel=(sentence_size - filter_size + 1, 1), stride=(1,1))
+    pooli = mx.sym.Pooling(data=relui, pool_type='max', kernel=(sentence_size - filter_size + 1, 1), stride=(1, 1))
     pooled_outputs.append(pooli)
 
 # combine all pooled outputs
@@ -206,14 +225,14 @@ h_pool = mx.sym.Reshape(data=concat, target_shape=(batch_size, total_filters))
     convolution filters [3, 4, 5]
 
 
-Next, we add dropout regularization, which will randomly disable a fraction of neurons in the layer (set to 50% here) to ensure that that model does not overfit. This works by preventing neurons from co-adapting and forcing them to learn individually useful features.
+Next, we add dropout regularization, which will randomly disable a fraction of neurons in the layer (set to 50% here) to ensure that that model does not overfit. This prevents neurons from co-adapting and forces them to learn individually useful features.
 
 This is necessary for our model because the dataset has a vocabulary of size around 20k and only around 10k examples so since this data set is pretty small we're likely to overfit with a powerful model (like this neural net).
 
 
 ```python
 # dropout layer
-dropout=0.5
+dropout = 0.5
 print 'dropout probability', dropout
 
 if dropout > 0.0:
@@ -231,7 +250,7 @@ Finally, we add a fully connected layer to add non-linearity to the model. We th
 
 ```python
 # fully connected layer
-num_label=2
+num_label = 2
 
 cls_weight = mx.sym.Variable('cls_weight')
 cls_bias = mx.sym.Variable('cls_bias')
@@ -252,16 +271,16 @@ Now that we have defined our CNN model we will define the device on our machine
 
 ```python
 from collections import namedtuple
-import time
 import math
+import time
 
 # Define the structure of our CNN Model (as a named tuple)
 CNNModel = namedtuple("CNNModel", ['cnn_exec', 'symbol', 'data', 'label', 'param_blocks'])
 
 # Define what device to train/test on
-ctx=mx.gpu(0)
+ctx = mx.gpu(0)
 # If you have no GPU on your machine change this to
-# ctx=mx.cpu(0)
+# ctx = mx.cpu(0)
 
 arg_names = cnn.list_arguments()
 
@@ -280,16 +299,14 @@ cnn_exec = cnn.bind(ctx=ctx, args=arg_arrays, args_grad=args_grad, grad_req='add
 
 param_blocks = []
 arg_dict = dict(zip(arg_names, cnn_exec.arg_arrays))
-initializer=mx.initializer.Uniform(0.1)
+initializer = mx.initializer.Uniform(0.1)
 for i, name in enumerate(arg_names):
     if name in ['softmax_label', 'data']: # input, output
         continue
-    initializer(name, arg_dict[name])
+    initializer(mx.init.InitDesc(name), arg_dict[name])
 
     param_blocks.append( (i, arg_dict[name], args_grad[name], name) )
 
-out_dict = dict(zip(cnn.list_outputs(), cnn_exec.outputs))
-
 data = cnn_exec.arg_dict['data']
 label = cnn_exec.arg_dict['softmax_label']
 
@@ -304,15 +321,15 @@ We can now execute the training and testing of our network, which in-part mxnet
 Train the cnn_model using back prop
 '''
 
-optimizer='rmsprop'
-max_grad_norm=5.0
-learning_rate=0.0005
-epoch=50
+optimizer = 'rmsprop'
+max_grad_norm = 5.0
+learning_rate = 0.0005
+epoch = 50
 
-print 'optimizer', optimizer
-print 'maximum gradient', max_grad_norm
-print 'learning rate (step size)', learning_rate
-print 'epochs to train for', epoch
+print('optimizer', optimizer)
+print('maximum gradient', max_grad_norm)
+print('learning rate (step size)', learning_rate)
+print('epochs to train for', epoch)
 
 # create optimizer
 opt = mx.optimizer.create(optimizer)
@@ -320,9 +337,6 @@ opt.lr = learning_rate
 
 updater = mx.optimizer.get_updater(opt)
 
-# create logging output
-logs = sys.stderr
-
 # For each training epoch
 for iteration in range(epoch):
     tic = time.time()
@@ -369,7 +383,7 @@ for iteration in range(epoch):
     # Decay learning rate for this epoch to ensure we are not "overshooting" optima
     if iteration % 50 == 0 and iteration > 0:
         opt.lr *= 0.5
-        print >> logs, 'reset learning rate to %g' % opt.lr
+        print('reset learning rate to %g' % opt.lr)
 
     # End of training loop for this epoch
     toc = time.time()
@@ -380,11 +394,11 @@ for iteration in range(epoch):
     if (iteration + 1) % 10 == 0:
         prefix = 'cnn'
         cnn_model.symbol.save('./%s-symbol.json' % prefix)
-        save_dict = {('arg:%s' % k) :v  for k, v in cnn_model.cnn_exec.arg_dict.items()}
+        save_dict = {('arg:%s' % k) : v  for k, v in cnn_model.cnn_exec.arg_dict.items()}
         save_dict.update({('aux:%s' % k) : v for k, v in cnn_model.cnn_exec.aux_dict.items()})
         param_name = './%s-%04d.params' % (prefix, iteration)
         mx.nd.save(param_name, save_dict)
-        print >> logs, 'Saved checkpoint to %s' % param_name
+        print('Saved checkpoint to %s' % param_name)
 
 
     # Evaluate model after this epoch on dev (test) set
@@ -406,10 +420,28 @@ for iteration in range(epoch):
         num_total += len(batchY)
 
     dev_acc = num_correct * 100 / float(num_total)
-    print >> logs, 'Iter [%d] Train: Time: %.3fs, Training Accuracy: %.3f \
-            --- Dev Accuracy thus far: %.3f' % (iteration, train_time, train_acc, dev_acc)
+    print('Iter [%d] Train: Time: %.3fs, Training Accuracy: %.3f \
+            --- Dev Accuracy thus far: %.3f' % (iteration, train_time, train_acc, dev_acc))
 ```
 
+
+    optimizer rmsprop
+    maximum gradient 5.0
+    learning rate (step size) 0.0005
+    epochs to train for 50
+    Iter [0] Train: Time: 3.903s, Training Accuracy: 56.290             --- Dev Accuracy thus far: 63.300
+    Iter [1] Train: Time: 3.142s, Training Accuracy: 71.917             --- Dev Accuracy thus far: 69.400
+    Iter [2] Train: Time: 3.146s, Training Accuracy: 80.508             --- Dev Accuracy thus far: 73.900
+    Iter [3] Train: Time: 3.142s, Training Accuracy: 87.233             --- Dev Accuracy thus far: 76.300
+    Iter [4] Train: Time: 3.145s, Training Accuracy: 91.057             --- Dev Accuracy thus far: 77.100
+    Iter [5] Train: Time: 3.145s, Training Accuracy: 94.073             --- Dev Accuracy thus far: 77.700
+    Iter [6] Train: Time: 3.147s, Training Accuracy: 96.000             --- Dev Accuracy thus far: 77.400
+    Iter [7] Train: Time: 3.150s, Training Accuracy: 97.399             --- Dev Accuracy thus far: 77.100
+    Iter [8] Train: Time: 3.144s, Training Accuracy: 98.425             --- Dev Accuracy thus far: 78.000
+    Saved checkpoint to ./cnn-0009.params
+    Iter [9] Train: Time: 3.151s, Training Accuracy: 99.192             --- Dev Accuracy thus far: 77.100
+    ...
+
 Now that we have gone through the trouble of training the model, we have stored the learned parameters in the .params file in our local directory. We can now load this file whenever we want and predict the sentiment of new sentences by running them through a forward pass of the trained model.
 
 ## References


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services