You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by jx...@apache.org on 2018/01/25 00:00:19 UTC
[incubator-mxnet] branch master updated: Data-iterator tutorial
made python3 compatible. (#9460)
This is an automated email from the ASF dual-hosted git repository.
jxie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git
The following commit(s) were added to refs/heads/master by this push:
new 5166e57 Data-iterator tutorial made python3 compatible. (#9460)
5166e57 is described below
commit 5166e57dc0c76e844a9cda70550ea98a91c225af
Author: Pracheer Gupta <pr...@hotmail.com>
AuthorDate: Wed Jan 24 16:00:15 2018 -0800
Data-iterator tutorial made python3 compatible. (#9460)
* Data-iterator tutorial made python3 compatible.
Faced 2 main issues while executing this http://mxnet.incubator.apache.org/tutorials/basic/data.html
tutorial on python3:
1. Zip function has changed in python3. It returns an iterator which gets exhausted
after it is iterated over. More info:
https://stackoverflow.com/questions/31683959/the-zip-function-in-python-3/31684038#31684038
2. Some of the methods in MXNet assume the parameter to be of type string in python2
but as bytes in python3.
* Create list of zipped elements to simplify SimpleIter.
---
docs/tutorials/basic/data.md | 40 +++++++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/docs/tutorials/basic/data.md b/docs/tutorials/basic/data.md
index 60a7ec1..b60626a 100644
--- a/docs/tutorials/basic/data.md
+++ b/docs/tutorials/basic/data.md
@@ -44,6 +44,7 @@ Before diving into the details let's setup the environment by importing some req
import mxnet as mx
%matplotlib inline
import os
+import sys
import subprocess
import numpy as np
import matplotlib.pyplot as plt
@@ -100,12 +101,11 @@ Thus we can create a new iterator by:
The example below shows how to create a Simple iterator.
```python
-
class SimpleIter(mx.io.DataIter):
def __init__(self, data_names, data_shapes, data_gen,
label_names, label_shapes, label_gen, num_batches=10):
- self._provide_data = zip(data_names, data_shapes)
- self._provide_label = zip(label_names, label_shapes)
+ self._provide_data = list(zip(data_names, data_shapes))
+ self._provide_label = list(zip(label_names, label_shapes))
self.num_batches = num_batches
self.data_gen = data_gen
self.label_gen = label_gen
@@ -180,6 +180,30 @@ mod = mx.mod.Module(symbol=net)
mod.fit(data_iter, num_epoch=5)
```
+A note on python 3 usage: Lot of the methods in mxnet use string for python2 and bytes for python3.
+In order to keep this tutorial readable, we are going to define a utility function that converts
+string to bytes in python 3 environment
+
+```python
+def str_or_bytes(str):
+ """
+ A utility function for this tutorial that helps us convert string
+ to bytes if we are using python3.
+
+ Parameters
+ ----------
+ str : string
+
+ Returns
+ -------
+ string (python2) or bytes (python3)
+ """
+ if sys.version_info[0] < 3:
+ return str
+ else:
+ return bytes(str, 'utf-8')
+```
+
## Record IO
Record IO is a file format used by MXNet for data IO.
It compactly packs the data for efficient read and writes from distributed file system like Hadoop HDFS and AWS S3.
@@ -197,7 +221,8 @@ using `MXRecordIO`. The files are named with a `.rec` extension.
```python
record = mx.recordio.MXRecordIO('tmp.rec', 'w')
for i in range(5):
- record.write('record_%d'%i)
+ record.write(str_or_bytes('record_%d'%i))
+
record.close()
```
@@ -221,7 +246,8 @@ We will create an indexed record file and a corresponding index file as below:
```python
record = mx.recordio.MXIndexedRecordIO('tmp.idx', 'tmp.rec', 'w')
for i in range(5):
- record.write_idx(i, 'record_%d'%i)
+ record.write_idx(i, str_or_bytes('record_%d'%i))
+
record.close()
```
@@ -255,11 +281,11 @@ The `mx.recordio` package provides a few utility functions for such operations,
data = 'data'
label1 = 1.0
header1 = mx.recordio.IRHeader(flag=0, label=label1, id=1, id2=0)
-s1 = mx.recordio.pack(header1, data)
+s1 = mx.recordio.pack(header1, str_or_bytes(data))
label2 = [1.0, 2.0, 3.0]
header2 = mx.recordio.IRHeader(flag=3, label=label2, id=2, id2=0)
-s2 = mx.recordio.pack(header2, data)
+s2 = mx.recordio.pack(header2, str_or_bytes(data))
```
```python
--
To stop receiving notification emails like this one, please contact
jxie@apache.org.