You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/14 19:15:49 UTC
[GitHub] piiswrong closed pull request #9061: improve image io example
piiswrong closed pull request #9061: improve image io example
URL: https://github.com/apache/incubator-mxnet/pull/9061
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/docs/tutorials/basic/image_io.md b/docs/tutorials/basic/image_io.md
index b017c9fb14..e6434257b7 100644
--- a/docs/tutorials/basic/image_io.md
+++ b/docs/tutorials/basic/image_io.md
@@ -7,9 +7,9 @@ iterators to process image data.
There are mainly three ways of loading image data in MXNet:
-- [NEW] `mx.img.ImageIter`: implemented in python, easily customizable, can load
+- [NEW] [mx.img.ImageIter](https://mxnet.incubator.apache.org/versions/master/api/python/image/image.html#mxnet.image.ImageIter): implemented in python, easily customizable, can load
from both .rec files and raw image files.
-- [OLD] `mx.io.ImageRecordIter`: implemented in backend (C++), less customizable
+- [OLD] [mx.io.ImageRecordIter](https://mxnet.incubator.apache.org/versions/master/api/python/io.html#mxnet.io.ImageRecordIter): implemented in backend (C++), less customizable
but can be used in all language bindings, load from .rec files
- Custom iterator by inheriting mx.io.DataIter
@@ -17,7 +17,7 @@ First, we explain the record io file format used by mxnet:
## RecordIO
-Record IO is the main file format used by MXNet for data IO. It supports reading
+[Record IO](https://mxnet.incubator.apache.org/architecture/note_data_loading.html#data-format) is the main file format used by MXNet for data IO. It supports reading
and writing on various file systems including distributed file systems like
Hadoop HDFS and AWS S3. First, we download the Caltech 101 dataset that
contains 101 classes of objects and convert them into record io format:
@@ -34,7 +34,7 @@ import matplotlib.pyplot as plt
MXNET_HOME = '/scratch/mxnet'
```
-Download and unzip:
+Download and unzip the dataset. The dataset is about ~126MB and may take some time:
```python
os.system('wget http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz -P data/')
@@ -43,15 +43,18 @@ os.system('tar -xf 101_ObjectCategories.tar.gz')
os.chdir('../')
```
-Let's take a look at the data. As you can see, under the
-[root folder](./data/101_ObjectCategories) every category has a
-[subfolder](./data/101_ObjectCategories/yin_yang).
+Let's take a look at the data.
+
+As you can see, under the
+root folder (data/101_ObjectCategories) every category has a
+subfolder (e.g. data/101_ObjectCategories/yin_yang).
Now let's convert them into record io format. First we need to make a list that
contains all the image files and their categories:
```python
+assert(MXNET_HOME != '/scratch/mxnet'), "Please update your MXNet location"
os.system('python %s/tools/im2rec.py --list=1 --recursive=1 --shuffle=1 --test-ratio=0.2 data/caltech data/101_ObjectCategories'%MXNET_HOME)
```
@@ -66,7 +69,7 @@ Then we can use this list to create our record io file:
os.system("python %s/tools/im2rec.py --num-thread=4 --pass-through=1 data/caltech data/101_ObjectCategories"%MXNET_HOME)
```
-The record io files are now saved at [here](./data)
+The record io files are now saved in the "data" directory.
## ImageRecordIter
diff --git a/tools/im2rec.py b/tools/im2rec.py
index ec6de19694..8f14543f35 100644
--- a/tools/im2rec.py
+++ b/tools/im2rec.py
@@ -28,7 +28,6 @@
import cv2
import time
import traceback
-from builtins import range
try:
import multiprocessing
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services