You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/20 03:05:51 UTC
[GitHub] rahul003 closed pull request #9135: tutorial for distributed training

rahul003 closed pull request #9135: tutorial for distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9135
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/faq/index.md b/docs/faq/index.md
index e5807f42fc..e714d44d10 100644
--- a/docs/faq/index.md
+++ b/docs/faq/index.md
@@ -17,6 +17,8 @@ and full working examples, visit the [tutorials section](../tutorials/index.md).
 ## Scale
 * [How can I train with multiple CPU/GPUs with data parallelism?](http://mxnet.io/how_to/multi_devices.html)
 
+* [How can I train using multiple machines?](http://mxnet.io/how_to/distributed_training.html)
+
 * [How can I train with multiple GPUs with model parallelism?](http://mxnet.io/how_to/model_parallel_lstm.html)
 
 
diff --git a/docs/faq/multi_devices.md b/docs/faq/multi_devices.md
index c79d1f80be..38ec7ed429 100644
--- a/docs/faq/multi_devices.md
+++ b/docs/faq/multi_devices.md
@@ -82,124 +82,7 @@ Note that this option may result in higher GPU memory usage.
 
 When using a large number of GPUs, e.g. >=4, we suggest using `device` for better performance.
 
-## Distributed Training with Multiple Machines
+## Distributed training with multiple devices across machines
 
-`KVStore` also supports a number of options for running on multiple machines.
-
-- `dist_sync` behaves similarly to `local` but exhibits one major difference.
-  With `dist_sync`, `batch-size` now means the batch size used on each machine.
-  So if there are *n* machines and we use batch size *b*,
-  then `dist_sync` behaves like `local` with batch size *n\*b*.
-- `dist_device_sync` is similar to `dist_sync`. The difference between them is that
-  `dist_device_sync` aggregates gradients and updates weight on GPUs
-  while `dist_sync` does so on CPU memory.
-- `dist_async`  performs asynchronous updates.
-  The weight is updated whenever gradients are received from any machine.
-  The update is atomic, i.e., no two updates happen on the same weight at the same time.
-  However, the order is not guaranteed.
-
-### How to Launch a Job
-
-> To use distributed training, we need to compile with `USE_DIST_KVSTORE=1`
-> (see [MXNet installation guide](http://mxnet.io/get_started/install.html) for more options).
-
-Launching a distributed job is a bit different from running on a single
-machine. MXNet provides
-[tools/launch.py](https://github.com/dmlc/mxnet/blob/master/tools/launch.py) to
-start a job by using `ssh`, `mpi`, `sge`, or `yarn`.
-
-An easy way to set up a cluster of EC2 instances for distributed deep learning
-is using an [AWS CloudFormation template](https://github.com/awslabs/deeplearning-cfn).
-If you do not have a cluster, you can check the repository before you continue.
-
-Assume we are at the directory `mxnet/example/image-classification`
-and want to train LeNet to classify MNIST images, as demonstrated here:
-[train_mnist.py](https://github.com/dmlc/mxnet/blob/master/example/image-classification/train_mnist.py).
-
-On a single machine, we can run:
-
-```bash
-python train_mnist.py --network lenet
-```
-
-Now, say we are given two ssh-able machines and _MXNet_ is installed on both machines.
-We want to train LeNet on these two machines.
-First, we save the IPs (or hostname) of these two machines in file `hosts`, e.g.
-
-```bash
-$ cat hosts
-172.30.0.172
-172.30.0.171
-```
-
-Next, if the mxnet folder is accessible from both machines, e.g. on a
-[network filesystem](https://help.ubuntu.com/lts/serverguide/network-file-system.html),
-then we can run:
-
-```bash
-python ../../tools/launch.py -n 2 --launcher ssh -H hosts python train_mnist.py --network lenet --kv-store dist_sync
-```
-
-Note that here we
-
-- use `launch.py` to submit the job.
-- provide launcher, `ssh` if all machines are ssh-able, `mpi` if `mpirun` is
-  available, `sge` for Sun Grid Engine, and `yarn` for Apache Yarn.
-- `-n` number of worker nodes to run on
-- `-H` the host file which is required by `ssh` and `mpi`
-- `--kv-store` use either `dist_sync` or `dist_async`
-
-
-### Synchronize Directory
-
-Now consider if the mxnet folder is not accessible.
-We can first copy the `MXNet` library to this folder by
-```bash
-cp -r ../../python/mxnet .
-cp -r ../../lib/libmxnet.so mxnet
-```
-
-then ask `launch.py` to synchronize the current directory to all machines'
- `/tmp/mxnet` directory with `--sync-dst-dir`
-
-```bash
-python ../../tools/launch.py -n 2 -H hosts --sync-dst-dir /tmp/mxnet \
-   python train_mnist.py --network lenet --kv-store dist_sync
-```
-
-
-### Gradient compression
-
-If your model has fully connected components or recurrent neural networks, you may achieve increased training speed using gradient compression with potentially slight loss of accuracy. Please see [Gradient Compression](https://mxnet.incubator.apache.org/versions/master/faq/gradient_compression.html) for more details on when and how to use it. For the above example, gradient compression can be enabled by running the following:
-
-```bash
-python ../../tools/launch.py -n 2 --launcher ssh -H hosts python train_mnist.py --network lenet \
-    --kv-store dist_sync --gc-type 2bit
-```
-
-In this example, `gc-type` has been set to `2bit`, to enable two bit gradient compression.
-
-
-### Use a Particular Network Interface
-
-_MXNet_ often chooses the first available network interface.
-But for machines that have multiple interfaces,
-we can specify which network interface to use for data
-communication by the environment variable `DMLC_INTERFACE`.
-For example, to use the interface `eth0`, we can
-
-```
-export DMLC_INTERFACE=eth0; python ../../tools/launch.py ...
-```
-
-### Debug Connection
-
-Set`PS_VERBOSE=1` to see the debug logging, e.g
-```
-export PS_VERBOSE=1; python ../../tools/launch.py ...
-```
-
-### More
-
-- See more launch options by `python ../../tools/launch.py -h`
-- See more options of [ps-lite](http://ps-lite.readthedocs.org/en/latest/how_to.html)
+Refer [Distributed training](https://mxnet.incubator.apache.org/versions/master/how_to/distributed_training.html)
+for information on how distributed training works and how to use it.
diff --git a/docs/how_to/distributed_training.md b/docs/how_to/distributed_training.md
new file mode 100644
index 0000000000..ae65af189a
--- /dev/null
+++ b/docs/how_to/distributed_training.md
@@ -0,0 +1,286 @@
+# Distributed training
+MXNet supports distributed training enabling us to leverage multiple machines for faster training.
+In this document, we describe how it works, how to launch a distributed training job and
+some environment variables which provide more control.
+
+## Type of parallelism
+There are two ways in which we can distribute the workload of training a neural network across multiple devices (can be either GPU or CPU).
+The first way is *data parallelism*, which refers to the case where each device stores a complete copy of the model.
+Each device works with a different part of the dataset, and the devices collectively update a shared model.
+These devices can be located on a single machine or across multiple machines.
+In this document, we describe how to train a model with devices distributed across machines in a data parallel way.
+
+When models are so large that they don't fit into device memory, then a second way called *model parallelism* is useful.
+Here, different devices are assigned the task of learning different parts of the model.
+Currently, MXNet supports Model parallelism in a single machine only. Refer [Training with multiple GPUs using model parallelism](https://mxnet.incubator.apache.org/versions/master/how_to/model_parallel_lstm.html) for more on this.
+
+## How does distributed training work?
+The architecture of distributed training in MXNet is as follows:
+#### Types of processes
+MXNet has three types of processes which communicate with each other to accomplish training of a model.
+- Worker: A worker node actually performs training on a batch of training samples.
+Before processing each batch, the workers pull weights from servers.
+The workers also send gradients to the servers after each batch.
+Depending on the workload for training a model, it might not be a good idea to run multiple worker processes on the same machine.
+- Server: There can be multiple servers which store the model's parameters, and communicate with workers.
+A server may or may not be co-located with the worker processes.
+- Scheduler: There is only one scheduler.
+The role of the scheduler is to set up the cluster.
+This includes waiting for messages that each node has come up and which port the node is listening on.
+The scheduler then lets all processes know about every other node in the cluster, so that they can communicate with each other.
+
+#### KV Store
+MXNet provides a key-value store, which is a critical component used for multi-device and distributed training.
+It provides a push and pull API for workers to communicate the parameters of the models. It stores a parameter value for each key.
+Workers `push` gradients after processing a batch, and `pull` updated weights before processing a new batch.
+We can also pass in optimizers for the KVStore to use while updating each weight. Optimizers like Stochastic Gradient Descent define an update rule,
+essentially a mathematical formula to compute the new weight based on the old weight, gradient, and some parameters.
+
+If you are using a Gluon Trainer object or the Module API,
+it uses a kvstore object internally to aggregate gradients from multiple devices on the same machine as well as to communicate across different machines.
+
+Although the API remains the same whether or not multiple machines are being used,
+the notion of kvstore server exists only during distributed training.
+In this case, each `push` and `pull` involves communication with the kvstore servers.
+Note that we need to compile MXNet with the build flag `USE_DIST_KVSTORE=1` to use distributed training.
+
+The distributed mode of KVStore is enabled by calling `mxnet.kvstore.create` function
+with a string argument which contains the word `dist` as follows:
+> kv = mxnet.kvstore.create('dist_sync')
+
+Refer [KVStore API](https://mxnet.incubator.apache.org/versions/master/api/python/kvstore/kvstore.html) for more information about KVStore.
+
+#### Data iterators
+When running distributed training in data parallel mode,
+we want the data iterators on each machine to be working on different parts of the dataset.
+
+For data parallel training on a single worker,
+we can use `mxnet.gluon.utils.split_and_load` to split a batch of samples provided by the data iterator, and then load each part of the batch on the device which will process it.
+In the case of distributed training, one way to ensure that different workers
+process different samples is to divide the dataset into `n` parts at the beginning, one for each worker.
+Within the part of the dataset each worker has, we can continue to split as before for each device on that worker.
+
+Typically, this split of data for each worker happens through the data iterator,
+on passing the number of parts and the index of parts to iterate over.
+Some iterators in MXNet that support this feature are [mxnet.io.MNISTIterator](https://mxnet.incubator.apache.org/versions/master/api/python/io/io.html#mxnet.io.MNISTIter) and [mxnet.io.ImageRecordIter](https://mxnet.incubator.apache.org/versions/master/api/python/io/io.html#mxnet.io.ImageRecordIter).
+If you are using a different iterator, you can look at how the above iterators implement this.
+We can use the kvstore object to get the number of workers (`kv.num_workers`) and rank of the current worker (`kv.rank`).
+These can be passed as arguments to the iterator.
+You can look at [example/gluon/image_classification.py](https://github.com/apache/incubator-mxnet/blob/master/example/gluon/image_classification.py)
+to see an example usage.
+
+#### Different modes of distributed training
+Different modes of distributed training can be enabled by using different types of kvstore.
+Distributed training itself is enabled when kvstore creation string contains the word `dist`.
+
+- `dist_sync`: In synchronous distributed training, all workers use the same synchronized set of model parameters at the start of every batch.
+This means that after each batch, the server waits to receive gradients from each worker before it updates the model parameters.
+This synchronization comes at a cost because the worker pulling parameters would have to wait till the server finishes this process.
+In this mode, if a worker crashes, then it halts the progress of all workers.
+
+- `dist_async`: In asynchronous distributed training, the server receives gradients from one worker and immediately updates its store, which it uses to respond to any future pulls.
+This means that a worker who finishes processing a batch can pull the current parameters from server and start the next batch,
+even if other workers haven't finished processing the earlier batch.
+This is faster than `dist_sync` but can take more epochs to converge.
+In `async` mode, it is required to pass an optimizer because in the absence of an optimizer kvstore would replace the stored weights with received weights and this doesn't make sense for training in asynchronous mode.
+The update of weights is atomic, meaning no two updates happen on the same weight at the same time. However, the order  of updates is not guaranteed.
+
+- `dist_sync_device`: Same as `dist_sync` except that when there are multiple GPUs being used on each node,
+this mode aggregates gradients and updates weights on GPU while dist_sync does so on CPU memory.
+This is faster than `dist_sync` because it reduces expensive communication between GPU and CPU, but it increases memory usage on GPU.
+
+- `dist_async_device` : The analogue of `dist_sync_device` but in asynchronous mode.
+
+#### Distribution of parameter arrays
+Each server doesn't necessarily store all the parameter arrays.
+Arrays are distributed across different servers. The decision of which server stores a particular array is made at random.
+The worker processes are unaware of this distribution because kvstore ensures that when a particular key is being pulled, this request is sent to the server which has the corresponding value.
+If the value of some key is very large, it may be sharded across different servers.
+Again, this is handled internally, so that the worker does not have to do anything differently.
+The threshold for this sharding can be controlled with the environment variable `MXNET_KVSTORE_BIGARRAY_BOUND`.
+See [environment variables](#environment-variables) for more details.
+
+#### Gradient compression
+When communication cost is expensive, and the ratio of computation time to communication time is low, communication can become a bottleneck.
+In such cases, gradient compression can be used to reduce the cost of communication, thereby speeding up training.
+Refer [Gradient compression](https://mxnet.incubator.apache.org/versions/master/how_to/gradient_compression.html) for more details.
+
+Note: For small models when the cost of computation is much lower than cost of communication,
+distributed training might actually be slower than training on a single machine because of the overhead of communication and synchronization.
+
+## How to start distributed training?
+MXNet provides a script tools/launch.py to make it easy to launch a distributed training job. This supports various types of cluster resource managers like `ssh`, `mpirun`, `yarn` and `sge`.
+If you already have one of these clusters setup, you can skip the next section on setting up a cluster.
+If you want to use a type of cluster not mentioned above, skip ahead to Manually launching jobs section.
+
+### Setting up the cluster
+An easy way to set up a cluster of EC2 instances for distributed deep learning is by using the [AWS CloudFormation template](https://github.com/awslabs/deeplearning-cfn).
+If you can not use the above, this section will help you manually set up a cluster of instances
+to enable you to use `ssh` for launching a distributed training job.
+Let us denote one machine as the `master` of the cluster, through which we will launch and monitor the distributed training on all machines.
+
+If the machines in your cluster are a part of a cloud computing platform like AWS EC2, then your instances should be using key-based authentication already.
+Ensure that you create all instances using the same key, say `mxnet-key` and in the same security group.
+Next, we need to ensure that master has access to all other machines in the cluster through `ssh` by
+adding this key to [ssh-agent](https://en.wikipedia.org/wiki/Ssh-agent) and forwarding it to master when we log in. This will make `mxnet-key` the default key on master.
+
+```
+ssh-add .ssh/mxnet-key
+ssh -A user@MASTER_IP_ADDRESS
+```
+
+
+If your machines use passwords for authentication, see [here](https://help.ubuntu.com/community/SSH/OpenSSH/Keys) for instructions on setting up password-less authentication between machines.
+
+
+It is easier if all these machines have a shared file system so that they can access the training script. One way is to use Amazon Elastic File System to create your network file system.
+The options in the following command are the recommended options when mounting an AWS Elastic File System.
+
+```
+sudo mkdir efs && sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 NETWORK_FILE_SYSTEM_IP:/ efs
+```
+
+Tip: You might find it helpful to store large datasets on S3 for easy access from all machines in the cluster. Refer [Using data from S3 for training](https://mxnet.incubator.apache.org/versions/master/how_to/s3_integration.html) for more information.
+
+### Using launch.py
+MXNet provides a script [tools/launch.py](https://github.com/apache/incubator-mxnet/blob/master/tools/launch.py) to make it easy to launch distributed training on a cluster with `ssh`, `mpi`, `sge` or `yarn`.
+You can fetch this script by cloning the mxnet repository.
+
+```
+git clone --recursive https://github.com/apache/incubator-mxnet
+```
+
+##### Example
+Let us consider training a VGG11 model on the CIFAR10 dataset using [example/gluon/image_classification.py](https://github.com/apache/incubator-mxnet/blob/master/example/gluon/image_classification.py).
+```
+cd example/gluon/
+```
+On a single machine, we can run this script as follows:
+```
+python image_classification.py --dataset cifar10 --model vgg11 --num-epochs 1
+```
+
+For distributed training of this example, we would do the following:
+
+If the mxnet directory which contains the script `image_classification.py` is accessible to all machines in the cluster (for example if they are on a network file system), we can run:
+```
+../../tools/launch.py -n 3 -H hosts --launcher ssh python image_classification.py --dataset cifar10 --model vgg11 --num-epochs 1 --kvstore dist_sync
+```
+
+If the directory with the script is not accessible from the other machines in the cluster, then we can synchronize the current directory to all machines.
+```
+../../tools/launch.py -n 3 -H hosts --launcher ssh --sync-dst-dir /tmp/mxnet_job/ python image_classification.py --dataset cifar10 --model vgg11 --num-epochs 1 --kvstore dist_sync
+```
+
+> Tip: If you don't have a cluster ready and still want to try this out, pass the option `--launcher local` instead of `ssh`
+
+#### Options
+Here, launch.py is used to submit the distributed training job. It takes the following options:
+- `-n` denotes the number of worker nodes to be launched.
+- `-s` denotes the number of server nodes to be launched.
+If it is not specified, it is taken to be equal to the number of worker nodes.
+The script tries to cycle through the hosts file to launch the servers and workers.
+For example, if you have 5 hosts in the hosts file and you passed `n` as 3 (and nothing for `s`).
+The script will launch a total of 3 server processes,
+one each for the first three hosts and launch a total of 3 worker processes, one each for the fourth, fifth and first host.
+If the hosts file has exactly `n` number of worker nodes, it will launch a server and worker process on each of the `n` hosts.
+- `--launcher` denotes the mode of communication. The options are:
+    - `ssh` if machines can communicate through ssh without passwords. This is the default launcher mode.
+    - `mpi` if Open MPI is available
+    - `sge` for Sun Grid Engine
+    - `yarn` for Apache Yarn
+    - `local` for launching all processes on the same local machine. This can be used for debugging purposes.
+- `-H` requires the path of the hosts file
+  This file contains IPs of the machines in the cluster. These machines should be able to communicate with each other without using passwords.
+  This file is only applicable and required when the launcher mode is `ssh` or `mpi`.
+  An example of the contents of the hosts file would be:
+  ```
+  172.30.0.172
+  172.31.0.173
+  172.30.1.174
+  ```
+- `--sync-dst-dir` takes the path of a directory on all hosts to which the current working directory will be synchronized. This only supports `ssh` launcher mode.
+This is necessary when the working directory is not accessible to all machines in the cluster. Setting this option synchronizes the current directory using rsync before the job is launched.  
+If you have not installed MXNet system-wide
+then you have to copy the folder `python/mxnet` and the file `lib/libmxnet.so` into the current directory before running `launch.py`.
+For example if you are in `example/gluon`, you can do this with `cp -r ../../python/mxnet ../../lib/libmxnet.so .`. This would work if your `lib` folder contains `libmxnet.so`, as would be the case when you use make. If you use CMake, this file would be in your `build` directory.
+
+- `python image_classification.py --dataset cifar10 --model vgg11 --num-epochs 1 --kvstore dist_sync`
+is the command for the training job on each machine. Note the use of `dist_sync` for the kvstore used in the script.
+
+#### Terminating jobs
+If the training job crashes due to an error or if we try to terminate the launch script while training is running,
+jobs on all machines might not have terminated. In such a case, we would need to terminate them manually.
+If we are using `ssh` launcher, this can be done by running the following command where `hosts` is the path of the hostfile.
+```
+while read -u 10 host; do ssh -o "StrictHostKeyChecking no" $host "pkill -f python" ; done 10<hosts
+```
+
+### Manually launching jobs
+If for some reason, you do not want to use the script above to start distributed training, then this section will be helpful.
+MXNet uses environment variables to assign roles to different processes and to let different processes find the scheduler.
+The environment variables are required to be set correctly as follows for the training to start:
+- `DMLC_ROLE`: Specifies the role of the process. This can be `server`, `worker` or `scheduler`. Note that there should only be one `scheduler`.
+When `DMLC_ROLE` is set to `server` or `scheduler`, these processes start when mxnet is imported.
+- `DMLC_PS_ROOT_URI`: Specifies the IP of the scheduler
+- `DMLC_PS_ROOT_PORT`: Specifies the port that the scheduler listens to
+- `DMLC_NUM_SERVER`: Specifies how many server nodes are in the cluster
+- `DMLC_NUM_WORKER`: Specifies how many worker nodes are in the cluster
+
+Below is an example to start all jobs locally on Linux or Mac. Note that starting all jobs on the same machine is not a good idea.
+This is only to make the usage clear.
+```
+export COMMAND=python example/gluon/mnist.py --dataset cifar10 --model vgg11 --num-epochs 1 --kv-store dist_async
+DMLC_ROLE=server DMLC_PS_ROOT_URI=127.0.0.1 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=2 COMMAND &
+DMLC_ROLE=server DMLC_PS_ROOT_URI=127.0.0.1 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=2 COMMAND &
+DMLC_ROLE=scheduler DMLC_PS_ROOT_URI=127.0.0.1 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=2 COMMAND &
+DMLC_ROLE=worker DMLC_PS_ROOT_URI=127.0.0.1 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=2 COMMAND &
+DMLC_ROLE=worker DMLC_PS_ROOT_URI=127.0.0.1 DMLC_PS_ROOT_PORT=9092 DMLC_NUM_SERVER=2 DMLC_NUM_WORKER=2 COMMAND
+```
+For an in-depth discussion of how the scheduler sets up the cluster, you can go [here](https://blog.kovalevskyi.com/mxnet-distributed-training-explained-in-depth-part-1-b90c84bda725).
+
+## Environment variables
+#### For tuning performance
+ - `MXNET_KVSTORE_REDUCTION_NTHREADS`
+  Value type: Integer
+  Default value: 4
+  Specifies the number of CPU threads used for summing up big arrays.
+
+- `MXNET_KVSTORE_BIGARRAY_BOUND`
+  Value type: Integer
+  Default value: 1000000
+  The minimum size of a *big array*.
+  When the array size is bigger than this threshold, `MXNET_KVSTORE_REDUCTION_NTHREADS` threads are used for reduction.
+  This parameter is also used as a load balancer in kvstore.
+  It controls when to partition a single weight to all the servers.
+  If the size of a single weight is less than this bound, then it is sent to a single randomly picked server; otherwise, it is partitioned to all the servers.
+
+- `MXNET_ENABLE_GPU_P2P` GPU Peer-to-Peer communication
+  Value type: 0(false) or 1(true)
+  Default value: 1
+  If true, MXNet tries to use GPU peer-to-peer communication, if available on your device. This is used only when kvstore has the type `device` in it.
+#### Communication
+- `DMLC_INTERFACE` Using a particular network interface
+  Value type: Name of interface
+  Example: `eth0`
+  MXNet often chooses the first available network interface.
+  But for machines with multiple interfaces, we can specify which network interface to use for data communication using this environment variable.
+
+- `PS_VERBOSE` Logging communication
+  Value type: 1 or 2
+  Default value: (empty)
+    - `PS_VERBOSE=1` logs connection information like the IPs and ports of all nodes
+    - `PS_VERBOSE=2` logs all data communication information
+
+When the network is unreliable, messages being sent from one node to another might get lost.
+The training process can hang when a critical message is not successfully delivered.
+In such cases, an additional ACK can be sent for each message to track its delivery.
+This can be done by setting `PS_RESEND` and `PS_RESEND_TIMEOUT`
+- `PS_RESEND` Retransmission for unreliable network
+Value type: 0(false) or 1(true)
+Default value: 0
+Whether or not to enable retransmission of messages
+- `PS_RESEND_TIMEOUT` Timeout for ACK to be received
+Value type: Integer (in milliseconds)
+Default value: 1000
+If ACK is not received in `PS_RESEND_TIMEOUT` milliseconds, then the message will be resent.
diff --git a/example/adversary/adversary_generation.ipynb b/example/adversary/adversary_generation.ipynb
index 8adae3d3b7..b8804bd813 100644
--- a/example/adversary/adversary_generation.ipynb
+++ b/example/adversary/adversary_generation.ipynb
@@ -28,10 +28,7 @@
     "import matplotlib.pyplot as plt\n",
     "import matplotlib.cm as cm\n",
     "\n",
-    "import os\n",
-    "import sys\n",
-    "sys.path.append(os.path.join(os.getcwd(), \"../../tests/python/common\"))\n",
-    "from get_data import MNISTIterator"
+    "from mxnet.test_utils import get_mnist_iterator"
    ]
   },
   {
@@ -53,7 +50,7 @@
    "source": [
     "dev = mx.cpu()\n",
     "batch_size = 100\n",
-    "train_iter, val_iter = mnist_iterator(batch_size=batch_size, input_shape = (1,28,28))"
+    "train_iter, val_iter = get_mnist_iterator(batch_size=batch_size, input_shape = (1,28,28))"
    ]
   },
   {
diff --git a/example/caffe/data.py b/example/caffe/data.py
index fac8e11989..15276c4236 100644
--- a/example/caffe/data.py
+++ b/example/caffe/data.py
@@ -15,19 +15,14 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import sys
-import os
-# code to automatically download dataset
-curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__)))
-sys.path.append(os.path.join(curr_path, "../../tests/python/common"))
-import get_data
 import mxnet as mx
+from mxnet.test_utils import get_mnist_ubyte
 
 def get_iterator(data_shape, use_caffe_data):
     def get_iterator_impl_mnist(args, kv):
         """return train and val iterators for mnist"""
         # download data
-        get_data.GetMNIST_ubyte()
+        get_mnist_ubyte()
         flat = False if len(data_shape) != 1 else True
 
         train           = mx.io.MNISTIter(
diff --git a/example/gluon/data.py b/example/gluon/data.py
index 67519e6a20..dc8f12e81f 100644
--- a/example/gluon/data.py
+++ b/example/gluon/data.py
@@ -19,39 +19,11 @@
 """ data iterator for mnist """
 import os
 import random
-import sys
-# code to automatically download dataset
-curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__)))
-sys.path.append(os.path.join(curr_path, "../../tests/python/common"))
-import get_data
 import mxnet as mx
+from mxnet.test_utils import get_cifar10
 
-def mnist_iterator(batch_size, input_shape):
-    """return train and val iterators for mnist"""
-    # download data
-    get_data.GetMNIST_ubyte()
-    flat = False if len(input_shape) == 3 else True
-
-    train_dataiter = mx.io.MNISTIter(
-        image="data/train-images-idx3-ubyte",
-        label="data/train-labels-idx1-ubyte",
-        input_shape=input_shape,
-        batch_size=batch_size,
-        shuffle=True,
-        flat=flat)
-
-    val_dataiter = mx.io.MNISTIter(
-        image="data/t10k-images-idx3-ubyte",
-        label="data/t10k-labels-idx1-ubyte",
-        input_shape=input_shape,
-        batch_size=batch_size,
-        flat=flat)
-
-    return (train_dataiter, val_dataiter)
-
-
-def cifar10_iterator(batch_size, data_shape, resize=-1):
-    get_data.GetCifar10()
+def get_cifar10_iterator(batch_size, data_shape, resize=-1, num_parts=1, part_index=0):
+    get_cifar10()
 
     train = mx.io.ImageRecordIter(
         path_imgrec = "data/cifar/train.rec",
@@ -60,7 +32,9 @@ def cifar10_iterator(batch_size, data_shape, resize=-1):
         data_shape  = data_shape,
         batch_size  = batch_size,
         rand_crop   = True,
-        rand_mirror = True)
+        rand_mirror = True,
+        num_parts=num_parts,
+        part_index=part_index)
 
     val = mx.io.ImageRecordIter(
         path_imgrec = "data/cifar/test.rec",
@@ -69,11 +43,14 @@ def cifar10_iterator(batch_size, data_shape, resize=-1):
         rand_crop   = False,
         rand_mirror = False,
         data_shape  = data_shape,
-        batch_size  = batch_size)
+        batch_size  = batch_size,
+        num_parts=num_parts,
+        part_index=part_index)
 
     return train, val
 
-def imagenet_iterator(train_data, val_data, batch_size, data_shape, resize=-1):
+
+def get_imagenet_iterator(train_data, val_data, batch_size, data_shape, resize=-1, num_parts=1, part_index=0):
     train = mx.io.ImageRecordIter(
         path_imgrec             = train_data,
         data_shape              = data_shape,
@@ -96,7 +73,9 @@ def imagenet_iterator(train_data, val_data, batch_size, data_shape, resize=-1):
         max_random_shear_ratio  = 0.1,
         max_random_aspect_ratio = 0.25,
         fill_value              = 127,
-        min_random_scale        = 0.533)
+        min_random_scale        = 0.533,
+        num_parts               = num_parts,
+        part_index              = part_index)
 
     val = mx.io.ImageRecordIter(
         path_imgrec        = val_data,
@@ -109,7 +88,9 @@ def imagenet_iterator(train_data, val_data, batch_size, data_shape, resize=-1):
         std_b              = 57.375,
         preprocess_threads = 32,
         batch_size         = batch_size,
-        resize             = resize)
+        resize             = resize,
+        num_parts          = num_parts,
+        part_index         = part_index)
 
     return train, val
 
diff --git a/example/gluon/image_classification.py b/example/gluon/image_classification.py
index a2fb757683..529b977a79 100644
--- a/example/gluon/image_classification.py
+++ b/example/gluon/image_classification.py
@@ -26,12 +26,13 @@
 from mxnet.gluon import nn
 from mxnet.gluon.model_zoo import vision as models
 from mxnet import autograd as ag
+from mxnet.test_utils import get_mnist_iterator
 
 from data import *
 
 # CLI
 parser = argparse.ArgumentParser(description='Train a model for image classification.')
-parser.add_argument('--dataset', type=str, default='mnist',
+parser.add_argument('--dataset', type=str, default='cifar10',
                     help='dataset to use. options are mnist, cifar10, and dummy.')
 parser.add_argument('--train-data', type=str, default='',
                     help='training record file to use, required for imagenet.')
@@ -92,25 +93,31 @@
 
 net = models.get_model(opt.model, **kwargs)
 
-# get dataset iterators
-if dataset == 'mnist':
-    train_data, val_data = mnist_iterator(batch_size, (1, 32, 32))
-elif dataset == 'cifar10':
-    train_data, val_data = cifar10_iterator(batch_size, (3, 32, 32))
-elif dataset == 'imagenet':
-    if model_name == 'inceptionv3':
-        train_data, val_data = imagenet_iterator(opt.train_data, opt.val_data,
-                                              batch_size, (3, 299, 299))
-    else:
-        train_data, val_data = imagenet_iterator(opt.train_data, opt.val_data,
-                                                 batch_size, (3, 224, 224))
-elif dataset == 'dummy':
-    if model_name == 'inceptionv3':
-        train_data, val_data = dummy_iterator(batch_size, (3, 299, 299))
-    else:
-        train_data, val_data = dummy_iterator(batch_size, (3, 224, 224))
-
-def test(ctx):
+def get_data_iters(dataset, batch_size, num_workers=1, rank=0):
+    # get dataset iterators
+    if dataset == 'mnist':
+        train_data, val_data = get_mnist_iterator(batch_size, (1, 28, 28),
+                                                  num_parts=num_workers, part_index=rank)
+    elif dataset == 'cifar10':
+        train_data, val_data = get_cifar10_iterator(batch_size, (3, 32, 32),
+                                                    num_parts=num_workers, part_index=rank)
+    elif dataset == 'imagenet':
+        if model_name == 'inceptionv3':
+            train_data, val_data = get_imagenet_iterator(opt.train_data, opt.val_data,
+                                                         batch_size, (3, 299, 299),
+                                                         num_parts=num_workers, part_index=rank)
+        else:
+            train_data, val_data = get_imagenet_iterator(opt.train_data, opt.val_data,
+                                                         batch_size, (3, 224, 224),
+                                                         num_parts=num_workers, part_index=rank)
+    elif dataset == 'dummy':
+        if model_name == 'inceptionv3':
+            train_data, val_data = dummy_iterator(batch_size, (3, 299, 299))
+        else:
+            train_data, val_data = dummy_iterator(batch_size, (3, 224, 224))
+    return train_data, val_data
+
+def test(ctx, val_data):
     metric = mx.metric.Accuracy()
     val_data.reset()
     for batch in val_data:
@@ -127,9 +134,11 @@ def train(epochs, ctx):
     if isinstance(ctx, mx.Context):
         ctx = [ctx]
     net.initialize(mx.init.Xavier(magnitude=2), ctx=ctx)
+    kv = mx.kv.create(opt.kvstore)
+    train_data, val_data = get_data_iters(dataset, batch_size, kv.num_workers, kv.rank)
     trainer = gluon.Trainer(net.collect_params(), 'sgd',
                             {'learning_rate': opt.lr, 'wd': opt.wd, 'momentum': opt.momentum},
-                            kvstore = opt.kvstore)
+                            kvstore = kv)
     metric = mx.metric.Accuracy()
     loss = gluon.loss.SoftmaxCrossEntropyLoss()
 
@@ -164,7 +173,7 @@ def train(epochs, ctx):
         name, acc = metric.get()
         logging.info('[Epoch %d] training: %s=%f'%(epoch, name, acc))
         logging.info('[Epoch %d] time cost: %f'%(epoch, time.time()-tic))
-        name, val_acc = test(ctx)
+        name, val_acc = test(ctx, val_data)
         logging.info('[Epoch %d] validation: %s=%f'%(epoch, name, val_acc))
 
     net.save_params('image-classifier-%s-%d.params'%(opt.model, epochs))
@@ -175,10 +184,12 @@ def main():
         out = net(data)
         softmax = mx.sym.SoftmaxOutput(out, name='softmax')
         mod = mx.mod.Module(softmax, context=[mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()])
+        kv = mx.kv.create(opt.kvstore)
+        train_data, val_data = get_data_iters(dataset, batch_size, kv.num_workers, kv.rank)
         mod.fit(train_data,
                 eval_data = val_data,
                 num_epoch=opt.epochs,
-                kvstore=opt.kvstore,
+                kvstore=kv,
                 batch_end_callback = mx.callback.Speedometer(batch_size, max(1, opt.log_interval)),
                 epoch_end_callback = mx.callback.do_checkpoint('image-classifier-%s'% opt.model),
                 optimizer = 'sgd',
diff --git a/example/multi-task/example_multi_task.py b/example/multi-task/example_multi_task.py
index 9ea9ad0173..9e898494a1 100644
--- a/example/multi-task/example_multi_task.py
+++ b/example/multi-task/example_multi_task.py
@@ -16,12 +16,8 @@
 # under the License.
 
 # pylint: skip-file
-import sys
-import os
-curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__)))
-sys.path.append(os.path.join(curr_path, "../../tests/python/common"))
-from get_data import MNISTIterator
 import mxnet as mx
+from mxnet.test_utils import get_mnist_iterator
 import numpy as np
 import logging
 import time
@@ -142,7 +138,7 @@ def get_name_value(self):
 lr = 0.01
 
 network = build_network()
-train, val = MNISTIterator(batch_size=batch_size, input_shape = (784,))
+train, val = get_mnist_iterator(batch_size=batch_size, input_shape = (784,))
 train = Multi_mnist_iterator(train)
 val = Multi_mnist_iterator(val)
 
diff --git a/example/numpy-ops/custom_softmax.py b/example/numpy-ops/custom_softmax.py
index a2ec5d54b7..ab94401185 100644
--- a/example/numpy-ops/custom_softmax.py
+++ b/example/numpy-ops/custom_softmax.py
@@ -16,12 +16,8 @@
 # under the License.
 
 # pylint: skip-file
-import sys
-import os
-curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__)))
-sys.path.append(os.path.join(curr_path, "../../tests/python/common"))
-from get_data import MNISTIterator
 import mxnet as mx
+from mxnet.test_utils import get_mnist_iterator
 import numpy as np
 import logging
 
@@ -75,7 +71,7 @@ def create_operator(self, ctx, shapes, dtypes):
 
 # data
 
-train, val = MNISTIterator(batch_size=100, input_shape = (784,))
+train, val = get_mnist_iterator(batch_size=100, input_shape = (784,))
 
 # train
 
diff --git a/example/numpy-ops/ndarray_softmax.py b/example/numpy-ops/ndarray_softmax.py
index 4ced2c5cd8..58eab3d538 100644
--- a/example/numpy-ops/ndarray_softmax.py
+++ b/example/numpy-ops/ndarray_softmax.py
@@ -16,16 +16,11 @@
 # under the License.
 
 # pylint: skip-file
-import os
-import sys
-curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__)))
-sys.path.append(os.path.join(curr_path, "../../tests/python/common"))
-from get_data import MNISTIterator
 import mxnet as mx
+from mxnet.test_utils import get_mnist_iterator
 import numpy as np
 import logging
 
-
 class NDArraySoftmax(mx.operator.NDArrayOp):
     def __init__(self):
         super(NDArraySoftmax, self).__init__(False)
@@ -97,7 +92,7 @@ def backward(self, out_grad, in_data, out_data, in_grad):
 
 # data
 
-train, val = MNISTIterator(batch_size=100, input_shape = (784,))
+train, val = get_mnist_iterator(batch_size=100, input_shape = (784,))
 
 # train
 
diff --git a/example/numpy-ops/numpy_softmax.py b/example/numpy-ops/numpy_softmax.py
index c10dfe3779..88d2473492 100644
--- a/example/numpy-ops/numpy_softmax.py
+++ b/example/numpy-ops/numpy_softmax.py
@@ -16,12 +16,8 @@
 # under the License.
 
 # pylint: skip-file
-import sys
-import os
-curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__)))
-sys.path.append(os.path.join(curr_path, "../../tests/python/common"))
-from get_data import MNISTIterator
 import mxnet as mx
+from mxnet.test_utils import get_mnist_iterator
 import numpy as np
 import logging
 
@@ -70,7 +66,7 @@ def backward(self, out_grad, in_data, out_data, in_grad):
 
 # data
 
-train, val = MNISTIterator(batch_size=100, input_shape = (784,))
+train, val = get_mnist_iterator(batch_size=100, input_shape = (784,))
 
 # train
 
diff --git a/example/python-howto/monitor_weights.py b/example/python-howto/monitor_weights.py
index ab77b4908b..929b0e7bf7 100644
--- a/example/python-howto/monitor_weights.py
+++ b/example/python-howto/monitor_weights.py
@@ -16,12 +16,8 @@
 # under the License.
 
 # pylint: skip-file
-import sys
-import os
-curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__)))
-sys.path.append(os.path.join(curr_path, "../../tests/python/common"))
-from get_data import MNISTIterator
 import mxnet as mx
+from mxnet.test_utils import get_mnist_iterator
 import numpy as np
 import logging
 
@@ -35,7 +31,7 @@
 mlp = mx.symbol.SoftmaxOutput(data = fc3, name = 'softmax')
 
 # data
-train, val = MNISTIterator(batch_size=100, input_shape = (784,))
+train, val = get_mnist_iterator(batch_size=100, input_shape = (784,))
 
 # monitor
 def norm_stat(d):
diff --git a/python/mxnet/test_utils.py b/python/mxnet/test_utils.py
index 53814b766f..1b41011fd9 100644
--- a/python/mxnet/test_utils.py
+++ b/python/mxnet/test_utils.py
@@ -1441,6 +1441,74 @@ def read_data(label_url, image_url):
     return {'train_data':train_img, 'train_label':train_lbl,
             'test_data':test_img, 'test_label':test_lbl}
 
+def get_mnist_pkl():
+    """Downloads MNIST dataset as a pkl.gz into a directory in the current directory
+    with the name `data`
+    """
+    if not os.path.isdir("data"):
+        os.makedirs('data')
+    if not os.path.exists('data/mnist.pkl.gz'):
+        download('http://deeplearning.net/data/mnist/mnist.pkl.gz',
+                 dirname='data')
+
+def get_mnist_ubyte():
+    """Downloads ubyte version of the MNIST dataset into a directory in the current directory
+    with the name `data` and extracts all files in the zip archive to this directory.
+    """
+    if not os.path.isdir("data"):
+        os.makedirs('data')
+    if (not os.path.exists('data/train-images-idx3-ubyte')) or \
+            (not os.path.exists('data/train-labels-idx1-ubyte')) or \
+            (not os.path.exists('data/t10k-images-idx3-ubyte')) or \
+            (not os.path.exists('data/t10k-labels-idx1-ubyte')):
+        zip_file_path = download('http://data.mxnet.io/mxnet/data/mnist.zip',
+                                 dirname='data')
+        with zipfile.ZipFile(zip_file_path) as zf:
+            zf.extractall('data')
+
+def get_cifar10():
+    """Downloads CIFAR10 dataset into a directory in the current directory with the name `data`,
+    and then extracts all files into the directory `data/cifar`.
+    """
+    if not os.path.isdir("data"):
+        os.makedirs('data')
+    if (not os.path.exists('data/cifar/train.rec')) or \
+            (not os.path.exists('data/cifar/test.rec')) or \
+            (not os.path.exists('data/cifar/train.lst')) or \
+            (not os.path.exists('data/cifar/test.lst')):
+        zip_file_path = download('http://data.mxnet.io/mxnet/data/cifar10.zip',
+                                 dirname='data')
+        with zipfile.ZipFile(zip_file_path) as zf:
+            zf.extractall('data')
+
+def get_mnist_iterator(batch_size, input_shape, num_parts=1, part_index=0):
+    """Returns training and validation iterators for MNIST dataset
+    """
+
+    get_mnist_ubyte()
+    flat = False if len(input_shape) == 3 else True
+
+    train_dataiter = mx.io.MNISTIter(
+        image="data/train-images-idx3-ubyte",
+        label="data/train-labels-idx1-ubyte",
+        input_shape=input_shape,
+        batch_size=batch_size,
+        shuffle=True,
+        flat=flat,
+        num_parts=num_parts,
+        part_index=part_index)
+
+    val_dataiter = mx.io.MNISTIter(
+        image="data/t10k-images-idx3-ubyte",
+        label="data/t10k-labels-idx1-ubyte",
+        input_shape=input_shape,
+        batch_size=batch_size,
+        flat=flat,
+        num_parts=num_parts,
+        part_index=part_index)
+
+    return (train_dataiter, val_dataiter)
+
 def get_zip_data(data_dir, url, data_origin_name):
     """Download and extract zip data.
 
diff --git a/tests/python/common/get_data.py b/tests/python/common/get_data.py
deleted file mode 100644
index 5802a06919..0000000000
--- a/tests/python/common/get_data.py
+++ /dev/null
@@ -1,81 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-# pylint: skip-file
-import os, gzip
-import pickle as pickle
-import sys
-from mxnet.test_utils import download
-import zipfile
-import mxnet as mx
-
-# download mnist.pkl.gz
-def GetMNIST_pkl():
-    if not os.path.isdir("data"):
-        os.makedirs('data')
-    if not os.path.exists('data/mnist.pkl.gz'):
-        download('http://deeplearning.net/data/mnist/mnist.pkl.gz',
-                 dirname='data')
-
-# download ubyte version of mnist and untar
-def GetMNIST_ubyte():
-    if not os.path.isdir("data"):
-        os.makedirs('data')
-    if (not os.path.exists('data/train-images-idx3-ubyte')) or \
-       (not os.path.exists('data/train-labels-idx1-ubyte')) or \
-       (not os.path.exists('data/t10k-images-idx3-ubyte')) or \
-       (not os.path.exists('data/t10k-labels-idx1-ubyte')):
-        zip_file_path = download('http://data.mxnet.io/mxnet/data/mnist.zip',
-                                 dirname='data')
-        with zipfile.ZipFile(zip_file_path) as zf:
-            zf.extractall('data')
-
-# download cifar
-def GetCifar10():
-    if not os.path.isdir("data"):
-        os.makedirs('data')
-    if (not os.path.exists('data/cifar/train.rec')) or \
-       (not os.path.exists('data/cifar/test.rec')) or \
-       (not os.path.exists('data/cifar/train.lst')) or \
-       (not os.path.exists('data/cifar/test.lst')):
-        zip_file_path = download('http://data.mxnet.io/mxnet/data/cifar10.zip',
-                                 dirname='data')
-        with zipfile.ZipFile(zip_file_path) as zf:
-            zf.extractall('data')
-
-def MNISTIterator(batch_size, input_shape):
-    """return train and val iterators for mnist"""
-    # download data
-    GetMNIST_ubyte()
-    flat = False if len(input_shape) == 3 else True
-
-    train_dataiter = mx.io.MNISTIter(
-        image="data/train-images-idx3-ubyte",
-        label="data/train-labels-idx1-ubyte",
-        input_shape=input_shape,
-        batch_size=batch_size,
-        shuffle=True,
-        flat=flat)
-
-    val_dataiter = mx.io.MNISTIter(
-        image="data/t10k-images-idx3-ubyte",
-        label="data/t10k-labels-idx1-ubyte",
-        input_shape=input_shape,
-        batch_size=batch_size,
-        flat=flat)
-
-    return (train_dataiter, val_dataiter)
diff --git a/tests/python/train/test_autograd.py b/tests/python/train/test_autograd.py
index c9921ecf4f..712672cd0a 100644
--- a/tests/python/train/test_autograd.py
+++ b/tests/python/train/test_autograd.py
@@ -21,9 +21,9 @@
 import mxnet as mx
 from mxnet import gluon
 from mxnet.gluon import nn
+from mxnet.test_utils import get_mnist_ubyte
 import numpy as np
 import logging
-from common import get_data
 from mxnet import autograd
 logging.basicConfig(level=logging.DEBUG)
 
@@ -36,7 +36,7 @@ def get_net():
     net.add(nn.Dense(10, prefix='fc3_'))
     return net
 
-get_data.GetMNIST_ubyte()
+get_mnist_ubyte()
 
 batch_size = 100
 train_data = mx.io.MNISTIter(
diff --git a/tests/python/train/test_conv.py b/tests/python/train/test_conv.py
index 46e06848f8..adceb3ebc1 100644
--- a/tests/python/train/test_conv.py
+++ b/tests/python/train/test_conv.py
@@ -19,10 +19,10 @@
 import sys
 sys.path.insert(0, '../../python')
 import mxnet as mx
+from mxnet.test_utils import get_mnist_ubyte
 import numpy as np
 import os, pickle, gzip, argparse
 import logging
-from common import get_data
 
 def get_model(use_gpu):
     # symbol net
@@ -52,7 +52,7 @@ def get_model(use_gpu):
 
 def get_iters():
     # check data
-    get_data.GetMNIST_ubyte()
+    get_mnist_ubyte()
 
     batch_size = 100
     train_dataiter = mx.io.MNISTIter(
diff --git a/tests/python/train/test_dtype.py b/tests/python/train/test_dtype.py
index 52f04bf9a1..2e3ff06d2e 100644
--- a/tests/python/train/test_dtype.py
+++ b/tests/python/train/test_dtype.py
@@ -22,7 +22,7 @@
 import numpy as np
 import os, pickle, gzip
 import logging
-from common import get_data
+from mxnet.test_utils import get_cifar10
 
 batch_size = 128
 
@@ -39,7 +39,7 @@ def get_net():
     return softmax
 
 # check data
-get_data.GetCifar10()
+get_cifar10()
 
 def get_iterator_uint8(kv):
     data_shape = (3, 28, 28)
diff --git a/tests/python/train/test_mlp.py b/tests/python/train/test_mlp.py
index a0a45b41e1..1b8e06f530 100644
--- a/tests/python/train/test_mlp.py
+++ b/tests/python/train/test_mlp.py
@@ -21,7 +21,7 @@
 import os, sys
 import pickle as pickle
 import logging
-from common import get_data
+from mxnet.test_utils import get_mnist_ubyte
 
 # symbol net
 batch_size = 100
@@ -41,7 +41,7 @@ def accuracy(label, pred):
 prefix = './mlp'
 
 #check data
-get_data.GetMNIST_ubyte()
+get_mnist_ubyte()
 
 train_dataiter = mx.io.MNISTIter(
         image="data/train-images-idx3-ubyte",
diff --git a/tests/python/unittest/common.py b/tests/python/unittest/common.py
index 12ed60d2bc..782534bf80 100644
--- a/tests/python/unittest/common.py
+++ b/tests/python/unittest/common.py
@@ -21,8 +21,6 @@
 sys.path.insert(0, os.path.join(curr_path, '../../../python'))
 
 import models
-import get_data
-
 
 def assertRaises(expected_exception, func, *args, **kwargs):
     try:
diff --git a/tests/python/unittest/test_io.py b/tests/python/unittest/test_io.py
index fa314e0f8b..03d829efd7 100644
--- a/tests/python/unittest/test_io.py
+++ b/tests/python/unittest/test_io.py
@@ -27,13 +27,12 @@
 except ImportError:
     h5py = None
 import sys
-from common import get_data, assertRaises
+from common import assertRaises
 import unittest
 
-
 def test_MNISTIter():
     # prepare data
-    get_data.GetMNIST_ubyte()
+    get_mnist_ubyte()
 
     batch_size = 100
     train_dataiter = mx.io.MNISTIter(
@@ -61,7 +60,7 @@ def test_MNISTIter():
     assert(sum(label_0 - label_1) == 0)
 
 def test_Cifar10Rec():
-    get_data.GetCifar10()
+    get_cifar10()
     dataiter = mx.io.ImageRecordIter(
             path_imgrec="data/cifar/train.rec",
             mean_img="data/cifar/cifar10_mean.bin",


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services