You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by zt...@apache.org on 2019/04/30 02:56:15 UTC

[hadoop] branch trunk updated: SUBMARINE-64. Improve TonY runtime's document. Contributed by Keqiu Hu.

This is an automated email from the ASF dual-hosted git repository.

ztang pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/hadoop.git


The following commit(s) were added to refs/heads/trunk by this push:
     new 24f218a  SUBMARINE-64. Improve TonY runtime's document. Contributed by Keqiu Hu.
24f218a is described below

commit 24f218aef8e2eb4bf72e39fc030c2d8be0b9ac92
Author: Zhankun Tang <zt...@apache.org>
AuthorDate: Tue Apr 30 10:51:39 2019 +0800

    SUBMARINE-64. Improve TonY runtime's document. Contributed by Keqiu Hu.
---
 .../src/site/markdown/QuickStart.md                | 105 ++++++++++++++++++++-
 1 file changed, 102 insertions(+), 3 deletions(-)

diff --git a/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/markdown/QuickStart.md b/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/markdown/QuickStart.md
index b6503e8..45eeea3 100644
--- a/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/markdown/QuickStart.md
+++ b/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/markdown/QuickStart.md
@@ -19,6 +19,8 @@
 Must:
 
 - Apache Hadoop 2.7 or above.
+- TonY library 0.3.2 or above. You could download latest TonY jar from
+https://github.com/linkedin/TonY/releases.
 
 Optional:
 
@@ -149,9 +151,106 @@ java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
  --worker_resources memory=3G,vcores=2 \
  --num_ps 2 \
  --ps_resources memory=3G,vcores=2 \
- --worker_launch_cmd "venv.zip/venv/bin/python --steps 1000 --data_dir /tmp/data --working_dir /tmp/mode" \
- --ps_launch_cmd "venv.zip/venv/bin/python --steps 1000 --data_dir /tmp/data --working_dir /tmp/mode" \
- --container_resources /home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar
+ --worker_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py --steps 1000 --data_dir /tmp/data --working_dir /tmp/mode" \
+ --ps_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py --steps 1000 --data_dir /tmp/data --working_dir /tmp/mode" \
+ --insecure
+ --conf tony.containers.resources=PATH_TO_VENV_YOU_CREATED/venv.zip#archive,PATH_TO_MNIST_EXAMPLE/mnist_distributed.py, \
+PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar
+
+```
+You should then be able to see links and status of the jobs from command line:
+
+```
+2019-04-22 20:30:42,611 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi status: RUNNING
+2019-04-22 20:30:42,612 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 1 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi status: RUNNING
+2019-04-22 20:30:42,612 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: ps index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi status: RUNNING
+2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for ps 0 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi
+2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for worker 0 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi
+2019-04-22 20:30:42,612 INFO tony.TonyClient: Logs for worker 1 at: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi
+2019-04-22 20:30:44,625 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: ps index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000002/pi status: FINISHED
+2019-04-22 20:30:44,625 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 0 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000003/pi status: FINISHED
+2019-04-22 20:30:44,626 INFO tony.TonyClient: Tasks Status Updated: [TaskInfo] name: worker index: 1 url: http://pi-aw:8042/node/containerlogs/container_1555916523933_0030_01_000004/pi status: FINISHED
+
+```
+
+### With Docker
+
+```
+CLASSPATH=$(hadoop classpath --glob): \
+./hadoop-submarine-core/target/hadoop-submarine-core-0.2.0-SNAPSHOT.jar: \
+./hadoop-submarine-yarnservice-runtime/target/hadoop-submarine-score-yarnservice-runtime-0.2.0-SNAPSHOT.jar: \
+./hadoop-submarine-tony-runtime/target/hadoop-submarine-tony-runtime-0.2.0-SNAPSHOT.jar: \
+/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
+
+java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
+ --docker_image hadoopsubmarine/tf-1.8.0-cpu:0.0.3 \
+ --input_path hdfs://pi-aw:9000/dataset/cifar-10-data \
+ --worker_resources memory=3G,vcores=2 \
+ --worker_launch_cmd "export CLASSPATH=\$(/hadoop-3.1.0/bin/hadoop classpath --glob) && cd /test/models/tutorials/image/cifar10_estimator && python cifar10_main.py --data-dir=%input_path% --job-dir=%checkpoint_path% --train-steps=10000 --eval-batch-size=16 --train-batch-size=16 --variable-strategy=CPU --num-gpus=0 --sync" \
+ --env JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \
+ --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 \
+ --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \
+ --env HADOOP_HOME=/hadoop-3.1.0 \
+ --env HADOOP_YARN_HOME=/hadoop-3.1.0 \
+ --env HADOOP_COMMON_HOME=/hadoop-3.1.0 \
+ --env HADOOP_HDFS_HOME=/hadoop-3.1.0 \
+ --env HADOOP_CONF_DIR=/hadoop-3.1.0/etc/hadoop \
+ --conf tony.containers.resources=--conf tony.containers.resources=/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar
+```
+
+
+### Launch PyToch Application:
+
+#### Commandline
+
+### Without Docker
+
+You need:
+* Build a Python virtual environment with PyTorch 0.4.* installed
+* A cluster with Hadoop 2.7 or above.
+
+### Building a Python virtual environment with PyTorch
+
+TonY requires a Python virtual environment zip with PyTorch and any needed Python libraries already installed.
+
+```
+wget https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz
+tar xf virtualenv-16.0.0.tar.gz
+
+python virtualenv-16.0.0/virtualenv.py venv
+. venv/bin/activate
+pip install pytorch==0.4.0
+zip -r venv.zip venv
+```
+
+### PyTorch version
+
+ - Version 0.4.0+
+
+
+### Installing Hadoop
+
+TonY only requires YARN, not HDFS. Please see the [open-source documentation](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) on how to set YARN up.
+
+### Get the training examples
+
+Get mnist_distributed.py from https://github.com/linkedin/TonY/tree/master/tony-examples/mnist-pytorch
+
+
+```
+CLASSPATH=$(hadoop classpath --glob): \
+./hadoop-submarine-core/target/hadoop-submarine-core-0.2.0-SNAPSHOT.jar: \
+./hadoop-submarine-yarnservice-runtime/target/hadoop-submarine-score-yarnservice-runtime-0.2.0-SNAPSHOT.jar: \
+./hadoop-submarine-tony-runtime/target/hadoop-submarine-tony-runtime-0.2.0-SNAPSHOT.jar: \
+/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
+
+java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
+ --num_workers 2 \
+ --worker_resources memory=3G,vcores=2 \
+ --num_ps 2 \
+ --ps_resources memory=3G,vcores=2 \
+ --worker_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
+ --ps_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
  --insecure
  --conf tony.containers.resources=PATH_TO_VENV_YOU_CREATED/venv.zip#archive,PATH_TO_MNIST_EXAMPLE/mnist_distributed.py, \
 PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org