You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by li...@apache.org on 2020/03/08 09:53:15 UTC
[submarine] branch master updated: SUBMARINE-375. Ensure TF example
code is supported in TensorFlow 2
This is an automated email from the ASF dual-hosted git repository.
liuxun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git
The following commit(s) were added to refs/heads/master by this push:
new ae9e8b3 SUBMARINE-375. Ensure TF example code is supported in TensorFlow 2
ae9e8b3 is described below
commit ae9e8b310967f9ea38ae2cc84cce1d5bc381fab6
Author: Ryan Lo <lo...@gmail.com>
AuthorDate: Sun Mar 8 16:34:52 2020 +0800
SUBMARINE-375. Ensure TF example code is supported in TensorFlow 2
### What is this PR for?
Because TensorFlow 2 removes many APIs, we should add a new tf2 example and reserve the tf1 example code
### What type of PR is it?
[Improvement]
### Todos
* [ ] - Task
### What is the Jira issue?
[SUBMARINE-375](https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-375)
### How should this be tested?
[passed CI](https://travis-ci.org/lowc1012/submarine/builds/659106327)
### Screenshots (if appropriate)
<img width="689" alt="螢幕快照 2020-03-06 下午7 19 03" src="https://user-images.githubusercontent.com/52355146/76079987-e22af300-5fe0-11ea-9e94-a14e895d3bf8.png">
### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No
Author: Ryan Lo <lo...@gmail.com>
Closes #203 from lowc1012/SUBMARINE-375 and squashes the following commits:
7430164 [Ryan Lo] SUBMARINE-375. Update the tf 2 job name
270909b [Ryan Lo] SUBMARINE-375. Add tf 2 example into mini-submarine
---
dev-support/mini-submarine/README.md | 4 +
.../submarine/build_python_virtual_env.sh | 7 ++
.../submarine/mnist_distributed_tf2.py | 98 ++++++++++++++++++++++
.../submarine/run_submarine_mnist_tf2_tony.sh | 59 +++++++++++++
4 files changed, 168 insertions(+)
diff --git a/dev-support/mini-submarine/README.md b/dev-support/mini-submarine/README.md
index baca223..564e6f1 100644
--- a/dev-support/mini-submarine/README.md
+++ b/dev-support/mini-submarine/README.md
@@ -146,7 +146,11 @@ cd /home/yarn/submarine/
### Run a mnist TF job with submarine + TonY runtime
```
+# run TF 1 distributed training job
./run_submarine_mnist_tony.sh
+
+# run TF 2 distributed training job
+./run_submarine_mnist_tf2_tony.sh
```
When run_submarine_mnist_tony.sh is executed, mnist data is download from the url, [google mnist](https://storage.googleapis.com/cvdf-datasets/mnist/), by default. If the url is unaccessible, you can use parameter "-d" to specify a customized url.
For example, if you are in mainland China, you can use the following command
diff --git a/dev-support/mini-submarine/submarine/build_python_virtual_env.sh b/dev-support/mini-submarine/submarine/build_python_virtual_env.sh
index 0596603..d441c0e 100755
--- a/dev-support/mini-submarine/submarine/build_python_virtual_env.sh
+++ b/dev-support/mini-submarine/submarine/build_python_virtual_env.sh
@@ -28,3 +28,10 @@ pip3 install /opt/pysubmarine/.
zip -r myvenv.zip venv
deactivate
+# Building a Python virtual environment with TensorFlow 2
+python3 virtualenv-16.0.0/virtualenv.py tf2-venv
+. tf2-venv/bin/activate
+pip3 install tensorflow==2.1.0
+pip3 install tensorflow-datasets==2.1.0
+zip -r tf2-venv.zip tf2-venv
+deactivate
diff --git a/dev-support/mini-submarine/submarine/mnist_distributed_tf2.py b/dev-support/mini-submarine/submarine/mnist_distributed_tf2.py
new file mode 100644
index 0000000..2f880e6
--- /dev/null
+++ b/dev-support/mini-submarine/submarine/mnist_distributed_tf2.py
@@ -0,0 +1,98 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import json
+import tensorflow as tf
+import tensorflow_datasets as tfds
+
+BATCH_SIZE = 64
+BUFFER_SIZE = 10000
+LEARNING_RATE = 1e-4
+
+def get_task_name():
+ cluster_spec = os.environ.get("CLUSTER_SPEC", None)
+ task_name = ''
+ if cluster_spec:
+ cluster_spec = json.loads(cluster_spec)
+ job_index = os.environ["TASK_INDEX"]
+ job_name = os.environ["JOB_NAME"]
+ task_name = job_name + '_' + job_index
+
+ return task_name
+
+def input_fn(mode, input_context=None):
+ datasets, info = tfds.load(name='mnist',
+ data_dir='/tmp/' + get_task_name() + '/data',
+ with_info=True,
+ as_supervised=True)
+
+ mnist_dataset = (datasets['train'] if mode == tf.estimator.ModeKeys.TRAIN
+ else datasets['test'])
+
+ def scale(image, label):
+ image = tf.cast(image, tf.float32)
+ image /= 255
+ return image, label
+
+ if input_context:
+ mnist_dataset = mnist_dataset.shard(
+ input_context.num_input_pipelines,
+ input_context.input_pipeline_id)
+
+ return mnist_dataset.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
+
+def model_fn(features, labels, mode):
+ model = tf.keras.Sequential([
+ tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
+ tf.keras.layers.MaxPooling2D(),
+ tf.keras.layers.Flatten(),
+ tf.keras.layers.Dense(64, activation='relu'),
+ tf.keras.layers.Dense(10)
+ ])
+
+ logits = model(features, training=False)
+
+ if mode == tf.estimator.ModeKeys.PREDICT:
+ predictions = {'logits': logits}
+ return tf.estimator.EstimatorSpec(labels=labels, predictions=predictions)
+
+ optimizer = tf.compat.v1.train.GradientDescentOptimizer(
+ learning_rate=LEARNING_RATE)
+ loss = tf.keras.losses.SparseCategoricalCrossentropy(
+ from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(labels, logits)
+ loss = tf.reduce_sum(loss) * (1. / BATCH_SIZE)
+ if mode == tf.estimator.ModeKeys.EVAL:
+ return tf.estimator.EstimatorSpec(mode, loss=loss)
+
+ return tf.estimator.EstimatorSpec(
+ mode=mode,
+ loss=loss,
+ train_op=optimizer.minimize(
+ loss, tf.compat.v1.train.get_or_create_global_step()))
+
+if __name__ == '__main__':
+ strategy = tf.distribute.experimental.ParameterServerStrategy()
+ config = tf.estimator.RunConfig(train_distribute=strategy)
+ estimator = tf.estimator.Estimator(
+ model_fn=model_fn,
+ model_dir='/tmp/model',
+ config=config)
+ train_spec = tf.estimator.TrainSpec(input_fn)
+ eval_spec = tf.estimator.EvalSpec(input_fn)
+ tf.estimator.train_and_evaluate(
+ estimator,
+ train_spec,
+ eval_spec)
diff --git a/dev-support/mini-submarine/submarine/run_submarine_mnist_tf2_tony.sh b/dev-support/mini-submarine/submarine/run_submarine_mnist_tf2_tony.sh
new file mode 100755
index 0000000..2f63c66
--- /dev/null
+++ b/dev-support/mini-submarine/submarine/run_submarine_mnist_tf2_tony.sh
@@ -0,0 +1,59 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+while [ $# -gt 0 ]; do
+ case "$1" in
+ --debug*)
+ DEBUG=$1
+ if [ -n "$2" ]; then
+ DEBUG_PORT=$2
+ shift
+ fi
+ shift
+ ;;
+ *)
+ break
+ ;;
+ esac
+done
+
+if [ "$DEBUG" ]; then
+ if [ -z "$DEBUG_PORT" ]; then
+ DEBUG_PORT=8000
+ fi
+ JAVA_CMD="java -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=${DEBUG_PORT}"
+else
+ JAVA_CMD="java"
+fi
+
+SUBMARINE_VERSION=0.4.0-SNAPSHOT
+HADOOP_VERSION=2.9
+SUBMARINE_PATH=/opt/submarine-current
+HADOOP_CONF_PATH=/usr/local/hadoop/etc/hadoop
+
+${JAVA_CMD} -cp "$("${HADOOP_COMMON_HOME}"/bin/hadoop classpath --glob)":${SUBMARINE_PATH}/submarine-all-${SUBMARINE_VERSION}-hadoop-"${HADOOP_VERSION}".jar:${HADOOP_CONF_PATH} \
+ org.apache.submarine.client.cli.Cli job run --name tf2-job-001 \
+ --framework tensorflow \
+ --input_path "" \
+ --num_workers 2 \
+ --worker_resources memory=1G,vcores=1 \
+ --num_ps 1 \
+ --ps_resources memory=1G,vcores=1 \
+ --worker_launch_cmd "tf2-venv.zip/tf2-venv/bin/python mnist_distributed_tf2.py" \
+ --ps_launch_cmd "tf2-venv.zip/tf2-venv/bin/python mnist_distributed_tf2.py" \
+ --insecure \
+ --verbose \
+ --conf tony.containers.resources=/home/yarn/submarine/tf2-venv.zip#archive,/home/yarn/submarine/mnist_distributed_tf2.py,${SUBMARINE_PATH}/submarine-all-${SUBMARINE_VERSION}-hadoop-"${HADOOP_VERSION}".jar
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org