You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by li...@apache.org on 2020/03/08 09:53:15 UTC
[submarine] branch master updated: SUBMARINE-375. Ensure TF example code is supported in TensorFlow 2

This is an automated email from the ASF dual-hosted git repository.

liuxun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git


The following commit(s) were added to refs/heads/master by this push:
     new ae9e8b3  SUBMARINE-375. Ensure TF example code is supported in TensorFlow 2
ae9e8b3 is described below

commit ae9e8b310967f9ea38ae2cc84cce1d5bc381fab6
Author: Ryan Lo <lo...@gmail.com>
AuthorDate: Sun Mar 8 16:34:52 2020 +0800

    SUBMARINE-375. Ensure TF example code is supported in TensorFlow 2
    
    ### What is this PR for?
    Because TensorFlow 2 removes many APIs, we should add a new tf2 example and reserve the tf1 example code
    
    ### What type of PR is it?
    [Improvement]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    [SUBMARINE-375](https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-375)
    
    ### How should this be tested?
    [passed CI](https://travis-ci.org/lowc1012/submarine/builds/659106327)
    
    ### Screenshots (if appropriate)
    <img width="689" alt="螢幕快照 2020-03-06 下午7 19 03" src="https://user-images.githubusercontent.com/52355146/76079987-e22af300-5fe0-11ea-9e94-a14e895d3bf8.png">
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No
    
    Author: Ryan Lo <lo...@gmail.com>
    
    Closes #203 from lowc1012/SUBMARINE-375 and squashes the following commits:
    
    7430164 [Ryan Lo] SUBMARINE-375. Update the tf 2 job name
    270909b [Ryan Lo] SUBMARINE-375. Add tf 2 example into mini-submarine
---
 dev-support/mini-submarine/README.md               |  4 +
 .../submarine/build_python_virtual_env.sh          |  7 ++
 .../submarine/mnist_distributed_tf2.py             | 98 ++++++++++++++++++++++
 .../submarine/run_submarine_mnist_tf2_tony.sh      | 59 +++++++++++++
 4 files changed, 168 insertions(+)

diff --git a/dev-support/mini-submarine/README.md b/dev-support/mini-submarine/README.md
index baca223..564e6f1 100644
--- a/dev-support/mini-submarine/README.md
+++ b/dev-support/mini-submarine/README.md
@@ -146,7 +146,11 @@ cd /home/yarn/submarine/
 
 ### Run a mnist TF job with submarine + TonY runtime
 ```
+# run TF 1 distributed training job 
 ./run_submarine_mnist_tony.sh
+
+# run TF 2 distributed training job
+./run_submarine_mnist_tf2_tony.sh
 ```
 When run_submarine_mnist_tony.sh is executed, mnist data is download from the url, [google mnist](https://storage.googleapis.com/cvdf-datasets/mnist/), by default. If the url is unaccessible, you can use parameter "-d" to specify a customized url.
 For example, if you are in mainland China, you can use the following command
diff --git a/dev-support/mini-submarine/submarine/build_python_virtual_env.sh b/dev-support/mini-submarine/submarine/build_python_virtual_env.sh
index 0596603..d441c0e 100755
--- a/dev-support/mini-submarine/submarine/build_python_virtual_env.sh
+++ b/dev-support/mini-submarine/submarine/build_python_virtual_env.sh
@@ -28,3 +28,10 @@ pip3 install /opt/pysubmarine/.
 zip -r myvenv.zip venv
 deactivate
 
+# Building a Python virtual environment with TensorFlow 2
+python3 virtualenv-16.0.0/virtualenv.py tf2-venv
+. tf2-venv/bin/activate
+pip3 install tensorflow==2.1.0
+pip3 install tensorflow-datasets==2.1.0
+zip -r tf2-venv.zip tf2-venv
+deactivate
diff --git a/dev-support/mini-submarine/submarine/mnist_distributed_tf2.py b/dev-support/mini-submarine/submarine/mnist_distributed_tf2.py
new file mode 100644
index 0000000..2f880e6
--- /dev/null
+++ b/dev-support/mini-submarine/submarine/mnist_distributed_tf2.py
@@ -0,0 +1,98 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import json
+import tensorflow as tf
+import tensorflow_datasets as tfds
+
+BATCH_SIZE = 64
+BUFFER_SIZE = 10000
+LEARNING_RATE = 1e-4
+
+def get_task_name():
+    cluster_spec = os.environ.get("CLUSTER_SPEC", None)
+    task_name = ''
+    if cluster_spec:
+        cluster_spec = json.loads(cluster_spec)
+        job_index = os.environ["TASK_INDEX"]
+        job_name = os.environ["JOB_NAME"]
+        task_name = job_name + '_' + job_index
+
+    return task_name
+
+def input_fn(mode, input_context=None):
+    datasets, info = tfds.load(name='mnist',
+                                data_dir='/tmp/' + get_task_name() + '/data',
+                                with_info=True,
+                                as_supervised=True)
+
+    mnist_dataset = (datasets['train'] if mode == tf.estimator.ModeKeys.TRAIN
+                                        else datasets['test'])
+
+    def scale(image, label):
+        image = tf.cast(image, tf.float32)
+        image /= 255
+        return image, label
+
+    if input_context:
+        mnist_dataset = mnist_dataset.shard(
+            input_context.num_input_pipelines,
+            input_context.input_pipeline_id)
+
+    return mnist_dataset.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
+
+def model_fn(features, labels, mode):
+    model = tf.keras.Sequential([
+        tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
+        tf.keras.layers.MaxPooling2D(),
+        tf.keras.layers.Flatten(),
+        tf.keras.layers.Dense(64, activation='relu'),
+        tf.keras.layers.Dense(10)
+    ])
+
+    logits = model(features, training=False)
+
+    if mode == tf.estimator.ModeKeys.PREDICT:
+        predictions = {'logits': logits}
+        return tf.estimator.EstimatorSpec(labels=labels, predictions=predictions)
+
+    optimizer = tf.compat.v1.train.GradientDescentOptimizer(
+        learning_rate=LEARNING_RATE)
+    loss = tf.keras.losses.SparseCategoricalCrossentropy(
+        from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(labels, logits)
+    loss = tf.reduce_sum(loss) * (1. / BATCH_SIZE)
+    if mode == tf.estimator.ModeKeys.EVAL:
+        return tf.estimator.EstimatorSpec(mode, loss=loss)
+
+    return tf.estimator.EstimatorSpec(
+        mode=mode,
+        loss=loss,
+        train_op=optimizer.minimize(
+            loss, tf.compat.v1.train.get_or_create_global_step()))
+
+if __name__ == '__main__':
+    strategy = tf.distribute.experimental.ParameterServerStrategy()
+    config = tf.estimator.RunConfig(train_distribute=strategy)
+    estimator = tf.estimator.Estimator(
+        model_fn=model_fn,
+        model_dir='/tmp/model',
+        config=config)
+    train_spec = tf.estimator.TrainSpec(input_fn)
+    eval_spec = tf.estimator.EvalSpec(input_fn)
+    tf.estimator.train_and_evaluate(
+        estimator,
+        train_spec,
+        eval_spec)
diff --git a/dev-support/mini-submarine/submarine/run_submarine_mnist_tf2_tony.sh b/dev-support/mini-submarine/submarine/run_submarine_mnist_tf2_tony.sh
new file mode 100755
index 0000000..2f63c66
--- /dev/null
+++ b/dev-support/mini-submarine/submarine/run_submarine_mnist_tf2_tony.sh
@@ -0,0 +1,59 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+while [ $# -gt 0 ]; do
+  case "$1" in
+    --debug*)
+      DEBUG=$1
+      if [ -n "$2" ]; then
+        DEBUG_PORT=$2
+        shift
+      fi
+      shift
+      ;;
+    *)
+      break
+      ;;
+  esac
+done
+
+if [ "$DEBUG" ]; then
+  if [ -z "$DEBUG_PORT" ]; then
+    DEBUG_PORT=8000
+  fi
+  JAVA_CMD="java -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=${DEBUG_PORT}"
+else
+  JAVA_CMD="java"
+fi
+
+SUBMARINE_VERSION=0.4.0-SNAPSHOT
+HADOOP_VERSION=2.9
+SUBMARINE_PATH=/opt/submarine-current
+HADOOP_CONF_PATH=/usr/local/hadoop/etc/hadoop
+
+${JAVA_CMD} -cp "$("${HADOOP_COMMON_HOME}"/bin/hadoop classpath --glob)":${SUBMARINE_PATH}/submarine-all-${SUBMARINE_VERSION}-hadoop-"${HADOOP_VERSION}".jar:${HADOOP_CONF_PATH} \
+ org.apache.submarine.client.cli.Cli job run --name tf2-job-001 \
+ --framework tensorflow \
+ --input_path "" \
+ --num_workers 2 \
+ --worker_resources memory=1G,vcores=1 \
+ --num_ps 1 \
+ --ps_resources memory=1G,vcores=1 \
+ --worker_launch_cmd "tf2-venv.zip/tf2-venv/bin/python mnist_distributed_tf2.py" \
+ --ps_launch_cmd "tf2-venv.zip/tf2-venv/bin/python mnist_distributed_tf2.py" \
+ --insecure \
+ --verbose \
+ --conf tony.containers.resources=/home/yarn/submarine/tf2-venv.zip#archive,/home/yarn/submarine/mnist_distributed_tf2.py,${SUBMARINE_PATH}/submarine-all-${SUBMARINE_VERSION}-hadoop-"${HADOOP_VERSION}".jar


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org