You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by pi...@apache.org on 2021/08/05 05:37:55 UTC

[submarine] branch master updated: SUBMARINE-928. [Quickstart] Rewrite quickstart guide

This is an automated email from the ASF dual-hosted git repository.

pingsutw pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git


The following commit(s) were added to refs/heads/master by this push:
     new a94494b  SUBMARINE-928. [Quickstart] Rewrite quickstart guide
a94494b is described below

commit a94494bf4ba89d05b3e8680b3139736204781c35
Author: ByronHsu <by...@gmail.com>
AuthorDate: Thu Jul 29 20:54:32 2021 +0800

    SUBMARINE-928. [Quickstart] Rewrite quickstart guide
    
    ### What is this PR for?
    Write an example that will walk users through the end-to-end usage of the submarine.
    
    ### What type of PR is it?
    [Documentation]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    https://issues.apache.org/jira/browse/SUBMARINE-928
    
    ### How should this be tested?
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Do the license files need updating? No
    * Are there breaking changes for older versions? No
    * Does this need new documentation? No
    
    Author: ByronHsu <by...@gmail.com>
    
    Signed-off-by: Kevin <pi...@apache.org>
    
    Closes #664 from ByronHsu/quickstart and squashes the following commits:
    
    673b2ab0 [ByronHsu] fix conflict
    dcad3bbc [ByronHsu] push quickstart to dockerhub
    d8be6fcd [ByronHsu] add workbench connection and mlflow demo
    b87b04b2 [ByronHsu] add example
    bcb01d79 [ByronHsu] version 1
---
 .github/workflows/deploy_docker_images.yml         |   5 +
 .../examples/quickstart/{post.sh => Dockerfile}    |  31 +---
 .../examples/quickstart/{post.sh => build.sh}      |  49 ++---
 dev-support/examples/quickstart/post.sh            |   1 -
 dev-support/examples/quickstart/train.py           |  86 +++++++++
 website/docs/assets/quickstart-mlflow-2.png        | Bin 0 -> 267330 bytes
 website/docs/assets/quickstart-mlflow.png          | Bin 0 -> 309585 bytes
 website/docs/assets/quickstart-submit-1.png        | Bin 0 -> 245302 bytes
 website/docs/assets/quickstart-submit-2.png        | Bin 0 -> 244702 bytes
 website/docs/assets/quickstart-submit-3.png        | Bin 0 -> 251717 bytes
 website/docs/assets/quickstart-submit-4.png        | Bin 0 -> 332445 bytes
 website/docs/assets/quickstart-worbench.png        | Bin 0 -> 86036 bytes
 website/docs/gettingStarted/notebook.md            |   2 +-
 website/docs/gettingStarted/quickstart.md          | 203 +++++++++++++++++++++
 website/docusaurus.config.js                       |   2 +-
 website/sidebars.js                                |   6 +-
 16 files changed, 334 insertions(+), 51 deletions(-)

diff --git a/.github/workflows/deploy_docker_images.yml b/.github/workflows/deploy_docker_images.yml
index 3afd32e..93b55d6 100644
--- a/.github/workflows/deploy_docker_images.yml
+++ b/.github/workflows/deploy_docker_images.yml
@@ -79,3 +79,8 @@ jobs:
         run: ./dev-support/docker-images/serve/build.sh
       - name: Push submarine-serve docker image
         run: docker push apache/submarine:serve-$SUBMARINE_VERSION
+
+      - name: Build submarine quickstart
+        run: ./dev-support/examples/quickstart/build.sh
+      - name: Push submarine quickstart docker image
+        run: docker push apache/submarine:quickstart-$SUBMARINE_VERSION
diff --git a/dev-support/examples/quickstart/post.sh b/dev-support/examples/quickstart/Dockerfile
similarity index 62%
copy from dev-support/examples/quickstart/post.sh
copy to dev-support/examples/quickstart/Dockerfile
index 39336bc..ee6d66d 100644
--- a/dev-support/examples/quickstart/post.sh
+++ b/dev-support/examples/quickstart/Dockerfile
@@ -1,4 +1,3 @@
-#!/usr/bin/env bash
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
@@ -14,26 +13,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+FROM continuumio/anaconda3
+MAINTAINER Apache Software Foundation <de...@submarine.apache.org>
 
-curl -X POST -H "Content-Type: application/json" -d '
-{
-  "meta": {
-    "name": "quickstart",
-    "namespace": "default",
-    "framework": "TensorFlow",
-    "cmd": "python /opt/train.py",
-    "envVars": {
-      "ENV_1": "ENV1"
-    }
-  },
-  "environment": {
-    "image": "quickstart:0.6.0-SNAPSHOT"
-  },
-  "spec": {
-    "Worker": {
-      "replicas": 3,
-      "resources": "cpu=1,memory=1024M"
-    }
-  }
-}
-' http://127.0.0.1:32080/api/v1/experiment
\ No newline at end of file
+ADD ./tmp/submarine-sdk /opt/
+# install submarine-sdk locally
+RUN pip install /opt/pysubmarine/.[tf-latest] 
+RUN pip install tensorflow_datasets
+
+ADD ./train.py /opt/
\ No newline at end of file
diff --git a/dev-support/examples/quickstart/post.sh b/dev-support/examples/quickstart/build.sh
old mode 100644
new mode 100755
similarity index 51%
copy from dev-support/examples/quickstart/post.sh
copy to dev-support/examples/quickstart/build.sh
index 39336bc..6865c39
--- a/dev-support/examples/quickstart/post.sh
+++ b/dev-support/examples/quickstart/build.sh
@@ -14,26 +14,31 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+set -euxo pipefail
 
-curl -X POST -H "Content-Type: application/json" -d '
-{
-  "meta": {
-    "name": "quickstart",
-    "namespace": "default",
-    "framework": "TensorFlow",
-    "cmd": "python /opt/train.py",
-    "envVars": {
-      "ENV_1": "ENV1"
-    }
-  },
-  "environment": {
-    "image": "quickstart:0.6.0-SNAPSHOT"
-  },
-  "spec": {
-    "Worker": {
-      "replicas": 3,
-      "resources": "cpu=1,memory=1024M"
-    }
-  }
-}
-' http://127.0.0.1:32080/api/v1/experiment
\ No newline at end of file
+SUBMARINE_VERSION=0.6.0-SNAPSHOT
+SUBMARINE_IMAGE_NAME="apache/submarine:quickstart-${SUBMARINE_VERSION}"
+
+if [ -L ${BASH_SOURCE-$0} ]; then
+  PWD=$(dirname $(readlink "${BASH_SOURCE-$0}"))
+else
+  PWD=$(dirname ${BASH_SOURCE-$0})
+fi
+export CURRENT_PATH=$(cd "${PWD}">/dev/null; pwd)
+export SUBMARINE_HOME=${CURRENT_PATH}/../../..
+
+if [ -d "${CURRENT_PATH}/tmp" ] # if old tmp folder is still there, delete it.
+then
+  rm -rf "${CURRENT_PATH}/tmp"
+fi
+
+mkdir -p "${CURRENT_PATH}/tmp"
+cp -r "${SUBMARINE_HOME}/submarine-sdk" "${CURRENT_PATH}/tmp"
+
+# build image
+cd ${CURRENT_PATH}
+echo "Start building the ${SUBMARINE_IMAGE_NAME} docker image ..."
+docker build -t ${SUBMARINE_IMAGE_NAME} .
+
+# clean temp file
+rm -rf "${CURRENT_PATH}/tmp"
diff --git a/dev-support/examples/quickstart/post.sh b/dev-support/examples/quickstart/post.sh
old mode 100644
new mode 100755
index 39336bc..8c23c52
--- a/dev-support/examples/quickstart/post.sh
+++ b/dev-support/examples/quickstart/post.sh
@@ -14,7 +14,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-
 curl -X POST -H "Content-Type: application/json" -d '
 {
   "meta": {
diff --git a/dev-support/examples/quickstart/train.py b/dev-support/examples/quickstart/train.py
new file mode 100644
index 0000000..e33de68
--- /dev/null
+++ b/dev-support/examples/quickstart/train.py
@@ -0,0 +1,86 @@
+# Copyright 2020 The Kubeflow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""
+An example of multi-worker training with Keras model using Strategy API.
+https://github.com/kubeflow/tf-operator/blob/master/examples/v1/distribution_strategy/keras-API/multi_worker_strategy-with-keras.py
+"""
+import tensorflow_datasets as tfds
+import tensorflow as tf
+from tensorflow.keras import layers, models
+from submarine import ModelsClient
+
+def make_datasets_unbatched():
+  BUFFER_SIZE = 10000
+
+  # Scaling MNIST data from (0, 255] to (0., 1.]
+  def scale(image, label):
+    image = tf.cast(image, tf.float32)
+    image /= 255
+    return image, label
+
+  datasets, _ = tfds.load(name='mnist', with_info=True, as_supervised=True)
+
+  return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE)
+
+
+def build_and_compile_cnn_model():
+  model = models.Sequential()
+  model.add(
+      layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
+  model.add(layers.MaxPooling2D((2, 2)))
+  model.add(layers.Conv2D(64, (3, 3), activation='relu'))
+  model.add(layers.MaxPooling2D((2, 2)))
+  model.add(layers.Conv2D(64, (3, 3), activation='relu'))
+  model.add(layers.Flatten())
+  model.add(layers.Dense(64, activation='relu'))
+  model.add(layers.Dense(10, activation='softmax'))
+
+  model.summary()
+
+  model.compile(optimizer='adam',
+                loss='sparse_categorical_crossentropy',
+                metrics=['accuracy'])
+
+  return model
+
+def main():
+  strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(
+      communication=tf.distribute.experimental.CollectiveCommunication.AUTO)
+
+  BATCH_SIZE_PER_REPLICA = 4
+  BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
+
+  with strategy.scope():
+    ds_train = make_datasets_unbatched().batch(BATCH_SIZE).repeat()
+    options = tf.data.Options()
+    options.experimental_distribute.auto_shard_policy = \
+        tf.data.experimental.AutoShardPolicy.DATA
+    ds_train = ds_train.with_options(options)
+    # Model building/compiling need to be within `strategy.scope()`.
+    multi_worker_model = build_and_compile_cnn_model()
+
+  class MyCallback(tf.keras.callbacks.Callback):
+    def on_epoch_end(self, epoch, logs=None):
+      # monitor the loss and accuracy
+      print(logs)
+      modelClient.log_metrics({"loss": logs["loss"], "accuracy": logs["accuracy"]}, epoch)
+
+  with modelClient.start() as run:
+    multi_worker_model.fit(ds_train, epochs=10, steps_per_epoch=70, callbacks=[MyCallback()])
+
+
+if __name__ == '__main__':
+  modelClient = ModelsClient()
+  main()
\ No newline at end of file
diff --git a/website/docs/assets/quickstart-mlflow-2.png b/website/docs/assets/quickstart-mlflow-2.png
new file mode 100644
index 0000000..6430164
Binary files /dev/null and b/website/docs/assets/quickstart-mlflow-2.png differ
diff --git a/website/docs/assets/quickstart-mlflow.png b/website/docs/assets/quickstart-mlflow.png
new file mode 100644
index 0000000..7600663
Binary files /dev/null and b/website/docs/assets/quickstart-mlflow.png differ
diff --git a/website/docs/assets/quickstart-submit-1.png b/website/docs/assets/quickstart-submit-1.png
new file mode 100644
index 0000000..a5d095f
Binary files /dev/null and b/website/docs/assets/quickstart-submit-1.png differ
diff --git a/website/docs/assets/quickstart-submit-2.png b/website/docs/assets/quickstart-submit-2.png
new file mode 100644
index 0000000..cc368d6
Binary files /dev/null and b/website/docs/assets/quickstart-submit-2.png differ
diff --git a/website/docs/assets/quickstart-submit-3.png b/website/docs/assets/quickstart-submit-3.png
new file mode 100644
index 0000000..0ca1daa
Binary files /dev/null and b/website/docs/assets/quickstart-submit-3.png differ
diff --git a/website/docs/assets/quickstart-submit-4.png b/website/docs/assets/quickstart-submit-4.png
new file mode 100644
index 0000000..ad7c60e
Binary files /dev/null and b/website/docs/assets/quickstart-submit-4.png differ
diff --git a/website/docs/assets/quickstart-worbench.png b/website/docs/assets/quickstart-worbench.png
new file mode 100644
index 0000000..a9ca304
Binary files /dev/null and b/website/docs/assets/quickstart-worbench.png differ
diff --git a/website/docs/gettingStarted/notebook.md b/website/docs/gettingStarted/notebook.md
index 532f5bc..1b58b59 100644
--- a/website/docs/gettingStarted/notebook.md
+++ b/website/docs/gettingStarted/notebook.md
@@ -1,5 +1,5 @@
 ---
-title: Notebook Tutorial
+title: Jupyter Notebook
 ---
 
 <!--
diff --git a/website/docs/gettingStarted/quickstart.md b/website/docs/gettingStarted/quickstart.md
new file mode 100644
index 0000000..0de4a93
--- /dev/null
+++ b/website/docs/gettingStarted/quickstart.md
@@ -0,0 +1,203 @@
+---
+title: Quickstart
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This document gives you a quick view on the basic usage of Submarine platform. You can finish each step of ML model lifecycle on the platform without messing up with the troublesome environment problems.
+
+## Installation
+
+### Prepare a Kubernetes cluster
+
+1. Prerequisite
+
+- Check [dependency page](https://github.com/apache/submarine/blob/master/website/docs/devDocs/Dependencies.md) for the compatible version
+- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
+- [helm](https://helm.sh/docs/intro/install/) (Helm v3 is minimum requirement.)
+- [minikube](https://minikube.sigs.k8s.io/docs/start/).
+
+2. Start minikube cluster
+```
+$ minikube start --vm-driver=docker --cpus 8 --memory 4096 --kubernetes-version v1.15.11
+```
+
+### Launch submarine in the cluster
+
+1. Clone the project
+```
+$ git clone https://github.com/apache/submarine.git
+```
+
+2. Install the resources by helm chart
+```
+$ cd submarine
+$ helm install submarine ./helm-charts/submarine
+```
+### Ensure submarine is ready
+
+1. Use kubectl to query the status of pods
+```
+$ kubectl get pods
+```
+
+2. Make sure each pod is `Running`
+```
+NAME                                              READY   STATUS    RESTARTS   AGE
+notebook-controller-deployment-5d4f5f874c-vwds8   1/1     Running   0          3h33m
+pytorch-operator-844c866d54-q5ztd                 1/1     Running   0          3h33m
+submarine-database-674987ff7d-r8zqs               1/1     Running   0          3h33m
+submarine-minio-5fdd957785-xd987                  1/1     Running   0          3h33m
+submarine-mlflow-76bbf5c7b-g2ntd                  1/1     Running   0          3h33m
+submarine-server-66f7b8658b-sfmv8                 1/1     Running   0          3h33m
+submarine-tensorboard-6c44944dfb-tvbr9            1/1     Running   0          3h33m
+submarine-traefik-7cbcfd4bd9-4bczn                1/1     Running   0          3h33m
+tf-job-operator-6bb69fd44-mc8ww                   1/1     Running   0          3h33m
+```
+
+### Connect to workbench
+
+1. Port-forwarding
+
+```
+# using port-forwarding
+$ kubectl port-forward --address 0.0.0.0 service/submarine-traefik 32080:80
+```
+
+2. Open `http://0.0.0.0:32080`
+
+![](../assets/quickstart-worbench.png)
+
+## Example: Submit a mnist distributed example
+
+We put the code of this example [here](https://github.com/apache/submarine/tree/master/dev-support/examples/quickstart). `train.py` is our training script, and `build.sh` is the script to build a docker image.
+
+### 1. Write a python script for distributed training
+
+Take a simple mnist tensorflow script as an example. We choose `MultiWorkerMirroredStrategy` as our distributed strategy.
+
+```python
+"""
+./dev-support/examples/quickstart/train.py
+Reference: https://github.com/kubeflow/tf-operator/blob/master/examples/v1/distribution_strategy/keras-API/multi_worker_strategy-with-keras.py
+"""
+
+import tensorflow_datasets as tfds
+import tensorflow as tf
+from tensorflow.keras import layers, models
+from submarine import ModelsClient
+
+def make_datasets_unbatched():
+  BUFFER_SIZE = 10000
+
+  # Scaling MNIST data from (0, 255] to (0., 1.]
+  def scale(image, label):
+    image = tf.cast(image, tf.float32)
+    image /= 255
+    return image, label
+
+  datasets, _ = tfds.load(name='mnist', with_info=True, as_supervised=True)
+
+  return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE)
+
+
+def build_and_compile_cnn_model():
+  model = models.Sequential()
+  model.add(
+      layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
+  model.add(layers.MaxPooling2D((2, 2)))
+  model.add(layers.Conv2D(64, (3, 3), activation='relu'))
+  model.add(layers.MaxPooling2D((2, 2)))
+  model.add(layers.Conv2D(64, (3, 3), activation='relu'))
+  model.add(layers.Flatten())
+  model.add(layers.Dense(64, activation='relu'))
+  model.add(layers.Dense(10, activation='softmax'))
+
+  model.summary()
+
+  model.compile(optimizer='adam',
+                loss='sparse_categorical_crossentropy',
+                metrics=['accuracy'])
+
+  return model
+
+def main():
+  strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(
+      communication=tf.distribute.experimental.CollectiveCommunication.AUTO)
+
+  BATCH_SIZE_PER_REPLICA = 4
+  BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
+
+  with strategy.scope():
+    ds_train = make_datasets_unbatched().batch(BATCH_SIZE).repeat()
+    options = tf.data.Options()
+    options.experimental_distribute.auto_shard_policy = \
+        tf.data.experimental.AutoShardPolicy.DATA
+    ds_train = ds_train.with_options(options)
+    # Model building/compiling need to be within `strategy.scope()`.
+    multi_worker_model = build_and_compile_cnn_model()
+
+  class MyCallback(tf.keras.callbacks.Callback):
+    def on_epoch_end(self, epoch, logs=None):
+      # monitor the loss and accuracy
+      print(logs)
+      modelClient.log_metrics({"loss": logs["loss"], "accuracy": logs["accuracy"]}, epoch)
+
+  with modelClient.start() as run:
+    multi_worker_model.fit(ds_train, epochs=10, steps_per_epoch=70, callbacks=[MyCallback()])
+
+
+if __name__ == '__main__':
+  modelClient = ModelsClient()
+  main()
+```
+
+### 2. Prepare an environment compatible with the training
+Build a docker image equipped with the requirement of the environment.
+
+```bash
+$ ./dev-support/examples/quickstart/build.sh 
+```
+
+### 3. Submit the experiment
+
+1. Open submarine workbench and click `+ New Experiment`
+2. Fill the form accordingly. Here we set 3 workers.
+
+    1. Step 1
+    ![](../assets/quickstart-submit-1.png)
+    2. Step 2
+    ![](../assets/quickstart-submit-2.png)
+    3. Step 3
+    ![](../assets/quickstart-submit-3.png)
+    4. The experiment is successfully submitted
+    ![](../assets/quickstart-submit-4.png)
+
+### 4. Monitor the process (modelClient)
+
+1. In our code, we use `modelClient` from `submarine-sdk` to record the metrics. To see the result, click `MLflow UI` in the workbench.
+2. To compare the metrics of each worker, you can select all workers and then click `compare`
+
+  ![](../assets/quickstart-mlflow.png)
+
+  ![](../assets/quickstart-mlflow-2.png)
+
+
+### 5. Serve the model (In development)
diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js
index bf482b3..23b1839 100644
--- a/website/docusaurus.config.js
+++ b/website/docusaurus.config.js
@@ -37,7 +37,7 @@ module.exports = {
       items: [
         {
           type: 'doc',
-          docId: 'gettingStarted/localDeployment',
+          docId: 'gettingStarted/quickstart',
           label: 'Docs',
           position: 'left',
         },
diff --git a/website/sidebars.js b/website/sidebars.js
index 80fd24d..b10ec3a 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -22,10 +22,10 @@ module.exports = {
         {
             "Introduction": [],
             "Getting Started": [
-                "gettingStarted/localDeployment",
-                "gettingStarted/kind",
+                "gettingStarted/quickstart",
+                // "gettingStarted/localDeployment",
                 "gettingStarted/notebook",
-                "gettingStarted/python-sdk",
+                // "gettingStarted/python-sdk",
             ],
             "User Docs": [
                 {

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org