You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@submarine.apache.org by GitBox <gi...@apache.org> on 2020/12/27 10:19:25 UTC

[GitHub] [submarine] ByronHsu opened a new pull request #483: SUBMARINE-701. 0.6.0 New Feature: Support Tensorboard in Experiment

ByronHsu opened a new pull request #483:
URL: https://github.com/apache/submarine/pull/483


   ### What is this PR for?
   
   Support new feature in 0.6.0: tensorboard integration.
   
   - Usage
       1. Create a job request that uses tensorboard
       2. Write tensorboard log to `/logs/mylog` (The subpath is required due to this issue in tensorflow [https://github.com/kubeflow/tf-operator/issues/1053](https://github.com/kubeflow/tf-operator/issues/1053). We cannot directly Write log file to mountPath.)
       3. Link to `http://<host>:<ip>/tfboard-${job-name}`, and you can monitor the tensorboard with ease!
   - Implementation
   
       When creating a new job, the backend will not only create original experiment but also several k8s resources required in tensorboard
   
       The resources can be classified into two categories: 
   
       1. Storage
       2. Tensorboard serving
   
       **Storage**
   
       The resources required for storage are **persistent volume** and **persistent volume claim**.  
   
       I set the storage path of persistent volume on host path, and mount this path to MLjob (enable job to generate logs to volume) and Tensorboard (enable tfboard to access logs).
   
       **Tensorboard Serving**
   
       The resources required here are **deployments, service, and ingressroute**.
   
       I create the tensorboard apps with deployments and service, and then redirect it to custom path with the help of ingressroute.
   
   - Example
       - tensorboard-example.json
   
           ```bash
           {
             "meta": {
               "name": "tensorflow-dist-mnist-byron-1234",
               "namespace": "default",
               "framework": "TensorFlow",
               "cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/logs/mylog --learning_rate=0.01 --batch_size=20",
               "envVars": {
                 "ENV_1": "ENV1"
               }
             },
             "environment": {
               "image": "apache/submarine:tf-mnist-with-summaries-1.0"
             },
             "spec": {
               "Worker": {
                 "replicas": 1,
                 "resources": "cpu=1,memory=1024M"
               }
             }
           }
           ```
    
           ![Kapture 2020-12-27 at 18 04 40](https://user-images.githubusercontent.com/24364830/103168607-926b3000-486f-11eb-9f73-ecfcf71625a1.gif)
   
   
   ### What type of PR is it?
   [Feature]
   
   ### Todos
   - [] Frontend support
   - [] The logs of job cannot be written directly on the mountPath (As describe in above). We should fix this problem.
   - [] Make log path configurable (Currently, it is hard-coded as `/logs` )
   - [] Support smb-server for shared storage
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-701
   
   ### How should this be tested?
   https://travis-ci.org/github/ByronHsu/submarine/jobs/751658488
   
   ### Questions:
   * Does the licenses files need update? No
   * Is there breaking changes for older versions? No
   * Does this needs documentation? No
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [submarine] ByronHsu closed pull request #483: SUBMARINE-701. 0.6.0 New Feature: Support Tensorboard in Experiment

Posted by GitBox <gi...@apache.org>.
ByronHsu closed pull request #483:
URL: https://github.com/apache/submarine/pull/483


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org