You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by "Wangda Tan (Jira)" <ji...@apache.org> on 2020/07/30 23:23:00 UTC

[jira] [Comment Edited] (SUBMARINE-548) [Umbrella] Predefined Experiment

    [ https://issues.apache.org/jira/browse/SUBMARINE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168293#comment-17168293 ] 

Wangda Tan edited comment on SUBMARINE-548 at 7/30/20, 11:22 PM:
-----------------------------------------------------------------

[~jotjohnting], thanks for working on this, I just reviewed [https://github.com/apache/submarine/pull/351]

I think we missed some part in the design: 

The design doc: [https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment] defined the spec of how to submit a pre-defined template, which will be sufficient for submission from CLI/REST/UI. However, it is not enough to *register/define* a pre-defined template. 

The differences between register and submission a pre-defined template are: 
 * *Register* an experiment-template requires information of how Submarine can run the experiment, for example, it needs to include: resources required for worker; environment (docker image, conda kernel); commandline options for workers/ps, etc. 
 * In contrast, *submit* an experiment-template only requires filling required/optional parameters.

So to register a pre-defined template, we need to *not only* include ExperimentTemplate, but also, we need to tell how Submarine can run it. 

*So the predefined template registration should include the following:* 

*1) A template of Experiment yaml, for example, if we take an experiment example from our* doc: [https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/run-tensorflow-experiment.md]
{code:java}
meta:
  name: "tf-mnist-yaml"
  namespace: "default"
  framework: "TensorFlow"
  cmd: "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150"
  envVars:
    ENV_1: "ENV1"
environment:
  image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
spec:
  Ps:
    replicas: 1
    resources: "cpu=1,memory=1024M"
  Worker:
    replicas: 1
    resources: "cpu=1,memory=1024M" {code}
We can create a template of the YAML (with placeholders) using syntax like:
{code:java}
meta:
  name: {{name}}
  namespace: "default"
  framework: "TensorFlow"
  cmd: "python /var/tf_mnist/mnist_with_summaries.py --input {{input}} --log_dir=/train/log --learning_rate={{training.learning_rate}} --batch_size={{training.batch_size}}"
  envVars:
    ENV_1: "ENV1"
environment:
  image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
spec:
  Ps:
    replicas: 1
    resources: "cpu=1,memory=1024M"
  Worker:
    replicas: 1
    resources: "cpu=1,memory=1024M" {code}
The above template defined 3 variables (placeholders): 
 * name 
 * input
 * training.learning_rate.
 * training.batch_size

(The above YAML placeholder is based on [https://stackoverflow.com/a/41620747)]

*2) A list of parameters (Similar to ExperimentTemplate)*

*So I think we need the following object:* 

*a. RegisterExperimentTemplateSpec*
{code:java}
{
   template_name: Name of the template
   experiment_spec: the spec for experiment with placeholders. 
   parameters: 
      List of parameters definition
} {code}
*a. SubmissionExperimentTemplateSpec*
{code:java}
{
   experiment_name: Name of the running experiment
   template_name: Name of the template
   parameters: 
      List of parameters (with values)
} {code}
Does this make sense? cc: [~pingsutw], [~ztang] for suggestions.

And if we agree with the proposal, we need to update our experiment spec design accordingly: [https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]


was (Author: wangda):
[~jotjohnting], thanks for working on this, I just reviewed [https://github.com/apache/submarine/pull/351]

I think we missed some part in the design: 

The design doc: [https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment] defined the spec of how to submit a pre-defined template, which will be sufficient for submission from CLI/REST/UI. However, it is not enough to *register/define* a pre-defined template. 

The differences between register and submission a pre-defined template are: 
 * *Register* an experiment-template requires information of how Submarine can run the experiment, for example, it needs to include: resources required for worker; environment (docker image, conda kernel); commandline options for workers/ps, etc. 
 * In contrast, *submit* an experiment-template only requires filling required/optional parameters.

So to register a pre-defined template, we need to *not only* include ExperimentTemplate, but also, we need to tell how Submarine can run it. 

*So the predefined template registration should include the following:* 

*1) A template of Experiment yaml, for example, if we take an experiment example from our* doc: [https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/run-tensorflow-experiment.md]
{code:java}
meta:
  name: "tf-mnist-yaml"
  namespace: "default"
  framework: "TensorFlow"
  cmd: "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150"
  envVars:
    ENV_1: "ENV1"
environment:
  image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
spec:
  Ps:
    replicas: 1
    resources: "cpu=1,memory=1024M"
  Worker:
    replicas: 1
    resources: "cpu=1,memory=1024M" {code}
We can create a template of the YAML (with placeholders) using syntax like:
{code:java}
meta:
  name: {{name}}
  namespace: "default"
  framework: "TensorFlow"
  cmd: "python /var/tf_mnist/mnist_with_summaries.py --input {{input}} --log_dir=/train/log --learning_rate={{training.learning_rate}} --batch_size={{training.batch_size}}"
  envVars:
    ENV_1: "ENV1"
environment:
  image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
spec:
  Ps:
    replicas: 1
    resources: "cpu=1,memory=1024M"
  Worker:
    replicas: 1
    resources: "cpu=1,memory=1024M" {code}
The above template defined 3 variables (placeholders): 
 * name 
 * input
 * training.learning_rate.
 * training.batch_size

(The above YAML placeholder is based on [https://stackoverflow.com/a/41620747)]

*2) A list of parameters (Similar to ExperimentTemplate)*

*So I think we need the following object:* 

*a. RegisterExperimentTemplateSpec*
{code:java}
{
   template_name: Name of the template
   experiment_spec: the spec for experiment with placeholders. 
   parameters: 
      List of parameters definition
} {code}
*a. SubmissionExperimentTemplateSpec*
{code:java}
{
   experiment_name: Name of the running experiment
   template_name: Name of the template
   parameters: 
      List of parameters (with values)
} {code}
Does this make sense? cc: [~pingsutw], [~ztang] for suggestions.

> [Umbrella] Predefined Experiment
> --------------------------------
>
>                 Key: SUBMARINE-548
>                 URL: https://issues.apache.org/jira/browse/SUBMARINE-548
>             Project: Apache Submarine
>          Issue Type: New Feature
>          Components: experiment template
>            Reporter: JohnTing
>            Assignee: JohnTing
>            Priority: Major
>             Fix For: 0.5.0
>
>
> Predefined-experiment features
>  * [API] Define Experiment API for pre-defined template
>  * [SDK] Add Python SDK to support pre-defined experiment
>  * [UI] Allow Run pre-defined experiment
>  * [API] Define Swagger API for pre-defined template submission
>  * [API] Define Swagger API for pre-defined template registration/delete, etc.
>  * [Sever] Support submit pre-defined template, and translate it to actual job
> [https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#support-predefined-experiment-templates]
> [https://cwiki.apache.org/confluence/display/SUBMARINE/Roadmap]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org