You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by "Wangda Tan (Jira)" <ji...@apache.org> on 2020/07/30 23:21:00 UTC
[jira] [Commented] (SUBMARINE-548) [Umbrella] Predefined Experiment
[ https://issues.apache.org/jira/browse/SUBMARINE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168293#comment-17168293 ]
Wangda Tan commented on SUBMARINE-548:
--------------------------------------
[~jotjohnting], thanks for working on this, I just reviewed [https://github.com/apache/submarine/pull/351]
I think we missed some part in the design:
The design doc: [https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment] defined the spec of how to submit a pre-defined template, which will be sufficient for submission from CLI/REST/UI. However, it is not enough to *register/define* a pre-defined template.
The differences between register and submission a pre-defined template are:
* *Register* an experiment-template requires information of how Submarine can run the experiment, for example, it needs to include: resources required for worker; environment (docker image, conda kernel); commandline options for workers/ps, etc.
* In contrast, *submit* an experiment-template only requires filling required/optional parameters.
So to register a pre-defined template, we need to *not only* include ExperimentTemplate, but also, we need to tell how Submarine can run it.
*So the predefined template registration should include the following:*
*1) A template of Experiment yaml, for example, if we take an experiment example from our* doc: [https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/run-tensorflow-experiment.md]
{code:java}
meta:
name: "tf-mnist-yaml"
namespace: "default"
framework: "TensorFlow"
cmd: "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150"
envVars:
ENV_1: "ENV1"
environment:
image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
spec:
Ps:
replicas: 1
resources: "cpu=1,memory=1024M"
Worker:
replicas: 1
resources: "cpu=1,memory=1024M" {code}
We can create a template of the YAML (with placeholders) using syntax like:
{code:java}
meta:
name: {{name}}
namespace: "default"
framework: "TensorFlow"
cmd: "python /var/tf_mnist/mnist_with_summaries.py --input {{input}} --log_dir=/train/log --learning_rate={{training.learning_rate}} --batch_size={{training.batch_size}}"
envVars:
ENV_1: "ENV1"
environment:
image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
spec:
Ps:
replicas: 1
resources: "cpu=1,memory=1024M"
Worker:
replicas: 1
resources: "cpu=1,memory=1024M" {code}
The above template defined 3 variables (placeholders):
* name
* input
* training.learning_rate.
* training.batch_size
(The above YAML placeholder is based on [https://stackoverflow.com/a/41620747)]
*2) A list of parameters (Similar to ExperimentTemplate)*
*So I think we need the following object:*
*a. RegisterExperimentTemplateSpec*
{code:java}
{
template_name: Name of the template
experiment_spec: the spec for experiment with placeholders.
parameters:
List of parameters definition
} {code}
*a. SubmissionExperimentTemplateSpec*
{code:java}
{
experiment_name: Name of the running experiment
template_name: Name of the template
parameters:
List of parameters (with values)
} {code}
Does this make sense? cc: [~pingsutw], [~ztang] for suggestions.
> [Umbrella] Predefined Experiment
> --------------------------------
>
> Key: SUBMARINE-548
> URL: https://issues.apache.org/jira/browse/SUBMARINE-548
> Project: Apache Submarine
> Issue Type: New Feature
> Components: experiment template
> Reporter: JohnTing
> Assignee: JohnTing
> Priority: Major
> Fix For: 0.5.0
>
>
> Predefined-experiment features
> * [API] Define Experiment API for pre-defined template
> * [SDK] Add Python SDK to support pre-defined experiment
> * [UI] Allow Run pre-defined experiment
> * [API] Define Swagger API for pre-defined template submission
> * [API] Define Swagger API for pre-defined template registration/delete, etc.
> * [Sever] Support submit pre-defined template, and translate it to actual job
> [https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#support-predefined-experiment-templates]
> [https://cwiki.apache.org/confluence/display/SUBMARINE/Roadmap]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org