You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Pei He <pe...@google.com.INVALID> on 2016/08/05 19:55:25 UTC
[Proposal] Pipelines and their executions naming changes.
Hi all,
I have a proposal about how we name pipelines and their executions.
The purpose is to clarify the differences between the two, have
consensus between runners, and unify the implementation.
Current states:
* PipelineOptions.appName defaults to mainClass name
* DataflowPipelineOptions.jobName defaults to appName+user+datetime
* FlinkPipelineOptions.jobName defaults to appName+user+datetime
Proposal:
1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
* It is the user-visible name for a specific graph.
* default to mainClass name.
* Use cases: Find all executions of a pipeline
2. Add jobName to top level PipelineOptions.
* It is the unique name for an execution
* defaults to pipelineName + user + datetime + random Integer
* Use cases:
-- Finding all executions by USER_A between TIME_X and TIME_Y
-- Naming resources created by the execution. for example:
Writing temp files to folder TMP_DIR/jobName/, Writing to default
output file jobName.output, Creating temp /subscriptions/jobName
Please let me know what you think.
Thanks
--
Pei
Re: [Proposal] Pipelines and their executions naming changes.
Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi
It sounds good to me. It will simplify the way for users to identify pipelines.
Regards
JB
On Aug 5, 2016, 21:55, at 21:55, Pei He <pe...@google.com.INVALID> wrote:
>Hi all,
>I have a proposal about how we name pipelines and their executions.
>The purpose is to clarify the differences between the two, have
>consensus between runners, and unify the implementation.
>
>Current states:
> * PipelineOptions.appName defaults to mainClass name
> * DataflowPipelineOptions.jobName defaults to appName+user+datetime
> * FlinkPipelineOptions.jobName defaults to appName+user+datetime
>
>Proposal:
>1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
> * It is the user-visible name for a specific graph.
> * default to mainClass name.
> * Use cases: Find all executions of a pipeline
>2. Add jobName to top level PipelineOptions.
> * It is the unique name for an execution
> * defaults to pipelineName + user + datetime + random Integer
> * Use cases:
> -- Finding all executions by USER_A between TIME_X and TIME_Y
> -- Naming resources created by the execution. for example:
>Writing temp files to folder TMP_DIR/jobName/, Writing to default
>output file jobName.output, Creating temp /subscriptions/jobName
>
>Please let me know what you think.
>
>Thanks
>--
>Pei
Re: [Proposal] Pipelines and their executions naming changes.
Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
Flink itself allows the user to specify a String when creating a Job, this
will be visible in the web dashboard and maybe some other places. This
would roughly correspond to the proposed PipelineOptions.pipelineName. An
executing job does not have a human-readable name, just an ID that has to
be used when referring to the job and communicating with the master node to
manage the job.
I think the proposed changes are very good. However, it might not be
immediately possible to refer to a running pipeline by its jobName, due to
implementation specifics in the runners.
Cheers,
Aljoscha
On Tue, 9 Aug 2016 at 21:57 Amit Sela <am...@gmail.com> wrote:
> Currently, the Spark runner extends ApplicationNameOptions, PipelineOptions
> and StreamingOptions. Any unification of naming conventions is great IMO,
> and the runner will inherit them as it is.
> As for appName/pipelineName - appName is the same as Spark's app name, but
> I can live happily with pipelineName ;-)
> Considering jobName - that's usually for the resource manager (I use YARN),
> and the proposal sounds great here as well, though I'd have see how I use
> it programmatically because usually I use the submit script.
>
> +1 and thanks Pei!
>
> Sorry for my late response,
> Amit
>
> On Fri, Aug 5, 2016 at 10:55 PM Pei He <pe...@google.com.invalid> wrote:
>
> > Hi all,
> > I have a proposal about how we name pipelines and their executions.
> > The purpose is to clarify the differences between the two, have
> > consensus between runners, and unify the implementation.
> >
> > Current states:
> > * PipelineOptions.appName defaults to mainClass name
> > * DataflowPipelineOptions.jobName defaults to appName+user+datetime
> > * FlinkPipelineOptions.jobName defaults to appName+user+datetime
> >
> > Proposal:
> > 1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
> > * It is the user-visible name for a specific graph.
> > * default to mainClass name.
> > * Use cases: Find all executions of a pipeline
> > 2. Add jobName to top level PipelineOptions.
> > * It is the unique name for an execution
> > * defaults to pipelineName + user + datetime + random Integer
> > * Use cases:
> > -- Finding all executions by USER_A between TIME_X and TIME_Y
> > -- Naming resources created by the execution. for example:
> > Writing temp files to folder TMP_DIR/jobName/, Writing to default
> > output file jobName.output, Creating temp /subscriptions/jobName
> >
> > Please let me know what you think.
> >
> > Thanks
> > --
> > Pei
> >
>
Re: [Proposal] Pipelines and their executions naming changes.
Posted by Amit Sela <am...@gmail.com>.
Currently, the Spark runner extends ApplicationNameOptions, PipelineOptions
and StreamingOptions. Any unification of naming conventions is great IMO,
and the runner will inherit them as it is.
As for appName/pipelineName - appName is the same as Spark's app name, but
I can live happily with pipelineName ;-)
Considering jobName - that's usually for the resource manager (I use YARN),
and the proposal sounds great here as well, though I'd have see how I use
it programmatically because usually I use the submit script.
+1 and thanks Pei!
Sorry for my late response,
Amit
On Fri, Aug 5, 2016 at 10:55 PM Pei He <pe...@google.com.invalid> wrote:
> Hi all,
> I have a proposal about how we name pipelines and their executions.
> The purpose is to clarify the differences between the two, have
> consensus between runners, and unify the implementation.
>
> Current states:
> * PipelineOptions.appName defaults to mainClass name
> * DataflowPipelineOptions.jobName defaults to appName+user+datetime
> * FlinkPipelineOptions.jobName defaults to appName+user+datetime
>
> Proposal:
> 1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
> * It is the user-visible name for a specific graph.
> * default to mainClass name.
> * Use cases: Find all executions of a pipeline
> 2. Add jobName to top level PipelineOptions.
> * It is the unique name for an execution
> * defaults to pipelineName + user + datetime + random Integer
> * Use cases:
> -- Finding all executions by USER_A between TIME_X and TIME_Y
> -- Naming resources created by the execution. for example:
> Writing temp files to folder TMP_DIR/jobName/, Writing to default
> output file jobName.output, Creating temp /subscriptions/jobName
>
> Please let me know what you think.
>
> Thanks
> --
> Pei
>