You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Pei He <pe...@google.com.INVALID> on 2016/08/05 19:55:25 UTC

[Proposal] Pipelines and their executions naming changes.

Hi all,
I have a proposal about how we name pipelines and their executions.
The purpose is to clarify the differences between the two, have
consensus between runners, and unify the implementation.

Current states:
 * PipelineOptions.appName defaults to mainClass name
 * DataflowPipelineOptions.jobName defaults to appName+user+datetime
 * FlinkPipelineOptions.jobName defaults to appName+user+datetime

Proposal:
1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
    *  It is the user-visible name for a specific graph.
    *  default to mainClass name.
    *  Use cases: Find all executions of a pipeline
2. Add jobName to top level PipelineOptions.
    *  It is the unique name for an execution
    *  defaults to pipelineName + user + datetime + random Integer
    *  Use cases:
        -- Finding all executions by USER_A between TIME_X and TIME_Y
        -- Naming resources created by the execution. for example:
Writing temp files to folder TMP_DIR/jobName/, Writing to default
output file jobName.output, Creating temp /subscriptions/jobName

Please let me know what you think.

Thanks
--
Pei

Re: [Proposal] Pipelines and their executions naming changes.

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

Hi

It sounds good to me. It will simplify the way for users to identify pipelines.

Regards
JB



On Aug 5, 2016, 21:55, at 21:55, Pei He <pe...@google.com.INVALID> wrote:
>Hi all,
>I have a proposal about how we name pipelines and their executions.
>The purpose is to clarify the differences between the two, have
>consensus between runners, and unify the implementation.
>
>Current states:
> * PipelineOptions.appName defaults to mainClass name
> * DataflowPipelineOptions.jobName defaults to appName+user+datetime
> * FlinkPipelineOptions.jobName defaults to appName+user+datetime
>
>Proposal:
>1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
>    *  It is the user-visible name for a specific graph.
>    *  default to mainClass name.
>    *  Use cases: Find all executions of a pipeline
>2. Add jobName to top level PipelineOptions.
>    *  It is the unique name for an execution
>    *  defaults to pipelineName + user + datetime + random Integer
>    *  Use cases:
>        -- Finding all executions by USER_A between TIME_X and TIME_Y
>        -- Naming resources created by the execution. for example:
>Writing temp files to folder TMP_DIR/jobName/, Writing to default
>output file jobName.output, Creating temp /subscriptions/jobName
>
>Please let me know what you think.
>
>Thanks
>--
>Pei

Re: [Proposal] Pipelines and their executions naming changes.

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,
Flink itself allows the user to specify a String when creating a Job, this
will be visible in the web dashboard and maybe some other places. This
would roughly correspond to the proposed PipelineOptions.pipelineName. An
executing job does not have a human-readable name, just an ID that has to
be used when referring to the job and communicating with the master node to
manage the job.

I think the proposed changes are very good. However, it might not be
immediately possible to refer to a running pipeline by its jobName, due to
implementation specifics in the runners.

Cheers,
Aljoscha

On Tue, 9 Aug 2016 at 21:57 Amit Sela <am...@gmail.com> wrote:

> Currently, the Spark runner extends ApplicationNameOptions, PipelineOptions
> and StreamingOptions. Any unification of naming conventions is great IMO,
> and the runner will inherit them as it is.
> As for appName/pipelineName - appName is the same as Spark's app name, but
> I can live happily with pipelineName ;-)
> Considering jobName - that's usually for the resource manager (I use YARN),
> and the proposal sounds great here as well, though I'd have see how I use
> it programmatically because usually I use the submit script.
>
> +1 and thanks Pei!
>
> Sorry for my late response,
> Amit
>
> On Fri, Aug 5, 2016 at 10:55 PM Pei He <pe...@google.com.invalid> wrote:
>
> > Hi all,
> > I have a proposal about how we name pipelines and their executions.
> > The purpose is to clarify the differences between the two, have
> > consensus between runners, and unify the implementation.
> >
> > Current states:
> >  * PipelineOptions.appName defaults to mainClass name
> >  * DataflowPipelineOptions.jobName defaults to appName+user+datetime
> >  * FlinkPipelineOptions.jobName defaults to appName+user+datetime
> >
> > Proposal:
> > 1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
> >     *  It is the user-visible name for a specific graph.
> >     *  default to mainClass name.
> >     *  Use cases: Find all executions of a pipeline
> > 2. Add jobName to top level PipelineOptions.
> >     *  It is the unique name for an execution
> >     *  defaults to pipelineName + user + datetime + random Integer
> >     *  Use cases:
> >         -- Finding all executions by USER_A between TIME_X and TIME_Y
> >         -- Naming resources created by the execution. for example:
> > Writing temp files to folder TMP_DIR/jobName/, Writing to default
> > output file jobName.output, Creating temp /subscriptions/jobName
> >
> > Please let me know what you think.
> >
> > Thanks
> > --
> > Pei
> >
>

Re: [Proposal] Pipelines and their executions naming changes.

Posted by Amit Sela <am...@gmail.com>.

Currently, the Spark runner extends ApplicationNameOptions, PipelineOptions
and StreamingOptions. Any unification of naming conventions is great IMO,
and the runner will inherit them as it is.
As for appName/pipelineName - appName is the same as Spark's app name, but
I can live happily with pipelineName ;-)
Considering jobName - that's usually for the resource manager (I use YARN),
and the proposal sounds great here as well, though I'd have see how I use
it programmatically because usually I use the submit script.

+1 and thanks Pei!

Sorry for my late response,
Amit

On Fri, Aug 5, 2016 at 10:55 PM Pei He <pe...@google.com.invalid> wrote:

> Hi all,
> I have a proposal about how we name pipelines and their executions.
> The purpose is to clarify the differences between the two, have
> consensus between runners, and unify the implementation.
>
> Current states:
>  * PipelineOptions.appName defaults to mainClass name
>  * DataflowPipelineOptions.jobName defaults to appName+user+datetime
>  * FlinkPipelineOptions.jobName defaults to appName+user+datetime
>
> Proposal:
> 1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
>     *  It is the user-visible name for a specific graph.
>     *  default to mainClass name.
>     *  Use cases: Find all executions of a pipeline
> 2. Add jobName to top level PipelineOptions.
>     *  It is the unique name for an execution
>     *  defaults to pipelineName + user + datetime + random Integer
>     *  Use cases:
>         -- Finding all executions by USER_A between TIME_X and TIME_Y
>         -- Naming resources created by the execution. for example:
> Writing temp files to folder TMP_DIR/jobName/, Writing to default
> output file jobName.output, Creating temp /subscriptions/jobName
>
> Please let me know what you think.
>
> Thanks
> --
> Pei
>