You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@samza.apache.org by Ke Wu <ke...@gmail.com> on 2019/11/15 22:59:37 UTC

[DISCUSS] SEP-23: Simplify Job Runner

We created SEP-23: Simplify Job Runner, which simplifies job runner by
moving config retrieval and planning to AM.

Please find out the SEP wiki below:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner

Please take a look and chime in your thoughts.

Thanks,
Ke

Re: [DISCUSS] SEP-23: Simplify Job Runner

Posted by Jagadish Venkatraman <ja...@gmail.com>.

+1 Lgtm, thanks Ke for the proposal.

On Tuesday, December 10, 2019, Prateek Maheshwari <pr...@gmail.com>
wrote:

> Looks good to me as well. +1 for the overall proposal, and thanks for
> putting it together.
>
> - Prateek
>
> On Tue, Dec 10, 2019 at 1:26 PM Xinyu Liu <xi...@gmail.com> wrote:
>
> > Thanks for updating the SEP wiki. The revised design looks clear to me.
> > Some config name might be simplified, e.g.
> > job.config.loader.properties.path.
> > Overall it looks good.
> >
> > Thanks,
> > Xinyu
> >
> > On Fri, Dec 6, 2019 at 1:52 PM Ke Wu <ke...@gmail.com> wrote:
> >
> > > As we are revamping our config loading logic in SEP-23: Simplify Job
> > > Runner, we will introduce backward incompatible changes on the job
> > > submission logic, i.e. deprecating the usage of --config-factory and
> > > --config-path, which reads a full config upon job submission, instead,
> we
> > > will only provide job submission related config through --config and
> > > ConfigLoader will load the complete config on AM instead.
> > >
> > > Please take a look and chime in your thoughts.
> > >
> > > Best,
> > > Ke
> > >
> > > > On Dec 2, 2019, at 4:42 PM, Ke Wu <ke...@gmail.com> wrote:
> > > >
> > > > Hi Xinyu,
> > > >
> > > > Please see the response in line:
> > > >
> > > >>  1. After this change, seems the original config-factory and
> > config-path
> > > >>  are only used to supply parameters for submitting job. Is that the
> > > case?
> > > >>  Which configs are still needed in the submission?
> > > >
> > > > Yes, only configs related to job submission is needed.
> > > >
> > > > job.name, job.factory.class & yarn.package.path are the minimum
> three
> > > configs needed for the job submission, which may be supplied by
> --config
> > > instead.
> > > >
> > > >>  2. For backward compatibility, does it still work if the user
> doesn't
> > > >>  specify the new ConfigLoader in the command line? The
> > > >>  PropertiesConfigLoader class seems requiring the path of the config
> > > after
> > > >>  exploding the tgz.
> > > >
> > > > If the user does not specify config loader in the config, then it
> will
> > > work in the previous flow, where runner publishes configs in
> coordinator
> > > stream and job coordinator/application master will pick it up by
> reading
> > > from Kafka. So this is a backward compatible change.
> > > >
> > > >>  3. If the final plan is to remove the original config factory/path,
> > how
> > > >>  do we pass the parameters needed for Yarn submission, e.g. job
> name,
> > > id,
> > > >>  and tgz path?
> > > >
> > > > We can either pass them by --config or introduce delicate command
> line
> > > arguments for it in CommandLine.scala.
> > > >
> > > >
> > > > Let me know if you have any further questions.
> > > >
> > > > Best,
> > > > Ke
> > > >
> > > >> On Nov 27, 2019, at 11:02 AM, Xinyu Liu <xi...@gmail.com>
> > wrote:
> > > >>
> > > >> Thanks a lot for putting out the design for simplifying the job
> > > submission
> > > >> process. The motivation makes sense to me that most of the planning
> > and
> > > >> config generation should be done after submitting to the cluster,
> > > instead
> > > >> of during the submission, which can happen in a local sandbox
> without
> > > the
> > > >> access to the resources needed for planning. It also improves the
> > > process
> > > >> from the security stand of the view.
> > > >>
> > > >> A few questions regarding to the interface changes:
> > > >>
> > > >>  1. After this change, seems the original config-factory and
> > config-path
> > > >>  are only used to supply parameters for submitting job. Is that the
> > > case?
> > > >>  Which configs are still needed in the submission?
> > > >>  2. For backward compatibility, does it still work if the user
> doesn't
> > > >>  specify the new ConfigLoader in the command line? The
> > > >>  PropertiesConfigLoader class seems requiring the path of the config
> > > after
> > > >>  exploding the tgz.
> > > >>  3. If the final plan is to remove the original config factory/path,
> > how
> > > >>  do we pass the parameters needed for Yarn submission, e.g. job
> name,
> > > id,
> > > >>  and tgz path?
> > > >>
> > > >> Thanks,
> > > >> Xinyu
> > > >>
> > > >> On Fri, Nov 15, 2019 at 3:00 PM Ke Wu <ke...@gmail.com> wrote:
> > > >>
> > > >>> We created SEP-23: Simplify Job Runner, which simplifies job runner
> > by
> > > >>> moving config retrieval and planning to AM.
> > > >>>
> > > >>> Please find out the SEP wiki below:
> > > >>>
> > > >>>
> > >
> > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> 23%3A+Simplify+Job+Runner
> > > >>>
> > > >>> Please take a look and chime in your thoughts.
> > > >>>
> > > >>> Thanks,
> > > >>> Ke
> > > >>>
> > > >
> > >
> > >
> >
>


-- 
Jagadish

Re: [DISCUSS] SEP-23: Simplify Job Runner

Posted by Prateek Maheshwari <pr...@gmail.com>.

Looks good to me as well. +1 for the overall proposal, and thanks for
putting it together.

- Prateek

On Tue, Dec 10, 2019 at 1:26 PM Xinyu Liu <xi...@gmail.com> wrote:

> Thanks for updating the SEP wiki. The revised design looks clear to me.
> Some config name might be simplified, e.g.
> job.config.loader.properties.path.
> Overall it looks good.
>
> Thanks,
> Xinyu
>
> On Fri, Dec 6, 2019 at 1:52 PM Ke Wu <ke...@gmail.com> wrote:
>
> > As we are revamping our config loading logic in SEP-23: Simplify Job
> > Runner, we will introduce backward incompatible changes on the job
> > submission logic, i.e. deprecating the usage of --config-factory and
> > --config-path, which reads a full config upon job submission, instead, we
> > will only provide job submission related config through --config and
> > ConfigLoader will load the complete config on AM instead.
> >
> > Please take a look and chime in your thoughts.
> >
> > Best,
> > Ke
> >
> > > On Dec 2, 2019, at 4:42 PM, Ke Wu <ke...@gmail.com> wrote:
> > >
> > > Hi Xinyu,
> > >
> > > Please see the response in line:
> > >
> > >>  1. After this change, seems the original config-factory and
> config-path
> > >>  are only used to supply parameters for submitting job. Is that the
> > case?
> > >>  Which configs are still needed in the submission?
> > >
> > > Yes, only configs related to job submission is needed.
> > >
> > > job.name, job.factory.class & yarn.package.path are the minimum three
> > configs needed for the job submission, which may be supplied by --config
> > instead.
> > >
> > >>  2. For backward compatibility, does it still work if the user doesn't
> > >>  specify the new ConfigLoader in the command line? The
> > >>  PropertiesConfigLoader class seems requiring the path of the config
> > after
> > >>  exploding the tgz.
> > >
> > > If the user does not specify config loader in the config, then it will
> > work in the previous flow, where runner publishes configs in coordinator
> > stream and job coordinator/application master will pick it up by reading
> > from Kafka. So this is a backward compatible change.
> > >
> > >>  3. If the final plan is to remove the original config factory/path,
> how
> > >>  do we pass the parameters needed for Yarn submission, e.g. job name,
> > id,
> > >>  and tgz path?
> > >
> > > We can either pass them by --config or introduce delicate command line
> > arguments for it in CommandLine.scala.
> > >
> > >
> > > Let me know if you have any further questions.
> > >
> > > Best,
> > > Ke
> > >
> > >> On Nov 27, 2019, at 11:02 AM, Xinyu Liu <xi...@gmail.com>
> wrote:
> > >>
> > >> Thanks a lot for putting out the design for simplifying the job
> > submission
> > >> process. The motivation makes sense to me that most of the planning
> and
> > >> config generation should be done after submitting to the cluster,
> > instead
> > >> of during the submission, which can happen in a local sandbox without
> > the
> > >> access to the resources needed for planning. It also improves the
> > process
> > >> from the security stand of the view.
> > >>
> > >> A few questions regarding to the interface changes:
> > >>
> > >>  1. After this change, seems the original config-factory and
> config-path
> > >>  are only used to supply parameters for submitting job. Is that the
> > case?
> > >>  Which configs are still needed in the submission?
> > >>  2. For backward compatibility, does it still work if the user doesn't
> > >>  specify the new ConfigLoader in the command line? The
> > >>  PropertiesConfigLoader class seems requiring the path of the config
> > after
> > >>  exploding the tgz.
> > >>  3. If the final plan is to remove the original config factory/path,
> how
> > >>  do we pass the parameters needed for Yarn submission, e.g. job name,
> > id,
> > >>  and tgz path?
> > >>
> > >> Thanks,
> > >> Xinyu
> > >>
> > >> On Fri, Nov 15, 2019 at 3:00 PM Ke Wu <ke...@gmail.com> wrote:
> > >>
> > >>> We created SEP-23: Simplify Job Runner, which simplifies job runner
> by
> > >>> moving config retrieval and planning to AM.
> > >>>
> > >>> Please find out the SEP wiki below:
> > >>>
> > >>>
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
> > >>>
> > >>> Please take a look and chime in your thoughts.
> > >>>
> > >>> Thanks,
> > >>> Ke
> > >>>
> > >
> >
> >
>

Re: [DISCUSS] SEP-23: Simplify Job Runner

Posted by Xinyu Liu <xi...@gmail.com>.

Thanks for updating the SEP wiki. The revised design looks clear to me.
Some config name might be simplified, e.g. job.config.loader.properties.path.
Overall it looks good.

Thanks,
Xinyu

On Fri, Dec 6, 2019 at 1:52 PM Ke Wu <ke...@gmail.com> wrote:

> As we are revamping our config loading logic in SEP-23: Simplify Job
> Runner, we will introduce backward incompatible changes on the job
> submission logic, i.e. deprecating the usage of --config-factory and
> --config-path, which reads a full config upon job submission, instead, we
> will only provide job submission related config through --config and
> ConfigLoader will load the complete config on AM instead.
>
> Please take a look and chime in your thoughts.
>
> Best,
> Ke
>
> > On Dec 2, 2019, at 4:42 PM, Ke Wu <ke...@gmail.com> wrote:
> >
> > Hi Xinyu,
> >
> > Please see the response in line:
> >
> >>  1. After this change, seems the original config-factory and config-path
> >>  are only used to supply parameters for submitting job. Is that the
> case?
> >>  Which configs are still needed in the submission?
> >
> > Yes, only configs related to job submission is needed.
> >
> > job.name, job.factory.class & yarn.package.path are the minimum three
> configs needed for the job submission, which may be supplied by --config
> instead.
> >
> >>  2. For backward compatibility, does it still work if the user doesn't
> >>  specify the new ConfigLoader in the command line? The
> >>  PropertiesConfigLoader class seems requiring the path of the config
> after
> >>  exploding the tgz.
> >
> > If the user does not specify config loader in the config, then it will
> work in the previous flow, where runner publishes configs in coordinator
> stream and job coordinator/application master will pick it up by reading
> from Kafka. So this is a backward compatible change.
> >
> >>  3. If the final plan is to remove the original config factory/path, how
> >>  do we pass the parameters needed for Yarn submission, e.g. job name,
> id,
> >>  and tgz path?
> >
> > We can either pass them by --config or introduce delicate command line
> arguments for it in CommandLine.scala.
> >
> >
> > Let me know if you have any further questions.
> >
> > Best,
> > Ke
> >
> >> On Nov 27, 2019, at 11:02 AM, Xinyu Liu <xi...@gmail.com> wrote:
> >>
> >> Thanks a lot for putting out the design for simplifying the job
> submission
> >> process. The motivation makes sense to me that most of the planning and
> >> config generation should be done after submitting to the cluster,
> instead
> >> of during the submission, which can happen in a local sandbox without
> the
> >> access to the resources needed for planning. It also improves the
> process
> >> from the security stand of the view.
> >>
> >> A few questions regarding to the interface changes:
> >>
> >>  1. After this change, seems the original config-factory and config-path
> >>  are only used to supply parameters for submitting job. Is that the
> case?
> >>  Which configs are still needed in the submission?
> >>  2. For backward compatibility, does it still work if the user doesn't
> >>  specify the new ConfigLoader in the command line? The
> >>  PropertiesConfigLoader class seems requiring the path of the config
> after
> >>  exploding the tgz.
> >>  3. If the final plan is to remove the original config factory/path, how
> >>  do we pass the parameters needed for Yarn submission, e.g. job name,
> id,
> >>  and tgz path?
> >>
> >> Thanks,
> >> Xinyu
> >>
> >> On Fri, Nov 15, 2019 at 3:00 PM Ke Wu <ke...@gmail.com> wrote:
> >>
> >>> We created SEP-23: Simplify Job Runner, which simplifies job runner by
> >>> moving config retrieval and planning to AM.
> >>>
> >>> Please find out the SEP wiki below:
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
> >>>
> >>> Please take a look and chime in your thoughts.
> >>>
> >>> Thanks,
> >>> Ke
> >>>
> >
>
>

Re: [DISCUSS] SEP-23: Simplify Job Runner

Posted by Ke Wu <ke...@gmail.com>.

As we are revamping our config loading logic in SEP-23: Simplify Job Runner, we will introduce backward incompatible changes on the job submission logic, i.e. deprecating the usage of --config-factory and --config-path, which reads a full config upon job submission, instead, we will only provide job submission related config through --config and ConfigLoader will load the complete config on AM instead.

Please take a look and chime in your thoughts.

Best,
Ke

> On Dec 2, 2019, at 4:42 PM, Ke Wu <ke...@gmail.com> wrote:
> 
> Hi Xinyu,
> 
> Please see the response in line:
> 
>>  1. After this change, seems the original config-factory and config-path
>>  are only used to supply parameters for submitting job. Is that the case?
>>  Which configs are still needed in the submission?
> 
> Yes, only configs related to job submission is needed.
> 
> job.name, job.factory.class & yarn.package.path are the minimum three configs needed for the job submission, which may be supplied by --config instead. 
> 
>>  2. For backward compatibility, does it still work if the user doesn't
>>  specify the new ConfigLoader in the command line? The
>>  PropertiesConfigLoader class seems requiring the path of the config after
>>  exploding the tgz.
> 
> If the user does not specify config loader in the config, then it will work in the previous flow, where runner publishes configs in coordinator stream and job coordinator/application master will pick it up by reading from Kafka. So this is a backward compatible change.
> 
>>  3. If the final plan is to remove the original config factory/path, how
>>  do we pass the parameters needed for Yarn submission, e.g. job name, id,
>>  and tgz path?
> 
> We can either pass them by --config or introduce delicate command line arguments for it in CommandLine.scala.
> 
> 
> Let me know if you have any further questions.
> 
> Best,
> Ke
> 
>> On Nov 27, 2019, at 11:02 AM, Xinyu Liu <xi...@gmail.com> wrote:
>> 
>> Thanks a lot for putting out the design for simplifying the job submission
>> process. The motivation makes sense to me that most of the planning and
>> config generation should be done after submitting to the cluster, instead
>> of during the submission, which can happen in a local sandbox without the
>> access to the resources needed for planning. It also improves the process
>> from the security stand of the view.
>> 
>> A few questions regarding to the interface changes:
>> 
>>  1. After this change, seems the original config-factory and config-path
>>  are only used to supply parameters for submitting job. Is that the case?
>>  Which configs are still needed in the submission?
>>  2. For backward compatibility, does it still work if the user doesn't
>>  specify the new ConfigLoader in the command line? The
>>  PropertiesConfigLoader class seems requiring the path of the config after
>>  exploding the tgz.
>>  3. If the final plan is to remove the original config factory/path, how
>>  do we pass the parameters needed for Yarn submission, e.g. job name, id,
>>  and tgz path?
>> 
>> Thanks,
>> Xinyu
>> 
>> On Fri, Nov 15, 2019 at 3:00 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>>> We created SEP-23: Simplify Job Runner, which simplifies job runner by
>>> moving config retrieval and planning to AM.
>>> 
>>> Please find out the SEP wiki below:
>>> 
>>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
>>> 
>>> Please take a look and chime in your thoughts.
>>> 
>>> Thanks,
>>> Ke
>>> 
>

Re: [DISCUSS] SEP-23: Simplify Job Runner

Posted by Ke Wu <ke...@gmail.com>.

Hi Xinyu,

Please see the response in line:

>   1. After this change, seems the original config-factory and config-path
>   are only used to supply parameters for submitting job. Is that the case?
>   Which configs are still needed in the submission?

Yes, only configs related to job submission is needed.

job.name, job.factory.class & yarn.package.path are the minimum three configs needed for the job submission, which may be supplied by --config instead. 

>   2. For backward compatibility, does it still work if the user doesn't
>   specify the new ConfigLoader in the command line? The
>   PropertiesConfigLoader class seems requiring the path of the config after
>   exploding the tgz.

If the user does not specify config loader in the config, then it will work in the previous flow, where runner publishes configs in coordinator stream and job coordinator/application master will pick it up by reading from Kafka. So this is a backward compatible change.

>   3. If the final plan is to remove the original config factory/path, how
>   do we pass the parameters needed for Yarn submission, e.g. job name, id,
>   and tgz path?

We can either pass them by --config or introduce delicate command line arguments for it in CommandLine.scala.


Let me know if you have any further questions.

Best,
Ke

> On Nov 27, 2019, at 11:02 AM, Xinyu Liu <xi...@gmail.com> wrote:
> 
> Thanks a lot for putting out the design for simplifying the job submission
> process. The motivation makes sense to me that most of the planning and
> config generation should be done after submitting to the cluster, instead
> of during the submission, which can happen in a local sandbox without the
> access to the resources needed for planning. It also improves the process
> from the security stand of the view.
> 
> A few questions regarding to the interface changes:
> 
>   1. After this change, seems the original config-factory and config-path
>   are only used to supply parameters for submitting job. Is that the case?
>   Which configs are still needed in the submission?
>   2. For backward compatibility, does it still work if the user doesn't
>   specify the new ConfigLoader in the command line? The
>   PropertiesConfigLoader class seems requiring the path of the config after
>   exploding the tgz.
>   3. If the final plan is to remove the original config factory/path, how
>   do we pass the parameters needed for Yarn submission, e.g. job name, id,
>   and tgz path?
> 
> Thanks,
> Xinyu
> 
> On Fri, Nov 15, 2019 at 3:00 PM Ke Wu <ke...@gmail.com> wrote:
> 
>> We created SEP-23: Simplify Job Runner, which simplifies job runner by
>> moving config retrieval and planning to AM.
>> 
>> Please find out the SEP wiki below:
>> 
>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
>> 
>> Please take a look and chime in your thoughts.
>> 
>> Thanks,
>> Ke
>>

Re: [DISCUSS] SEP-23: Simplify Job Runner

Posted by Xinyu Liu <xi...@gmail.com>.

Thanks a lot for putting out the design for simplifying the job submission
process. The motivation makes sense to me that most of the planning and
config generation should be done after submitting to the cluster, instead
of during the submission, which can happen in a local sandbox without the
access to the resources needed for planning. It also improves the process
from the security stand of the view.

A few questions regarding to the interface changes:

   1. After this change, seems the original config-factory and config-path
   are only used to supply parameters for submitting job. Is that the case?
   Which configs are still needed in the submission?
   2. For backward compatibility, does it still work if the user doesn't
   specify the new ConfigLoader in the command line? The
   PropertiesConfigLoader class seems requiring the path of the config after
   exploding the tgz.
   3. If the final plan is to remove the original config factory/path, how
   do we pass the parameters needed for Yarn submission, e.g. job name, id,
   and tgz path?

Thanks,
Xinyu

On Fri, Nov 15, 2019 at 3:00 PM Ke Wu <ke...@gmail.com> wrote:

> We created SEP-23: Simplify Job Runner, which simplifies job runner by
> moving config retrieval and planning to AM.
>
> Please find out the SEP wiki below:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
>
> Please take a look and chime in your thoughts.
>
> Thanks,
> Ke
>