You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@griffin.apache.org by 大鹏 <18...@163.com> on 2019/03/04 03:31:37 UTC

the purpose of the two scheduling tasks in Griffin




I don't know the purpose of the two scheduling tasks in Griffin:


JobInstance and SparkSubmitJob，what is the connection between them?

Re: the purpose of the two scheduling tasks in Griffin

Posted by 大鹏 <18...@163.com>.

OK,thank you,I understand




On 03/5/2019 10:48，Kevin Yao<ah...@gmail.com> wrote：
Hi，
JobInstance is actually the job you created which will be scheduled
*periodically or
at a specific time*, such as every four minutes with cron expression *0 0/4
* * * ?*.
SparkSubmitJob will be triggered immediately which is a simple schedule
with *repeat count **and internal *after being created by JobInstance.

Their scheduling rules are different. Current design makes the process
clear and simple. If you have a better design, you can propose and we can
discuss  whether to adopt together.

Thanks,
Kevin


On Tue, Mar 5, 2019 at 10:20 AM 大鹏 <18...@163.com> wrote:

What are the benefits of the current design?
I think only one job is enough. Is there only one JobInstance? After the
preparation of the data required by spark, JobInstance can pass the data to
the relevant methods of the SparkSubmitJob class




On 03/5/2019 10:15，Kevin Yao<ah...@gmail.com> wrote：
Hi,
JobInstance and SparkSubmitJob all implement the job interface.
JobInstance is mainly used to set source and predicate partitions that is
to split source data or predicate paths into several part and get every
part start timestamp. For examlple, When creating a measure, you configure
the *where *field to *dt=#YYYYMMdd# AND hour=#HH# *format and the predicate
*path* field to */dt=#YYYYMMdd#/hour=#HH#/_DONE* format. After JobInstance
is executed, the *where* value will become *dt=20190305 AND hour=01* and
the *path* value will become */dt=20190305/hour=00/_DONE *(value is just a
sample).  In short, it converts some of the configurations in measurement
into directly available data for SparkSubmitJob prediction and spark
calculation.

The SparkSubmitJob is mainly used to predicate whether the calculated data
is ready. If ready, Livy will submit the measure configuration where
converted in JobInstance to spark. Otherwise,  predication will be
continued for a certain number of times  wiht your configuration (default
is 12 times).


Thanks,
Kevin


On Mon, Mar 4, 2019 at 11:31 AM 大鹏 <18...@163.com> wrote:





I don't know the purpose of the two scheduling tasks in Griffin:


JobInstance and SparkSubmitJob，what is the connection between them?

Re: the purpose of the two scheduling tasks in Griffin

Posted by Kevin Yao <ah...@gmail.com>.

Hi，
JobInstance is actually the job you created which will be scheduled
*periodically or
at a specific time*, such as every four minutes with cron expression *0 0/4
* * * ?*.
SparkSubmitJob will be triggered immediately which is a simple schedule
with *repeat count **and internal *after being created by JobInstance.

Their scheduling rules are different. Current design makes the process
clear and simple. If you have a better design, you can propose and we can
discuss  whether to adopt together.

Thanks,
Kevin


On Tue, Mar 5, 2019 at 10:20 AM 大鹏 <18...@163.com> wrote:

> What are the benefits of the current design?
> I think only one job is enough. Is there only one JobInstance? After the
> preparation of the data required by spark, JobInstance can pass the data to
> the relevant methods of the SparkSubmitJob class
>
>
>
>
> On 03/5/2019 10:15，Kevin Yao<ah...@gmail.com> wrote：
> Hi,
> JobInstance and SparkSubmitJob all implement the job interface.
> JobInstance is mainly used to set source and predicate partitions that is
> to split source data or predicate paths into several part and get every
> part start timestamp. For examlple, When creating a measure, you configure
> the *where *field to *dt=#YYYYMMdd# AND hour=#HH# *format and the predicate
> *path* field to */dt=#YYYYMMdd#/hour=#HH#/_DONE* format. After JobInstance
> is executed, the *where* value will become *dt=20190305 AND hour=01* and
> the *path* value will become */dt=20190305/hour=00/_DONE *(value is just a
> sample).  In short, it converts some of the configurations in measurement
> into directly available data for SparkSubmitJob prediction and spark
> calculation.
>
> The SparkSubmitJob is mainly used to predicate whether the calculated data
> is ready. If ready, Livy will submit the measure configuration where
> converted in JobInstance to spark. Otherwise,  predication will be
> continued for a certain number of times  wiht your configuration (default
> is 12 times).
>
>
> Thanks,
> Kevin
>
>
> On Mon, Mar 4, 2019 at 11:31 AM 大鹏 <18...@163.com> wrote:
>
>
>
>
>
> I don't know the purpose of the two scheduling tasks in Griffin:
>
>
> JobInstance and SparkSubmitJob，what is the connection between them?
>

Re: the purpose of the two scheduling tasks in Griffin

Posted by 大鹏 <18...@163.com>.

What are the benefits of the current design?
I think only one job is enough. Is there only one JobInstance? After the preparation of the data required by spark, JobInstance can pass the data to the relevant methods of the SparkSubmitJob class

On 03/5/2019 10:15，Kevin Yao<ah...@gmail.com> wrote：
Hi,
JobInstance and SparkSubmitJob all implement the job interface.
JobInstance is mainly used to set source and predicate partitions that is
to split source data or predicate paths into several part and get every
part start timestamp. For examlple, When creating a measure, you configure
the *where *field to *dt=#YYYYMMdd# AND hour=#HH# *format and the predicate
*path* field to */dt=#YYYYMMdd#/hour=#HH#/_DONE* format. After JobInstance
is executed, the *where* value will become *dt=20190305 AND hour=01* and
the *path* value will become */dt=20190305/hour=00/_DONE *(value is just a
sample). In short, it converts some of the configurations in measurement
into directly available data for SparkSubmitJob prediction and spark
calculation.

The SparkSubmitJob is mainly used to predicate whether the calculated data
is ready. If ready, Livy will submit the measure configuration where
converted in JobInstance to spark. Otherwise, predication will be
continued for a certain number of times wiht your configuration (default
is 12 times).

Thanks,
Kevin

On Mon, Mar 4, 2019 at 11:31 AM 大鹏 <18...@163.com> wrote:

I don't know the purpose of the two scheduling tasks in Griffin:

JobInstance and SparkSubmitJob，what is the connection between them?

Re: the purpose of the two scheduling tasks in Griffin

Posted by Kevin Yao <ah...@gmail.com>.

Hi,
JobInstance and SparkSubmitJob all implement the job interface.
JobInstance is mainly used to set source and predicate partitions that is
to split source data or predicate paths into several part and get every
part start timestamp. For examlple, When creating a measure, you configure
the *where *field to *dt=#YYYYMMdd# AND hour=#HH# *format and the predicate
*path* field to */dt=#YYYYMMdd#/hour=#HH#/_DONE* format. After JobInstance
is executed, the *where* value will become *dt=20190305 AND hour=01* and
the *path* value will become */dt=20190305/hour=00/_DONE *(value is just a
sample).  In short, it converts some of the configurations in measurement
into directly available data for SparkSubmitJob prediction and spark
calculation.

The SparkSubmitJob is mainly used to predicate whether the calculated data
is ready. If ready, Livy will submit the measure configuration where
converted in JobInstance to spark. Otherwise,  predication will be
continued for a certain number of times  wiht your configuration (default
is 12 times).

Thanks,
Kevin

On Mon, Mar 4, 2019 at 11:31 AM 大鹏 <18...@163.com> wrote:

>
>
>
>
> I don't know the purpose of the two scheduling tasks in Griffin:
>
>
> JobInstance and SparkSubmitJob，what is the connection between them?