You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2014/09/11 15:36:23 UTC

Job scheduling

Hi to all,

I'd like to know if there's an example of how to schedule a Job in Flink.
Do we still need something like Oozie or Quartz or we can avoid them?

Best,
Flavio

Re: Job scheduling

Posted by Flavio Pompermaier <po...@okkam.it>.
I think it could be a useful feature to implement in Stockholm if I could
be there.. :)

On Thu, Sep 18, 2014 at 10:23 AM, Robert Metzger <rm...@apache.org>
wrote:

> I don't think that we have a suggested way.
>
> If I would have the requirement, I would look into Oozie. I think its
> quite easy to add additional services (=Flink) into Oozie. In addition, it
> seems to have a REST interface and some other stuff.
>
> If you want, you could also implement one yourself and contribute it back
> to Flink.
>
> On Thu, Sep 18, 2014 at 10:11 AM, Flavio Pompermaier <pompermaier@okkam.it
> > wrote:
>
>> Yes I was referring exactly to that, I was also involved in the Dopa
>> project :)
>> So, at the moment what is the suggested way to schedule jobs with Flink?
>>
>>
>> On Thu, Sep 18, 2014 at 9:48 AM, Robert Metzger <rm...@apache.org>
>> wrote:
>>
>>> Are you referring to this project?
>>> https://github.com/TU-Berlin/dopa-scheduler
>>> Its not an official repository of the Flink (Stratosphere) project. I
>>> think a PhD student at TU Berlin created the code there.
>>>
>>>
>>>
>>> On Thu, Sep 11, 2014 at 4:29 PM, Flavio Pompermaier <
>>> pompermaier@okkam.it> wrote:
>>>
>>>> Of course with Flink I could in principle execute almost everything
>>>> with a single Job but, in general, I could write 2 different jobs and
>>>> decide from time to time when the second should be run.
>>>> That's why also metheor scripts are very useful :)
>>>> From what I know there was a scheduler in Stratosphere that was using
>>>> RabbitMQ, right?
>>>>
>>>> I would like to avoid to run linux commands and instead use some REST
>>>> interface to trigger or schedule jobs.
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>>
>>>> On Thu, Sep 11, 2014 at 4:07 PM, Fabian Hueske <fh...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Flavio,
>>>>>
>>>>> what exactly do you mean by scheduling?
>>>>> Do you want to run a job in regular intervals or execute a complex
>>>>> workflow?
>>>>>
>>>>> Oozie is primarily used to orchestrate the execution of MapReduce
>>>>> workflows. Since, MR is a rather inflexible programming model, complex
>>>>> tasks need to split up into multiple dependent jobs that are executed once
>>>>> their predecessors have finished. Oozie orchestrates this execution.
>>>>> In Flink, you can build a complex analysis flow as a single program
>>>>> and execute it. Hence, there is no need for a workflow scheduler such as
>>>>> Oozie.
>>>>>
>>>>> If you want to run a job in regular intervals, you can configure a
>>>>> cron job, that starts executes the CLI client or implement a Java or Scala
>>>>> program that submits jobs a certain points in time.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>> 2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>>>>
>>>>>> Hi to all,
>>>>>>
>>>>>> I'd like to know if there's an example of how to schedule a Job in
>>>>>> Flink.
>>>>>> Do we still need something like Oozie or Quartz or we can avoid them?
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>
>>>>
>>

Re: Job scheduling

Posted by Robert Metzger <rm...@apache.org>.
I don't think that we have a suggested way.

If I would have the requirement, I would look into Oozie. I think its quite
easy to add additional services (=Flink) into Oozie. In addition, it seems
to have a REST interface and some other stuff.

If you want, you could also implement one yourself and contribute it back
to Flink.

On Thu, Sep 18, 2014 at 10:11 AM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Yes I was referring exactly to that, I was also involved in the Dopa
> project :)
> So, at the moment what is the suggested way to schedule jobs with Flink?
>
>
> On Thu, Sep 18, 2014 at 9:48 AM, Robert Metzger <rm...@apache.org>
> wrote:
>
>> Are you referring to this project?
>> https://github.com/TU-Berlin/dopa-scheduler
>> Its not an official repository of the Flink (Stratosphere) project. I
>> think a PhD student at TU Berlin created the code there.
>>
>>
>>
>> On Thu, Sep 11, 2014 at 4:29 PM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Of course with Flink I could in principle execute almost everything with
>>> a single Job but, in general, I could write 2 different jobs and decide
>>> from time to time when the second should be run.
>>> That's why also metheor scripts are very useful :)
>>> From what I know there was a scheduler in Stratosphere that was using
>>> RabbitMQ, right?
>>>
>>> I would like to avoid to run linux commands and instead use some REST
>>> interface to trigger or schedule jobs.
>>>
>>> Best,
>>> Flavio
>>>
>>>
>>> On Thu, Sep 11, 2014 at 4:07 PM, Fabian Hueske <fh...@apache.org>
>>> wrote:
>>>
>>>> Hi Flavio,
>>>>
>>>> what exactly do you mean by scheduling?
>>>> Do you want to run a job in regular intervals or execute a complex
>>>> workflow?
>>>>
>>>> Oozie is primarily used to orchestrate the execution of MapReduce
>>>> workflows. Since, MR is a rather inflexible programming model, complex
>>>> tasks need to split up into multiple dependent jobs that are executed once
>>>> their predecessors have finished. Oozie orchestrates this execution.
>>>> In Flink, you can build a complex analysis flow as a single program and
>>>> execute it. Hence, there is no need for a workflow scheduler such as Oozie.
>>>>
>>>> If you want to run a job in regular intervals, you can configure a cron
>>>> job, that starts executes the CLI client or implement a Java or Scala
>>>> program that submits jobs a certain points in time.
>>>>
>>>> Best, Fabian
>>>>
>>>> 2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>>>
>>>>> Hi to all,
>>>>>
>>>>> I'd like to know if there's an example of how to schedule a Job in
>>>>> Flink.
>>>>> Do we still need something like Oozie or Quartz or we can avoid them?
>>>>>
>>>>> Best,
>>>>> Flavio
>>>>>
>>>>
>>>
>

Re: Job scheduling

Posted by Flavio Pompermaier <po...@okkam.it>.
Yes I was referring exactly to that, I was also involved in the Dopa
project :)
So, at the moment what is the suggested way to schedule jobs with Flink?

On Thu, Sep 18, 2014 at 9:48 AM, Robert Metzger <rm...@apache.org> wrote:

> Are you referring to this project?
> https://github.com/TU-Berlin/dopa-scheduler
> Its not an official repository of the Flink (Stratosphere) project. I
> think a PhD student at TU Berlin created the code there.
>
>
>
> On Thu, Sep 11, 2014 at 4:29 PM, Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Of course with Flink I could in principle execute almost everything with
>> a single Job but, in general, I could write 2 different jobs and decide
>> from time to time when the second should be run.
>> That's why also metheor scripts are very useful :)
>> From what I know there was a scheduler in Stratosphere that was using
>> RabbitMQ, right?
>>
>> I would like to avoid to run linux commands and instead use some REST
>> interface to trigger or schedule jobs.
>>
>> Best,
>> Flavio
>>
>>
>> On Thu, Sep 11, 2014 at 4:07 PM, Fabian Hueske <fh...@apache.org>
>> wrote:
>>
>>> Hi Flavio,
>>>
>>> what exactly do you mean by scheduling?
>>> Do you want to run a job in regular intervals or execute a complex
>>> workflow?
>>>
>>> Oozie is primarily used to orchestrate the execution of MapReduce
>>> workflows. Since, MR is a rather inflexible programming model, complex
>>> tasks need to split up into multiple dependent jobs that are executed once
>>> their predecessors have finished. Oozie orchestrates this execution.
>>> In Flink, you can build a complex analysis flow as a single program and
>>> execute it. Hence, there is no need for a workflow scheduler such as Oozie.
>>>
>>> If you want to run a job in regular intervals, you can configure a cron
>>> job, that starts executes the CLI client or implement a Java or Scala
>>> program that submits jobs a certain points in time.
>>>
>>> Best, Fabian
>>>
>>> 2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>>
>>>> Hi to all,
>>>>
>>>> I'd like to know if there's an example of how to schedule a Job in
>>>> Flink.
>>>> Do we still need something like Oozie or Quartz or we can avoid them?
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>
>>

Re: Job scheduling

Posted by Robert Metzger <rm...@apache.org>.
Are you referring to this project?
https://github.com/TU-Berlin/dopa-scheduler
Its not an official repository of the Flink (Stratosphere) project. I think
a PhD student at TU Berlin created the code there.



On Thu, Sep 11, 2014 at 4:29 PM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Of course with Flink I could in principle execute almost everything with a
> single Job but, in general, I could write 2 different jobs and decide from
> time to time when the second should be run.
> That's why also metheor scripts are very useful :)
> From what I know there was a scheduler in Stratosphere that was using
> RabbitMQ, right?
>
> I would like to avoid to run linux commands and instead use some REST
> interface to trigger or schedule jobs.
>
> Best,
> Flavio
>
>
> On Thu, Sep 11, 2014 at 4:07 PM, Fabian Hueske <fh...@apache.org> wrote:
>
>> Hi Flavio,
>>
>> what exactly do you mean by scheduling?
>> Do you want to run a job in regular intervals or execute a complex
>> workflow?
>>
>> Oozie is primarily used to orchestrate the execution of MapReduce
>> workflows. Since, MR is a rather inflexible programming model, complex
>> tasks need to split up into multiple dependent jobs that are executed once
>> their predecessors have finished. Oozie orchestrates this execution.
>> In Flink, you can build a complex analysis flow as a single program and
>> execute it. Hence, there is no need for a workflow scheduler such as Oozie.
>>
>> If you want to run a job in regular intervals, you can configure a cron
>> job, that starts executes the CLI client or implement a Java or Scala
>> program that submits jobs a certain points in time.
>>
>> Best, Fabian
>>
>> 2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>
>>> Hi to all,
>>>
>>> I'd like to know if there's an example of how to schedule a Job in Flink.
>>> Do we still need something like Oozie or Quartz or we can avoid them?
>>>
>>> Best,
>>> Flavio
>>>
>>
>

Re: Job scheduling

Posted by Flavio Pompermaier <po...@okkam.it>.
Of course with Flink I could in principle execute almost everything with a
single Job but, in general, I could write 2 different jobs and decide from
time to time when the second should be run.
That's why also metheor scripts are very useful :)
>From what I know there was a scheduler in Stratosphere that was using
RabbitMQ, right?

I would like to avoid to run linux commands and instead use some REST
interface to trigger or schedule jobs.

Best,
Flavio

On Thu, Sep 11, 2014 at 4:07 PM, Fabian Hueske <fh...@apache.org> wrote:

> Hi Flavio,
>
> what exactly do you mean by scheduling?
> Do you want to run a job in regular intervals or execute a complex
> workflow?
>
> Oozie is primarily used to orchestrate the execution of MapReduce
> workflows. Since, MR is a rather inflexible programming model, complex
> tasks need to split up into multiple dependent jobs that are executed once
> their predecessors have finished. Oozie orchestrates this execution.
> In Flink, you can build a complex analysis flow as a single program and
> execute it. Hence, there is no need for a workflow scheduler such as Oozie.
>
> If you want to run a job in regular intervals, you can configure a cron
> job, that starts executes the CLI client or implement a Java or Scala
> program that submits jobs a certain points in time.
>
> Best, Fabian
>
> 2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>
>> Hi to all,
>>
>> I'd like to know if there's an example of how to schedule a Job in Flink.
>> Do we still need something like Oozie or Quartz or we can avoid them?
>>
>> Best,
>> Flavio
>>
>

Re: Job scheduling

Posted by Fabian Hueske <fh...@apache.org>.
Hi Flavio,

what exactly do you mean by scheduling?
Do you want to run a job in regular intervals or execute a complex workflow?

Oozie is primarily used to orchestrate the execution of MapReduce
workflows. Since, MR is a rather inflexible programming model, complex
tasks need to split up into multiple dependent jobs that are executed once
their predecessors have finished. Oozie orchestrates this execution.
In Flink, you can build a complex analysis flow as a single program and
execute it. Hence, there is no need for a workflow scheduler such as Oozie.

If you want to run a job in regular intervals, you can configure a cron
job, that starts executes the CLI client or implement a Java or Scala
program that submits jobs a certain points in time.

Best, Fabian

2014-09-11 15:36 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:

> Hi to all,
>
> I'd like to know if there's an example of how to schedule a Job in Flink.
> Do we still need something like Oozie or Quartz or we can avoid them?
>
> Best,
> Flavio
>