You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Matthias J. Sax" <mj...@informatik.hu-berlin.de> on 2015/05/27 01:17:58 UTC

[DISCUSS] Canceling Streaming Jobs

Hi,

currently, the only way to stop a streaming job is to "cancel" the job,
This has multiple disadvantage:
 1) a "clean" stopping is not possible (see
https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean stop
is a pre-requirement for FLINK-1929) and
 2) as a minor issue, all canceled jobs are listed as canceled in the
history (what is somewhat confusing for the user -- at least it was for
me when I started to work with Flink Streaming).

This issue was raised a few times already, however, no final conclusion
was there (if I remember correctly). I could not find a JIRA for it either.

From my understanding of the system, there would be two ways to
implement a nice way for stopping streaming jobs:

  1) "Task"s can be distinguished between "batch" and "streaming"
     -> canceling a batch jobs works as always
     -> canceling a streaming job only send a "canceling" signal to the
sources, and waits until the job finishes (ie, sources stop emitting
data and finish regularly, triggering the finishing of all operators).
For this case, streaming jobs are stopped in a "clean way" (as is the
input would have be finite) and the job will be listed as "finished" in
the history regularly.

  This approach has the advantage, that it should be simpler to
implement. However, the disadvantages are (1) a "hard canceling" of jobs
is not possible any more, and (2) Flink must be able to distinguishes
batch and streaming jobs (I don't think Flink runtime can distinguish
both right now?)

  2) A new message "terminate" (or similar) is introduced, that can only
be used for streaming jobs (would be ignored for batch jobs) that stops
the sources and waits until the job finishes regularly.

  This approach has the advantage, that current system behavior is
preserved (it only adds a few feature). The disadvantage is, that all
clients need to be touched and it must be clear to the user, that
"terminate" does not work for streaming jobs. If an error/warning should
be raised if a user tries to "terminate" a batch job, Flink must be able
to distinguish between batch and streaming jobs, too.  As an
alternative, "terminate" on batch jobs could be interpreted as "cancel",
too.


I personally think, that the second approach is better. Please give
feedback. If we can get to a conclusion how to implement it, I would
like to work on it.


-Matthias


Re: [DISCUSS] Canceling Streaming Jobs

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
Stephan, not sure what you mean by this exactly... But I guess, this a
an "add-on" that can be done later. Seems to be related to
https://issues.apache.org/jira/browse/FLINK-1929

I will open a JIRA for the new "terminate" message and assign it to myself.

-Matthias


On 05/27/2015 12:36 PM, Stephan Ewen wrote:
> +1 for the second option.
> 
> How about we allow to pass a flag that indicates whether a checkpoint
> should be taken together with the canceling?
> 
> 
> On Wed, May 27, 2015 at 12:27 PM, Aljoscha Krettek <al...@apache.org>
> wrote:
> 
>> I would also prefer the second option. The first is rather a hack but not
>> an option. :D
>> On May 27, 2015 9:14 AM, "Márton Balassi" <ba...@gmail.com>
>> wrote:
>>
>>> +1 for the second option:
>>>
>>> It would also provide possibility to properly commit a state checkpoint
>>> after the terminate message was triggered. In some cases this can be a
>>> desirable behaviour.
>>>
>>> On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <gy...@apache.org> wrote:
>>>
>>>> Hey,
>>>>
>>>> I would also strongly prefer the second option, users need to have the
>>>> option to force cancel a program in case of something unwanted
>> behaviour.
>>>>
>>>> Cheers,
>>>> Gyula
>>>>
>>>> Matthias J. Sax <mj...@informatik.hu-berlin.de> ezt írta (időpont:
>> 2015.
>>>> máj. 27., Sze, 1:20):
>>>>
>>>>> Hi,
>>>>>
>>>>> currently, the only way to stop a streaming job is to "cancel" the
>> job,
>>>>> This has multiple disadvantage:
>>>>>  1) a "clean" stopping is not possible (see
>>>>> https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean
>>> stop
>>>>> is a pre-requirement for FLINK-1929) and
>>>>>  2) as a minor issue, all canceled jobs are listed as canceled in the
>>>>> history (what is somewhat confusing for the user -- at least it was
>> for
>>>>> me when I started to work with Flink Streaming).
>>>>>
>>>>> This issue was raised a few times already, however, no final
>> conclusion
>>>>> was there (if I remember correctly). I could not find a JIRA for it
>>>> either.
>>>>>
>>>>> From my understanding of the system, there would be two ways to
>>>>> implement a nice way for stopping streaming jobs:
>>>>>
>>>>>   1) "Task"s can be distinguished between "batch" and "streaming"
>>>>>      -> canceling a batch jobs works as always
>>>>>      -> canceling a streaming job only send a "canceling" signal to
>> the
>>>>> sources, and waits until the job finishes (ie, sources stop emitting
>>>>> data and finish regularly, triggering the finishing of all
>> operators).
>>>>> For this case, streaming jobs are stopped in a "clean way" (as is the
>>>>> input would have be finite) and the job will be listed as "finished"
>> in
>>>>> the history regularly.
>>>>>
>>>>>   This approach has the advantage, that it should be simpler to
>>>>> implement. However, the disadvantages are (1) a "hard canceling" of
>>> jobs
>>>>> is not possible any more, and (2) Flink must be able to distinguishes
>>>>> batch and streaming jobs (I don't think Flink runtime can distinguish
>>>>> both right now?)
>>>>>
>>>>>   2) A new message "terminate" (or similar) is introduced, that can
>>> only
>>>>> be used for streaming jobs (would be ignored for batch jobs) that
>> stops
>>>>> the sources and waits until the job finishes regularly.
>>>>>
>>>>>   This approach has the advantage, that current system behavior is
>>>>> preserved (it only adds a few feature). The disadvantage is, that all
>>>>> clients need to be touched and it must be clear to the user, that
>>>>> "terminate" does not work for streaming jobs. If an error/warning
>>> should
>>>>> be raised if a user tries to "terminate" a batch job, Flink must be
>>> able
>>>>> to distinguish between batch and streaming jobs, too.  As an
>>>>> alternative, "terminate" on batch jobs could be interpreted as
>>> "cancel",
>>>>> too.
>>>>>
>>>>>
>>>>> I personally think, that the second approach is better. Please give
>>>>> feedback. If we can get to a conclusion how to implement it, I would
>>>>> like to work on it.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>
>>>
>>
> 


Re: [DISCUSS] Canceling Streaming Jobs

Posted by Stephan Ewen <se...@apache.org>.
+1 for the second option.

How about we allow to pass a flag that indicates whether a checkpoint
should be taken together with the canceling?


On Wed, May 27, 2015 at 12:27 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> I would also prefer the second option. The first is rather a hack but not
> an option. :D
> On May 27, 2015 9:14 AM, "Márton Balassi" <ba...@gmail.com>
> wrote:
>
> > +1 for the second option:
> >
> > It would also provide possibility to properly commit a state checkpoint
> > after the terminate message was triggered. In some cases this can be a
> > desirable behaviour.
> >
> > On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <gy...@apache.org> wrote:
> >
> > > Hey,
> > >
> > > I would also strongly prefer the second option, users need to have the
> > > option to force cancel a program in case of something unwanted
> behaviour.
> > >
> > > Cheers,
> > > Gyula
> > >
> > > Matthias J. Sax <mj...@informatik.hu-berlin.de> ezt írta (időpont:
> 2015.
> > > máj. 27., Sze, 1:20):
> > >
> > > > Hi,
> > > >
> > > > currently, the only way to stop a streaming job is to "cancel" the
> job,
> > > > This has multiple disadvantage:
> > > >  1) a "clean" stopping is not possible (see
> > > > https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean
> > stop
> > > > is a pre-requirement for FLINK-1929) and
> > > >  2) as a minor issue, all canceled jobs are listed as canceled in the
> > > > history (what is somewhat confusing for the user -- at least it was
> for
> > > > me when I started to work with Flink Streaming).
> > > >
> > > > This issue was raised a few times already, however, no final
> conclusion
> > > > was there (if I remember correctly). I could not find a JIRA for it
> > > either.
> > > >
> > > > From my understanding of the system, there would be two ways to
> > > > implement a nice way for stopping streaming jobs:
> > > >
> > > >   1) "Task"s can be distinguished between "batch" and "streaming"
> > > >      -> canceling a batch jobs works as always
> > > >      -> canceling a streaming job only send a "canceling" signal to
> the
> > > > sources, and waits until the job finishes (ie, sources stop emitting
> > > > data and finish regularly, triggering the finishing of all
> operators).
> > > > For this case, streaming jobs are stopped in a "clean way" (as is the
> > > > input would have be finite) and the job will be listed as "finished"
> in
> > > > the history regularly.
> > > >
> > > >   This approach has the advantage, that it should be simpler to
> > > > implement. However, the disadvantages are (1) a "hard canceling" of
> > jobs
> > > > is not possible any more, and (2) Flink must be able to distinguishes
> > > > batch and streaming jobs (I don't think Flink runtime can distinguish
> > > > both right now?)
> > > >
> > > >   2) A new message "terminate" (or similar) is introduced, that can
> > only
> > > > be used for streaming jobs (would be ignored for batch jobs) that
> stops
> > > > the sources and waits until the job finishes regularly.
> > > >
> > > >   This approach has the advantage, that current system behavior is
> > > > preserved (it only adds a few feature). The disadvantage is, that all
> > > > clients need to be touched and it must be clear to the user, that
> > > > "terminate" does not work for streaming jobs. If an error/warning
> > should
> > > > be raised if a user tries to "terminate" a batch job, Flink must be
> > able
> > > > to distinguish between batch and streaming jobs, too.  As an
> > > > alternative, "terminate" on batch jobs could be interpreted as
> > "cancel",
> > > > too.
> > > >
> > > >
> > > > I personally think, that the second approach is better. Please give
> > > > feedback. If we can get to a conclusion how to implement it, I would
> > > > like to work on it.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Canceling Streaming Jobs

Posted by Aljoscha Krettek <al...@apache.org>.
I would also prefer the second option. The first is rather a hack but not
an option. :D
On May 27, 2015 9:14 AM, "Márton Balassi" <ba...@gmail.com> wrote:

> +1 for the second option:
>
> It would also provide possibility to properly commit a state checkpoint
> after the terminate message was triggered. In some cases this can be a
> desirable behaviour.
>
> On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <gy...@apache.org> wrote:
>
> > Hey,
> >
> > I would also strongly prefer the second option, users need to have the
> > option to force cancel a program in case of something unwanted behaviour.
> >
> > Cheers,
> > Gyula
> >
> > Matthias J. Sax <mj...@informatik.hu-berlin.de> ezt írta (időpont: 2015.
> > máj. 27., Sze, 1:20):
> >
> > > Hi,
> > >
> > > currently, the only way to stop a streaming job is to "cancel" the job,
> > > This has multiple disadvantage:
> > >  1) a "clean" stopping is not possible (see
> > > https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean
> stop
> > > is a pre-requirement for FLINK-1929) and
> > >  2) as a minor issue, all canceled jobs are listed as canceled in the
> > > history (what is somewhat confusing for the user -- at least it was for
> > > me when I started to work with Flink Streaming).
> > >
> > > This issue was raised a few times already, however, no final conclusion
> > > was there (if I remember correctly). I could not find a JIRA for it
> > either.
> > >
> > > From my understanding of the system, there would be two ways to
> > > implement a nice way for stopping streaming jobs:
> > >
> > >   1) "Task"s can be distinguished between "batch" and "streaming"
> > >      -> canceling a batch jobs works as always
> > >      -> canceling a streaming job only send a "canceling" signal to the
> > > sources, and waits until the job finishes (ie, sources stop emitting
> > > data and finish regularly, triggering the finishing of all operators).
> > > For this case, streaming jobs are stopped in a "clean way" (as is the
> > > input would have be finite) and the job will be listed as "finished" in
> > > the history regularly.
> > >
> > >   This approach has the advantage, that it should be simpler to
> > > implement. However, the disadvantages are (1) a "hard canceling" of
> jobs
> > > is not possible any more, and (2) Flink must be able to distinguishes
> > > batch and streaming jobs (I don't think Flink runtime can distinguish
> > > both right now?)
> > >
> > >   2) A new message "terminate" (or similar) is introduced, that can
> only
> > > be used for streaming jobs (would be ignored for batch jobs) that stops
> > > the sources and waits until the job finishes regularly.
> > >
> > >   This approach has the advantage, that current system behavior is
> > > preserved (it only adds a few feature). The disadvantage is, that all
> > > clients need to be touched and it must be clear to the user, that
> > > "terminate" does not work for streaming jobs. If an error/warning
> should
> > > be raised if a user tries to "terminate" a batch job, Flink must be
> able
> > > to distinguish between batch and streaming jobs, too.  As an
> > > alternative, "terminate" on batch jobs could be interpreted as
> "cancel",
> > > too.
> > >
> > >
> > > I personally think, that the second approach is better. Please give
> > > feedback. If we can get to a conclusion how to implement it, I would
> > > like to work on it.
> > >
> > >
> > > -Matthias
> > >
> > >
> >
>

Re: [DISCUSS] Canceling Streaming Jobs

Posted by Márton Balassi <ba...@gmail.com>.
+1 for the second option:

It would also provide possibility to properly commit a state checkpoint
after the terminate message was triggered. In some cases this can be a
desirable behaviour.

On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <gy...@apache.org> wrote:

> Hey,
>
> I would also strongly prefer the second option, users need to have the
> option to force cancel a program in case of something unwanted behaviour.
>
> Cheers,
> Gyula
>
> Matthias J. Sax <mj...@informatik.hu-berlin.de> ezt írta (időpont: 2015.
> máj. 27., Sze, 1:20):
>
> > Hi,
> >
> > currently, the only way to stop a streaming job is to "cancel" the job,
> > This has multiple disadvantage:
> >  1) a "clean" stopping is not possible (see
> > https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean stop
> > is a pre-requirement for FLINK-1929) and
> >  2) as a minor issue, all canceled jobs are listed as canceled in the
> > history (what is somewhat confusing for the user -- at least it was for
> > me when I started to work with Flink Streaming).
> >
> > This issue was raised a few times already, however, no final conclusion
> > was there (if I remember correctly). I could not find a JIRA for it
> either.
> >
> > From my understanding of the system, there would be two ways to
> > implement a nice way for stopping streaming jobs:
> >
> >   1) "Task"s can be distinguished between "batch" and "streaming"
> >      -> canceling a batch jobs works as always
> >      -> canceling a streaming job only send a "canceling" signal to the
> > sources, and waits until the job finishes (ie, sources stop emitting
> > data and finish regularly, triggering the finishing of all operators).
> > For this case, streaming jobs are stopped in a "clean way" (as is the
> > input would have be finite) and the job will be listed as "finished" in
> > the history regularly.
> >
> >   This approach has the advantage, that it should be simpler to
> > implement. However, the disadvantages are (1) a "hard canceling" of jobs
> > is not possible any more, and (2) Flink must be able to distinguishes
> > batch and streaming jobs (I don't think Flink runtime can distinguish
> > both right now?)
> >
> >   2) A new message "terminate" (or similar) is introduced, that can only
> > be used for streaming jobs (would be ignored for batch jobs) that stops
> > the sources and waits until the job finishes regularly.
> >
> >   This approach has the advantage, that current system behavior is
> > preserved (it only adds a few feature). The disadvantage is, that all
> > clients need to be touched and it must be clear to the user, that
> > "terminate" does not work for streaming jobs. If an error/warning should
> > be raised if a user tries to "terminate" a batch job, Flink must be able
> > to distinguish between batch and streaming jobs, too.  As an
> > alternative, "terminate" on batch jobs could be interpreted as "cancel",
> > too.
> >
> >
> > I personally think, that the second approach is better. Please give
> > feedback. If we can get to a conclusion how to implement it, I would
> > like to work on it.
> >
> >
> > -Matthias
> >
> >
>

Re: [DISCUSS] Canceling Streaming Jobs

Posted by Gyula Fóra <gy...@apache.org>.
Hey,

I would also strongly prefer the second option, users need to have the
option to force cancel a program in case of something unwanted behaviour.

Cheers,
Gyula

Matthias J. Sax <mj...@informatik.hu-berlin.de> ezt írta (időpont: 2015.
máj. 27., Sze, 1:20):

> Hi,
>
> currently, the only way to stop a streaming job is to "cancel" the job,
> This has multiple disadvantage:
>  1) a "clean" stopping is not possible (see
> https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean stop
> is a pre-requirement for FLINK-1929) and
>  2) as a minor issue, all canceled jobs are listed as canceled in the
> history (what is somewhat confusing for the user -- at least it was for
> me when I started to work with Flink Streaming).
>
> This issue was raised a few times already, however, no final conclusion
> was there (if I remember correctly). I could not find a JIRA for it either.
>
> From my understanding of the system, there would be two ways to
> implement a nice way for stopping streaming jobs:
>
>   1) "Task"s can be distinguished between "batch" and "streaming"
>      -> canceling a batch jobs works as always
>      -> canceling a streaming job only send a "canceling" signal to the
> sources, and waits until the job finishes (ie, sources stop emitting
> data and finish regularly, triggering the finishing of all operators).
> For this case, streaming jobs are stopped in a "clean way" (as is the
> input would have be finite) and the job will be listed as "finished" in
> the history regularly.
>
>   This approach has the advantage, that it should be simpler to
> implement. However, the disadvantages are (1) a "hard canceling" of jobs
> is not possible any more, and (2) Flink must be able to distinguishes
> batch and streaming jobs (I don't think Flink runtime can distinguish
> both right now?)
>
>   2) A new message "terminate" (or similar) is introduced, that can only
> be used for streaming jobs (would be ignored for batch jobs) that stops
> the sources and waits until the job finishes regularly.
>
>   This approach has the advantage, that current system behavior is
> preserved (it only adds a few feature). The disadvantage is, that all
> clients need to be touched and it must be clear to the user, that
> "terminate" does not work for streaming jobs. If an error/warning should
> be raised if a user tries to "terminate" a batch job, Flink must be able
> to distinguish between batch and streaming jobs, too.  As an
> alternative, "terminate" on batch jobs could be interpreted as "cancel",
> too.
>
>
> I personally think, that the second approach is better. Please give
> feedback. If we can get to a conclusion how to implement it, I would
> like to work on it.
>
>
> -Matthias
>
>