You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by zhijiang <wa...@aliyun.com.INVALID> on 2018/11/06 03:28:43 UTC

回复：[DISCUSS] Task speculative execution for Flink batch

Thanks yangyu for launching this discussion.

I really like this proposal. We ever found this scene frequently that some long tail tasks to delay the total batch job execution time in production.
We also have some thoughts for bringing this mechanism. Looking forward to your detail design doc, then we can discussion further.

Best,
Zhijiang
------------------------------------------------------------------
发件人：Tao Yangyu <ry...@gmail.com>
发送时间：2018年11月6日(星期二) 11:01
收件人：dev <de...@flink.apache.org>
主 题：[DISCUSS] Task speculative execution for Flink batch

Hi everyone,

We propose task speculative execution for Flink batch in this message as
follows.

In the batch mode, the job is usually divided into multiple parallel tasks
executed cross many nodes in the cluster. It is common to encounter the
performance degradation on some nodes due to hardware problems or accident
I/O busy and high CPU load. This kind of degradation can probably cause the
running tasks on the node to be quite slow that is so called long tail
tasks. Although the long tail tasks will not fail, they can severely affect
the total job running time. Flink task scheduler does not take this long
tail problem into account currently.

Here we propose the speculative execution strategy to handle the problem.
The basic idea is to run a copy of task on another node when the original
task is identified to be long tail. In more details, the speculative task
will be triggered when the scheduler detects that the data processing
throughput of a task is much slower than others. The speculative task is
executed in parallel with the original one and share the same failure retry
mechanism. Once either task complete, the scheduler admits its output as
the final result and cancel the other running one. The preliminary
experiments has demonstrated the effectiveness.

The detailed design doc will be ready soon. Your reviews and comments will
be much appreciated.

Thanks!

Ryan

Re: [DISCUSS] Task speculative execution for Flink batch

Posted by Becket Qin <be...@gmail.com>.

+1, Thanks Yangyu for proposing this very useful feature. Looking forward
to the design doc.

On Wed, Nov 7, 2018 at 10:15 AM SHI Xiaogang <sh...@gmail.com> wrote:

> Hi,
>
> +1 for the speculative execution.
>
> It will be more great if it can work well with exisitng checkpointing and
> pipelined execution. That way, we can move a further step towards the
> unification of batch and stream processing.
>
> Regards,
> Xiaogang
>
> Jeff Zhang <zj...@gmail.com> 于2018年11月7日周三 上午9:40写道：
>
> > +1 for the speculative execution for Flink batch, Speculative execution
> is
> > used in lots of batch execution engine like mr, tez and spark. This would
> > be a great improvement for Flink in batch scenario.
> >
> > Jin Sun <is...@gmail.com>于2018年11月7日周三 上午8:38写道：
> >
> > > I think this is target for batch at the very beginning, the idea should
> > be
> > > also work for both case, with different algorithm/strategy.
> > >
> > > Ryan, since you are working on this, I will assign FLINK-10644 <
> > > https://issues.apache.org/jira/browse/FLINK-10644> to you.
> > >
> > > Jin
> > >
> > > > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <tr...@apache.org>
> > wrote:
> > > >
> > > > Thanks for starting this discussion Ryan. I'm looking forward to your
> > > > design document about this feature. Quick question: Will it be a
> batch
> > > only
> > > > feature? If no, then it needs to take checkpointing into account as
> > well.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wangzhijiang999@aliyun.com
> > > .invalid>
> > > > wrote:
> > > >
> > > >> Thanks yangyu for launching this discussion.
> > > >>
> > > >> I really like this proposal. We ever found this scene frequently
> that
> > > some
> > > >> long tail tasks to delay the total batch job execution time in
> > > production.
> > > >> We also have some thoughts for bringing this mechanism. Looking
> > forward
> > > to
> > > >> your detail design doc, then we can discussion further.
> > > >>
> > > >> Best,
> > > >> Zhijiang
> > > >> ------------------------------------------------------------------
> > > >> 发件人：Tao Yangyu <ry...@gmail.com>
> > > >> 发送时间：2018年11月6日(星期二) 11:01
> > > >> 收件人：dev <de...@flink.apache.org>
> > > >> 主 题：[DISCUSS] Task speculative execution for Flink batch
> > > >>
> > > >> Hi everyone,
> > > >>
> > > >> We propose task speculative execution for Flink batch in this
> message
> > as
> > > >> follows.
> > > >>
> > > >> In the batch mode, the job is usually divided into multiple parallel
> > > tasks
> > > >> executed cross many nodes in the cluster. It is common to encounter
> > the
> > > >> performance degradation on some nodes due to hardware problems or
> > > accident
> > > >> I/O busy and high CPU load. This kind of degradation can probably
> > cause
> > > the
> > > >> running tasks on the node to be quite slow that is so called long
> tail
> > > >> tasks. Although the long tail tasks will not fail, they can severely
> > > affect
> > > >> the total job running time. Flink task scheduler does not take this
> > long
> > > >> tail problem into account currently.
> > > >>
> > > >>
> > > >>
> > > >> Here we propose the speculative execution strategy to handle the
> > > problem.
> > > >> The basic idea is to run a copy of task on another node when the
> > > original
> > > >> task is identified to be long tail. In more details, the speculative
> > > task
> > > >> will be triggered when the scheduler detects that the data
> processing
> > > >> throughput of a task is much slower than others. The speculative
> task
> > is
> > > >> executed in parallel with the original one and share the same
> failure
> > > retry
> > > >> mechanism. Once either task complete, the scheduler admits its
> output
> > as
> > > >> the final result and cancel the other running one. The preliminary
> > > >> experiments has demonstrated the effectiveness.
> > > >>
> > > >>
> > > >> The detailed design doc will be ready soon.  Your reviews and
> comments
> > > will
> > > >> be much appreciated.
> > > >>
> > > >>
> > > >> Thanks!
> > > >>
> > > >> Ryan
> > > >>
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] Task speculative execution for Flink batch

Posted by SHI Xiaogang <sh...@gmail.com>.

Hi,

+1 for the speculative execution.

It will be more great if it can work well with exisitng checkpointing and
pipelined execution. That way, we can move a further step towards the
unification of batch and stream processing.

Regards,
Xiaogang

Jeff Zhang <zj...@gmail.com> 于2018年11月7日周三 上午9:40写道：

> +1 for the speculative execution for Flink batch, Speculative execution is
> used in lots of batch execution engine like mr, tez and spark. This would
> be a great improvement for Flink in batch scenario.
>
> Jin Sun <is...@gmail.com>于2018年11月7日周三 上午8:38写道：
>
> > I think this is target for batch at the very beginning, the idea should
> be
> > also work for both case, with different algorithm/strategy.
> >
> > Ryan, since you are working on this, I will assign FLINK-10644 <
> > https://issues.apache.org/jira/browse/FLINK-10644> to you.
> >
> > Jin
> >
> > > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <tr...@apache.org>
> wrote:
> > >
> > > Thanks for starting this discussion Ryan. I'm looking forward to your
> > > design document about this feature. Quick question: Will it be a batch
> > only
> > > feature? If no, then it needs to take checkpointing into account as
> well.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wangzhijiang999@aliyun.com
> > .invalid>
> > > wrote:
> > >
> > >> Thanks yangyu for launching this discussion.
> > >>
> > >> I really like this proposal. We ever found this scene frequently that
> > some
> > >> long tail tasks to delay the total batch job execution time in
> > production.
> > >> We also have some thoughts for bringing this mechanism. Looking
> forward
> > to
> > >> your detail design doc, then we can discussion further.
> > >>
> > >> Best,
> > >> Zhijiang
> > >> ------------------------------------------------------------------
> > >> 发件人：Tao Yangyu <ry...@gmail.com>
> > >> 发送时间：2018年11月6日(星期二) 11:01
> > >> 收件人：dev <de...@flink.apache.org>
> > >> 主 题：[DISCUSS] Task speculative execution for Flink batch
> > >>
> > >> Hi everyone,
> > >>
> > >> We propose task speculative execution for Flink batch in this message
> as
> > >> follows.
> > >>
> > >> In the batch mode, the job is usually divided into multiple parallel
> > tasks
> > >> executed cross many nodes in the cluster. It is common to encounter
> the
> > >> performance degradation on some nodes due to hardware problems or
> > accident
> > >> I/O busy and high CPU load. This kind of degradation can probably
> cause
> > the
> > >> running tasks on the node to be quite slow that is so called long tail
> > >> tasks. Although the long tail tasks will not fail, they can severely
> > affect
> > >> the total job running time. Flink task scheduler does not take this
> long
> > >> tail problem into account currently.
> > >>
> > >>
> > >>
> > >> Here we propose the speculative execution strategy to handle the
> > problem.
> > >> The basic idea is to run a copy of task on another node when the
> > original
> > >> task is identified to be long tail. In more details, the speculative
> > task
> > >> will be triggered when the scheduler detects that the data processing
> > >> throughput of a task is much slower than others. The speculative task
> is
> > >> executed in parallel with the original one and share the same failure
> > retry
> > >> mechanism. Once either task complete, the scheduler admits its output
> as
> > >> the final result and cancel the other running one. The preliminary
> > >> experiments has demonstrated the effectiveness.
> > >>
> > >>
> > >> The detailed design doc will be ready soon.  Your reviews and comments
> > will
> > >> be much appreciated.
> > >>
> > >>
> > >> Thanks!
> > >>
> > >> Ryan
> > >>
> > >>
> >
> >
>

Re: [DISCUSS] Task speculative execution for Flink batch

Posted by Jeff Zhang <zj...@gmail.com>.

+1 for the speculative execution for Flink batch, Speculative execution is
used in lots of batch execution engine like mr, tez and spark. This would
be a great improvement for Flink in batch scenario.

Jin Sun <is...@gmail.com>于2018年11月7日周三 上午8:38写道：

> I think this is target for batch at the very beginning, the idea should be
> also work for both case, with different algorithm/strategy.
>
> Ryan, since you are working on this, I will assign FLINK-10644 <
> https://issues.apache.org/jira/browse/FLINK-10644> to you.
>
> Jin
>
> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <tr...@apache.org> wrote:
> >
> > Thanks for starting this discussion Ryan. I'm looking forward to your
> > design document about this feature. Quick question: Will it be a batch
> only
> > feature? If no, then it needs to take checkpointing into account as well.
> >
> > Cheers,
> > Till
> >
> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wangzhijiang999@aliyun.com
> .invalid>
> > wrote:
> >
> >> Thanks yangyu for launching this discussion.
> >>
> >> I really like this proposal. We ever found this scene frequently that
> some
> >> long tail tasks to delay the total batch job execution time in
> production.
> >> We also have some thoughts for bringing this mechanism. Looking forward
> to
> >> your detail design doc, then we can discussion further.
> >>
> >> Best,
> >> Zhijiang
> >> ------------------------------------------------------------------
> >> 发件人：Tao Yangyu <ry...@gmail.com>
> >> 发送时间：2018年11月6日(星期二) 11:01
> >> 收件人：dev <de...@flink.apache.org>
> >> 主 题：[DISCUSS] Task speculative execution for Flink batch
> >>
> >> Hi everyone,
> >>
> >> We propose task speculative execution for Flink batch in this message as
> >> follows.
> >>
> >> In the batch mode, the job is usually divided into multiple parallel
> tasks
> >> executed cross many nodes in the cluster. It is common to encounter the
> >> performance degradation on some nodes due to hardware problems or
> accident
> >> I/O busy and high CPU load. This kind of degradation can probably cause
> the
> >> running tasks on the node to be quite slow that is so called long tail
> >> tasks. Although the long tail tasks will not fail, they can severely
> affect
> >> the total job running time. Flink task scheduler does not take this long
> >> tail problem into account currently.
> >>
> >>
> >>
> >> Here we propose the speculative execution strategy to handle the
> problem.
> >> The basic idea is to run a copy of task on another node when the
> original
> >> task is identified to be long tail. In more details, the speculative
> task
> >> will be triggered when the scheduler detects that the data processing
> >> throughput of a task is much slower than others. The speculative task is
> >> executed in parallel with the original one and share the same failure
> retry
> >> mechanism. Once either task complete, the scheduler admits its output as
> >> the final result and cancel the other running one. The preliminary
> >> experiments has demonstrated the effectiveness.
> >>
> >>
> >> The detailed design doc will be ready soon.  Your reviews and comments
> will
> >> be much appreciated.
> >>
> >>
> >> Thanks!
> >>
> >> Ryan
> >>
> >>
>
>

Re: [DISCUSS] Task speculative execution for Flink batch

Posted by Tao Yangyu <ry...@gmail.com>.

Thanks Xiaowei for the inspiring comments!
Yes, we could increase the granularity of speculation from a single task to
a bundle of successive tasks especially for the pipelined channel.

Xiaowei Jiang <xi...@gmail.com> 于2018年11月18日周日 下午2:24写道：

> Thanks Yangyu for the nice design doc! One thing to consider is the
> granularity of speculation. Multiple task may propagate data through
> pipeline mode. In such case, fixing a single task may not be enough. But
> you might be able to fix this problem by increasing the granularity of
> speculation. The traditional case of a single speculative task can be
> considered as a special case of this.
>
> Xiaowei
>
> On Sat, Nov 17, 2018 at 10:27 PM Tao Yangyu <ry...@gmail.com> wrote:
>
> > Hi all，
> >
> > After refined, the detailed design doc is here:
> >
> >
> https://docs.google.com/document/d/1X_Pfo4WcO-TEZmmVTTYNn44LQg5gnFeeaeqM7ZNLQ7M/edit?usp=sharing
> >
> > Your kind reviews and comments are very appreciated and will help so much
> > the feature to be completed.
> >
> > Best,
> > Ryan
> >
> >
> > Tao Yangyu <ry...@gmail.com> 于2018年11月7日周三 下午4:49写道：
> >
> > > Thanks so much for your all feedbacks!
> > >
> > > Yes, as mentioned above by Jin Sun, the design currently targets batch
> to
> > > explore the general framework and basic modules. The strategy could be
> > also
> > > applied to stream with some extended code, for example, the result
> > > commitment.
> > >
> > > Jin Sun <is...@gmail.com> 于2018年11月7日周三 上午8:38写道：
> > >
> > >> I think this is target for batch at the very beginning, the idea
> should
> > >> be also work for both case, with different algorithm/strategy.
> > >>
> > >> Ryan, since you are working on this, I will assign FLINK-10644 <
> > >> https://issues.apache.org/jira/browse/FLINK-10644> to you.
> > >>
> > >> Jin
> > >>
> > >> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <tr...@apache.org>
> > wrote:
> > >> >
> > >> > Thanks for starting this discussion Ryan. I'm looking forward to
> your
> > >> > design document about this feature. Quick question: Will it be a
> batch
> > >> only
> > >> > feature? If no, then it needs to take checkpointing into account as
> > >> well.
> > >> >
> > >> > Cheers,
> > >> > Till
> > >> >
> > >> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wangzhijiang999@aliyun.com
> > >> .invalid>
> > >> > wrote:
> > >> >
> > >> >> Thanks yangyu for launching this discussion.
> > >> >>
> > >> >> I really like this proposal. We ever found this scene frequently
> that
> > >> some
> > >> >> long tail tasks to delay the total batch job execution time in
> > >> production.
> > >> >> We also have some thoughts for bringing this mechanism. Looking
> > >> forward to
> > >> >> your detail design doc, then we can discussion further.
> > >> >>
> > >> >> Best,
> > >> >> Zhijiang
> > >> >> ------------------------------------------------------------------
> > >> >> 发件人：Tao Yangyu <ry...@gmail.com>
> > >> >> 发送时间：2018年11月6日(星期二) 11:01
> > >> >> 收件人：dev <de...@flink.apache.org>
> > >> >> 主 题：[DISCUSS] Task speculative execution for Flink batch
> > >> >>
> > >> >> Hi everyone,
> > >> >>
> > >> >> We propose task speculative execution for Flink batch in this
> message
> > >> as
> > >> >> follows.
> > >> >>
> > >> >> In the batch mode, the job is usually divided into multiple
> parallel
> > >> tasks
> > >> >> executed cross many nodes in the cluster. It is common to encounter
> > the
> > >> >> performance degradation on some nodes due to hardware problems or
> > >> accident
> > >> >> I/O busy and high CPU load. This kind of degradation can probably
> > >> cause the
> > >> >> running tasks on the node to be quite slow that is so called long
> > tail
> > >> >> tasks. Although the long tail tasks will not fail, they can
> severely
> > >> affect
> > >> >> the total job running time. Flink task scheduler does not take this
> > >> long
> > >> >> tail problem into account currently.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Here we propose the speculative execution strategy to handle the
> > >> problem.
> > >> >> The basic idea is to run a copy of task on another node when the
> > >> original
> > >> >> task is identified to be long tail. In more details, the
> speculative
> > >> task
> > >> >> will be triggered when the scheduler detects that the data
> processing
> > >> >> throughput of a task is much slower than others. The speculative
> task
> > >> is
> > >> >> executed in parallel with the original one and share the same
> failure
> > >> retry
> > >> >> mechanism. Once either task complete, the scheduler admits its
> output
> > >> as
> > >> >> the final result and cancel the other running one. The preliminary
> > >> >> experiments has demonstrated the effectiveness.
> > >> >>
> > >> >>
> > >> >> The detailed design doc will be ready soon.  Your reviews and
> > comments
> > >> will
> > >> >> be much appreciated.
> > >> >>
> > >> >>
> > >> >> Thanks!
> > >> >>
> > >> >> Ryan
> > >> >>
> > >> >>
> > >>
> > >>
> >
>

Re: [DISCUSS] Task speculative execution for Flink batch

Posted by Xiaowei Jiang <xi...@gmail.com>.

Thanks Yangyu for the nice design doc! One thing to consider is the
granularity of speculation. Multiple task may propagate data through
pipeline mode. In such case, fixing a single task may not be enough. But
you might be able to fix this problem by increasing the granularity of
speculation. The traditional case of a single speculative task can be
considered as a special case of this.

Xiaowei

On Sat, Nov 17, 2018 at 10:27 PM Tao Yangyu <ry...@gmail.com> wrote:

> Hi all，
>
> After refined, the detailed design doc is here:
>
> https://docs.google.com/document/d/1X_Pfo4WcO-TEZmmVTTYNn44LQg5gnFeeaeqM7ZNLQ7M/edit?usp=sharing
>
> Your kind reviews and comments are very appreciated and will help so much
> the feature to be completed.
>
> Best,
> Ryan
>
>
> Tao Yangyu <ry...@gmail.com> 于2018年11月7日周三 下午4:49写道：
>
> > Thanks so much for your all feedbacks!
> >
> > Yes, as mentioned above by Jin Sun, the design currently targets batch to
> > explore the general framework and basic modules. The strategy could be
> also
> > applied to stream with some extended code, for example, the result
> > commitment.
> >
> > Jin Sun <is...@gmail.com> 于2018年11月7日周三 上午8:38写道：
> >
> >> I think this is target for batch at the very beginning, the idea should
> >> be also work for both case, with different algorithm/strategy.
> >>
> >> Ryan, since you are working on this, I will assign FLINK-10644 <
> >> https://issues.apache.org/jira/browse/FLINK-10644> to you.
> >>
> >> Jin
> >>
> >> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <tr...@apache.org>
> wrote:
> >> >
> >> > Thanks for starting this discussion Ryan. I'm looking forward to your
> >> > design document about this feature. Quick question: Will it be a batch
> >> only
> >> > feature? If no, then it needs to take checkpointing into account as
> >> well.
> >> >
> >> > Cheers,
> >> > Till
> >> >
> >> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wangzhijiang999@aliyun.com
> >> .invalid>
> >> > wrote:
> >> >
> >> >> Thanks yangyu for launching this discussion.
> >> >>
> >> >> I really like this proposal. We ever found this scene frequently that
> >> some
> >> >> long tail tasks to delay the total batch job execution time in
> >> production.
> >> >> We also have some thoughts for bringing this mechanism. Looking
> >> forward to
> >> >> your detail design doc, then we can discussion further.
> >> >>
> >> >> Best,
> >> >> Zhijiang
> >> >> ------------------------------------------------------------------
> >> >> 发件人：Tao Yangyu <ry...@gmail.com>
> >> >> 发送时间：2018年11月6日(星期二) 11:01
> >> >> 收件人：dev <de...@flink.apache.org>
> >> >> 主 题：[DISCUSS] Task speculative execution for Flink batch
> >> >>
> >> >> Hi everyone,
> >> >>
> >> >> We propose task speculative execution for Flink batch in this message
> >> as
> >> >> follows.
> >> >>
> >> >> In the batch mode, the job is usually divided into multiple parallel
> >> tasks
> >> >> executed cross many nodes in the cluster. It is common to encounter
> the
> >> >> performance degradation on some nodes due to hardware problems or
> >> accident
> >> >> I/O busy and high CPU load. This kind of degradation can probably
> >> cause the
> >> >> running tasks on the node to be quite slow that is so called long
> tail
> >> >> tasks. Although the long tail tasks will not fail, they can severely
> >> affect
> >> >> the total job running time. Flink task scheduler does not take this
> >> long
> >> >> tail problem into account currently.
> >> >>
> >> >>
> >> >>
> >> >> Here we propose the speculative execution strategy to handle the
> >> problem.
> >> >> The basic idea is to run a copy of task on another node when the
> >> original
> >> >> task is identified to be long tail. In more details, the speculative
> >> task
> >> >> will be triggered when the scheduler detects that the data processing
> >> >> throughput of a task is much slower than others. The speculative task
> >> is
> >> >> executed in parallel with the original one and share the same failure
> >> retry
> >> >> mechanism. Once either task complete, the scheduler admits its output
> >> as
> >> >> the final result and cancel the other running one. The preliminary
> >> >> experiments has demonstrated the effectiveness.
> >> >>
> >> >>
> >> >> The detailed design doc will be ready soon.  Your reviews and
> comments
> >> will
> >> >> be much appreciated.
> >> >>
> >> >>
> >> >> Thanks!
> >> >>
> >> >> Ryan
> >> >>
> >> >>
> >>
> >>
>

Re: [DISCUSS] Task speculative execution for Flink batch

Posted by Tao Yangyu <ry...@gmail.com>.

Hi all，

After refined, the detailed design doc is here:
https://docs.google.com/document/d/1X_Pfo4WcO-TEZmmVTTYNn44LQg5gnFeeaeqM7ZNLQ7M/edit?usp=sharing

Your kind reviews and comments are very appreciated and will help so much
the feature to be completed.

Best,
Ryan


Tao Yangyu <ry...@gmail.com> 于2018年11月7日周三 下午4:49写道：

> Thanks so much for your all feedbacks!
>
> Yes, as mentioned above by Jin Sun, the design currently targets batch to
> explore the general framework and basic modules. The strategy could be also
> applied to stream with some extended code, for example, the result
> commitment.
>
> Jin Sun <is...@gmail.com> 于2018年11月7日周三 上午8:38写道：
>
>> I think this is target for batch at the very beginning, the idea should
>> be also work for both case, with different algorithm/strategy.
>>
>> Ryan, since you are working on this, I will assign FLINK-10644 <
>> https://issues.apache.org/jira/browse/FLINK-10644> to you.
>>
>> Jin
>>
>> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <tr...@apache.org> wrote:
>> >
>> > Thanks for starting this discussion Ryan. I'm looking forward to your
>> > design document about this feature. Quick question: Will it be a batch
>> only
>> > feature? If no, then it needs to take checkpointing into account as
>> well.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wangzhijiang999@aliyun.com
>> .invalid>
>> > wrote:
>> >
>> >> Thanks yangyu for launching this discussion.
>> >>
>> >> I really like this proposal. We ever found this scene frequently that
>> some
>> >> long tail tasks to delay the total batch job execution time in
>> production.
>> >> We also have some thoughts for bringing this mechanism. Looking
>> forward to
>> >> your detail design doc, then we can discussion further.
>> >>
>> >> Best,
>> >> Zhijiang
>> >> ------------------------------------------------------------------
>> >> 发件人：Tao Yangyu <ry...@gmail.com>
>> >> 发送时间：2018年11月6日(星期二) 11:01
>> >> 收件人：dev <de...@flink.apache.org>
>> >> 主 题：[DISCUSS] Task speculative execution for Flink batch
>> >>
>> >> Hi everyone,
>> >>
>> >> We propose task speculative execution for Flink batch in this message
>> as
>> >> follows.
>> >>
>> >> In the batch mode, the job is usually divided into multiple parallel
>> tasks
>> >> executed cross many nodes in the cluster. It is common to encounter the
>> >> performance degradation on some nodes due to hardware problems or
>> accident
>> >> I/O busy and high CPU load. This kind of degradation can probably
>> cause the
>> >> running tasks on the node to be quite slow that is so called long tail
>> >> tasks. Although the long tail tasks will not fail, they can severely
>> affect
>> >> the total job running time. Flink task scheduler does not take this
>> long
>> >> tail problem into account currently.
>> >>
>> >>
>> >>
>> >> Here we propose the speculative execution strategy to handle the
>> problem.
>> >> The basic idea is to run a copy of task on another node when the
>> original
>> >> task is identified to be long tail. In more details, the speculative
>> task
>> >> will be triggered when the scheduler detects that the data processing
>> >> throughput of a task is much slower than others. The speculative task
>> is
>> >> executed in parallel with the original one and share the same failure
>> retry
>> >> mechanism. Once either task complete, the scheduler admits its output
>> as
>> >> the final result and cancel the other running one. The preliminary
>> >> experiments has demonstrated the effectiveness.
>> >>
>> >>
>> >> The detailed design doc will be ready soon.  Your reviews and comments
>> will
>> >> be much appreciated.
>> >>
>> >>
>> >> Thanks!
>> >>
>> >> Ryan
>> >>
>> >>
>>
>>

Re: [DISCUSS] Task speculative execution for Flink batch

Posted by Tao Yangyu <ry...@gmail.com>.

Thanks so much for your all feedbacks!

Yes, as mentioned above by Jin Sun, the design currently targets batch to
explore the general framework and basic modules. The strategy could be also
applied to stream with some extended code, for example, the result
commitment.

Jin Sun <is...@gmail.com> 于2018年11月7日周三 上午8:38写道：

> I think this is target for batch at the very beginning, the idea should be
> also work for both case, with different algorithm/strategy.
>
> Ryan, since you are working on this, I will assign FLINK-10644 <
> https://issues.apache.org/jira/browse/FLINK-10644> to you.
>
> Jin
>
> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <tr...@apache.org> wrote:
> >
> > Thanks for starting this discussion Ryan. I'm looking forward to your
> > design document about this feature. Quick question: Will it be a batch
> only
> > feature? If no, then it needs to take checkpointing into account as well.
> >
> > Cheers,
> > Till
> >
> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wangzhijiang999@aliyun.com
> .invalid>
> > wrote:
> >
> >> Thanks yangyu for launching this discussion.
> >>
> >> I really like this proposal. We ever found this scene frequently that
> some
> >> long tail tasks to delay the total batch job execution time in
> production.
> >> We also have some thoughts for bringing this mechanism. Looking forward
> to
> >> your detail design doc, then we can discussion further.
> >>
> >> Best,
> >> Zhijiang
> >> ------------------------------------------------------------------
> >> 发件人：Tao Yangyu <ry...@gmail.com>
> >> 发送时间：2018年11月6日(星期二) 11:01
> >> 收件人：dev <de...@flink.apache.org>
> >> 主 题：[DISCUSS] Task speculative execution for Flink batch
> >>
> >> Hi everyone,
> >>
> >> We propose task speculative execution for Flink batch in this message as
> >> follows.
> >>
> >> In the batch mode, the job is usually divided into multiple parallel
> tasks
> >> executed cross many nodes in the cluster. It is common to encounter the
> >> performance degradation on some nodes due to hardware problems or
> accident
> >> I/O busy and high CPU load. This kind of degradation can probably cause
> the
> >> running tasks on the node to be quite slow that is so called long tail
> >> tasks. Although the long tail tasks will not fail, they can severely
> affect
> >> the total job running time. Flink task scheduler does not take this long
> >> tail problem into account currently.
> >>
> >>
> >>
> >> Here we propose the speculative execution strategy to handle the
> problem.
> >> The basic idea is to run a copy of task on another node when the
> original
> >> task is identified to be long tail. In more details, the speculative
> task
> >> will be triggered when the scheduler detects that the data processing
> >> throughput of a task is much slower than others. The speculative task is
> >> executed in parallel with the original one and share the same failure
> retry
> >> mechanism. Once either task complete, the scheduler admits its output as
> >> the final result and cancel the other running one. The preliminary
> >> experiments has demonstrated the effectiveness.
> >>
> >>
> >> The detailed design doc will be ready soon.  Your reviews and comments
> will
> >> be much appreciated.
> >>
> >>
> >> Thanks!
> >>
> >> Ryan
> >>
> >>
>
>

Re: [DISCUSS] Task speculative execution for Flink batch

Posted by Jin Sun <is...@gmail.com>.

I think this is target for batch at the very beginning, the idea should be also work for both case, with different algorithm/strategy. 

Ryan, since you are working on this, I will assign FLINK-10644 <https://issues.apache.org/jira/browse/FLINK-10644> to you.

Jin

> On Nov 6, 2018, at 4:45 AM, Till Rohrmann <tr...@apache.org> wrote:
> 
> Thanks for starting this discussion Ryan. I'm looking forward to your
> design document about this feature. Quick question: Will it be a batch only
> feature? If no, then it needs to take checkpointing into account as well.
> 
> Cheers,
> Till
> 
> On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wa...@aliyun.com.invalid>
> wrote:
> 
>> Thanks yangyu for launching this discussion.
>> 
>> I really like this proposal. We ever found this scene frequently that some
>> long tail tasks to delay the total batch job execution time in production.
>> We also have some thoughts for bringing this mechanism. Looking forward to
>> your detail design doc, then we can discussion further.
>> 
>> Best,
>> Zhijiang
>> ------------------------------------------------------------------
>> 发件人：Tao Yangyu <ry...@gmail.com>
>> 发送时间：2018年11月6日(星期二) 11:01
>> 收件人：dev <de...@flink.apache.org>
>> 主 题：[DISCUSS] Task speculative execution for Flink batch
>> 
>> Hi everyone,
>> 
>> We propose task speculative execution for Flink batch in this message as
>> follows.
>> 
>> In the batch mode, the job is usually divided into multiple parallel tasks
>> executed cross many nodes in the cluster. It is common to encounter the
>> performance degradation on some nodes due to hardware problems or accident
>> I/O busy and high CPU load. This kind of degradation can probably cause the
>> running tasks on the node to be quite slow that is so called long tail
>> tasks. Although the long tail tasks will not fail, they can severely affect
>> the total job running time. Flink task scheduler does not take this long
>> tail problem into account currently.
>> 
>> 
>> 
>> Here we propose the speculative execution strategy to handle the problem.
>> The basic idea is to run a copy of task on another node when the original
>> task is identified to be long tail. In more details, the speculative task
>> will be triggered when the scheduler detects that the data processing
>> throughput of a task is much slower than others. The speculative task is
>> executed in parallel with the original one and share the same failure retry
>> mechanism. Once either task complete, the scheduler admits its output as
>> the final result and cancel the other running one. The preliminary
>> experiments has demonstrated the effectiveness.
>> 
>> 
>> The detailed design doc will be ready soon.  Your reviews and comments will
>> be much appreciated.
>> 
>> 
>> Thanks!
>> 
>> Ryan
>> 
>>

Re: [DISCUSS] Task speculative execution for Flink batch

Posted by Till Rohrmann <tr...@apache.org>.

Thanks for starting this discussion Ryan. I'm looking forward to your
design document about this feature. Quick question: Will it be a batch only
feature? If no, then it needs to take checkpointing into account as well.

Cheers,
Till

On Tue, Nov 6, 2018 at 4:29 AM zhijiang <wa...@aliyun.com.invalid>
wrote:

> Thanks yangyu for launching this discussion.
>
> I really like this proposal. We ever found this scene frequently that some
> long tail tasks to delay the total batch job execution time in production.
> We also have some thoughts for bringing this mechanism. Looking forward to
> your detail design doc, then we can discussion further.
>
> Best,
> Zhijiang
> ------------------------------------------------------------------
> 发件人：Tao Yangyu <ry...@gmail.com>
> 发送时间：2018年11月6日(星期二) 11:01
> 收件人：dev <de...@flink.apache.org>
> 主 题：[DISCUSS] Task speculative execution for Flink batch
>
> Hi everyone,
>
> We propose task speculative execution for Flink batch in this message as
> follows.
>
> In the batch mode, the job is usually divided into multiple parallel tasks
> executed cross many nodes in the cluster. It is common to encounter the
> performance degradation on some nodes due to hardware problems or accident
> I/O busy and high CPU load. This kind of degradation can probably cause the
> running tasks on the node to be quite slow that is so called long tail
> tasks. Although the long tail tasks will not fail, they can severely affect
> the total job running time. Flink task scheduler does not take this long
> tail problem into account currently.
>
>
>
> Here we propose the speculative execution strategy to handle the problem.
> The basic idea is to run a copy of task on another node when the original
> task is identified to be long tail. In more details, the speculative task
> will be triggered when the scheduler detects that the data processing
> throughput of a task is much slower than others. The speculative task is
> executed in parallel with the original one and share the same failure retry
> mechanism. Once either task complete, the scheduler admits its output as
> the final result and cancel the other running one. The preliminary
> experiments has demonstrated the effectiveness.
>
>
> The detailed design doc will be ready soon.  Your reviews and comments will
> be much appreciated.
>
>
> Thanks!
>
> Ryan
>
>