You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Renjie Liu <li...@gmail.com> on 2017/02/15 08:28:01 UTC

Flink batch processing fault tolerance

Hi, all:
I'm learning flink's doc and curious about the fault tolerance of batch
process jobs. It seems that when one of task execution fails, the whole job
will be restarted, is it true? If so, isn't it impractical to deploy large
flink batch jobs?
-- 
Liu, Renjie
Software Engineer, MVAD

Re: Flink batch processing fault tolerance

Posted by Aljoscha Krettek <al...@apache.org>.

@Anton, these are the Ideas I was mentioning and I'm afraid I have nothing
more to add. (In the FLIP)

On Fri, 17 Feb 2017 at 06:26 wangzhijiang999 <wa...@aliyun.com>
wrote:

> yes, it is really a critical problem for large batch job because the
> unexpected failure is a common case.
> And we are already focusing on realizing the ideas mentioned in FLIP1,
> wish to contirbute to flink in months.
>
> Best,
>
> Zhijiang
>
> ------------------------------------------------------------------
> 发件人：Si-li Liu <un...@gmail.com>
> 发送时间：2017年2月17日(星期五) 11:22
> 收件人：user <us...@flink.apache.org>
> 主 题：Re: Flink batch processing fault tolerance
>
> Hi,
>
> It's the reason why I gave up use Flink for my current project and pick up
> traditional Hadoop Framework again.
>
> 2017-02-17 10:56 GMT+08:00 Renjie Liu <li...@gmail.com>:
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
> This FLIP may help.
>
> On Thu, Feb 16, 2017 at 7:34 PM Anton Solovev <An...@epam.com>
> wrote:
>
> Hi Aljoscha,
>
> Could you share your plans of resolving it?
>
>
>
> Best,
>
> Anton
>
>
>
>
>
> *From:* Aljoscha Krettek [mailto:aljoscha@apache.org]
> *Sent:* Thursday, February 16, 2017 2:48 PM
> *To:* user@flink.apache.org
> *Subject:* Re: Flink batch processing fault tolerance
>
>
>
> Hi,
>
> yes, this is indeed true. We had some plans for how to resolve this but
> they never materialised because of the focus on Stream Processing. We might
> unite the two in the future and then you will get fault-tolerant
> batch/stream processing in the same API.
>
>
>
> Best,
>
> Aljoscha
>
>
>
> On Wed, 15 Feb 2017 at 09:28 Renjie Liu <li...@gmail.com> wrote:
>
> Hi, all:
> I'm learning flink's doc and curious about the fault tolerance of batch
> process jobs. It seems that when one of task execution fails, the whole job
> will be restarted, is it true? If so, isn't it impractical to deploy large
> flink batch jobs?
>
> --
>
> Liu, Renjie
>
> Software Engineer, MVAD
> --
> Liu, Renjie
> Software Engineer, MVAD
>
> --
> Best regards
>
> Sili Liu
>
>

回复：Flink batch processing fault tolerance

Posted by wangzhijiang999 <wa...@aliyun.com>.

yes, it is really a critical problem for large batch job because the unexpected failure is a common case. And we are already focusing on realizing the ideas mentioned in FLIP1, wish to contirbute to flink in months.
Best,
Zhijiang------------------------------------------------------------------发件人：Si-li Liu <un...@gmail.com>发送时间：2017年2月17日(星期五) 11:22收件人：user <us...@flink.apache.org>主　题：Re: Flink batch processing fault tolerance
Hi, 
It's the reason why I gave up use Flink for my current project and pick up traditional Hadoop Framework again. 
2017-02-17 10:56 GMT+08:00 Renjie Liu <li...@gmail.com>:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
This FLIP may help.
On Thu, Feb 16, 2017 at 7:34 PM Anton Solovev <An...@epam.com> wrote:
Hi Aljoscha,
Could you share your plans of resolving it? Best,Anton  From: Aljoscha Krettek [mailto:aljoscha@apache.org]

Sent: Thursday, February 16, 2017 2:48 PM
To: user@flink.apache.org
Subject: Re: Flink batch processing fault tolerance Hi,yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing
 in the same API. Best,Aljoscha On Wed, 15 Feb 2017 at 09:28 Renjie Liu <li...@gmail.com> wrote:Hi, all:

I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs? -- Liu, RenjieSoftware Engineer, MVAD-- 
Liu, RenjieSoftware Engineer, MVAD


-- 
Best regards


Sili Liu

Re: Flink batch processing fault tolerance

Posted by Si-li Liu <un...@gmail.com>.

Hi,

It's the reason why I gave up use Flink for my current project and pick up
traditional Hadoop Framework again.

2017-02-17 10:56 GMT+08:00 Renjie Liu <li...@gmail.com>:

> https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 1+%3A+Fine+Grained+Recovery+from+Task+Failures
> This FLIP may help.
>
> On Thu, Feb 16, 2017 at 7:34 PM Anton Solovev <An...@epam.com>
> wrote:
>
>> Hi Aljoscha,
>>
>> Could you share your plans of resolving it?
>>
>>
>>
>> Best,
>>
>> Anton
>>
>>
>>
>>
>>
>> *From:* Aljoscha Krettek [mailto:aljoscha@apache.org]
>> *Sent:* Thursday, February 16, 2017 2:48 PM
>> *To:* user@flink.apache.org
>> *Subject:* Re: Flink batch processing fault tolerance
>>
>>
>>
>> Hi,
>>
>> yes, this is indeed true. We had some plans for how to resolve this but
>> they never materialised because of the focus on Stream Processing. We might
>> unite the two in the future and then you will get fault-tolerant
>> batch/stream processing in the same API.
>>
>>
>>
>> Best,
>>
>> Aljoscha
>>
>>
>>
>> On Wed, 15 Feb 2017 at 09:28 Renjie Liu <li...@gmail.com> wrote:
>>
>> Hi, all:
>> I'm learning flink's doc and curious about the fault tolerance of batch
>> process jobs. It seems that when one of task execution fails, the whole job
>> will be restarted, is it true? If so, isn't it impractical to deploy large
>> flink batch jobs?
>>
>> --
>>
>> Liu, Renjie
>>
>> Software Engineer, MVAD
>>
>> --
> Liu, Renjie
> Software Engineer, MVAD
>



-- 
Best regards

Sili Liu

Re: Flink batch processing fault tolerance

Posted by Renjie Liu <li...@gmail.com>.

https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
This FLIP may help.

On Thu, Feb 16, 2017 at 7:34 PM Anton Solovev <An...@epam.com>
wrote:

> Hi Aljoscha,
>
> Could you share your plans of resolving it?
>
>
>
> Best,
>
> Anton
>
>
>
>
>
> *From:* Aljoscha Krettek [mailto:aljoscha@apache.org]
> *Sent:* Thursday, February 16, 2017 2:48 PM
> *To:* user@flink.apache.org
> *Subject:* Re: Flink batch processing fault tolerance
>
>
>
> Hi,
>
> yes, this is indeed true. We had some plans for how to resolve this but
> they never materialised because of the focus on Stream Processing. We might
> unite the two in the future and then you will get fault-tolerant
> batch/stream processing in the same API.
>
>
>
> Best,
>
> Aljoscha
>
>
>
> On Wed, 15 Feb 2017 at 09:28 Renjie Liu <li...@gmail.com> wrote:
>
> Hi, all:
> I'm learning flink's doc and curious about the fault tolerance of batch
> process jobs. It seems that when one of task execution fails, the whole job
> will be restarted, is it true? If so, isn't it impractical to deploy large
> flink batch jobs?
>
> --
>
> Liu, Renjie
>
> Software Engineer, MVAD
>
> --
Liu, Renjie
Software Engineer, MVAD

RE: Flink batch processing fault tolerance

Posted by Anton Solovev <An...@epam.com>.

Hi Aljoscha,
Could you share your plans of resolving it?

Best,
Anton

From: Aljoscha Krettek [mailto:aljoscha@apache.org]
Sent: Thursday, February 16, 2017 2:48 PM
To: user@flink.apache.org
Subject: Re: Flink batch processing fault tolerance

Hi,
yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing in the same API.

Best,
Aljoscha

On Wed, 15 Feb 2017 at 09:28 Renjie Liu <li...@gmail.com>> wrote:
Hi, all:
I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs?
--
Liu, Renjie
Software Engineer, MVAD

Re: Flink batch processing fault tolerance

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,
yes, this is indeed true. We had some plans for how to resolve this but
they never materialised because of the focus on Stream Processing. We might
unite the two in the future and then you will get fault-tolerant
batch/stream processing in the same API.

Best,
Aljoscha

On Wed, 15 Feb 2017 at 09:28 Renjie Liu <li...@gmail.com> wrote:

> Hi, all:
> I'm learning flink's doc and curious about the fault tolerance of batch
> process jobs. It seems that when one of task execution fails, the whole job
> will be restarted, is it true? If so, isn't it impractical to deploy large
> flink batch jobs?
> --
> Liu, Renjie
> Software Engineer, MVAD
>