You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Yassine MARZOUGUI <y....@mindlytix.com> on 2017/04/28 09:22:50 UTC

Behaviour of the BucketingSink when checkpoints fail

Hi all,

I'm have a failed job containing a BucketingSink. The last successful
checkpoint was before the source started emitting data. The following
checkpoints all failed due to the long timeout as I mentioned here :
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html
.

The Taskmanager has then failed. Upon recovery, the pending fies did not
move to finished state.

Is that because the sink was not able to checkpoint to list of pending
files?
Is it possible to build the sink state just from the output folder and the
suffixes of the files?

Thanks,
Yassine

Re: Behaviour of the BucketingSink when checkpoints fail

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,
Yes, basically all the exactly-once/at-least-once guarantees are not given if checkpointing does not work correctly. For example, this will also be the case when reading from Kafka and writing to Kafka.

Best,
Aljoscha 
> On 28. Apr 2017, at 15:53, Yassine MARZOUGUI <y....@mindlytix.com> wrote:
> 
> Hi Aljoscha,
> 
> Thank you for your response. I guess then I will manually rename the pending files. Does this however mean that the BucketingSink is not exactly-once as it is described is the docs, since in this case (failure of the job and failure of checkpoints) there will be duplicates? Or am I missing something in the notion of exactly-once guarantees?
> 
> Best,
> Yassine
> 
> 2017-04-28 15:47 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org <ma...@apache.org>>:
> Hi,
> Yes, your analysis is correct. The pending files are not recognised as such because they were never in any checkpointed state that could be restored. I’m afraid it’s not possible to build the sink state just from the files existing in the output folder. The reason we have state in the first place is so that we can figure out what each of the files in the output folder are.
> 
> Maybe you could manually move the pending files that you know are correct to “final”?
> 
> Best,
> Aljoscha
> 
>> On 28. Apr 2017, at 11:22, Yassine MARZOUGUI <y.marzougui@mindlytix.com <ma...@mindlytix.com>> wrote:
>> 
>> Hi all,
>> 
>> I'm have a failed job containing a BucketingSink. The last successful checkpoint was before the source started emitting data. The following checkpoints all failed due to the long timeout as I mentioned here : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html>.
>> 
>> The Taskmanager has then failed. Upon recovery, the pending fies did not move to finished state. 
>> 
>> Is that because the sink was not able to checkpoint to list of pending files?
>> Is it possible to build the sink state just from the output folder and the suffixes of the files?
>> 
>> Thanks,
>> Yassine
> 
>

Re: Behaviour of the BucketingSink when checkpoints fail

Posted by Yassine MARZOUGUI <y....@mindlytix.com>.

Hi Aljoscha,

Thank you for your response. I guess then I will manually rename the
pending files. Does this however mean that the BucketingSink is not
exactly-once as it is described is the docs, since in this case (failure of
the job and failure of checkpoints) there will be duplicates? Or am I
missing something in the notion of exactly-once guarantees?

Best,
Yassine

2017-04-28 15:47 GMT+02:00 Aljoscha Krettek <al...@apache.org>:

> Hi,
> Yes, your analysis is correct. The pending files are not recognised as
> such because they were never in any checkpointed state that could be
> restored. I’m afraid it’s not possible to build the sink state just from
> the files existing in the output folder. The reason we have state in the
> first place is so that we can figure out what each of the files in the
> output folder are.
>
> Maybe you could manually move the pending files that you know are correct
> to “final”?
>
> Best,
> Aljoscha
>
> On 28. Apr 2017, at 11:22, Yassine MARZOUGUI <y....@mindlytix.com>
> wrote:
>
> Hi all,
>
> I'm have a failed job containing a BucketingSink. The last successful
> checkpoint was before the source started emitting data. The following
> checkpoints all failed due to the long timeout as I mentioned here :
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html.
>
> The Taskmanager has then failed. Upon recovery, the pending fies did not
> move to finished state.
>
> Is that because the sink was not able to checkpoint to list of pending
> files?
> Is it possible to build the sink state just from the output folder and the
> suffixes of the files?
>
> Thanks,
> Yassine
>
>
>

Re: Behaviour of the BucketingSink when checkpoints fail

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,
Yes, your analysis is correct. The pending files are not recognised as such because they were never in any checkpointed state that could be restored. I’m afraid it’s not possible to build the sink state just from the files existing in the output folder. The reason we have state in the first place is so that we can figure out what each of the files in the output folder are.

Maybe you could manually move the pending files that you know are correct to “final”?

Best,
Aljoscha

> On 28. Apr 2017, at 11:22, Yassine MARZOUGUI <y....@mindlytix.com> wrote:
> 
> Hi all,
> 
> I'm have a failed job containing a BucketingSink. The last successful checkpoint was before the source started emitting data. The following checkpoints all failed due to the long timeout as I mentioned here : http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html>.
> 
> The Taskmanager has then failed. Upon recovery, the pending fies did not move to finished state. 
> 
> Is that because the sink was not able to checkpoint to list of pending files?
> Is it possible to build the sink state just from the output folder and the suffixes of the files?
> 
> Thanks,
> Yassine