You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Xin Ma <ke...@gmail.com> on 2022/06/30 15:05:51 UTC

StreamingFileSink & checkpoint tuning

Hi,

I recently encountered an issue while using StreamingFileSink.

I have a flink job consuming records from various sources and write to s3
with streaming file sink. But the job sometimes fails due to checkpoint
timeout, and the root cause is checkpoint alignment failure as there is
data skewness between different data sources.

I don't want to enable unaligned checkpointing but prefer to do some
checkpoint tuning first.

My current checkpoint interval is 1 min and timeout is also 1 min. I wanna
increase *tolerable checkpoint failure number* to 5, as I believe the
unaligned subtasks will definitely update their watermark in 5 minutes. My
question is, will streaming file sink still writes to s3 even if the
checkpoint fails or just wait until next successful checkpoint? (as if we
don't tolerate checkpoint failure, the job will simply restart from last
successful checkpoint)


Thanks.

Best,
Kevin

Re: StreamingFileSink & checkpoint tuning

Posted by Weihua Hu <hu...@gmail.com>.
Hi, Kevin

I have two minor tips that you can have a try.

1. check the severity of the skewed data and try to solve it at the logic,
or reduce the skew by keyby multiple times
2. increase the checkpoint timeout appropriately

Best,
Weihua


On Fri, Jul 1, 2022 at 9:29 AM yuxia <lu...@alumni.sjtu.edu.cn> wrote:

> Streaming file sink  will write to s3 when processing element. But it's
> just temporary file. Only after one  successful checkpoint (more exactly,
> once recieve a notification for successful checkpoint), will it commit
> these temporary files written since last successful checkpoint .
>
> Best regards,
> Yuxia
>
> ------------------------------
> *发件人: *"Xin Ma" <ke...@gmail.com>
> *收件人: *"User" <us...@flink.apache.org>
> *发送时间: *星期四, 2022年 6 月 30日 下午 11:05:51
> *主题: *StreamingFileSink & checkpoint tuning
>
> Hi,
>
> I recently encountered an issue while using StreamingFileSink.
> I have a flink job consuming records from various sources and write to s3
> with streaming file sink. But the job sometimes fails due to checkpoint
> timeout, and the root cause is checkpoint alignment failure as there is
> data skewness between different data sources.
>
> I don't want to enable unaligned checkpointing but prefer to do some
> checkpoint tuning first.
>
> My current checkpoint interval is 1 min and timeout is also 1 min. I wanna
> increase *tolerable checkpoint failure number* to 5, as I believe the
> unaligned subtasks will definitely update their watermark in 5 minutes. My
> question is, will streaming file sink still writes to s3 even if the
> checkpoint fails or just wait until next successful checkpoint? (as if we
> don't tolerate checkpoint failure, the job will simply restart from last
> successful checkpoint)
>
>
> Thanks.
>
> Best,
> Kevin
>
>

Re: StreamingFileSink & checkpoint tuning

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Streaming file sink will write to s3 when processing element. But it's just temporary file. Only after one successful checkpoint (more exactly, once recieve a notification for successful checkpoint), will it commit these temporary files written since last successful checkpoint . 

Best regards, 
Yuxia 


发件人: "Xin Ma" <ke...@gmail.com> 
收件人: "User" <us...@flink.apache.org> 
发送时间: 星期四, 2022年 6 月 30日 下午 11:05:51 
主题: StreamingFileSink & checkpoint tuning 

Hi, 

I recently encountered an issue while using StreamingFileSink. 
I have a flink job consuming records from various sources and write to s3 with streaming file sink. But the job sometimes fails due to checkpoint timeout, and the root cause is checkpoint alignment failure as there is data skewness between different data sources. 

I don't want to enable unaligned checkpointing but prefer to do some checkpoint tuning first. 

My current checkpoint interval is 1 min and timeout is also 1 min. I wanna increase tolerable checkpoint failure number to 5, as I believe the unaligned subtasks will definitely update their watermark in 5 minutes. My question is, will streaming file sink still writes to s3 even if the checkpoint fails or just wait until next successful checkpoint? (as if we don't tolerate checkpoint failure, the job will simply restart from last successful checkpoint) 


Thanks. 

Best, 
Kevin