You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Matthias Broecheler <ma...@dataeng.ai> on 2021/08/06 20:59:21 UTC

StreamFileSink not closing file

Hey guys,
I wrote a simple DataStream that counts up some numbers into a SideOutput which
I am trying to sink into a StreamFileSink so that I can write the results to
disk and read them from there.
I'm running my little test locally and I can see that the data is being written
to hidden "inproress" files but those aren't closed when the job terminates. I
have enabled checkpointing, tried running in batch mode, and played around with
various rolling policy settings (rolloverinterval  = 1) but none of it seems to
trigger flink close off the file at the end of the job.
Is there a way to trigger a checkpoint in Flink at the end of a job which would
trigger the file to be closed? I tried setting the checkpointing interval to 10
ms but that didn't work either.
I realize that this is a total newbie question but I couldn't find any answers
on StackOverflow or the archive.Thanks for your help,Matthias

Re: StreamFileSink not closing file

Posted by Matthias Broecheler <ma...@dataeng.ai>.
Thank you, Yun, for pointing me to the related issue. I'll keep an eye on
it.

All the best,
Matthias

On Wed, Aug 11, 2021 at 10:50 PM Yun Gao <yu...@aliyun.com> wrote:

> Hi Matthias,
>
> Sorry for the late reply, this should be a known issue that Flink would
> lost the last piece of data for bounded dataset with 2pc sink. However,
> we are expected to fix this issue in the upcoming 1.14 version [1].
>
> Best,
> Yun
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-2491
>
> ------------------Original Mail ------------------
> *Sender:*Matthias Broecheler <ma...@dataeng.ai>
> *Send Date:*Sat Aug 7 04:59:50 2021
> *Recipients:*Flink User Group <us...@flink.apache.org>
> *Subject:*StreamFileSink not closing file
>
>> Hey guys,
>>
>> I wrote a simple DataStream that counts up some numbers into a SideOutput
>> which I am trying to sink into a StreamFileSink so that I can write the
>> results to disk and read them from there.
>>
>> I'm running my little test locally and I can see that the data is being
>> written to hidden "inproress" files but those aren't closed when the job
>> terminates. I have enabled checkpointing, tried running in batch mode, and
>> played around with various rolling policy settings (rolloverinterval =
>> 1) but none of it seems to trigger flink close off the file at the end of
>> the job.
>>
>> Is there a way to trigger a checkpoint in Flink at the end of a job which
>> would trigger the file to be closed? I tried setting the checkpointing
>> interval to 10 ms but that didn't work either.
>>
>> I realize that this is a total newbie question but I couldn't find any
>> answers on StackOverflow or the archive.
>> Thanks for your help,
>> Matthias
>>
>

Re: StreamFileSink not closing file

Posted by Yun Gao <yu...@aliyun.com>.
Hi Matthias,

Sorry for the late reply, this should be a known issue that Flink would
lost the last piece of data for bounded dataset with 2pc sink. However,
we are expected to fix this issue in the upcoming 1.14 version [1].

Best,
Yun


[1] https://issues.apache.org/jira/browse/FLINK-2491


 ------------------Original Mail ------------------
Sender:Matthias Broecheler <ma...@dataeng.ai>
Send Date:Sat Aug 7 04:59:50 2021
Recipients:Flink User Group <us...@flink.apache.org>
Subject:StreamFileSink not closing file

Hey guys,

I wrote a simple DataStream that counts up some numbers into a SideOutput which I am trying to sink into a StreamFileSink so that I can write the results to disk and read them from there.

I'm running my little test locally and I can see that the data is being written to hidden "inproress" files but those aren't closed when the job terminates. I have enabled checkpointing, tried running in batch mode, and played around with various rolling policy settings (rolloverinterval = 1) but none of it seems to trigger flink close off the file at the end of the job.

Is there a way to trigger a checkpoint in Flink at the end of a job which would trigger the file to be closed? I tried setting the checkpointing interval to 10 ms but that didn't work either.

I realize that this is a total newbie question but I couldn't find any answers on StackOverflow or the archive.
Thanks for your help,
Matthias