You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Matthias Broecheler <ma...@dataeng.ai> on 2021/08/06 20:59:21 UTC
StreamFileSink not closing file
Hey guys,
I wrote a simple DataStream that counts up some numbers into a SideOutput which
I am trying to sink into a StreamFileSink so that I can write the results to
disk and read them from there.
I'm running my little test locally and I can see that the data is being written
to hidden "inproress" files but those aren't closed when the job terminates. I
have enabled checkpointing, tried running in batch mode, and played around with
various rolling policy settings (rolloverinterval = 1) but none of it seems to
trigger flink close off the file at the end of the job.
Is there a way to trigger a checkpoint in Flink at the end of a job which would
trigger the file to be closed? I tried setting the checkpointing interval to 10
ms but that didn't work either.
I realize that this is a total newbie question but I couldn't find any answers
on StackOverflow or the archive.Thanks for your help,Matthias
Re: StreamFileSink not closing file
Posted by Matthias Broecheler <ma...@dataeng.ai>.
Thank you, Yun, for pointing me to the related issue. I'll keep an eye on
it.
All the best,
Matthias
On Wed, Aug 11, 2021 at 10:50 PM Yun Gao <yu...@aliyun.com> wrote:
> Hi Matthias,
>
> Sorry for the late reply, this should be a known issue that Flink would
> lost the last piece of data for bounded dataset with 2pc sink. However,
> we are expected to fix this issue in the upcoming 1.14 version [1].
>
> Best,
> Yun
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-2491
>
> ------------------Original Mail ------------------
> *Sender:*Matthias Broecheler <ma...@dataeng.ai>
> *Send Date:*Sat Aug 7 04:59:50 2021
> *Recipients:*Flink User Group <us...@flink.apache.org>
> *Subject:*StreamFileSink not closing file
>
>> Hey guys,
>>
>> I wrote a simple DataStream that counts up some numbers into a SideOutput
>> which I am trying to sink into a StreamFileSink so that I can write the
>> results to disk and read them from there.
>>
>> I'm running my little test locally and I can see that the data is being
>> written to hidden "inproress" files but those aren't closed when the job
>> terminates. I have enabled checkpointing, tried running in batch mode, and
>> played around with various rolling policy settings (rolloverinterval =
>> 1) but none of it seems to trigger flink close off the file at the end of
>> the job.
>>
>> Is there a way to trigger a checkpoint in Flink at the end of a job which
>> would trigger the file to be closed? I tried setting the checkpointing
>> interval to 10 ms but that didn't work either.
>>
>> I realize that this is a total newbie question but I couldn't find any
>> answers on StackOverflow or the archive.
>> Thanks for your help,
>> Matthias
>>
>
Re: StreamFileSink not closing file
Posted by Yun Gao <yu...@aliyun.com>.
Hi Matthias,
Sorry for the late reply, this should be a known issue that Flink would
lost the last piece of data for bounded dataset with 2pc sink. However,
we are expected to fix this issue in the upcoming 1.14 version [1].
Best,
Yun
[1] https://issues.apache.org/jira/browse/FLINK-2491
------------------Original Mail ------------------
Sender:Matthias Broecheler <ma...@dataeng.ai>
Send Date:Sat Aug 7 04:59:50 2021
Recipients:Flink User Group <us...@flink.apache.org>
Subject:StreamFileSink not closing file
Hey guys,
I wrote a simple DataStream that counts up some numbers into a SideOutput which I am trying to sink into a StreamFileSink so that I can write the results to disk and read them from there.
I'm running my little test locally and I can see that the data is being written to hidden "inproress" files but those aren't closed when the job terminates. I have enabled checkpointing, tried running in batch mode, and played around with various rolling policy settings (rolloverinterval = 1) but none of it seems to trigger flink close off the file at the end of the job.
Is there a way to trigger a checkpoint in Flink at the end of a job which would trigger the file to be closed? I tried setting the checkpointing interval to 10 ms but that didn't work either.
I realize that this is a total newbie question but I couldn't find any answers on StackOverflow or the archive.
Thanks for your help,
Matthias