You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Kostas Kloudas (Jira)" <ji...@apache.org> on 2019/09/02 10:43:00 UTC

[jira] [Commented] (FLINK-13609) StreamingFileSink - reset part counter on bucket change

    [ https://issues.apache.org/jira/browse/FLINK-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920767#comment-16920767 ] 

Kostas Kloudas commented on FLINK-13609:
----------------------------------------

[~eskabetxe] for the proposal of using the max counter per bucket, then this would mean that we have to keep the counter-per-bucket in memory, even for inactive buckets. This could lead to the state that we need to keep blowing up. In addition, it would not solve the problem, as the max counter per bucket will differ from subtask to subtask.

 

For the proposal of having the creation timestamp instead of the counter, I think that this would just obfuscate the problem but not solve it, right? Still the guarantees would be the same when someone checks the directory. The user would see random numbers and the only guarantee would be that smaller counter/timestamp signals file that was created before another file with higher counter/timestamp.

 

 

If the explanation is sufficient, and we agree that there is no action to be taken, I would like to close this issue in order to keep a semi-clean JIRA. What do you think?

If nobody replies, I will close this on Thursday. 

> StreamingFileSink - reset part counter on bucket change
> -------------------------------------------------------
>
>                 Key: FLINK-13609
>                 URL: https://issues.apache.org/jira/browse/FLINK-13609
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>            Reporter: Joao Boto
>            Priority: Major
>
> When writing to files using StreamingFileSink on bucket change we expect that partcounter will reset its counter to 0
> as a example
>  * using DateTimeBucketAssigner using ({color:#6a8759}yyyy/MM/dd/HH{color}) 
>  * and ten files hour (for simplicity)
> this will create the:
>  * bucket 2019/08/07/00 with files partfile-0-0 to partfile-0-9
>  * bucket 2019/08/07/01 with files partfile-0-10 to partfile-0-19
>  * bucket 2019/08/07/02 with files partfile-0-20 to partfile-0-29
> and we expect this:
>  * bucket 2019/08/07/00 with files partfile-0-0 to partfile-0-9
>  * bucket 2019/08/07/01 with files partfile-0-0 to partfile-0-9
>  * bucket 2019/08/07/02 with files partfile-0-0 to partfile-0-9
>  
> [~kkl0u] i don't know if it's the expected behavior  (or this can be configured)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)