You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "vinoyang (Jira)" <ji...@apache.org> on 2021/02/17 07:26:00 UTC

[jira] [Closed] (HUDI-1598) Write as minor batches during one checkpoint interval for the new writer

     [ https://issues.apache.org/jira/browse/HUDI-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

vinoyang closed HUDI-1598.
--------------------------
      Assignee: Danny Chen
    Resolution: Done

Done via master branch: 5d2491d10c70e4e5fc9b7aeb62cc64bcaaf6043f

> Write as minor batches during one checkpoint interval for the new writer
> ------------------------------------------------------------------------
>
>                 Key: HUDI-1598
>                 URL: https://issues.apache.org/jira/browse/HUDI-1598
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: Flink Integration
>            Reporter: Danny Chen
>            Assignee: Danny Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.8.0
>
>
> Buffering data during one checkpoint when flush the buffer out all at a time is not resource friendly for streaming write. The more proper way it to cut the batches based on their real memory data buffer size (say, 128Mb), the writer always flushes the buffer out when its size reaches the configured threshold.
> Thus, after this change, one instant may span one (if every checkpoint succeeds) or more (if there are checkpoint failures) checkpoints. The instant only commits when there is a successful checkpoint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)