You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/09/08 02:44:00 UTC

[jira] [Closed] (HUDI-4615) Fix empty commits being made by deltastreamer with S3EventsSource when there is no data in SQS on starting a new pipeline

     [ https://issues.apache.org/jira/browse/HUDI-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan closed HUDI-4615.
-------------------------------------
    Resolution: Fixed

> Fix empty commits being made by deltastreamer with S3EventsSource when there is no data in SQS on starting a new pipeline
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-4615
>                 URL: https://issues.apache.org/jira/browse/HUDI-4615
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: deltastreamer
>            Reporter: sivabalan narayanan
>            Assignee: Vinish Reddy
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.12.1
>
>
> When we start a new deltastreamer with S3EventsSource, checkpoint is Option.empty(). After consumption from source, if there is no data, the source returns "val=0" as the checkpoint. So, deltastreamer assumes checkpoint has changed and makes an empty commit. This needs fixing. 
>  
> [https://github.com/apache/hudi/blob/0d0a4152cfd362185066519ae926ac4513c7a152/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/S3EventsMetaSelector.java#L151]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)