You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/09/07 15:28:00 UTC
[jira] [Updated] (HUDI-4615) Fix empty commits being made by deltastreamer with S3EventsSource when there is no data in SQS on starting a new pipeline
[ https://issues.apache.org/jira/browse/HUDI-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-4615:
-----------------------------
Sprint: 2022/08/22, 2022/09/05 (was: 2022/08/22)
> Fix empty commits being made by deltastreamer with S3EventsSource when there is no data in SQS on starting a new pipeline
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: HUDI-4615
> URL: https://issues.apache.org/jira/browse/HUDI-4615
> Project: Apache Hudi
> Issue Type: Bug
> Components: deltastreamer
> Reporter: sivabalan narayanan
> Assignee: Vinish Reddy
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.12.1
>
>
> When we start a new deltastreamer with S3EventsSource, checkpoint is Option.empty(). After consumption from source, if there is no data, the source returns "val=0" as the checkpoint. So, deltastreamer assumes checkpoint has changed and makes an empty commit. This needs fixing.
>
> [https://github.com/apache/hudi/blob/0d0a4152cfd362185066519ae926ac4513c7a152/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/S3EventsMetaSelector.java#L151]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)