You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Nishith Agarwal (Jira)" <ji...@apache.org> on 2021/05/18 02:22:00 UTC

[jira] [Commented] (HUDI-1910) Supporting Kafka based checkpointing for HoodieDeltaStreamer

    [ https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346537#comment-17346537 ] 

Nishith Agarwal commented on HUDI-1910:
---------------------------------------

Some background information on current checkpointing 

HoodieDeltaStreamer takes the new source checkpoint (eg. Kafka) using [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java] and adds it to the commit metadata here -> [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L461]

 

Possible approaches to implement this
 # Keep the commit metadata checkpoint as is, but also commit this checkpoint to Kafka. 
 # Use Kafka based checkpoint instead of commit metadata based

(2) Requires refactoring since there are no pluggable checkpoint mechanisms right now in hudi.

(1) Can be done fairly easily, by adding a callback post commit operation like this -> [https://github.com/apache/hudi/blob/fcedbfcb58109f6af0b0a1e69915b40a0fcd5ccb/hudi-utilities/src/main/java/org/apache/hudi/utilities/callback/kafka/HoodieWriteCommitKafkaCallback.java#L47]

and then enhancing the [https://github.com/apache/hudi/blob/fcedbfcb58109f6af0b0a1e69915b40a0fcd5ccb/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/callback/common/HoodieWriteCommitCallbackMessage.java#L25] to contain extra information just the commit metadata. 

 

Based on the requirements, either (1) or (2) can be implemented.

> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> ------------------------------------------------------------
>
>                 Key: HUDI-1910
>                 URL: https://issues.apache.org/jira/browse/HUDI-1910
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: DeltaStreamer
>            Reporter: Nishith Agarwal
>            Assignee: Nishith Agarwal
>            Priority: Major
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some users have requested support for Kafka based checkpoints for freshness auditing purposes. This ticket tracks any implementation for that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)