You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Pranoti Shanbhag (Jira)" <ji...@apache.org> on 2021/05/28 20:31:00 UTC

[jira] [Created] (HUDI-1947) Hudi Commit Callback and commit in a single transaction

Pranoti Shanbhag created HUDI-1947:
--------------------------------------

             Summary: Hudi Commit Callback and commit in a single transaction
                 Key: HUDI-1947
                 URL: https://issues.apache.org/jira/browse/HUDI-1947
             Project: Apache Hudi
          Issue Type: New Feature
            Reporter: Pranoti Shanbhag


Hello,

I am using Hudi Commit callbacks to call an internal service. As per my understanding, the service is called after the commit on the dataset and if there is a failure in the callback service we would not rollback the commit.

The service which we call saves the commit time in a database which is accessed by multiple pipelines to get the incremental delta. For example, when there are 4 commits in hudi dataset, we register 4 commit timestamps in the database. The pipelines that need the incremental delta, run at different frequencies and use this database to fetch new data after their respective runs. 

For this to work well, we need the hudi commit and call back to be atomic in a single transaction. Otherwise on callback failures, there may be data in the hudi dataset which may not be registered in the DB.

Please can you let me know if this can be supported and if there is a way to achieve this with the current implementation. We do have retries set up and are not expecting failures but we want to keep the hudi commits in sync with what we register in the DB.

 

Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)