You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/10/22 20:45:00 UTC

[jira] [Created] (HUDI-5077) Supporting multiple deltastreamers writing to a single hudi table

sivabalan narayanan created HUDI-5077:
-----------------------------------------

             Summary: Supporting multiple deltastreamers writing to a single hudi table
                 Key: HUDI-5077
                 URL: https://issues.apache.org/jira/browse/HUDI-5077
             Project: Apache Hudi
          Issue Type: Improvement
          Components: deltastreamer
            Reporter: sivabalan narayanan


As of now, we can only have a single deltastreamer write to a single hudi table. we have an ask from the community to have 2 deltastreamers write to a single table. 

 

Things required to be fixed:
 # we need to fix the checkpointing to have multiple key-value pairs, where key represents a unique identifier for the deltastreamer client and value represents the checkpoint. We might need to introduce a new notion of identifier for each deltastreamer in this case.
 # within delta sync, after writeClient.upsert, before calling writeClient.commit, we need to update the checkpoint value. for this, we might need to take a lock and then fetch latest checkpoint from timeline (since there could be multiple wirters) and then update the checkpoint. and release the lock. 

 

These are the changes I can think of. may be while implementing it, there could be some more minor fixes required. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)