You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/10/22 20:46:00 UTC

[jira] [Updated] (HUDI-5077) Supporting multiple deltastreamers writing to a single hudi table

     [ https://issues.apache.org/jira/browse/HUDI-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-5077:
--------------------------------------
    Description: 
As of now, we can only have a single deltastreamer write to a single hudi table. we have an ask from the community to have 2 deltastreamers write to a single table. 

 

Things required to be fixed:
 # we need to fix the checkpointing to have multiple key-value pairs, where key represents a unique identifier for the deltastreamer client and value represents the checkpoint. We might need to introduce a new notion of identifier for each deltastreamer in this case.
 # within delta sync, after writeClient.upsert, before calling writeClient.commit, we need to update the checkpoint value. for this, we might need to take a lock and then fetch latest checkpoint from timeline (since there could be multiple wirters) and then update the checkpoint. and release the lock. 

 

These are the changes I can think of. may be while implementing it, there could be some more minor fixes required. 

 

ask from a user: https://github.com/apache/hudi/issues/6718

 

  was:
As of now, we can only have a single deltastreamer write to a single hudi table. we have an ask from the community to have 2 deltastreamers write to a single table. 

 

Things required to be fixed:
 # we need to fix the checkpointing to have multiple key-value pairs, where key represents a unique identifier for the deltastreamer client and value represents the checkpoint. We might need to introduce a new notion of identifier for each deltastreamer in this case.
 # within delta sync, after writeClient.upsert, before calling writeClient.commit, we need to update the checkpoint value. for this, we might need to take a lock and then fetch latest checkpoint from timeline (since there could be multiple wirters) and then update the checkpoint. and release the lock. 

 

These are the changes I can think of. may be while implementing it, there could be some more minor fixes required. 

 


> Supporting multiple deltastreamers writing to a single hudi table
> -----------------------------------------------------------------
>
>                 Key: HUDI-5077
>                 URL: https://issues.apache.org/jira/browse/HUDI-5077
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: deltastreamer
>            Reporter: sivabalan narayanan
>            Priority: Major
>
> As of now, we can only have a single deltastreamer write to a single hudi table. we have an ask from the community to have 2 deltastreamers write to a single table. 
>  
> Things required to be fixed:
>  # we need to fix the checkpointing to have multiple key-value pairs, where key represents a unique identifier for the deltastreamer client and value represents the checkpoint. We might need to introduce a new notion of identifier for each deltastreamer in this case.
>  # within delta sync, after writeClient.upsert, before calling writeClient.commit, we need to update the checkpoint value. for this, we might need to take a lock and then fetch latest checkpoint from timeline (since there could be multiple wirters) and then update the checkpoint. and release the lock. 
>  
> These are the changes I can think of. may be while implementing it, there could be some more minor fixes required. 
>  
> ask from a user: https://github.com/apache/hudi/issues/6718
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)