You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/05/27 03:36:54 UTC

[GitHub] [hudi] wangxianghu commented on pull request #1665: [HUDI-910]Introduce HoodieWriteInput for hudi write client

wangxianghu commented on pull request #1665:
URL: https://github.com/apache/hudi/pull/1665#issuecomment-633776310


   > 
   > 
   > Is there an umbrella task to understand how all the follow up work will be.. on this..For e.g I am wondering what the eventual methods on `HoodieWriteInput` will be and how it will abstract away the RDD construct
   
   
   
   > 
   > 
   > Is there an umbrella task to understand how all the follow up work will be.. on this..For e.g I am wondering what the eventual methods on `HoodieWriteInput` will be and how it will abstract away the RDD construct
   
   yes, here it is : https://issues.apache.org/jira/browse/HUDI-909
   currently only four subtasks are filed, which is the very foundation of the entire abstraction:
   1. Introduce HoodieWriteInput for hudi write client: https://issues.apache.org/jira/browse/HUDI-910 
   2. Introduce HoodieWriteOutput for hudi write client: https://issues.apache.org/jira/browse/HUDI-911 
   3. Introduce HoodieWriteKey for hudi write client: https://issues.apache.org/jira/browse/HUDI-912 
   4. Introduce HoodieEngineContext for hudi write client: https://issues.apache.org/jira/browse/HUDI-913 
   
   For Spark these could be :
   `JavaRDD<HoodieRecord<T>> records = ... ; // read from souce
   HoodieWriteInput<JavaRDD<HoodieRecord<T>>> inputRecords = new HoodieWriteInput(records);
   JavaRDD<HoodieRecord<T>> inputRdds = inputRecords.getInputs();`
   
   `JavaSparkContext jsc = ...;`
   `HoodieEngineContext<JavaSparkContext> hec = new HoodieSparkEngineContext(jsc); //HoodieSparkEngineContext<JavaSparkContext> implements HoodieEngineContext`
   `JavaSparkContext jsc = hec.getContext();`
   
   The HoodieWriteKey and HoodieWriteOutput are the same as HoodieWriteInput.
   
   upsert api could be like this:
   
   `public HoodieWriteOutput<JavaRDD<WriteStatus>> upsert(HoodieWriteInput<JavaRDD<HoodieRecord<T>>> records, final String instantTime) {...}`
   
   The content of the method is almost the same as before.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org