You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/02/21 14:32:00 UTC
[jira] [Updated] (HUDI-3463) Make user-defined BulkInsertPartitioner fit write path API
[ https://issues.apache.org/jira/browse/HUDI-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-3463:
-----------------------------
Component/s: writer-core
> Make user-defined BulkInsertPartitioner fit write path API
> ----------------------------------------------------------
>
> Key: HUDI-3463
> URL: https://issues.apache.org/jira/browse/HUDI-3463
> Project: Apache Hudi
> Issue Type: Improvement
> Components: writer-core
> Reporter: Raymond Xu
> Priority: Critical
>
> this existing logic is problematic due to we can’t enforce user’s partitioner to return JavaRDD, this potentially breaks.
> {code:java}
> BulkInsertPartitioner partitioner = userDefinedBulkInsertPartitioner.isPresent()
> ? userDefinedBulkInsertPartitioner.get()
> : BulkInsertInternalPartitionerFactory.get(config.getBulkInsertSortMode());
> repartitionedRecords = (JavaRDD<HoodieRecord<T>>) partitioner.repartitionRecords(dedupedRecords, parallelism);
> {code}
> The factory is used only in spark for now. So, we expect JavaRDD or HoodieData. The API can be made explicit about the constraint.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)