You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/02/21 14:32:00 UTC
[jira] [Created] (HUDI-3463) Make user-defined BulkInsertPartitioner fit write path API
Raymond Xu created HUDI-3463:
--------------------------------
Summary: Make user-defined BulkInsertPartitioner fit write path API
Key: HUDI-3463
URL: https://issues.apache.org/jira/browse/HUDI-3463
Project: Apache Hudi
Issue Type: Improvement
Reporter: Raymond Xu
this existing logic is problematic due to we can’t enforce user’s partitioner to return JavaRDD, this potentially breaks.
{code:java}
BulkInsertPartitioner partitioner = userDefinedBulkInsertPartitioner.isPresent()
? userDefinedBulkInsertPartitioner.get()
: BulkInsertInternalPartitionerFactory.get(config.getBulkInsertSortMode());
repartitionedRecords = (JavaRDD<HoodieRecord<T>>) partitioner.repartitionRecords(dedupedRecords, parallelism);
{code}
The factory is used only in spark for now. So, we expect JavaRDD or HoodieData. The API can be made explicit about the constraint.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)