You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/02/21 14:32:00 UTC

[jira] [Created] (HUDI-3463) Make user-defined BulkInsertPartitioner fit write path API

Raymond Xu created HUDI-3463:
--------------------------------

             Summary: Make user-defined BulkInsertPartitioner fit write path API
                 Key: HUDI-3463
                 URL: https://issues.apache.org/jira/browse/HUDI-3463
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Raymond Xu


this existing logic is problematic due to we can’t enforce user’s partitioner to return JavaRDD, this potentially breaks. 


{code:java}
    BulkInsertPartitioner partitioner = userDefinedBulkInsertPartitioner.isPresent()
        ? userDefinedBulkInsertPartitioner.get()
        : BulkInsertInternalPartitionerFactory.get(config.getBulkInsertSortMode());
    repartitionedRecords = (JavaRDD<HoodieRecord<T>>) partitioner.repartitionRecords(dedupedRecords, parallelism);
{code}


The factory is used only in spark for now. So, we expect JavaRDD or HoodieData. The API can be made explicit about the constraint.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)