You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Prashant Wason (Jira)" <ji...@apache.org> on 2022/05/13 09:30:00 UTC

[jira] [Created] (HUDI-4094) Allow bulk insert partitioner to specify the fileID prefixes to use

Prashant Wason created HUDI-4094:
------------------------------------

             Summary: Allow bulk insert partitioner to specify the fileID prefixes to use
                 Key: HUDI-4094
                 URL: https://issues.apache.org/jira/browse/HUDI-4094
             Project: Apache Hudi
          Issue Type: New Feature
            Reporter: Prashant Wason
            Assignee: Prashant Wason


This is useful for using bulk insert when bootstrapping metadata table indexes.

Currently we use upsertPrepped to write to metadata table. The upsert code path is not optimized for very large writes (1Billion+ records) due to the work load profiling and upsert partitioning overheads. 

Bulk insert for metadata table requires the partitions to be written to files which have special names and hence random fileIDs cannot be used (as currently implemented).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)