You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "boneanxs (via GitHub)" <gi...@apache.org> on 2023/04/03 02:39:59 UTC

[GitHub] [hudi] boneanxs commented on pull request #8076: [HUDI-5884] Support bulk_insert for insert_overwrite and insert_overwrite_table

boneanxs commented on PR #8076:
URL: https://github.com/apache/hudi/pull/8076#issuecomment-1493558572

   >  I am yet to review fully, but have taken one pass. Can you break it down into two PRs - a) don't delete the table location if using SaveMode.Overwrite for bulk_insert, insert_overwrite, b) add support for bulk_insert for insert_overwrite and insert_overwrite_table.
   
   Yea, sure, will do so
   
   > Also, I want to understand the use case when we need this. If you can elaborate a bit more on why we need this, that would be great.
   
   Currently, we want to migrate all existing hive tables to HUDI table, given many hive tables
      1) usually perform `insert_overwrite` operation to overwrite the partition 
      2) written by batch jobs, could contains TB level data one day 
      3) doesn't need to perform the `tag`, `drop duplicates`
   
   `bulk_insert` mode fit such scenario well, we can use `bulk_insert` mode to boost the write performance and make users easier to migrate existing hive table to hudi table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org