You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "nsivabalan (via GitHub)" <gi...@apache.org> on 2023/01/26 07:13:46 UTC

[GitHub] [hudi] nsivabalan commented on issue #7602: [SUPPORT] When does the Spark engine's bulk insert mode support bucket index

nsivabalan commented on issue #7602:
URL: https://github.com/apache/hudi/issues/7602#issuecomment-1404630992

   w/ bucket index, what perf issue you are seeing. From what I know, there may not any small file handling only even w/ "insert" as operation type if you are using bucket index. So, it should be pretty close to bulk_insert. I mean, even if we add bucket index support to bulk_insert, it will perform similar to how insert works as of today w/ bulk_insert. 
   
   Essentially, we take hash of record key and find the file group to insert. and this goes into merge handle where we merge incoming records w/ existing file group. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org