You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/06 02:57:40 UTC

[GitHub] [hudi] garyli1019 commented on issue #1786: [SUPPORT] Bulk insert slow on MOR

garyli1019 commented on issue #1786:
URL: https://github.com/apache/hudi/issues/1786#issuecomment-653989028


   Hi @rvd8345 , are you referring `shuffle parallelism` to `spark.shuffle.partition` or hudi parallelism.
   For bulk insert, the Hudi parallelism seems too large for 9.7 GB data. With this config, it will create a lot of small files.
   Also, the screenshot of stage 5 details would be helpful as well.
   Would you try to tune the following config:
   `hoodie.bulkinsert.shuffle.parallelism` to `100` and leave the file size limit to default?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org