You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/31 01:35:45 UTC

[GitHub] [hudi] xushiyan commented on issue #3751: [SUPPORT] Slow Write Speeds to Hudi

xushiyan commented on issue #3751:
URL: https://github.com/apache/hudi/issues/3751#issuecomment-1025300721


   > num-executors 19
   > executor-cores 1
   > executor-memory 6g
   
   @MikeBuh this setting means you probably can have 30-40 parallelism to set for the spark and shuffle partitions and hudi parallelisms, given each core works with 1.5-2 concurrency. suggest increase executor cores to 3-5 to increase throughput, and tune other settings accordingly. You want to also align spark parallelism, shuffle partitions and hudi parallelisms (a few of them) as well.
   
   > hoodie.datasource.write.row.writer.enable: true
   
   This is only for bulk insert as of now.
   
   >  data seems to be skewed and thus not easy to partition using a field and ensuring even distribution
   
   usually you'd use salting to handle skewed data to improve this. the performance won't go far without handling skewness properly.
   
   Hope these would help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org