You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/05/03 06:57:59 UTC

[GitHub] [hudi] MikeBuh commented on issue #5481: [SUPPORT] Slow Upsert When Reloading Data into Hudi Table

MikeBuh commented on issue #5481:
URL: https://github.com/apache/hudi/issues/5481#issuecomment-1115792891

   @yihua Thank-you for the reply on this however please note the following: 
   - real-time is working fine, I just included those to showcase what it is currently using for new data
   - batch reload is having the issues and failing with out of memory errors (error code 137) 
   - the Spark UI screenshots and detailed configs are related to this reload job
   - perhaps I was not clear, the resources used by the reload job are the default EMR ones specified in that config (re-adding below to have full clarity):
   
   **Spark Parameters**
   > spark.driver.cores: 5
   > spark.driver.memory: 24100m
   > spark.driver.memoryOverhead: 2680m
   > 
   > spark.executor.instances: 10
   > spark.executor.cores: 5
   > spark.executor.memory: 24100m
   > spark.executor.memoryOverhead: 2680m
   > spark.memory.storageFraction: 0.6
   > spark.memory.fraction: 0.7
   > spark.default.parallelism: 100
   > spark.sql.shuffle.partitions: 100
   > spark.kryoserializer.buffer.max: 128m
   > 
   > spark.driver.extraJavaOptions: -Xloggc:/var/log/spark-GClog.log -XX:+PrintGC -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
   > spark.executor.extraJavaOptions: -Xloggc:/var/log/spark-GClog.log -XX:+PrintGC -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
   
   **Hudi Parameters**
   > hoodie.index.type: BLOOM
   > hoodie.datasource.write.operation: UPSERT
   > hoodie.upsert.shuffle.parallelism: 100
   > hoodie.payload.ordering.field: hoodie.datasource.write.precombine.field
   > hoodie.datasource.write.payload.class: org.apache.hudi.common.model.DefaultHoodieRecordPayload
   
   Thanks once again for your prompt reply and I hope you can assist me with this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org