You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/13 00:37:47 UTC

[GitHub] [hudi] yihua commented on issue #6919: [SUPPORT] recommended resource allocation

yihua commented on issue #6919:
URL: https://github.com/apache/hudi/issues/6919#issuecomment-1276883734

   @tommy810pp You may use multiple cores per executor for the Spark job.  In that case, you should ensure that each executor is allocated enough memory to avoid OOM.  For example, if you use 10 m5.4xlarge instances (16 cores per instance) in an EMR cluster, you can easily ingest hundreds of GB of data with the following setup:
   ```
   ./bin/spark-shell  \
        --master yarn \
        --deploy-mode client \
        --driver-memory 50g \
        --executor-memory 50g \
        --num-executors 10 \
        --executor-cores 16 \
        --jars /home/hadoop/hudi-spark3.2-bundle_2.12-0.13.0-SNAPSHOT.jar \
        --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
        --conf spark.kryoserializer.buffer=256m \
        --conf spark.kryoserializer.buffer.max=1024m \
        --conf spark.rdd.compress=true \
        --conf spark.memory.storageFraction=0.8 \
        --conf "spark.driver.defaultJavaOptions=-XX:+UseG1GC" \
        --conf "spark.executor.defaultJavaOptions=-XX:+UseG1GC" \
        --conf spark.ui.proxyBase="" \
        --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs:///var/log/spark/apps \
        --conf spark.sql.hive.convertMetastoreParquet=false \
        --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
        --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org