You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by yi...@apache.org on 2022/05/25 00:30:08 UTC

[hudi] branch asf-site updated: [MINOR][DOCS] Update spark.yarn.driver.memoryOverhead and spark.yarn.executor.memoryOverhead in the tuning-guide. (#5670)

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 9f2ea8563f [MINOR][DOCS] Update spark.yarn.driver.memoryOverhead and spark.yarn.executor.memoryOverhead in the tuning-guide. (#5670)
9f2ea8563f is described below

commit 9f2ea8563fe71ddcdfd9b10e40841948b5f0d586
Author: liuzhuang2017 <95...@users.noreply.github.com>
AuthorDate: Wed May 25 08:30:01 2022 +0800

    [MINOR][DOCS] Update spark.yarn.driver.memoryOverhead and spark.yarn.executor.memoryOverhead in the tuning-guide. (#5670)
---
 website/docs/tuning-guide.md                          | 6 +++---
 website/versioned_docs/version-0.10.1/tuning-guide.md | 6 +++---
 website/versioned_docs/version-0.11.0/tuning-guide.md | 6 +++---
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/website/docs/tuning-guide.md b/website/docs/tuning-guide.md
index 581778aa97..4affeafda6 100644
--- a/website/docs/tuning-guide.md
+++ b/website/docs/tuning-guide.md
@@ -13,7 +13,7 @@ Writing data via Hudi happens as a Spark job and thus general rules of spark deb
 
 **Input Parallelism** : By default, Hudi tends to over-partition input (i.e `withParallelism(1500)`), to ensure each Spark partition stays within the 2GB limit for inputs upto 500GB. Bump this up accordingly if you have larger inputs. We recommend having shuffle parallelism `hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast input_data_size/500MB
 
-**Off-heap memory** : Hudi writes parquet files and that needs good amount of off-heap memory proportional to schema width. Consider setting something like `spark.yarn.executor.memoryOverhead` or `spark.yarn.driver.memoryOverhead`, if you are running into such failures.
+**Off-heap memory** : Hudi writes parquet files and that needs good amount of off-heap memory proportional to schema width. Consider setting something like `spark.executor.memoryOverhead` or `spark.driver.memoryOverhead`, if you are running into such failures.
 
 **Spark Memory** : Typically, hudi needs to be able to read a single file into memory to perform merges or compactions and thus the executor memory should be sufficient to accomodate this. In addition, Hoodie caches the input to be able to intelligently place data and thus leaving some `spark.memory.storageFraction` will generally help boost performance.
 
@@ -51,7 +51,7 @@ spark.submit.deployMode cluster
 spark.task.cpus 1
 spark.task.maxFailures 4
  
-spark.yarn.driver.memoryOverhead 1024
-spark.yarn.executor.memoryOverhead 3072
+spark.driver.memoryOverhead 1024
+spark.executor.memoryOverhead 3072
 spark.yarn.max.executor.failures 100
 ```
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.10.1/tuning-guide.md b/website/versioned_docs/version-0.10.1/tuning-guide.md
index 581778aa97..4affeafda6 100644
--- a/website/versioned_docs/version-0.10.1/tuning-guide.md
+++ b/website/versioned_docs/version-0.10.1/tuning-guide.md
@@ -13,7 +13,7 @@ Writing data via Hudi happens as a Spark job and thus general rules of spark deb
 
 **Input Parallelism** : By default, Hudi tends to over-partition input (i.e `withParallelism(1500)`), to ensure each Spark partition stays within the 2GB limit for inputs upto 500GB. Bump this up accordingly if you have larger inputs. We recommend having shuffle parallelism `hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast input_data_size/500MB
 
-**Off-heap memory** : Hudi writes parquet files and that needs good amount of off-heap memory proportional to schema width. Consider setting something like `spark.yarn.executor.memoryOverhead` or `spark.yarn.driver.memoryOverhead`, if you are running into such failures.
+**Off-heap memory** : Hudi writes parquet files and that needs good amount of off-heap memory proportional to schema width. Consider setting something like `spark.executor.memoryOverhead` or `spark.driver.memoryOverhead`, if you are running into such failures.
 
 **Spark Memory** : Typically, hudi needs to be able to read a single file into memory to perform merges or compactions and thus the executor memory should be sufficient to accomodate this. In addition, Hoodie caches the input to be able to intelligently place data and thus leaving some `spark.memory.storageFraction` will generally help boost performance.
 
@@ -51,7 +51,7 @@ spark.submit.deployMode cluster
 spark.task.cpus 1
 spark.task.maxFailures 4
  
-spark.yarn.driver.memoryOverhead 1024
-spark.yarn.executor.memoryOverhead 3072
+spark.driver.memoryOverhead 1024
+spark.executor.memoryOverhead 3072
 spark.yarn.max.executor.failures 100
 ```
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.0/tuning-guide.md b/website/versioned_docs/version-0.11.0/tuning-guide.md
index 581778aa97..4affeafda6 100644
--- a/website/versioned_docs/version-0.11.0/tuning-guide.md
+++ b/website/versioned_docs/version-0.11.0/tuning-guide.md
@@ -13,7 +13,7 @@ Writing data via Hudi happens as a Spark job and thus general rules of spark deb
 
 **Input Parallelism** : By default, Hudi tends to over-partition input (i.e `withParallelism(1500)`), to ensure each Spark partition stays within the 2GB limit for inputs upto 500GB. Bump this up accordingly if you have larger inputs. We recommend having shuffle parallelism `hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast input_data_size/500MB
 
-**Off-heap memory** : Hudi writes parquet files and that needs good amount of off-heap memory proportional to schema width. Consider setting something like `spark.yarn.executor.memoryOverhead` or `spark.yarn.driver.memoryOverhead`, if you are running into such failures.
+**Off-heap memory** : Hudi writes parquet files and that needs good amount of off-heap memory proportional to schema width. Consider setting something like `spark.executor.memoryOverhead` or `spark.driver.memoryOverhead`, if you are running into such failures.
 
 **Spark Memory** : Typically, hudi needs to be able to read a single file into memory to perform merges or compactions and thus the executor memory should be sufficient to accomodate this. In addition, Hoodie caches the input to be able to intelligently place data and thus leaving some `spark.memory.storageFraction` will generally help boost performance.
 
@@ -51,7 +51,7 @@ spark.submit.deployMode cluster
 spark.task.cpus 1
 spark.task.maxFailures 4
  
-spark.yarn.driver.memoryOverhead 1024
-spark.yarn.executor.memoryOverhead 3072
+spark.driver.memoryOverhead 1024
+spark.executor.memoryOverhead 3072
 spark.yarn.max.executor.failures 100
 ```
\ No newline at end of file