You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2024/01/15 21:50:34 UTC

(spark) branch master updated: [SPARK-46724][DOCS] Update tuning.md to use java 17 doc links

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 4207cbe67ead [SPARK-46724][DOCS] Update tuning.md to use java 17 doc links
4207cbe67ead is described below

commit 4207cbe67ead33f147aa79351ecb093fec8072f4
Author: Kent Yao <ya...@apache.org>
AuthorDate: Mon Jan 15 13:50:24 2024 -0800

    [SPARK-46724][DOCS] Update tuning.md to use java 17 doc links
    
    ### What changes were proposed in this pull request?
    
    SPARK-45315 drops JDK8/11 and makes 17 the default, which also makes G1 the default garbage collector; this PR updates tuning.md to use java 17 doc links and fixes some wordings about G1.
    
    ### Why are the changes needed?
    
    doc improvements
    ### Does this PR introduce _any_ user-facing change?
    
    no
    ### How was this patch tested?
    
    doc build
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    no
    
    Closes #44737 from yaooqinn/SPARK-46724.
    
    Authored-by: Kent Yao <ya...@apache.org>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 docs/tuning.md | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/docs/tuning.md b/docs/tuning.md
index 550ffb0f357b..94fe987175cf 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -41,12 +41,12 @@ Often, this will be the first thing you should tune to optimize a Spark applicat
 Spark aims to strike a balance between convenience (allowing you to work with any Java type
 in your operations) and performance. It provides two serialization libraries:
 
-* [Java serialization](https://docs.oracle.com/javase/8/docs/api/java/io/Serializable.html):
+* [Java serialization](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/io/Serializable.html):
   By default, Spark serializes objects using Java's `ObjectOutputStream` framework, and can work
   with any class you create that implements
-  [`java.io.Serializable`](https://docs.oracle.com/javase/8/docs/api/java/io/Serializable.html).
+  [`java.io.Serializable`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/io/Serializable.html).
   You can also control the performance of your serialization more closely by extending
-  [`java.io.Externalizable`](https://docs.oracle.com/javase/8/docs/api/java/io/Externalizable.html).
+  [`java.io.Externalizable`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/io/Externalizable.html).
   Java serialization is flexible but often quite slow, and leads to large
   serialized formats for many classes.
 * [Kryo serialization](https://github.com/EsotericSoftware/kryo): Spark can also use
@@ -204,7 +204,7 @@ their work directories), *not* on your driver program.
 
 To further tune garbage collection, we first need to understand some basic information about memory management in the JVM:
 
-* Java Heap space is divided in to two regions Young and Old. The Young generation is meant to hold short-lived objects
+* Java Heap space is divided into two regions Young and Old. The Young generation is meant to hold short-lived objects
   while the Old generation is intended for objects with longer lifetimes.
 
 * The Young generation is further divided into three regions \[Eden, Survivor1, Survivor2\].
@@ -232,10 +232,8 @@ temporary objects created during task execution. Some steps which may be useful
   value of the JVM's `NewRatio` parameter. Many JVMs default this to 2, meaning that the Old generation 
   occupies 2/3 of the heap. It should be large enough such that this fraction exceeds `spark.memory.fraction`.
   
-* Try the G1GC garbage collector with `-XX:+UseG1GC`. It can improve performance in some situations where
-  garbage collection is a bottleneck. Note that with large executor heap sizes, it may be important to
-  increase the [G1 region size](http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html) 
-  with `-XX:G1HeapRegionSize`.
+* Since 4.0.0, Spark uses JDK 17 by default, which also makes the G1GC garbage collector the default. Note that with
+  large executor heap sizes, it may be important to increase the G1 region size with `-XX:G1HeapRegionSize`.
 
 * As an example, if your task is reading data from HDFS, the amount of memory used by the task can be estimated using
   the size of the data block read from HDFS. Note that the size of a decompressed block is often 2 or 3 times the
@@ -245,7 +243,7 @@ temporary objects created during task execution. Some steps which may be useful
 * Monitor how the frequency and time taken by garbage collection changes with the new settings.
 
 Our experience suggests that the effect of GC tuning depends on your application and the amount of memory available.
-There are [many more tuning options](https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/index.html) described online,
+There are [many more tuning options](https://docs.oracle.com/en/java/javase/17/gctuning/introduction-garbage-collection-tuning.html) described online,
 but at a high level, managing how frequently full GC takes place can help in reducing the overhead.
 
 GC tuning flags for executors can be specified by setting `spark.executor.defaultJavaOptions` or `spark.executor.extraJavaOptions` in


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org