You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dimitris Batis (Jira)" <ji...@apache.org> on 2021/03/11 14:52:00 UTC
[jira] [Commented] (SPARK-34689) Spark Thrift Server: Memory leak for SparkSession objects

    [ https://issues.apache.org/jira/browse/SPARK-34689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299603#comment-17299603 ] 

Dimitris Batis commented on SPARK-34689:
----------------------------------------

After further examination, this seems to be related to SPARK-34087 . Based on the pull requests on that ticket, I added     ctx.sparkSession.listenerManager.clearListenerBus() to SparkSQLSessionManager#closeSession() as in the attached "git diff" file, and it seems that, in local tests, SparkSession objects are released properly. I am not sure if this is a complete solution or if there are any side-effects.

> Spark Thrift Server: Memory leak for SparkSession objects
> ---------------------------------------------------------
>
>                 Key: SPARK-34689
>                 URL: https://issues.apache.org/jira/browse/SPARK-34689
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 3.0.1, 3.1.1
>            Reporter: Dimitris Batis
>            Priority: Major
>         Attachments: heap_sparksession.png, heapdump_local_attempt_250_closed_connections.png, test_patch.diff
>
>
> When running the Spark Thrift Server (3.0.1, standalone cluster), we have noticed that each new JDBC connection creates a new SparkSession object. This object (and anything being referenced by it), however, remains in memory indefinitely even though the JDBC connection is closed, and full GCs do not remove it. After about 18 hours of heavy use, we get more than 46.000 such objects (heap_sparksession.png).
> In a small local installation test, I replicated the behavior by simply opening a JDBC connection, executing SHOW SCHEMAS and closing the connection (heapdump_local_attempt.png). For each connection, a new SparkSession object is created and never removed. I have noticed the same behavior in Spark 3.1.1 as well.
> Our settings are as follows. Please note that this was occuring even before we added the ExplicitGCInvokesConcurrent option (i.e. it happened even when a full GC was performed every 20 minutes). 
> spark-defaults.conf:
> {code}
> spark.master                    spark://...:7077,...:7077
> spark.master.rest.enabled       true
> spark.eventLog.enabled          false
> spark.eventLog.dir              file:///...
> spark.driver.cores             1
> spark.driver.maxResultSize     4g
> spark.driver.memory            5g
> spark.executor.memory          1g
> spark.executor.logs.rolling.maxRetainedFiles   2
> spark.executor.logs.rolling.strategy           size
> spark.executor.logs.rolling.maxSize            1G
> spark.local.dir ...
> spark.sql.ui.retainedExecutions=10
> spark.ui.retainedDeadExecutors=10
> spark.worker.ui.retainedExecutors=10
> spark.worker.ui.retainedDrivers=10
> spark.ui.retainedJobs=30
> spark.ui.retainedStages=100
> spark.ui.retainedTasks=500
> spark.appStateStore.asyncTracking.enable=false
> spark.sql.shuffle.partitions=200
> spark.default.parallelism=200
> spark.task.reaper.enabled=true
> spark.task.reaper.threadDump=false
> spark.memory.offHeap.enabled=true
> spark.memory.offHeap.size=4g
> {code}
> spark-env.sh:
> {code}
> HADOOP_CONF_DIR="/.../hadoop/etc/hadoop"
> SPARK_WORKER_CORES=28
> SPARK_WORKER_MEMORY=54g
> SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=172800 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=40 "
> SPARK_DAEMON_JAVA_OPTS="-Dlog4j.configuration=file:///.../log4j.properties -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.dir="..." -Dspark.deploy.zookeeper.url=...:2181,...:2181,...:2181 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=40"
> {code}
> start-thriftserver.sh:
> {code}
> export SPARK_DAEMON_MEMORY=16g
> exec "${SPARK_HOME}"/sbin/spark-daemon.sh submit $CLASS 1 \
>   --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
>   --conf "spark.ui.retainedJobs=30" \
>   --conf "spark.ui.retainedStages=100" \
>   --conf "spark.ui.retainedTasks=500" \
>   --conf "spark.sql.ui.retainedExecutions=10" \
>   --conf "spark.appStateStore.asyncTracking.enable=false" \
>   --conf "spark.cleaner.periodicGC.interval=20min" \
>   --conf "spark.sql.autoBroadcastJoinThreshold=-1" \
>   --conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseG1GC -XX:MaxGCPauseMillis=200" \
>   --conf "spark.driver.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/.../thrift_driver_gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=7 -XX:GCLogFileSize=35M -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=11990 -XX:+ExplicitGCInvokesConcurrent" \
>   --conf "spark.metrics.namespace=..." --name "..." --packages io.delta:delta-core_2.12:0.7.0 --hiveconf spark.ui.port=4038 --hiveconf spark.cores.max=22 --hiveconf spark.executor.cores=3 --hiveconf spark.executor.memory=6144M --hiveconf spark.scheduler.mode=FAIR --hiveconf spark.scheduler.allocation.file=.../conf/thrift-scheduler.xml \
>   --conf spark.sql.thriftServer.incrementalCollect=true "$@"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org