You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/04 06:40:39 UTC

[GitHub] [spark] Resol1992 opened a new pull request, #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Resol1992 opened a new pull request, #37404:
URL: https://github.com/apache/spark/pull/37404

   ### What changes were proposed in this pull request?
   Fixed memory leak caused by not clearing the FileStatusCache created by calling FileStatusCache.getOrCreate when close a SparkSession of Spark Thrift Server.
   
   ### Why are the changes needed?
   When create many sessions in Spark Thrift Server and do querys in each session, the fileStatus of files related will be cached.
   But these filestatus will not be cleared when these sessions are closed, the memory leak causes Driver OOM.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Tested by the steps in SPARK-39866.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r944632762


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala:
##########
@@ -36,6 +37,8 @@ import org.apache.spark.util.SizeEstimator
 object FileStatusCache {
   private var sharedCache: SharedInMemoryCache = _

Review Comment:
   @cloud-fan In fact, if we set the value of `spark.sql.hive.filesourcePartitionFileCacheSize` about the same as or even larger than the value of `spark.driver.memory`, it may cause OOM.
   Even if OOM doesn't occur, we should still clean these `FileStatus` objects in the `Cache` when we close a sparkSession, because they are useless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r937436693


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -169,6 +179,7 @@ class SessionResourceLoader(session: SparkSession) extends FunctionResourceLoade
    *
    * Note: this method seems not access any session state, but a Hive based `SessionState` needs
    * to add the jar to its hive client for the current session. Hence, it still needs to be in
+   * to add the jar to its hive client for the current session. Hence, it still needs to be in

Review Comment:
   sorry. It is a mistake



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r943073186


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala:
##########
@@ -36,6 +37,8 @@ import org.apache.spark.util.SizeEstimator
 object FileStatusCache {
   private var sharedCache: SharedInMemoryCache = _
 
+  private[spark] val sessionToCache = new mutable.HashMap[String, FileStatusCache]

Review Comment:
   @srowen Thanks for reminding, I should remove the entry from the map when closing a session, I will fix it.
   And with this fix, I think the memory problem should be fine, because the total size of `sessionToCache` is consist of `SessionUUID` and `FileStatusCache`, the total size of `FileStatusCache` is controlled by the max size of `sharedCache`, and the remaining parts is equal to (the number of SparkSession * the size of every `SessionUUID`), this would not cost too much memory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1345705737

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r943073186


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala:
##########
@@ -36,6 +37,8 @@ import org.apache.spark.util.SizeEstimator
 object FileStatusCache {
   private var sharedCache: SharedInMemoryCache = _
 
+  private[spark] val sessionToCache = new mutable.HashMap[String, FileStatusCache]

Review Comment:
   @srowen Thanks for reminding, I should remove the entry from the map when closing a session, I will fix it.
   And with this fix, I think the memory problem should be fine, because the total size of `sessionToCache` is consist of 'SessionUUID' and `FileStatusCache`, the total size of `FileStatusCache` is controlled by the max size of `sharedCache`, and the remaining parts is equal to (the number of SparkSession * the size of every `SessionUUID`), this would not cost too much memory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1205376699

   > jenkins, test this please
   
   no jenkins now :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r944125275


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala:
##########
@@ -36,6 +37,8 @@ import org.apache.spark.util.SizeEstimator
 object FileStatusCache {
   private var sharedCache: SharedInMemoryCache = _

Review Comment:
   I thought this is a global LRU cache, how does it cause OOM?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Zhangshunyu commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Zhangshunyu commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1206127023

   This is a very good fix when using multiple connections to query the same table and then closing the connections, I don't think relying on TTL is enough.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1206011690

   > > jenkins, test this please
   > 
   > no jenkins now :)
   
   thanks for reminding :)  @Resol1992 maybe you need to check your github action settings?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
yaooqinn commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r947738299


##########
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala:
##########
@@ -88,6 +88,7 @@ private[hive] class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext:
     HiveThriftServer2.eventManager.onSessionClosed(sessionHandle.getSessionId.toString)
     val ctx = sparkSqlOperationManager.sessionToContexts.getOrDefault(sessionHandle, sqlContext)
     ctx.sparkSession.sessionState.catalog.getTempViewNames().foreach(ctx.uncacheTable)
+    ctx.sparkSession.sessionState.closeState()

Review Comment:
   this seems a bug to me for single session mode when all remote client sessions share the same instance of SparkSession/SessionState.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r949765561


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala:
##########
@@ -37,7 +37,7 @@ import org.apache.spark.util.SizeEstimator
 object FileStatusCache {
   private var sharedCache: SharedInMemoryCache = _
 
-  private[spark] val sessionToCache = new mutable.HashMap[String, FileStatusCache]
+  private[spark] val sessionToCache = new mutable.HashMap[String, Seq[FileStatusCache]]

Review Comment:
   better to use mutable.Buffer to avoid copy



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1206011894

   > > jenkins, test this please
   > 
   > no jenkins now :)
   
   @LuciferYang thanks for reminding :) 
   @Resol1992 maybe you need to check your github action settings?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r945726800


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala:
##########
@@ -49,7 +52,9 @@ object FileStatusCache {
           session.sqlContext.conf.metadataCacheTTL
         )
       }
-      sharedCache.createForNewClient()

Review Comment:
   reading the code more, I think this `FileStatusCache` is not per session, but per invocation. If we really want to clear all states for a session, we should keep a map of `Map[String, Seq[FileStatusCache]]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1206010912

   > Different sessions can share this cache. Maybe you can set `spark.sql.metadataCacheTTLSeconds` to a positive value to workaround this issue?
   
   @wangyum IIUC different sessions' file status cache is separated, but they share memory, so clean file status when closing session asap is necessary: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala#L157


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1204880911

   reset this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1204889723

   jenkins, test this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1209284641

   also cc @yaooqinn @cloud-fan , could you also take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r937415355


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -169,6 +179,7 @@ class SessionResourceLoader(session: SparkSession) extends FunctionResourceLoade
    *
    * Note: this method seems not access any session state, but a Hive based `SessionState` needs
    * to add the jar to its hive client for the current session. Hence, it still needs to be in
+   * to add the jar to its hive client for the current session. Hence, it still needs to be in

Review Comment:
   duplicate comment, plz revert



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r937438874


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -79,7 +80,8 @@ private[sql] class SessionState(
     createQueryExecution: (LogicalPlan, CommandExecutionMode.Value) => QueryExecution,
     createClone: (SparkSession, SessionState) => SessionState,
     val columnarRules: Seq[ColumnarRule],
-    val adaptiveRulesHolder: AdaptiveRulesHolder) {
+    val adaptiveRulesHolder: AdaptiveRulesHolder,
+    sessionStateBuilder: Option[BaseSessionStateBuilder] = None) {

Review Comment:
   yes. we just need `SessionUUID` here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r937419615


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -79,7 +80,8 @@ private[sql] class SessionState(
     createQueryExecution: (LogicalPlan, CommandExecutionMode.Value) => QueryExecution,
     createClone: (SparkSession, SessionState) => SessionState,
     val columnarRules: Seq[ColumnarRule],
-    val adaptiveRulesHolder: AdaptiveRulesHolder) {
+    val adaptiveRulesHolder: AdaptiveRulesHolder,
+    sessionStateBuilder: Option[BaseSessionStateBuilder] = None) {

Review Comment:
   can we directly pass `sessionUUID` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r937431425


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -123,6 +125,14 @@ private[sql] class SessionState(
       plan: LogicalPlan,
       mode: CommandExecutionMode.Value = CommandExecutionMode.ALL): QueryExecution =
     createQueryExecution(plan, mode)
+
+  def closeSession(): Unit = {

Review Comment:
   `closeState()` or `close()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r937435895


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -123,6 +125,14 @@ private[sql] class SessionState(
       plan: LogicalPlan,
       mode: CommandExecutionMode.Value = CommandExecutionMode.ALL): QueryExecution =
     createQueryExecution(plan, mode)
+
+  def closeSession(): Unit = {

Review Comment:
   Ok. `closeState` is better



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r937435895


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -123,6 +125,14 @@ private[sql] class SessionState(
       plan: LogicalPlan,
       mode: CommandExecutionMode.Value = CommandExecutionMode.ALL): QueryExecution =
     createQueryExecution(plan, mode)
+
+  def closeSession(): Unit = {

Review Comment:
   Ok. `closeState`  would be better



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1204880514

   jenkins, test this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1205631679

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1206257121

   
   
   
   
   > Different sessions can share this cache. Maybe you can set `spark.sql.metadataCacheTTLSeconds` to a positive value to workaround this issue?
   
   @wangyum Thanks for your advise. I have tried to workaroud this with setting `spark.sql.metadataCacheTTLSeconds = 10`, but it does not work, the fileStatus objects still exist after sparkSession is closed.
   In fact, 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1207133142

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1204838993

   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
srowen commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r942659571


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileStatusCache.scala:
##########
@@ -36,6 +37,8 @@ import org.apache.spark.util.SizeEstimator
 object FileStatusCache {
   private var sharedCache: SharedInMemoryCache = _
 
+  private[spark] val sessionToCache = new mutable.HashMap[String, FileStatusCache]

Review Comment:
   Can this itself grow large? I'm also wary of maps that track stuff over long periods as a potential memory problem



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wzhfy commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
wzhfy commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r937419615


##########
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala:
##########
@@ -79,7 +80,8 @@ private[sql] class SessionState(
     createQueryExecution: (LogicalPlan, CommandExecutionMode.Value) => QueryExecution,
     createClone: (SparkSession, SessionState) => SessionState,
     val columnarRules: Seq[ColumnarRule],
-    val adaptiveRulesHolder: AdaptiveRulesHolder) {
+    val adaptiveRulesHolder: AdaptiveRulesHolder,
+    sessionStateBuilder: Option[BaseSessionStateBuilder] = None) {

Review Comment:
   can we directly pass sessionId here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Zhangshunyu commented on pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Zhangshunyu commented on PR #37404:
URL: https://github.com/apache/spark/pull/37404#issuecomment-1206126564

   This is a very good fix when using multiple connections to query the same table and then closing the connections, I don't think relying on TTL is enough.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Resol1992 commented on a diff in pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
Resol1992 commented on code in PR #37404:
URL: https://github.com/apache/spark/pull/37404#discussion_r961515565


##########
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala:
##########
@@ -88,6 +88,7 @@ private[hive] class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext:
     HiveThriftServer2.eventManager.onSessionClosed(sessionHandle.getSessionId.toString)
     val ctx = sparkSqlOperationManager.sessionToContexts.getOrDefault(sessionHandle, sqlContext)
     ctx.sparkSession.sessionState.catalog.getTempViewNames().foreach(ctx.uncacheTable)
+    ctx.sparkSession.sessionState.closeState()

Review Comment:
   @yaooqinn Thanks for your review.
   when `SparkSQLSessionManager.closeSession()` is called, it indicates that the sparkSession should be closed, so we should also call close `SparkSession.sessionState`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer
URL: https://github.com/apache/spark/pull/37404


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org