You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/01 16:02:33 UTC

[GitHub] [spark] tianshuang opened a new pull request, #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

tianshuang opened a new pull request, #36741:
URL: https://github.com/apache/spark/pull/36741

   ### What changes were proposed in this pull request?
   * Fixed memory leak caused `RawStore` cleanup not to take effect due to different `threadLocalMS` instances being manipulated
   * Fixed memory leak caused by `SparkExecuteStatementOperation` not calling `ThreadWithGarbageCleanup#cacheThreadLocalRawStore`.
   
   
   ### Why are the changes needed?
   We need to manipulate the same `threadLocalMS` instance for the `RawStore` cleanup mechanism to take effect, I have tried other fixes, manipulating the `threadLocalMS` instance in `HMSHandler`(loaded by IsolatedClassLoader$$anon$1) may be a suitable fix , I compiled Spark 2.4.4 by myself using this fix and verified it in the production environment, confirming that the memory leak is fixed.
   
   Reference: [SPARK-39357 pmCache memory leak caused by IsolatedClassLoader - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-39357)
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   I have added `SparkSQLEnv.sqlContext = sqlContext` to `SharedThriftServer#startThriftServer`, I don't know if this is the right way, in order to pass the unit test, please give suggestions, thank you.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

Posted by GitBox <gi...@apache.org>.
srowen commented on PR #36741:
URL: https://github.com/apache/spark/pull/36741#issuecomment-1170187617

   Can't we just use a SoftReferenceMap or something ? is the leak just holding onto entries in that Map?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #36741:
URL: https://github.com/apache/spark/pull/36741#issuecomment-1312603067

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianshuang commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

Posted by GitBox <gi...@apache.org>.
tianshuang commented on PR #36741:
URL: https://github.com/apache/spark/pull/36741#issuecomment-1170214998

   > Can't we just use a SoftReferenceMap or something ? is the leak just holding onto entries in that Map?
   
   In fact, the leak map([pmCache](https://github.com/datanucleus/datanucleus-api-jdo/blob/master/src/main/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L128)) caused by this problem belongs to [datanucleus-api-jdo](https://github.com/datanucleus/datanucleus-api-jdo), which is only a library that Spark depends on, and we should not modify its code because the problem is caused by the `IsolatedClassLoader` in Spark. The leak caused the memory to grow slowly and finally the STS SQL execution was very slow (continuous GC).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianshuang commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

Posted by GitBox <gi...@apache.org>.
tianshuang commented on PR #36741:
URL: https://github.com/apache/spark/pull/36741#issuecomment-1170651968

   @marmbrus , Can you give some advice? The `IsolatedClassLoader` introduced by [SPARK-6907](https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568) seven years ago is related to this bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianshuang commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

Posted by GitBox <gi...@apache.org>.
tianshuang commented on PR #36741:
URL: https://github.com/apache/spark/pull/36741#issuecomment-1312924982

   This problem should be solved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #36741:
URL: https://github.com/apache/spark/pull/36741#issuecomment-1145590268

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianshuang commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

Posted by GitBox <gi...@apache.org>.
tianshuang commented on PR #36741:
URL: https://github.com/apache/spark/pull/36741#issuecomment-1170246184

   As described at the beginning of this PR, I tried other fixes and none of them worked, and finally I came up with the current fix, which does seem hacky since I also didn't come up with a better way to clean it up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianshuang commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

Posted by GitBox <gi...@apache.org>.
tianshuang commented on PR #36741:
URL: https://github.com/apache/spark/pull/36741#issuecomment-1170231007

   > This seems super hacky. Is there any way to do better cleanup?
   
   Yes, the current fix ensures that we always manipulate `threadLocalMS` instances in `HMSHandler`(loaded by IsolatedClassLoader$$anon$1) to avoid memory leaks. I don't know if this is the optimal fix, but it does solve the problem of memory leaks. There are two `threadLocalMS` instances in the JVM complicates the problem. I haven't come up with a better fix, please let me know if you have new ideas.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tianshuang commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

Posted by GitBox <gi...@apache.org>.
tianshuang commented on PR #36741:
URL: https://github.com/apache/spark/pull/36741#issuecomment-1162520261

   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader
URL: https://github.com/apache/spark/pull/36741


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org