You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/12/22 09:48:00 UTC

[jira] [Commented] (SPARK-41599) Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher

    [ https://issues.apache.org/jira/browse/SPARK-41599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651201#comment-17651201 ] 

Steve Loughran commented on SPARK-41599:
----------------------------------------

either the fs is being created by ((FileSystem.newInstance()}} and the code isn't calling close() after, or caching is disabled with "fs.$SCHEME.impl.disable.cache" set to true.

There's also {{HADOOP-17313. FileSystem.get to support slow-to-instantiate FS clients}} which handles many threads calling get() on slow to create clients...but that only surfaced as an issue in large worker process

> Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-41599
>                 URL: https://issues.apache.org/jira/browse/SPARK-41599
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy, YARN
>    Affects Versions: 3.1.2
>            Reporter: Maciej Smolenski
>            Priority: Major
>         Attachments: InProcLaunchFsIssue.scala
>
>
> When submitting spark application in kerberos environment the credentials of 'current user' (UserGroupInformation.getCurrentUser()) are being modified.
> Filesystem.CACHE entries contain 'current user' (with user credentials) as a key.
> Submitting many spark applications using InProcessLauncher cause that FileSystem.CACHE becomes bigger and bigger.
> Finally process exits because of OutOfMemory error.
> Code for reproduction attached.
>  
> Output from running 'jmap -histo' on reproduction jvm shows that the number of FileSystem$Cache$Key increases in time:
> time: #instances class
> 1671533274: 2 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533335: 11 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533395: 21 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533455: 30 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533515: 39 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533576: 48 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533636: 57 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533696: 66 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533757: 75 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533817: 84 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533877: 93 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533937: 102 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671533998: 111 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534058: 120 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534118: 135 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534178: 140 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534239: 150 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534299: 159 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534359: 168 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534419: 177 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534480: 186 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534540: 195 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534600: 204 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534661: 213 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534721: 222 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534781: 231 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534841: 240 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534902: 249 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671534962: 257 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535022: 264 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535083: 273 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535143: 282 org.apache.hadoop.fs.FileSystem$Cache$Key
> 1671535203: 291 org.apache.hadoop.fs.FileSystem$Cache$Key



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org