You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/30 15:09:35 UTC

[GitHub] [spark] linehrr edited a comment on issue #24461: [SPARK-27434][CORE] Fix mem leak

linehrr edited a comment on issue #24461: [SPARK-27434][CORE] Fix mem leak 
URL: https://github.com/apache/spark/pull/24461#issuecomment-487990769
 
 
   looked more into the hadoop side, `FileSystem.get(URI uri, Configuration conf)` has caching built in side, unless specifically disabled. 
   `return conf.getBoolean(disableCacheName, false) ? createFileSystem(uri, conf) : CACHE.get(uri, conf);`
   
   due to this, fileSystem object will be cached and put into a map using generated Key: 
   
              Key(URI uri, Configuration conf) throws IOException {
                   this(uri, conf, 0L);
               }
   
               Key(URI uri, Configuration conf, long unique) throws IOException {
                   this.scheme = uri.getScheme() == null ? "" : StringUtils.toLowerCase(uri.getScheme());
                   this.authority = uri.getAuthority() == null ? "" : StringUtils.toLowerCase(uri.getAuthority());
                   this.unique = unique;
                   this.ugi = UserGroupInformation.getCurrentUser();
               }
   
   therefore in theory if the baseLogDir is the same, and hadoop conf don't change, object will be reused between spark context, however for some unknown reason it did not and got created every time. 
   
   on the other hand, the `close()` method is safer than you thought. first, it's only going close one of those many cached fileSystems that are cached, not all, so it's likely you are only closing the one you created. also, closing fileSystem does not actually CLOSE that file system, according to the close method: 
   ```
           public void close() throws IOException {
               this.processDeleteOnExit();
               CACHE.remove(this.key, this);
           }
   ```
   
   it deletes the pending deletions and remove that object from the cache, that's it. 
   so if by any chance other threads are possessing this object, it will be fine and that thread can continue using it. 
   most important gain from closing this file system object is to get it de-referenced from cache, so GC can eventually reclaim it from the heap when no one else has reference to it no more. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org