You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Daryn Sharp (Commented) (JIRA)" <ji...@apache.org> on 2012/01/13 17:56:40 UTC

[jira] [Commented] (HADOOP-7973) FileSystem close has severe consequences

    [ https://issues.apache.org/jira/browse/HADOOP-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185669#comment-13185669 ] 

Daryn Sharp commented on HADOOP-7973:
-------------------------------------


We are seeing two specific failure cases with the same cause:
* A MR task that uses {{FsShell}}.  The shell opens a DFS, performs it's action, and the shell will close the DFS.  Now the MR input stream close to that same fileystem will fail.
* User map task code that opens the default filesystem and subsequently closes it.  MR input stream close will fail.

The problem is being seen with oozie jobs, but is not unique to oozie.  If the MR tasks opens the input/output streams with a DFS lacking a port number, then it gets a different instance of the filesystem than user code which gets the default filesystem via {{fs.default.name}} which does include the port number.  Effectively, the issue is hidden, and arguably it's a bug that getting a filesystem with and without the default port returns different filesystem instances.

There are 3 approaches that can be taken:
# {{FsShell#close}} will be a no-op
# Closing a read stream will not generate an exception if the {{DFSClient}} is closed.
# {{DistributedFileSystem#close}} becomes a no-op.  The finalizer will close the {{DFSClient}}.

#1 & #2 are simply workarounds for specific use-cases.  The problem can still happen if user code or libraries get a filesystem and close it.

#3 is a more comprehensive solution since a decision was made on an earlier jira to not add reference counting to cached filesystem objects.

I'll post a patch for #3.  Please provide comments if there are superior solutions.
                
> FileSystem close has severe consequences
> ----------------------------------------
>
>                 Key: HADOOP-7973
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7973
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 1.0.0
>            Reporter: Daryn Sharp
>            Priority: Blocker
>
> The way {{FileSystem#close}} works is very problematic.  Since the {{FileSystems}} are cached, any {{close}} by any caller will cause problems for every other reference to it.  Will add more detail in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira