You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Sanjay Radia (JIRA)" <ji...@apache.org> on 2010/11/03 21:46:25 UTC

[jira] Created: (HADOOP-7018) FileContext's list operation should return local names in the status object rather then the full path.

FileContext's list operation should return local names in the status object rather then the full path.
------------------------------------------------------------------------------------------------------

                 Key: HADOOP-7018
                 URL: https://issues.apache.org/jira/browse/HADOOP-7018
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Sanjay Radia




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-7018) FileContext's list operation should return local names in the status object rather then the full path.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932155#action_12932155 ] 

Doug Cutting commented on HADOOP-7018:
--------------------------------------

To be clearer:
 - I understand that the relative path is useful and should be provided
 - The relative path could be provided via a new class returned by FileContext or could or a new method on the existing FileStatus.
 - In either case, folks would need to update their code to use the new functionality.
 
My question is: why should one approach be preferred over the other?

If we add a new class:
 - we'd remove FileContext methods that return FileStatus
 - add new methods that return FileStatusInfo
 - any user programs that update to FileContext will also need to update all their uses of FileStatus.  A program could not easily update partially, e.g., still calling a library that accepts FileStatus but not FileStatusInfo.

If we add a new method:
 - user programs could update incrementally
 - user programs that adopt FileContext would not be compelled to update, so programs may take longer to transition.

Does that sound right or have I mis-analyzed things?  Are there other pros and cons to the two approaches?

> FileContext's list operation should return local names in the status object rather then the full path.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7018
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-7018) FileContext's list operation should return local names in the status object rather then the full path.

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928002#action_12928002 ] 

Sanjay Radia commented on HADOOP-7018:
--------------------------------------

In the the design of FileContext  I made FileStatus to be compatible with the original FileStatus of FileSystem where the pathname is a full-pathname. 
This was done to allow easy transition from FileSystem to FileContext; however I believe this was a mistake. 
Having a full path is messy in systems that that have symbolic links and mount tables - the caller knows the path from which he access the file and can figure out the
path if he is really interested. In general, it is always better to provide  relative names where possible - they have fewer closure problems in a distributed system where there are
multiple independent contexts or roots.
Further we recently added optimization to not sent the full path across the wire for the list directory operations. 

Proposal is to create a new type, say, FileStatusInfo and have the several list operations return that type instead of FIleStatus.
This will provide compile time error and allow folks porting over to change their code. We can provide a helper functions
that can generate the full name for those apps that really want the full path or want to take the easy route in porting their code from FileSystem to FileContext.

> FileContext's list operation should return local names in the status object rather then the full path.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7018
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HADOOP-7018) FileContext's list operation should return local names in the status object rather then the full path.

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020347#comment-13020347 ] 

Hairong Kuang commented on HADOOP-7018:
---------------------------------------

One pros to having a new class is the performance gain. ListStatus/getFileInfo do not need to convert a local name to a full path until it is needed. String operations turn out to be quite expensive in java. But the pros of having a new class is that we need to do more code changes and requires more code changes to the users.

> FileContext's list operation should return local names in the status object rather then the full path.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7018
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7018) FileContext's list operation should return local names in the status object rather then the full path.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928239#action_12928239 ] 

Doug Cutting commented on HADOOP-7018:
--------------------------------------

Why not just add a new method to FileStatus that returns the relative path?  The method that returns the absolute path could either be deprecated or retained to identify a canonical path.

> FileContext's list operation should return local names in the status object rather then the full path.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7018
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-7018) FileContext's list operation should return local names in the status object rather then the full path.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932145#action_12932145 ] 

Doug Cutting commented on HADOOP-7018:
--------------------------------------

I am advocating using the existing FileStatus class, not defining a new one.  Add a new method rather than a new class.

> FileContext's list operation should return local names in the status object rather then the full path.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7018
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-7018) FileContext's list operation should return local names in the status object rather then the full path.

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932122#action_12932122 ] 

Sanjay Radia commented on HADOOP-7018:
--------------------------------------

Given that the the API is evolving and there are not real users of the API so far is there any benefit to retaining it?
Also were you advocating reusing the same FileStatus class or create a new one.

> FileContext's list operation should return local names in the status object rather then the full path.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7018
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.