You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2008/04/15 03:41:04 UTC

[jira] Created: (HADOOP-3257) Path should handle all characters

Path should handle all characters
---------------------------------

                 Key: HADOOP-3257
                 URL: https://issues.apache.org/jira/browse/HADOOP-3257
             Project: Hadoop Core
          Issue Type: Bug
          Components: fs
    Affects Versions: 0.17.0
            Reporter: Arun C Murthy


Currently Path is limited by URI semantics in the sense that one cannot create files whose names include characters such as ":" etc.

HADOOP-2066 & HADOOP-3256 are manifestations of this problem. It would be nice if Path handled all characters correctly...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3257) Path should handle all characters

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588878#action_12588878 ] 

Runping Qi commented on HADOOP-3257:
------------------------------------


I think Hadoop need to make a clear and deliberate decision as to which chars are allowed/disallowed as a part of dfs paths
and provide tool (sych as url encode/decode) to help deal with those disallowed chars.
Personally, I don't see why it is wrong to exclude ":" if doing so simplifies a lot of things.
I don't think it is worthwhile to invest a lot of efforts to support all chars in paths.


> Path should handle all characters
> ---------------------------------
>
>                 Key: HADOOP-3257
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3257
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.17.0
>            Reporter: Arun C Murthy
>
> Currently Path is limited by URI semantics in the sense that one cannot create files whose names include characters such as ":" etc.
> HADOOP-2066 & HADOOP-3256 are manifestations of this problem. It would be nice if Path handled all characters correctly...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3257) Path should handle all characters

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589154#action_12589154 ] 

Doug Cutting commented on HADOOP-3257:
--------------------------------------

> Currently Path is limited by URI semantics in the sense that one cannot create files whose names include characters such as ":" etc.

Path is a convenience class that wraps a URI.  URIs are the underlying mechanism Hadoop uses to name files.  Hadoop only supports a subset of possible URIs (hierarchical URIs, normalized to remove double-slashes and so that non-root paths don't end in a slash).  Path enforces this subset.  Path also handles some compatibility issues, mostly to make it easier to include Windows drive letters in "file:" URIs when running on Windows.

So a path is not limited by "URI semantics", it is implemented with URI syntax.  URIs permit escapes, so that one can include arbitrary unicode characters in a URI.  One *can* create URIs that include colons.  However our Windows-compatibility code may make it awkward to get colons through the Path wrapper into a URI and perhaps we can improve that.

> It would be nice if Path handled all characters correctly...

What does "correctly" mean?  I think we need more specific issues before we can have a real discussion.  Escaping here is tricky, since we have code that takes files from different filesystems that require different escapes and uses these to form paths.  I've commented on this previously:

https://issues.apache.org/jira/browse/HADOOP-2066?focusedCommentId=12558701#action_12558701

Two approaches are possible:
 - limit Paths to an interoperability subset, a common-denominator.  That's where we are today.
 - permit simpler and more automated escaping of certain characters.  That's a laudable goal.

I don't think we should simply say that Path must accept any string verbatim as a file name.  I think it is reasonable to permit syntax errors for clearly malformed paths.  It is also reasonable to permit colons in directory and file names.  If colons are unescaped in a relative path, then they can be confused for the URI scheme, and I think that interpretation trumps.

> Path should handle all characters
> ---------------------------------
>
>                 Key: HADOOP-3257
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3257
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.17.0
>            Reporter: Arun C Murthy
>
> Currently Path is limited by URI semantics in the sense that one cannot create files whose names include characters such as ":" etc.
> HADOOP-2066 & HADOOP-3256 are manifestations of this problem. It would be nice if Path handled all characters correctly...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.