You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Robert Chansler (JIRA)" <ji...@apache.org> on 2009/06/17 00:20:07 UTC

[jira] Created: (HADOOP-6059) Should HDFS restrict the names used for files?

Should HDFS restrict the names used for files?
----------------------------------------------

                 Key: HADOOP-6059
                 URL: https://issues.apache.org/jira/browse/HADOOP-6059
             Project: Hadoop Core
          Issue Type: New Feature
          Components: dfs
    Affects Versions: 0.20.0
            Reporter: Robert Chansler


When reviewing the consequences of Hadoop:6017 (the name system could not start because a file name interpreted as a regex caused a fault), the discussion turned to improving the test set for file system functions by broadening the set of names used for testing. Presently, HDFS allows any name without a slash. _Should the space of names be restricted?_ If most funny names are unintended, maybe the user would benefit from an early error indication. A contrary view is that restricting names is so 20th-century.
Should be or shouldn't we?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6059) Should HDFS restrict the names used for files?

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720397#action_12720397 ] 

Jakob Homan commented on HADOOP-6059:
-------------------------------------

If it's decided to make naming more restrictive, a conversion process would need to be created (during upgrade?) for newly verboten file names.  I've seen names that were inadvertently created (with things such as carriage returns embedded) while doing work on the offline image viewer.  They're out there and would need to be accounted for.

> Should HDFS restrict the names used for files?
> ----------------------------------------------
>
>                 Key: HADOOP-6059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6059
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Robert Chansler
>
> When reviewing the consequences of Hadoop:6017 (the name system could not start because a file name interpreted as a regex caused a fault), the discussion turned to improving the test set for file system functions by broadening the set of names used for testing. Presently, HDFS allows any name without a slash. _Should the space of names be restricted?_ If most funny names are unintended, maybe the user would benefit from an early error indication. A contrary view is that restricting names is so 20th-century.
> Should be or shouldn't we?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6059) Should HDFS restrict the names used for files?

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720616#action_12720616 ] 

Steve Loughran commented on HADOOP-6059:
----------------------------------------

+1 for restrictions. 

I'd go for names that allow lots of 

* all valid HDFS names are valid within XML files. That is, at a minimum, the only values <ASCII 32 are tab, cr, and lf. And I can think of some good reasons to stop that too. No < or > either.
* all valid HDFS names are valid within string database tables. 
* All valid names can be represnted with strings in JSON documents, possibly with some escaping
* the normal POSIX forbidden paths are still forbidden

I have no pressing need for XML, JSON or in-database representation, but I can imagine it being useful in the future. Valid XML can also be used inside HTML reports..you don't want to do XSS tricks by creating filenames with <script> in their name to try and catch out anyone browsing the directory tree


> Should HDFS restrict the names used for files?
> ----------------------------------------------
>
>                 Key: HADOOP-6059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6059
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Robert Chansler
>
> When reviewing the consequences of Hadoop:6017 (the name system could not start because a file name interpreted as a regex caused a fault), the discussion turned to improving the test set for file system functions by broadening the set of names used for testing. Presently, HDFS allows any name without a slash. _Should the space of names be restricted?_ If most funny names are unintended, maybe the user would benefit from an early error indication. A contrary view is that restricting names is so 20th-century.
> Should be or shouldn't we?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6059) Should HDFS restrict the names used for files?

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720390#action_12720390 ] 

Robert Chansler commented on HADOOP-6059:
-----------------------------------------

Of course, the more restricted the name space, the easier the QA function. And you never have to explain how globbing works with Farsi. The HDFS milieu is complicated by the deliberate conflation of file names and URIs. Maybe there are three broad options:

1. Any string can be  a name, except no '/', NUL, or "", "." or "..".
2. Any 8-bit character string can be a name, except no '/', NUL, or "", "." or "..". (loosely, like POSIX)
3. Any non-empty string from a specified list of printing characters. ([A-Z,a-z,0-9,_,-,:], for example)

> Should HDFS restrict the names used for files?
> ----------------------------------------------
>
>                 Key: HADOOP-6059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6059
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Robert Chansler
>
> When reviewing the consequences of Hadoop:6017 (the name system could not start because a file name interpreted as a regex caused a fault), the discussion turned to improving the test set for file system functions by broadening the set of names used for testing. Presently, HDFS allows any name without a slash. _Should the space of names be restricted?_ If most funny names are unintended, maybe the user would benefit from an early error indication. A contrary view is that restricting names is so 20th-century.
> Should be or shouldn't we?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.