You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org> on 2008/09/19 07:02:44 UTC

[jira] Created: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

New lines and leading spaces are not trimmed of a value when configuration is read
----------------------------------------------------------------------------------

                 Key: HADOOP-4212
                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
             Project: Hadoop Core
          Issue Type: Bug
          Components: conf
    Affects Versions: 0.18.1
         Environment: Generic
            Reporter: Sreekanth Ramakrishnan
            Assignee: Sreekanth Ramakrishnan
            Priority: Minor


While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Resolved: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by Rafael Turk <ra...@gmail.com>.
unsubscribe

[jira] Commented: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642404#action_12642404 ] 

Sreekanth Ramakrishnan commented on HADOOP-4212:
------------------------------------------------

The most of the use cases, except the filename's don't require a trailing whitespace for instance [HADOOP-4416|https://issues.apache.org/jira/browse/HADOOP-4416], So shouldn't we just treat file paths as special case and allow users to mention path which have trailing and leading spaces in special way? That way we can make sure that the user intends to use them instead of adding a space by accident?

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632606#action_12632606 ] 

Tom White commented on HADOOP-4212:
-----------------------------------

How about using XML's xml:space="preserve" instead (http://www.w3.org/TR/REC-xml/#sec-white-space)?

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633215#action_12633215 ] 

Sreekanth Ramakrishnan commented on HADOOP-4212:
------------------------------------------------

Only place it would and should be dependent on is Filesystem path's but does trailing space make sense with respect to them? Won't it be problematic when looking at windows file system?

With respect to xml:space attribute it would be use only if the parser which we use is a XML validating parser.  I dont see that we are using XML validating parser when parsing for the configuration. So should we now parse all configuration using validation turned on? 

Moreover, there is a similar issue reported here [HADOOP-2366|https://issues.apache.org/jira/browse/HADOOP-2366]

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633212#action_12633212 ] 

Steve Loughran commented on HADOOP-4212:
----------------------------------------

Are there any places where preserving leading/trailing whitespace is depended on? 

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632511#action_12632511 ] 

Sreekanth Ramakrishnan commented on HADOOP-4212:
------------------------------------------------

Create following property in configuration file :
{noformat}
  <property>
    <name>test.property.name1</name>
    <value>
       100
    </value>
  </property>
{noformat}


Try to programatically get the value  of property _test.property.name1_ it would not be equal to _"100"_ but would be equal to _"\n100\n"_

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634704#action_12634704 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4212:
------------------------------------------------

> Won't it be problematic when looking at windows file system? 

Configuration is not a part of file system API.  It should support generic usage.

When I worked on HADOOP-2461, I think that the property names should be trimmed but not the values.  Otherwise, it forbids the potential use of leading and trailing spaces.  If there is a need, the codes using the conf values should do the trimming.

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4212:
-------------------------------------------

    Attachment: HADOOP-4212-TESTCASE.patch

Attaching test case used for testing this bug

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642508#action_12642508 ] 

Chris Douglas commented on HADOOP-4212:
---------------------------------------

bq. User looks at the ./hadoop queue list and finds out there is a job queue A, and does not notice the extra space which is hidden in output in console and in web. And mentions in his jobconf to submit to job queue A, the system checks for queue information in job sees that there is no queue called A(without space i.e.) and submits the job to the default queue. Which is wrong.
bq. You might argue that as an implementer, I should do checking with trimming the space, but then again this can cause bugs to due accidental mis-configuration.

I don't follow. Configuration should call trim before returning the queue name, but if the callee(s) were to call trim on the same String, it "can cause bugs [due to] accidental misconfiguration?"

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4212:
-------------------------------------------

    Attachment: HADOOP-4212-1.patch

Attaching patch with updated test case.

- Introducing an attribute for the value called _preserve-whitespace_ . if the value of the attribute is set to true. Then value is read as is with leading and trailing whitespaces. Example :
{noformat}
   <property>
    <name>test.key.withoutwhitespace</name>
    <value>
      Test Value
    </value>
  </property>
  <property>
  <name>test.key.withwhitespace</name>
    <value preserve-whitespace="true">
      Test Value
    </value>
  </property>
{noformat}

The value of the _test.key.withoutwhitespace_ would be "Test Value" and _test.key.withwhitespace_ is "\nTest Value\n".


> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633665#action_12633665 ] 

Steve Loughran commented on HADOOP-4212:
----------------------------------------

-You can look for the xml:space attribute on any element and act on it; when working with XSD-schema'd docs I think xerces behaves differently when it hits it, but I forget these things.

-yes, it would cause windows to behave differently and not allow filenames with trailing spaces, or other strings. But I dont see that filenames with trailing spaces and carriage returns do actually make sense, even on windows. Spaces mid-path, maybe, but leading or trailing? Danger.

FWIW, I'm not using the XML format for our configurations; we use our own configuration format

http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/hadoop/src/org/smartfrog/services/hadoop/components/hadoopconfiguration.sf?view=markup

Looking at the current declarations, there's nowhere where white space is useful, and there are places (in comma separated lists), where it may already be harmful and need filtering. There may be some inconsistency between filenames (HADOOP-2366) and user group information, where spaces between words are allowed in hadoop.job.ugi. I would propose

-consistent filtering of spaces wherever lists are taken (strip leading, trailing), 
-trim leading, tailing whitespace

What may make sense is to allow quoted whitespace, so you could have a list of directories, those in quotes would be passed down as is:

<name>dfs.data.dir</name>
<value>/mnt/hstore2/hdfs , "/home/user2/temp hadoop dir"</value> 

This would resolve to a list with two entries ["/mnt/hstore2/hdfs","/home/user2/temp hadoop dir"]





> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE resolved HADOOP-4212.
--------------------------------------------

    Resolution: Duplicate

This duplicate HADOOP-2366.

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642433#action_12642433 ] 

Sreekanth Ramakrishnan commented on HADOOP-4212:
------------------------------------------------

I think we should have a generic way of reading values than to expect each implementer of new getXXX() method in Configuration or a class which subclasses configuration to remember he has to deal with a leading or a trailing space.

bq. In addition to resolving class names, it would be reasonable to trim values interpreted in getInt, getLong, etc., though it would still be an incompatible change
As per the suggestion all  that remains untouched in the Configuration getXXX methods are getLocalPath() and getFile(). I still feel that getString() method should be trimmed and be passed to the user. For I have one use case which I am mentioning here:


An HADOOP administrator configures a job queue: with name A and accidentially adds a space at the end.

User looks at the ./hadoop queue list and finds out there is a job queue A, and does not notice the extra space which is hidden in output in console and in web. And mentions in his jobconf to submit to job queue A, the system checks for queue information in job sees that there is no queue called A(without space i.e.) and submits the job to the default queue. Which is wrong.

You might argue that as an implementer, I should do checking with trimming the space, but then again this can cause bugs to due accidental mis-configuration. I would lean in towards trimming the space unless explicitly mentioned not to for getString method.

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4212) New lines and leading spaces are not trimmed of a value when configuration is read

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642416#action_12642416 ] 

Chris Douglas commented on HADOOP-4212:
---------------------------------------

bq. The most of the use cases, except the filename's don't require a trailing whitespace for instance HADOOP-4416, So shouldn't we just treat file paths as special case and allow users to mention path which have trailing and leading spaces in special way? That way we can make sure that the user intends to use them instead of adding a space by accident?

I disagree. Configuration should assume the semantics of the value Strings are known to the user defining the property. Where the user _instructs_ the Configuration to interpret the value, the semantics are explicit and sanitizing input is reasonable. In addition to resolving class names, it would be reasonable to trim values interpreted in getInt, getLong, etc., though it would still be an incompatible change (IIRC, the default value is returned for invalid input). It's certainly an incompatible change in the general case. I think we should close this issue as "Won't fix" and consider broadening the scope of HADOOP-4416.

> New lines and leading spaces are not trimmed of a value when configuration is read
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-4212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4212
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.1
>         Environment: Generic
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Minor
>         Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch
>
>
> While configuration value is read the leading and trailing spaces and new line characters are taken into account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.