You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "David Rosenstrauch (JIRA)" <ji...@apache.org> on 2011/02/01 17:38:29 UTC

[jira] Created: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Enhance MultipleOutputs to allow additional characters in the named output name
-------------------------------------------------------------------------------

                 Key: MAPREDUCE-2293
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
    Affects Versions: 0.21.0
            Reporter: David Rosenstrauch
            Priority: Minor


It Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)

The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)

Would be extremely helpful/useful to have this fixed though!

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2293:
-------------------------------

    Attachment: mapreduce.mo.removecheck.r3.diff

Alejandro,

Incorporated your requested conditions in the old API MO. New API has no concept of single/multi so the restriction is not required there.

Updated patch for trunk.

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2293:
-------------------------------

    Status: Open  (was: Patch Available)

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274019#comment-13274019 ] 

Harsh J commented on MAPREDUCE-2293:
------------------------------------

The javac warnings are cause of deprecation markers thats happened on trunk since the last time I did this patch:

{code}

1438c1438
< [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java:[444,14] [deprecation] Job(org.apache.hadoop.conf.Configuration) in org.apache.hadoop.mapreduce.Job has been deprecated
---
> [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java:[440,14] [deprecation] Job(org.apache.hadoop.conf.Configuration) in org.apache.hadoop.mapreduce.Job has been deprecated
1658c1658,1659
< [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestMultipleOutputs.java:[254,6] [deprecation] Reader(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,org.apache.hadoop.conf.Configuration) in org.apache.hadoop.io.SequenceFile.Reader has been deprecated
---
> [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestMultipleOutputs.java:[307,6] [deprecation] Reader(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,org.apache.hadoop.conf.Configuration) in org.apache.hadoop.io.SequenceFile.Reader has been deprecated
> [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestMultipleOutputs.java:[325,6] [deprecation] Reader(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,org.apache.hadoop.conf.Configuration) in org.apache.hadoop.io.SequenceFile.Reader has been deprecated
1661c1662,1663
< [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestMRMultipleOutputs.java:[215,6] [deprecation] Reader(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,org.apache.hadoop.conf.Configuration) in org.apache.hadoop.io.SequenceFile.Reader has been deprecated
---
> [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestMRMultipleOutputs.java:[255,6] [deprecation] Reader(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,org.apache.hadoop.conf.Configuration) in org.apache.hadoop.io.SequenceFile.Reader has been deprecated
> [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestMRMultipleOutputs.java:[273,6] [deprecation] Reader(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,org.apache.hadoop.conf.Configuration) in org.apache.hadoop.io.SequenceFile.Reader has been deprecated
{code}

I'll fix these shortly.
                
> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff, mapreduce.mo.removecheck.r4.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989931#comment-12989931 ] 

Allen Wittenauer commented on MAPREDUCE-2293:
---------------------------------------------

If InputFormats can't take a file with commas or spaces, then that's a bug with InputFormat.  Even so: that's not a reason to restrict the character set so gratuitously.  Just because a file is generated as output doesn't mean it is going to be used for input for some other MR phase. 

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J Chouraria
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055166#comment-13055166 ] 

Hadoop QA commented on MAPREDUCE-2293:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483858/mapreduce.mo.removecheck.r2.diff
  against trunk revision 1139400.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed these core unit tests:
                  org.apache.hadoop.cli.TestMRCLI
                  org.apache.hadoop.fs.TestFileSystem

    -1 contrib tests.  The patch failed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/421//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/421//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/421//console

This message is automatically generated.

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2293:
-------------------------------

    Attachment: mapreduce.mo.removecheck.r5.diff

New patch that fixes deprecated constructors used for SequenceFile.Reader in the tests previously.

Any concern on the core changes though? This has been open for almost two years now, and still applies :)
                
> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff, mapreduce.mo.removecheck.r4.diff, mapreduce.mo.removecheck.r5.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989825#comment-12989825 ] 

Alejandro Abdelnur commented on MAPREDUCE-2293:
-----------------------------------------------

David,

The current implementation uses '_' as separator for multi-named outputs, FIXNAME_VARNAME, this means that the use of '_' as part of the FIXNAME or VARNAME would break things if you allow '_'.

Allen,

Regarding allowing other characters, even if HDFS supports UTF, I'm not sure what will happen if you have SPACEs and COMMAs in the names of a file and you use that file as an input for a MR job. Will the InputFormat class take a file a 2 different PATHs?



Said this, allowing a reduced set of symbols would be possible.


> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J Chouraria
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J Chouraria (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J Chouraria updated MAPREDUCE-2293:
-----------------------------------------

    Attachment: mapreduce.mo.removecheck.r1.diff

For extensions to be applied, I think one needs to make a change in the OutputFormat class chosen. MultipleOutputs can only handle the part before the '-XXXXX' (partition) numbering in Map/Reduce outputs, not after it.

I've posted a patch that removes the check (from both old and new API). I can't post the result of an {{ant test-patch}} since that isn't working for me right now (Mumak build is failing for some reason, in MR trunk). I'll post that when I get it working.

This should be marked as an Incompatible Change in my opinion, as it is a removal of a strong validation. People may also be relying on the MultiName_OutputName-Partition syntax via string splits, etc. in the Stable API MO class.

Also, I'm curious to see if allowing any character to go in is a good idea 'Path' wise. Does HDFS have any restrictions on Filenames? I've not seen a documentation on it (although I think it is pretty POSIX compliant), but HDFS-13 points out that there may be some trouble, any thoughts on that?

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J Chouraria
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2293:
-------------------------------

    Attachment: mapreduce.mo.removecheck.r4.diff

Rebased patch to current trunk. Still holds good.

Any further comments on patch anyone? This has fairly good user demand even today, given that MultipleOutputs lacks extensibility due to various static methods.
                
> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.24.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff, mapreduce.mo.removecheck.r4.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2293:
-------------------------------

    Status: Patch Available  (was: Open)

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989437#comment-12989437 ] 

Todd Lipcon commented on MAPREDUCE-2293:
----------------------------------------

I asked Alejandro to take a look at this. In the original patch HADOOP-3149 he said:
{quote}
Limiting the names to [a-zA-Z0-9] has a purpose, the names are used to create the files under the output dir, you don't want to get funny characters in the leafname
{quote}
but I'm not clear why :)

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J Chouraria
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J Chouraria (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J Chouraria reassigned MAPREDUCE-2293:
--------------------------------------------

    Assignee: Harsh J Chouraria

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J Chouraria
>            Priority: Minor
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "David Rosenstrauch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191288#comment-13191288 ] 

David Rosenstrauch commented on MAPREDUCE-2293:
-----------------------------------------------

:-(
                
> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.24.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J Chouraria (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989350#comment-12989350 ] 

Harsh J Chouraria commented on MAPREDUCE-2293:
----------------------------------------------

Forgot to add, the tests TestMultipleOutputs and TestMRMultipleOutputs both pass with the changes in the previously attached patch.

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J Chouraria
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2293:
-------------------------------

    Target Version/s: 3.0.0
       Fix Version/s:     (was: 0.24.0)
    
> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff, mapreduce.mo.removecheck.r4.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2293:
-------------------------------

    Fix Version/s: 0.23.0
           Status: Patch Available  (was: Open)

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068742#comment-13068742 ] 

Alejandro Abdelnur commented on MAPREDUCE-2293:
-----------------------------------------------

I'm OK if '_' it is not allowed as namedoutput name or multiname name?

The reason is that there must be a separator character to avoid filename collisions.

If there is not such character, the the following 2 named outputs could be configured for a job:

* named-output/multi-name FOO with multi-name BAR produces a file named FOO_BAR-#####
* name-output/no-multi-name FOO_BAR produces a file named FOO_BAR-#####

And that would mean that data written to 2 logical locations end up mixed in the same physical location.



> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989795#comment-12989795 ] 

Allen Wittenauer commented on MAPREDUCE-2293:
---------------------------------------------

It would be very useful to have the HDFS folks chime in... my understanding is that HDFS is essentially UTF-16 by virtue of using java String everywhere.  So it makes complete sense to only block the character /. 

I'm curious as to why the NULL character causes issues.

If users want to shoot themselves in the foot by naming things inconsistently, that isn't our place to get in their way.

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J Chouraria
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055700#comment-13055700 ] 

Harsh J commented on MAPREDUCE-2293:
------------------------------------

To clarify, the " " (space) is the only real constraint per the implementation right now. The comma was the only thing I added in for sanity for subsequent jobs, since FileInputFormat uses plain paths and commas to handle path lists.

The restriction can surely be lifted away off the comma (,). The " " (space) dependency for MO configuration can be done away with too, but would require me to use a control or escape character as delimiter and base64 encoding (for xml to allow) which may complicate the code a bit and make it impossible to easily see the conf values. Is it agreed to restrict on " " and ","?

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508353#comment-13508353 ] 

Hadoop QA commented on MAPREDUCE-2293:
--------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12555681/mapreduce.mo.removecheck.r5.diff
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 2 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3090//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3090//console

This message is automatically generated.
                
> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff, mapreduce.mo.removecheck.r4.diff, mapreduce.mo.removecheck.r5.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "David Rosenstrauch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989661#comment-12989661 ] 

David Rosenstrauch commented on MAPREDUCE-2293:
-----------------------------------------------

Alejandro:  if you read my bug report a little more carefully, you'll see that I wasn't suggesting that the valid characters in a named output name be completely unrestricted, but rather that the valid character set be loosened a bit.  Give the feedback provided here, I don't see any reason why it couldn't at least be relaxed to allow the characters [._-].  All of those characters a) wouldn't break anything, anb b) are very commonly used in filenames, and so really should be supported IMO.

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J Chouraria
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096666#comment-13096666 ] 

Hadoop QA commented on MAPREDUCE-2293:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12492900/mapreduce.mo.removecheck.r3.diff
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in hadoop-mapreduce-project.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-hs.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-shuffle.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-jobclient.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/598//console

This message is automatically generated.

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989460#comment-12989460 ] 

Alejandro Abdelnur commented on MAPREDUCE-2293:
-----------------------------------------------

The reason for disallowing special chars were:

* to be consistent with the default output names part-#####
* to ensure the final name is valid file name (i.e. using '/' or ' ' would break things)
* to avoid collisions between a named-output and a multinamed-output that is a prefix of the named-output

IMO these restrictions should stay.


> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J Chouraria
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2293:
-------------------------------

    Attachment: mapreduce.mo.removecheck.r2.diff

New patch that removes away the alphanumeric boxing and only introduces two sanity checks: the comma ({{,}}) and the space ({{ }}). 

Slashes ({{/}}) are allowed, so subdirectories in outputs may be created as well (this has often been asked for).

Adds test cases for some special characters (.-_) and also adds tests that check for subdir existence and verifies subdirectories output.

> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274016#comment-13274016 ] 

Hadoop QA commented on MAPREDUCE-2293:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12526635/mapreduce.mo.removecheck.r4.diff
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 2 new or modified test files.

    -1 javac.  The applied patch generated 1935 javac compiler warnings (more than the trunk's current 1933 warnings).

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2384//testReport/
Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2384//artifact/trunk/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2384//console

This message is automatically generated.
                
> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff, mapreduce.mo.removecheck.r4.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

Posted by "Harsh J (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188922#comment-13188922 ] 

Harsh J commented on MAPREDUCE-2293:
------------------------------------

Its a bit tedious to maintain patches waiting for reviews for very long time. If there is no interest in having this (given that new API does already have one way you can bypass checks), we can close it for now.
                
> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.24.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff, mapreduce.mo.removecheck.r3.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira