You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2011/07/21 02:35:58 UTC

[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to allow additional characters in the named output name

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068742#comment-13068742 ] 

Alejandro Abdelnur commented on MAPREDUCE-2293:
-----------------------------------------------

I'm OK if '_' it is not allowed as namedoutput name or multiname name?

The reason is that there must be a separator character to avoid filename collisions.

If there is not such character, the the following 2 named outputs could be configured for a job:

* named-output/multi-name FOO with multi-name BAR produces a file named FOO_BAR-#####
* name-output/no-multi-name FOO_BAR produces a file named FOO_BAR-#####

And that would mean that data written to 2 logical locations end up mixed in the same physical location.



> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2293
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: David Rosenstrauch
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class.  This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too.  (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension.  Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method.  (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira