You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2011/07/21 02:35:58 UTC
[jira] [Commented] (MAPREDUCE-2293) Enhance MultipleOutputs to
allow additional characters in the named output name
[ https://issues.apache.org/jira/browse/MAPREDUCE-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068742#comment-13068742 ]
Alejandro Abdelnur commented on MAPREDUCE-2293:
-----------------------------------------------
I'm OK if '_' it is not allowed as namedoutput name or multiname name?
The reason is that there must be a separator character to avoid filename collisions.
If there is not such character, the the following 2 named outputs could be configured for a job:
* named-output/multi-name FOO with multi-name BAR produces a file named FOO_BAR-#####
* name-output/no-multi-name FOO_BAR produces a file named FOO_BAR-#####
And that would mean that data written to 2 logical locations end up mixed in the same physical location.
> Enhance MultipleOutputs to allow additional characters in the named output name
> -------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2293
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2293
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 0.21.0
> Reporter: David Rosenstrauch
> Assignee: Harsh J
> Priority: Minor
> Fix For: 0.23.0
>
> Attachments: mapreduce.mo.removecheck.r1.diff, mapreduce.mo.removecheck.r2.diff
>
>
> Currently you are only allowed to use alpha-numeric characters in a named output name in the MultipleOutputs class. This is a bit of an onerous restriction, as it would be extremely convenient to be able to use non alpha-numerics in the name too. (E.g., a '.' character would be very helpful, so that you can use the named output name for holding a file name/extension. Perhaps '-' and a '_' characters as well.)
> The restriction seems to be somewhat arbitrary - it appears to be only enforced in the checkTokenName method. (Though I don't know if there's any downstream impact by loosening this restriction.)
> Would be extremely helpful/useful to have this fixed though!
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira