You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2009/03/11 11:55:50 UTC

[jira] Resolved: (HADOOP-2954) In streaming, map-output cannot have empty keys

     [ https://issues.apache.org/jira/browse/HADOOP-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu resolved HADOOP-2954.
---------------------------------------------

    Resolution: Duplicate

Fixed by HADOOP-3040

> In streaming, map-output cannot have empty keys
> -----------------------------------------------
>
>                 Key: HADOOP-2954
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2954
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Sameer Paranjpye
>
> Here is the analysis, when the mapper and reducer both are /bin/cat,
> default key field separator: '\t' (or tab)
> for ex, if the input line is:
> \tSDSDFIKSDFSDFJS
> the input for the mapper ('cat' in this case) is:
> \tSDSDFIKSDFSDFJS
> -
> the output of the mapper is split into a key, value pair as below:
> (key, value) -> (\tSDSDFIKSDFSDFJS, "")
> (i.e. the value is empty)
> the function which splits the output into key,value pair for
> streaming jobs, ignores the first character of the line
> -
> from the above (key, value) pair, the input for the reducer is:
> (key followed by separator followed by value)
> \tSDSDFIKSDFSDFJS\t
> if the reducer is set to NONE, the above line is the output of
> the map task
> -
> the output of the reducer ('cat' in this case) is:
> \tSDSDFIKSDFSDFJS\t
> -
> if the line starts with the field separator, it is possible that
> the output of the mapper can be assigned to different reducers because
> it is possible that the line contains more than once instance of the
> field separator - for ex:
> input-line=\tABCDEFGH
> key=\tABCDEFGH
> value=
> (value is empty)
> output-line=\tABCDEFGH\t
> line=\tABCDEFGHYH\tJHUHJH
> key=\tABCDEFGHYH
> value=JHUHJH
> output-line=\tABCDEFGHYH\tJHUHJH
> assuming defaults (HashPartitioner), they are likely to be assigned to
> different reducers because the keys are different.
> The streaming contract  says that from beginning of the line upto the first tab is the key, so key should be empty string. But it is not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.