You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2014/07/17 21:13:04 UTC

[jira] [Resolved] (MAPREDUCE-594) Streaming: org.apache.hadoop.mapred.lib.IdentityMapper should not inserted unnecessary keys

     [ https://issues.apache.org/jira/browse/MAPREDUCE-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved MAPREDUCE-594.
----------------------------------------

    Resolution: Fixed

Closing this as fixed for a variety of reasons (alternative provided, you can now provide your own input format, etc, etc)

> Streaming: org.apache.hadoop.mapred.lib.IdentityMapper should not inserted unnecessary keys
> -------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-594
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-594
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: arkady borkovsky
>
> When streaming command specifies 
> -mapper org.apache.hadoop.mapred.lib.IdentityMapper
> the reducer should receive exactly the same text lines as where present in the input.
> The only modification is the reordering the input.
> Currently, org.apache.hadoop.mapred.lib.IdentityMapper inserts ofsets in the input as keys.  Which renders it useless.
> Moreover, in the latest release org.apache.hadoop.mapred.lib.IdentityMapper just crashes:
> >java.io.IOException: Type mismatch in key from map: e
> xpected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:331)
>         at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> (I open only one bug, as it is broken anyway, the new behavior does not actually make it any worse than before)



--
This message was sent by Atlassian JIRA
(v6.2#6252)