You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/01/23 17:19:49 UTC

[jira] Created: (HADOOP-920) MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks

MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks
----------------------------------------------------------------------------------------------------

                 Key: HADOOP-920
                 URL: https://issues.apache.org/jira/browse/HADOOP-920
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.11.0
            Reporter: Andrzej Bialecki 
             Fix For: 0.11.0


Let's assume a job uses different key/value class for the output of map tasks and for the final output of reduce tasks.

When executing map tasks classes returned from JobConf.getMapOutputKeyClass() / getMapOutputValueClass() should be used, and when executing reduce tasks classes returned from JobConf.gtOutputKeyClass() / getOutputValueClass() should be used.

Currently both map and reduce tasks will use getMapOutputKeyClass/getMapOutputValueClass when using MapFileOutputFormat, or they will always use getOutputKeyClassgetOutputValueClass when using SequenceFileOutputFormat. This causes exceptions, because Mapper / Reducer implementations will output different key/value classes than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-920) MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466773 ] 

Hadoop QA commented on HADOOP-920:
----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12349457/key-value-class.patch applied and successfully tested against trunk revision r498829.

> MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-920
>                 URL: https://issues.apache.org/jira/browse/HADOOP-920
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.0
>            Reporter: Andrzej Bialecki 
>             Fix For: 0.11.0
>
>         Attachments: key-value-class.patch
>
>
> Let's assume a job uses different key/value class for the output of map tasks and for the final output of reduce tasks.
> When executing map tasks classes returned from JobConf.getMapOutputKeyClass() / getMapOutputValueClass() should be used, and when executing reduce tasks classes returned from JobConf.gtOutputKeyClass() / getOutputValueClass() should be used.
> Currently both map and reduce tasks will use getMapOutputKeyClass/getMapOutputValueClass when using MapFileOutputFormat, or they will always use getOutputKeyClassgetOutputValueClass when using SequenceFileOutputFormat. This causes exceptions, because Mapper / Reducer implementations will output different key/value classes than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-920) MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  resolved HADOOP-920.
--------------------------------------

    Resolution: Fixed
      Assignee: Andrzej Bialecki 

Fixed by reverting an accidental change introduced by HADOOP-115.

> MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-920
>                 URL: https://issues.apache.org/jira/browse/HADOOP-920
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.0
>            Reporter: Andrzej Bialecki 
>         Assigned To: Andrzej Bialecki 
>             Fix For: 0.11.0
>
>         Attachments: key-value-class.patch
>
>
> Let's assume a job uses different key/value class for the output of map tasks and for the final output of reduce tasks.
> When executing map tasks classes returned from JobConf.getMapOutputKeyClass() / getMapOutputValueClass() should be used, and when executing reduce tasks classes returned from JobConf.gtOutputKeyClass() / getOutputValueClass() should be used.
> Currently both map and reduce tasks will use getMapOutputKeyClass/getMapOutputValueClass when using MapFileOutputFormat, or they will always use getOutputKeyClassgetOutputValueClass when using SequenceFileOutputFormat. This causes exceptions, because Mapper / Reducer implementations will output different key/value classes than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-920) MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated HADOOP-920:
-------------------------------------

    Attachment: key-value-class.patch

Proposed fix, which uses different methods depending on whether we are in map or reduce task.

> MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-920
>                 URL: https://issues.apache.org/jira/browse/HADOOP-920
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.0
>            Reporter: Andrzej Bialecki 
>             Fix For: 0.11.0
>
>         Attachments: key-value-class.patch
>
>
> Let's assume a job uses different key/value class for the output of map tasks and for the final output of reduce tasks.
> When executing map tasks classes returned from JobConf.getMapOutputKeyClass() / getMapOutputValueClass() should be used, and when executing reduce tasks classes returned from JobConf.gtOutputKeyClass() / getOutputValueClass() should be used.
> Currently both map and reduce tasks will use getMapOutputKeyClass/getMapOutputValueClass when using MapFileOutputFormat, or they will always use getOutputKeyClassgetOutputValueClass when using SequenceFileOutputFormat. This causes exceptions, because Mapper / Reducer implementations will output different key/value classes than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-920) MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated HADOOP-920:
-------------------------------------

    Attachment:     (was: key-value-class.patch)

> MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-920
>                 URL: https://issues.apache.org/jira/browse/HADOOP-920
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.0
>            Reporter: Andrzej Bialecki 
>             Fix For: 0.11.0
>
>
> Let's assume a job uses different key/value class for the output of map tasks and for the final output of reduce tasks.
> When executing map tasks classes returned from JobConf.getMapOutputKeyClass() / getMapOutputValueClass() should be used, and when executing reduce tasks classes returned from JobConf.gtOutputKeyClass() / getOutputValueClass() should be used.
> Currently both map and reduce tasks will use getMapOutputKeyClass/getMapOutputValueClass when using MapFileOutputFormat, or they will always use getOutputKeyClassgetOutputValueClass when using SequenceFileOutputFormat. This causes exceptions, because Mapper / Reducer implementations will output different key/value classes than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-920) MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated HADOOP-920:
-------------------------------------

    Attachment: key-value-class.patch

> MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-920
>                 URL: https://issues.apache.org/jira/browse/HADOOP-920
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.0
>            Reporter: Andrzej Bialecki 
>             Fix For: 0.11.0
>
>         Attachments: key-value-class.patch
>
>
> Let's assume a job uses different key/value class for the output of map tasks and for the final output of reduce tasks.
> When executing map tasks classes returned from JobConf.getMapOutputKeyClass() / getMapOutputValueClass() should be used, and when executing reduce tasks classes returned from JobConf.gtOutputKeyClass() / getOutputValueClass() should be used.
> Currently both map and reduce tasks will use getMapOutputKeyClass/getMapOutputValueClass when using MapFileOutputFormat, or they will always use getOutputKeyClassgetOutputValueClass when using SequenceFileOutputFormat. This causes exceptions, because Mapper / Reducer implementations will output different key/value classes than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-920) MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-920:
--------------------------------

    Status: Open  (was: Patch Available)

OutputFormats are only used when reducing, to generate the final output.  They're not used when creating intermediate output.  So the bug here is that MapFileOutputFormat calls job.getMapOutput{Key,Value}Class()--those methods should only be called by the MapReduce kernel when generating intermediate output and should not be called by an OutputFormat implementation.  This bug was introduced by HADOOP-115.

http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/MapFileOutputFormat.java?p2=%2Flucene%2Fhadoop%2Ftrunk%2Fsrc%2Fjava%2Forg%2Fapache%2Fhadoop%2Fmapred%2FMapFileOutputFormat.java&p1=%2Flucene%2Fhadoop%2Ftrunk%2Fsrc%2Fjava%2Forg%2Fapache%2Fhadoop%2Fmapred%2FMapFileOutputFormat.java&r1=407355&r2=407354&view=diff&pathrev=407355

The proper fix I think is to undo that change to this file.

> MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-920
>                 URL: https://issues.apache.org/jira/browse/HADOOP-920
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.0
>            Reporter: Andrzej Bialecki 
>             Fix For: 0.11.0
>
>         Attachments: key-value-class.patch
>
>
> Let's assume a job uses different key/value class for the output of map tasks and for the final output of reduce tasks.
> When executing map tasks classes returned from JobConf.getMapOutputKeyClass() / getMapOutputValueClass() should be used, and when executing reduce tasks classes returned from JobConf.gtOutputKeyClass() / getOutputValueClass() should be used.
> Currently both map and reduce tasks will use getMapOutputKeyClass/getMapOutputValueClass when using MapFileOutputFormat, or they will always use getOutputKeyClassgetOutputValueClass when using SequenceFileOutputFormat. This causes exceptions, because Mapper / Reducer implementations will output different key/value classes than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-920) MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated HADOOP-920:
-------------------------------------

    Status: Patch Available  (was: Open)

> MapFileOutputFormat and SequenceFileOutputFormat use incorrect key/value classes in map/reduce tasks
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-920
>                 URL: https://issues.apache.org/jira/browse/HADOOP-920
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.0
>            Reporter: Andrzej Bialecki 
>             Fix For: 0.11.0
>
>         Attachments: key-value-class.patch
>
>
> Let's assume a job uses different key/value class for the output of map tasks and for the final output of reduce tasks.
> When executing map tasks classes returned from JobConf.getMapOutputKeyClass() / getMapOutputValueClass() should be used, and when executing reduce tasks classes returned from JobConf.gtOutputKeyClass() / getOutputValueClass() should be used.
> Currently both map and reduce tasks will use getMapOutputKeyClass/getMapOutputValueClass when using MapFileOutputFormat, or they will always use getOutputKeyClassgetOutputValueClass when using SequenceFileOutputFormat. This causes exceptions, because Mapper / Reducer implementations will output different key/value classes than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.