You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/04/01 10:21:26 UTC

[jira] Reopened: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.

     [ http://issues.apache.org/jira/browse/HADOOP-115?page=all ]
     
Owen O'Malley reopened HADOOP-115:
----------------------------------


Let's reopen this. I've had discussions with Runping today, and it seems to me that:

1. It is basically free with respect to the framework.
2. It allows more applications to be written using the framework rather than working around the framework.
3. It is less clear that we should allow the user to change the key type in the reduce, but since the current API does allow them to change the value (if not the type), I think we should be consistent and allow a type change too.

I propose:

1. Add {set,get}MapOutput{Key,Value}Class functions in JobConf.
2. The default values for getMapOutput{Key,Value}Class are the values from getOutput{Key,Value}Class.
3. Always check the types in the output collector rather that the OutputFormat, so that even text output files are check for type correctness.

We should include a javadoc comment for setMapOutputKeyClass will warn that changing the key in the reduce will mean that your output is NOT sorted.

> Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose  key/value classes that are different from those for map output.
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-115
>          URL: http://issues.apache.org/jira/browse/HADOOP-115
>      Project: Hadoop
>         Type: Improvement
>   Components: mapred
>     Reporter: Runping Qi

>
> When map tasks write intermediate data out, they always use SequencialFile RecordWriter with key/value classes from the job object.
> When the reducers write the final results out, its output format is obtained from the job object. By default, it is TextOutputFormat, and no conflicts.
> However, if one wants to use SequencialFileFormat for the final results, then the key/value classes are also obtained from the job object, the same as the map tasks' output. Now we have a problem. It is impossible for the map outputs and reducer outputs use different key/value classes, if one wants the reducers generate outputs in SequentialFileFormat.
> A simple fix would be to add another two attributes to JobConf class: mapOutputLeyClass and mapOutputValueClass. That allows the user to have different key/value classes for the intermediate and final outputs.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [jira] Reopened: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output.

Posted by Andrew McNabb <am...@mcnabbs.org>.
On Sat, Apr 01, 2006 at 09:21:26AM +0100, Owen O'Malley (JIRA) wrote:
> 
> Let's reopen this. I've had discussions with Runping today, and it seems to me that:


> 3. It is less clear that we should allow the user to change the key
> type in the reduce, but since the current API does allow them to
> change the value (if not the type), I think we should be consistent
> and allow a type change too.

I strongly disagree.  I think it's unnecessary, and I think it breaks
the model too much.  Google has many thousands of map reduce
applications, and they haven't broken the model yet.  I don't really
care about output formats, but I think we're just asking for trouble if
we allow the type of the reduce output to be different from the type of
the map output.

-- 
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868