You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jun Young Kim <ju...@gmail.com> on 2011/01/20 10:32:08 UTC

MultipleOutputs is not working on 0.20.2 properly.

Hi,

I am using Hadoop 0.20.2 version on my cluster.

To write multiple output files from a reducer, I want to use 
MultipleOutputs class.

in this class, I need to call addNamedOutput.


      addNamedOutput

public static void*addNamedOutput*(JobConf  <http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobConf.html>  conf,
                                   String  <http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>  namedOutput,
                                   Class  <http://java.sun.com/javase/6/docs/api/java/lang/Class.html?is-external=true><? extendsOutputFormat  <http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/OutputFormat.html>>  outputFormatClass,
                                   Class  <http://java.sun.com/javase/6/docs/api/java/lang/Class.html?is-external=true><?>  keyClass,
                                   Class  <http://java.sun.com/javase/6/docs/api/java/lang/Class.html?is-external=true><?>  valueClass)

    Adds a named output for the job.

    *Parameters:*
        |conf|- job conf to add the named output
        |namedOutput|- named output name, it has to be a word, letters
        and numbers only, cannot be the word 'part' as that is reserved
        for the default output.
        |outputFormatClass|- OutputFormat class.
        |keyClass|- key class
        |valueClass|- value class


As you see, this method takes JobConf type as a first argument.
but, this one is deprecated one in 0.20.2.

additionally, MultipleOuputs class is only stored in 
org.apache.hadoop.mapred.lib.MultipleOutputs.
(not in org.apache.hadoop.mapred*uce*.lib.MultipleOutputs)

this is related discussions about this problem.
https://issues.apache.org/jira/browse/HADOOP-3149
https://issues.apache.org/jira/browse/MAPREDUCE-370


How I can set multiple output on my version?
thanks.

-- 

-----
Junyoung Kim (juneng603@gmail.com)


Re: MultipleOutputs is not working on 0.20.2 properly.

Posted by Jun Young Kim <ju...@gmail.com>.
anyway, the cloudera's version (0.20.2-737) is working. ;)

--
Junyoung Kim (juneng603@gmail.com)


On 01/20/2011 07:58 PM, Harsh J wrote:
> The MAPREDUCE-370 is fixed in 0.21 releases of Hadoop. You can
> use/upgrade-to that release if it is no trouble.
>
> If it is of any help, the "deprecated" MapReduce API in 0.20.2 has
> been unmarked as so in the upcoming 0.20.3 (and is back as the stable
> API, while new API is marked evolving/unstable) and is perfectly okay
> to use without worrying about any deprecation (it is even supported in
> 0.21).
>
> Otherwise, you can consider switching to Cloudera's Distribution for
> Hadoop [CDH] (From http://cloudera.com) or other such distributions
> that have the mentioned patches back-ported to 0.20.x; if you wish to
> stick to the 0.20.x releases.
>
> I know for a fact that the current CDH2 and CDH3 releases have the new
> API MultipleOutputs support (and some more fixes).
>

Re: MultipleOutputs is not working on 0.20.2 properly.

Posted by Jun Young Kim <ju...@gmail.com>.
As I know, there is a maven repository to use 0.21.0.

the cloudera, riptano are also supporting only 0.20.x versions.

is there any repository to 0.21.x version of a hadoop?

thanks.

--
Junyoung Kim (juneng603@gmail.com)


On 01/20/2011 07:58 PM, Harsh J wrote:
> The MAPREDUCE-370 is fixed in 0.21 releases of Hadoop. You can
> use/upgrade-to that release if it is no trouble.
>
> If it is of any help, the "deprecated" MapReduce API in 0.20.2 has
> been unmarked as so in the upcoming 0.20.3 (and is back as the stable
> API, while new API is marked evolving/unstable) and is perfectly okay
> to use without worrying about any deprecation (it is even supported in
> 0.21).
>
> Otherwise, you can consider switching to Cloudera's Distribution for
> Hadoop [CDH] (From http://cloudera.com) or other such distributions
> that have the mentioned patches back-ported to 0.20.x; if you wish to
> stick to the 0.20.x releases.
>
> I know for a fact that the current CDH2 and CDH3 releases have the new
> API MultipleOutputs support (and some more fixes).
>

Re: MultipleOutputs is not working on 0.20.2 properly.

Posted by Harsh J <qw...@gmail.com>.
The MAPREDUCE-370 is fixed in 0.21 releases of Hadoop. You can
use/upgrade-to that release if it is no trouble.

If it is of any help, the "deprecated" MapReduce API in 0.20.2 has
been unmarked as so in the upcoming 0.20.3 (and is back as the stable
API, while new API is marked evolving/unstable) and is perfectly okay
to use without worrying about any deprecation (it is even supported in
0.21).

Otherwise, you can consider switching to Cloudera's Distribution for
Hadoop [CDH] (From http://cloudera.com) or other such distributions
that have the mentioned patches back-ported to 0.20.x; if you wish to
stick to the 0.20.x releases.

I know for a fact that the current CDH2 and CDH3 releases have the new
API MultipleOutputs support (and some more fixes).

-- 
Harsh J
www.harshj.com