You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mikhail Yakshin <gr...@gmail.com> on 2009/01/28 16:53:14 UTC

Hadoop 0.19, Cascading 1.0 and MultipleOutputs problem

Hi,

We have a system based on Hadoop 0.18 / Cascading 0.8.1 and now I'm
trying to port it to Hadoop 0.19 / Cascading 1.0. The first serious
problem I've got into that we're extensively using MultipleOutputs in
our jobs dealing with sequence files that store Cascading's Tuples.

Since Cascading 0.9, Tuples stopped being WritableComparable and
implemented generic Hadoop serialization interface and framework.
However, in Hadoop 0.19, MultipleOutputs require use of older
WritableComparable interface. Thus, trying to do something like:

MultipleOutputs.addNamedOutput(conf, "output-name",
MySpecialMultiSplitOutputFormat.class, Tuple.class, Tuple.class);
mos = new MultipleOutputs(conf);
...
mos.getCollector("output-name", reporter).collect(tuple1, tuple2);

yields an error:

java.lang.RuntimeException: java.lang.RuntimeException: class
cascading.tuple.Tuple not org.apache.hadoop.io.WritableComparable
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
	at org.apache.hadoop.mapred.lib.MultipleOutputs.getNamedOutputKeyClass(MultipleOutputs.java:252)
	at org.apache.hadoop.mapred.lib.MultipleOutputs$InternalFileOutputFormat.getRecordWriter(MultipleOutputs.java:556)
	at org.apache.hadoop.mapred.lib.MultipleOutputs.getRecordWriter(MultipleOutputs.java:425)
	at org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:511)
	at org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:476)
	at my.namespace.MyReducer.reduce(MyReducer.java:xxx)

Is there any known workaround for that? Any progress going on to make
MultipleOutputs use generic Hadoop serialization?

-- 
WBR, Mikhail Yakshin

Re: Hadoop 0.19, Cascading 1.0 and MultipleOutputs problem

Posted by Mikhail Yakshin <gr...@gmail.com>.

On Wed, Feb 4, 2009 at 10:07 AM, Alejandro Abdelnur <tu...@gmail.com> wrote:
> Mikhail,
>
> You are right, please open a Jira on this.
>
> Alejandro

Done:
https://issues.apache.org/jira/browse/HADOOP-5167

-- 
WBR, Mikhail Yakshin

Re: Hadoop 0.19, Cascading 1.0 and MultipleOutputs problem

Posted by Alejandro Abdelnur <tu...@gmail.com>.

Mikhail,

You are right, please open a Jira on this.

Alejandro


On Wed, Jan 28, 2009 at 9:23 PM, Mikhail Yakshin
<gr...@gmail.com>wrote:

> Hi,
>
> We have a system based on Hadoop 0.18 / Cascading 0.8.1 and now I'm
> trying to port it to Hadoop 0.19 / Cascading 1.0. The first serious
> problem I've got into that we're extensively using MultipleOutputs in
> our jobs dealing with sequence files that store Cascading's Tuples.
>
> Since Cascading 0.9, Tuples stopped being WritableComparable and
> implemented generic Hadoop serialization interface and framework.
> However, in Hadoop 0.19, MultipleOutputs require use of older
> WritableComparable interface. Thus, trying to do something like:
>
> MultipleOutputs.addNamedOutput(conf, "output-name",
> MySpecialMultiSplitOutputFormat.class, Tuple.class, Tuple.class);
> mos = new MultipleOutputs(conf);
> ...
> mos.getCollector("output-name", reporter).collect(tuple1, tuple2);
>
> yields an error:
>
> java.lang.RuntimeException: java.lang.RuntimeException: class
> cascading.tuple.Tuple not org.apache.hadoop.io.WritableComparable
>        at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
>        at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getNamedOutputKeyClass(MultipleOutputs.java:252)
>        at
> org.apache.hadoop.mapred.lib.MultipleOutputs$InternalFileOutputFormat.getRecordWriter(MultipleOutputs.java:556)
>        at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getRecordWriter(MultipleOutputs.java:425)
>        at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:511)
>        at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:476)
>        at my.namespace.MyReducer.reduce(MyReducer.java:xxx)
>
> Is there any known workaround for that? Any progress going on to make
> MultipleOutputs use generic Hadoop serialization?
>
> --
> WBR, Mikhail Yakshin
>