You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mikhail Yakshin <gr...@gmail.com> on 2009/01/28 16:53:14 UTC
Hadoop 0.19, Cascading 1.0 and MultipleOutputs problem
Hi,
We have a system based on Hadoop 0.18 / Cascading 0.8.1 and now I'm
trying to port it to Hadoop 0.19 / Cascading 1.0. The first serious
problem I've got into that we're extensively using MultipleOutputs in
our jobs dealing with sequence files that store Cascading's Tuples.
Since Cascading 0.9, Tuples stopped being WritableComparable and
implemented generic Hadoop serialization interface and framework.
However, in Hadoop 0.19, MultipleOutputs require use of older
WritableComparable interface. Thus, trying to do something like:
MultipleOutputs.addNamedOutput(conf, "output-name",
MySpecialMultiSplitOutputFormat.class, Tuple.class, Tuple.class);
mos = new MultipleOutputs(conf);
...
mos.getCollector("output-name", reporter).collect(tuple1, tuple2);
yields an error:
java.lang.RuntimeException: java.lang.RuntimeException: class
cascading.tuple.Tuple not org.apache.hadoop.io.WritableComparable
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
at org.apache.hadoop.mapred.lib.MultipleOutputs.getNamedOutputKeyClass(MultipleOutputs.java:252)
at org.apache.hadoop.mapred.lib.MultipleOutputs$InternalFileOutputFormat.getRecordWriter(MultipleOutputs.java:556)
at org.apache.hadoop.mapred.lib.MultipleOutputs.getRecordWriter(MultipleOutputs.java:425)
at org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:511)
at org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:476)
at my.namespace.MyReducer.reduce(MyReducer.java:xxx)
Is there any known workaround for that? Any progress going on to make
MultipleOutputs use generic Hadoop serialization?
--
WBR, Mikhail Yakshin
Re: Hadoop 0.19, Cascading 1.0 and MultipleOutputs problem
Posted by Mikhail Yakshin <gr...@gmail.com>.
On Wed, Feb 4, 2009 at 10:07 AM, Alejandro Abdelnur <tu...@gmail.com> wrote:
> Mikhail,
>
> You are right, please open a Jira on this.
>
> Alejandro
Done:
https://issues.apache.org/jira/browse/HADOOP-5167
--
WBR, Mikhail Yakshin
Re: Hadoop 0.19, Cascading 1.0 and MultipleOutputs problem
Posted by Alejandro Abdelnur <tu...@gmail.com>.
Mikhail,
You are right, please open a Jira on this.
Alejandro
On Wed, Jan 28, 2009 at 9:23 PM, Mikhail Yakshin
<gr...@gmail.com>wrote:
> Hi,
>
> We have a system based on Hadoop 0.18 / Cascading 0.8.1 and now I'm
> trying to port it to Hadoop 0.19 / Cascading 1.0. The first serious
> problem I've got into that we're extensively using MultipleOutputs in
> our jobs dealing with sequence files that store Cascading's Tuples.
>
> Since Cascading 0.9, Tuples stopped being WritableComparable and
> implemented generic Hadoop serialization interface and framework.
> However, in Hadoop 0.19, MultipleOutputs require use of older
> WritableComparable interface. Thus, trying to do something like:
>
> MultipleOutputs.addNamedOutput(conf, "output-name",
> MySpecialMultiSplitOutputFormat.class, Tuple.class, Tuple.class);
> mos = new MultipleOutputs(conf);
> ...
> mos.getCollector("output-name", reporter).collect(tuple1, tuple2);
>
> yields an error:
>
> java.lang.RuntimeException: java.lang.RuntimeException: class
> cascading.tuple.Tuple not org.apache.hadoop.io.WritableComparable
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
> at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getNamedOutputKeyClass(MultipleOutputs.java:252)
> at
> org.apache.hadoop.mapred.lib.MultipleOutputs$InternalFileOutputFormat.getRecordWriter(MultipleOutputs.java:556)
> at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getRecordWriter(MultipleOutputs.java:425)
> at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:511)
> at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:476)
> at my.namespace.MyReducer.reduce(MyReducer.java:xxx)
>
> Is there any known workaround for that? Any progress going on to make
> MultipleOutputs use generic Hadoop serialization?
>
> --
> WBR, Mikhail Yakshin
>