You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Xiaming Chen <ch...@gmail.com> on 2013/10/02 15:28:43 UTC
Avro new mapreduce API
Hi there,
Can u give me some examples or explaination about programming with
pure org.apache.avro.mapreduce interfaces??
-------- Save your time, continue if you know how --------
All of my programs are writing with hadoop's new MR1 interfaces
(org.apache.hadoop.mapreduce), so I want to use new
org.apache.avro.mapreduce of avro too. But it doesn't work for me.
The program takes input of avro data and output the same.
The main idea behind my program is subclassing hadoop's Mapper
and Reducer against avro wrapped key/value.
Here is a block of my job driver :
AvroJob.setInputKeySchema(job, NetflowRecord.getClassSchema());
AvroJob.setOutputKeySchema(job, NetflowRecord.getClassSchema());
job.setMapperClass(MyAvroMap.class);
job.setReducerClass(MyAvroReduce.class);
job.setInputFormatClass(AvroKeyInputFormat.class);
job.setOutputFormatClass(AvroKeyOutputFormat.class);
job.setMapOutputKeyClass(AvroKey.class);
job.setMapOutputValueClass(AvroValue.class);
job.setOutputKeyClass(AvroKey.class);
job.setOutputValueClass(NullWritable.class);
The definitions of MyAvroMap and MyAvroReduce subclasses respectivly are
public static class MyAvroMap extends Mapper<AvroKey<NetflowRecord>, NullWritable,
AvroKey<CharSequence>, AvroValue<NetflowRecord>>{ ... }
public static class MyAvroReduce extends Reducer<AvroKey<CharSequence>, AvroValue<NetflowRecord>,
AvroKey<NetflowRecord>, NullWritable>{ ... }
The methioned NetflowRecord is my avro record class. And I got running exception
java.lang.ClassCastException: class org.apache.avro.hadoop.io.AvroKey
By reading hadoop's and avro's source code,
I found that the exception was thrown by JobConf to make sure
the map key is a subclass of WritableComparable, like this (hadoop1.2.1, line759)
WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class));
But the avro shows that AvroKey and AvroValue are just a simple wrapper
**without** subclassing Writable* interfaces of hadoop.
I believe that, even without testing, I can get through that using old mapred interfaces,
but its not what I want.
Sincerely,
Jamin
Re: Avro new mapreduce API
Posted by Xiaming Chen <ch...@gmail.com>.
Hi Johannes,
Thanks for you remind. It's solved after adding mapper key/value schema settings.
New mapreduce API is more convenient than mapred's. I love this way.
Best Regards,
Jamin
在 2013-10-6,上午5:30,Johannes Schulte <jo...@gmail.com> 写道:
> Hi,
>
> you should try using the static methods of AvroJob to configure your map output key and value schemas. This takes care of configuring the right KeyComparators for you. SO instead of writing
>
> job.setMapOutputKeyClass(AvroKey.class);
> job.setMapOutputValueClass(AvroValue.class);
>
> write
>
> AvroJob.setMapOutputKEySchema(Schema.create(Type.String));
> AvroJob.setMapOutputValueSchema(NetflowRecord.getClassSchema())
>
> and same for output values
>
> AvroJob.setOutputKeySchema.
>
>
> Cheers,
> Johannes
>
>
>
> On Wed, Oct 2, 2013 at 3:28 PM, Xiaming Chen <ch...@gmail.com> wrote:
> Hi there,
>
> Can u give me some examples or explaination about programming with
> pure org.apache.avro.mapreduce interfaces??
>
> -------- Save your time, continue if you know how --------
>
> All of my programs are writing with hadoop's new MR1 interfaces
> (org.apache.hadoop.mapreduce), so I want to use new
> org.apache.avro.mapreduce of avro too. But it doesn't work for me.
>
> The program takes input of avro data and output the same.
> The main idea behind my program is subclassing hadoop's Mapper
> and Reducer against avro wrapped key/value.
>
> Here is a block of my job driver :
>
> AvroJob.setInputKeySchema(job, NetflowRecord.getClassSchema());
> AvroJob.setOutputKeySchema(job, NetflowRecord.getClassSchema());
>
> job.setMapperClass(MyAvroMap.class);
> job.setReducerClass(MyAvroReduce.class);
>
> job.setInputFormatClass(AvroKeyInputFormat.class);
> job.setOutputFormatClass(AvroKeyOutputFormat.class);
>
> job.setMapOutputKeyClass(AvroKey.class);
> job.setMapOutputValueClass(AvroValue.class);
>
> job.setOutputKeyClass(AvroKey.class);
> job.setOutputValueClass(NullWritable.class);
>
> The definitions of MyAvroMap and MyAvroReduce subclasses respectivly are
>
> public static class MyAvroMap extends Mapper<AvroKey<NetflowRecord>, NullWritable,
> AvroKey<CharSequence>, AvroValue<NetflowRecord>>{ ... }
>
> public static class MyAvroReduce extends Reducer<AvroKey<CharSequence>, AvroValue<NetflowRecord>,
> AvroKey<NetflowRecord>, NullWritable>{ ... }
>
> The methioned NetflowRecord is my avro record class. And I got running exception
>
> java.lang.ClassCastException: class org.apache.avro.hadoop.io.AvroKey
>
> By reading hadoop's and avro's source code,
> I found that the exception was thrown by JobConf to make sure
> the map key is a subclass of WritableComparable, like this (hadoop1.2.1, line759)
>
> WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class));
>
> But the avro shows that AvroKey and AvroValue are just a simple wrapper
> **without** subclassing Writable* interfaces of hadoop.
> I believe that, even without testing, I can get through that using old mapred interfaces,
> but its not what I want.
>
>
> Sincerely,
>
> Jamin
>
>
>
Re: Avro new mapreduce API
Posted by Johannes Schulte <jo...@gmail.com>.
Hi,
you should try using the static methods of AvroJob to configure your map
output key and value schemas. This takes care of configuring the right
KeyComparators for you. SO instead of writing
job.setMapOutputKeyClass(AvroKey.class);
job.setMapOutputValueClass(AvroValue.class);
write
AvroJob.setMapOutputKEySchema(Schema.create(Type.String));
AvroJob.setMapOutputValueSchema(NetflowRecord.getClassSchema())
and same for output values
AvroJob.setOutputKeySchema.
Cheers,
Johannes
On Wed, Oct 2, 2013 at 3:28 PM, Xiaming Chen <ch...@gmail.com> wrote:
> Hi there,
>
> Can u give me some examples or explaination about programming with
> pure org.apache.avro.mapreduce interfaces??
>
> -------- Save your time, continue if you know how --------
>
> All of my programs are writing with hadoop's new MR1 interfaces
> (org.apache.hadoop.mapreduce), so I want to use new
> org.apache.avro.mapreduce of avro too. But it doesn't work for me.
>
> The program takes input of avro data and output the same.
> The main idea behind my program is subclassing hadoop's Mapper
> and Reducer against avro wrapped key/value.
>
> Here is a block of my job driver :
>
> AvroJob.setInputKeySchema(job, NetflowRecord.getClassSchema());
> AvroJob.setOutputKeySchema(job, NetflowRecord.getClassSchema());
>
> job.setMapperClass(MyAvroMap.class);
> job.setReducerClass(MyAvroReduce.class);
>
> job.setInputFormatClass(AvroKeyInputFormat.class);
> job.setOutputFormatClass(AvroKeyOutputFormat.class);
>
> job.setMapOutputKeyClass(AvroKey.class);
> job.setMapOutputValueClass(AvroValue.class);
>
> job.setOutputKeyClass(AvroKey.class);
> job.setOutputValueClass(NullWritable.class);
>
> The definitions of MyAvroMap and MyAvroReduce subclasses respectivly are
>
> public static class MyAvroMap extends Mapper<AvroKey<NetflowRecord>,
> NullWritable,
> AvroKey<CharSequence>, AvroValue<NetflowRecord>>{ ... }
>
> public static class MyAvroReduce extends
> Reducer<AvroKey<CharSequence>, AvroValue<NetflowRecord>,
> AvroKey<NetflowRecord>, NullWritable>{ ... }
>
> The methioned NetflowRecord is my avro record class. And I got running
> exception
>
> java.lang.ClassCastException: class org.apache.avro.hadoop.io.AvroKey
>
> By reading hadoop's and avro's source code,
> I found that the exception was thrown by JobConf to make sure
> the map key is a subclass of WritableComparable, like this (hadoop1.2.1,
> line759)
>
>
> WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class));
>
> But the avro shows that AvroKey and AvroValue are just a simple wrapper
> **without** subclassing Writable* interfaces of hadoop.
> I believe that, even without testing, I can get through that using old
> mapred interfaces,
> but its not what I want.
>
>
> Sincerely,
>
> Jamin
>
>
>