You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Rahul Bhattacharjee <ra...@gmail.com> on 2012/08/17 18:25:22 UTC

Avro data file support in Hadoop

Hello,

I was trying to write a MR job for processing avro data file , which
contains serialized object conforming to a schema.

The schema is something this,

name:string
surname:string
age:long
address:string.
country:string

1) My plan was to read the avro data file using AvroMapper.
2)Create custom key (CustomKey) containing [name,surname,age] and custom
value (CustomValue) containing [address,country]
3)Collect Pair of CustomKey an CustomValue.
4)Create AvroReducer , expected key is CustomKey and a iterator of
CustomValue.

I am not able to get the reducer to work.Looks like the AvroJob expects the
mapper output key and output value to be of type AvroKey and AvroValue
type.I even tried to wrap my custom key and value using AvroKey and
AvroValue. It didn't help.

>From the exception it looks like, AvroReducer get is GenericData.Record
type of object.

Going further , I wish to do secondary sort based on age and using
[name,surname] for grouping comparator and using a custom comparator.

Not sure the helpers for Avro processing in Hadoop supports secondary sort
, custom writable , custom writable comparable or not.

Any pointers regarding avro data processing using hadoop would greatly be
appreciated.

Please let me know if any more information is required from me.

Thanks,
Rahul