You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2010/09/17 21:27:33 UTC

[jira] Updated: (AVRO-669) Avro Mapreduce Doesn't Work With Reflect Schemas

     [ https://issues.apache.org/jira/browse/AVRO-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-669:
------------------------------

    Attachment: AVRO-669.patch

Here's a patch that implements ReflectData.compare() for all types but Object[] and byte[].  It moves the getField() and setField() methods from the reader/writers to GenericData, so that the generic implementation of record comparison can be shared.  This also eliminates a little duplicate code.

It doesn't add full end-to-end MapReduce tests for reflect-based data, so I have not verified whether that works yet, but it should address the proximal cause.

AVRO-638 is related.  When that's complete, it should be possible to switch the mapred code to always use ReflectDatumReader, ReflectDatumWriter and ReflectData, since these should then work for specific and generic data as well.

Please tell me if this helps.

> Avro Mapreduce Doesn't Work With Reflect Schemas
> ------------------------------------------------
>
>                 Key: AVRO-669
>                 URL: https://issues.apache.org/jira/browse/AVRO-669
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Ron Bodkin
>             Fix For: 1.5.0
>
>         Attachments: AVRO-669.patch
>
>
> I'm trying to get the Avro trunk code (from Subversion) to work with a simple example of a reflection-defined schema, using a class I created. I use a ReflectDatumWriter to write a set of records to a file, e.g.,
>         DatumWriter writer = new ReflectDatumWriter(Record.class);
>         DataFileWriter file = new DataFileWriter(writer);
> However, when I try to read that data in using an AvroMapper it fails with an exception as shown below. It turns out that the mapreduce implementation hard-codes a dependence on SpecificDatum readers and writers. 
> I've tested switching to use ReflectDatum instead in five places to try to get it to work for an end-to-end reflect data example:
> AvroFileInputFormat
> AvroFileOutputFormat
> AvroSerialization (getDeserializer and getSerializer)
> AvroKeyComparator
> However, switching to use reflection for AvroKeyComparator doesn't work:
> java.lang.UnsupportedOperationException
> 	at org.apache.avro.reflect.ReflectData.compare(ReflectData.java:427)
> 	at org.apache.avro.mapred.AvroKeyComparator.compare(AvroKeyComparator.java:46)
> It should be possible to implement compare on reflect data (just like GenericData's implementation but use the field name instead (or better yet a cached java.lang.reflect.Field)...
> Original exception:
> java.lang.ClassCastException: tba.mr.sample.avro.Record cannot be cast to org.apache.avro.generic.IndexedRecord
> 	at org.apache.avro.generic.GenericDatumReader.setField(GenericDatumReader.java:152)
> 	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
> 	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:198)
> 	at org.apache.avro.mapred.AvroRecordReader.next(AvroRecordReader.java:63)
> 	at org.apache.avro.mapred.AvroRecordReader.next(AvroRecordReader.java:33)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.