You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Russell Jurney <ru...@gmail.com> on 2012/03/24 03:01:45 UTC
Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException:
64 / avro.io.SchemaResolutionException: Can't access branch index 64 for
union with 2 branches / `read_data': Writer's schema and Reader's schema
["string","null"] do not match.
I have a problem record I've written in Avro that crashes anything which
tries to read it :(
Can anyone make sense of these errors?
The exception in Pig/AvroStorage is this:
java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
> at
> org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
> at
> org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80)
> at
> org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273)
> ... 7 more
When reading the record in Python:
File "/me/Collecting-Data/src/python/cat_avro", line 21, in <module>
> for record in df_reader:
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py",
> line 354, in next
> datum = self.datum_reader.read(self.datum_decoder)
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
> line 445, in read
> return self.read_data(self.writers_schema, self.readers_schema,
> decoder)
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
> line 490, in read_data
> return self.read_record(writers_schema, readers_schema, decoder)
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
> line 690, in read_record
> field_val = self.read_data(field.type, readers_field.type, decoder)
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
> line 488, in read_data
> return self.read_union(writers_schema, readers_schema, decoder)
> File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
> line 650, in read_union
> raise SchemaResolutionException(fail_msg, writers_schema,
> readers_schema)
> avro.io.SchemaResolutionException: Can't access branch index 64 for union
> with 2 branches
When reading the record in Ruby:
/Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in
> `read_data': Writer's schema and Reader's schema ["string","null"] do not
> match. (Avro::IO::SchemaMatchException)
--
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
Re: Problem: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64 / avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches / `read_data': Writer's schema and Reader's schema ["string","null"] do not match.
Posted by Russell Jurney <ru...@gmail.com>.
Thanks Scott, looking at the raw data it seems to have been a truncated record due to UTF problems.
Russell Jurney http://datasyndrome.com
On Mar 23, 2012, at 7:59 PM, Scott Carey <sc...@apache.org> wrote:
>
> It appears to be reading a union index and failing in there somehow. If it did not have any of the pig AvroStorage stuff in there I could tell you more.
>
> What does avro-tools.jar 's 'tojson' tool do? (java –jar avro-tools-1.6.3.jar tojson <file> | your_favorite_text_reader)
> What version of Avro is the java stack trace below?
>
>
> On 3/23/12 7:01 PM, "Russell Jurney" <ru...@gmail.com> wrote:
>
> I have a problem record I've written in Avro that crashes anything which tries to read it :(
>
> Can anyone make sense of these errors?
>
> The exception in Pig/AvroStorage is this:
>
> java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
> at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
> at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:67)
> at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
> at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:80)
> at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:273)
> ... 7 more
>
> When reading the record in Python:
>
> File "/me/Collecting-Data/src/python/cat_avro", line 21, in <module>
> for record in df_reader:
> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", line 354, in next
> datum = self.datum_reader.read(self.datum_decoder)
> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 445, in read
> return self.read_data(self.writers_schema, self.readers_schema, decoder)
> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 490, in read_data
> return self.read_record(writers_schema, readers_schema, decoder)
> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 690, in read_record
> field_val = self.read_data(field.type, readers_field.type, decoder)
> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 488, in read_data
> return self.read_union(writers_schema, readers_schema, decoder)
> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 650, in read_union
> raise SchemaResolutionException(fail_msg, writers_schema, readers_schema)
> avro.io.SchemaResolutionException: Can't access branch index 64 for union with 2 branches
>
> When reading the record in Ruby:
>
> /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in `read_data': Writer's schema and Reader's schema ["string","null"] do not match. (Avro::IO::SchemaMatchException)
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
Re: Problem: java.io.IOException:
java.lang.ArrayIndexOutOfBoundsException: 64 /
avro.io.SchemaResolutionException: Can't access branch index 64 for union
with 2 branches / `read_data': Writer's schema and Reader's schema
["string","null"] do not match.
Posted by Scott Carey <sc...@apache.org>.
It appears to be reading a union index and failing in there somehow. If it
did not have any of the pig AvroStorage stuff in there I could tell you
more.
What does avro-tools.jar 's 'tojson' tool do? (java jar
avro-tools-1.6.3.jar tojson <file> | your_favorite_text_reader)
What version of Avro is the java stack trace below?
On 3/23/12 7:01 PM, "Russell Jurney" <ru...@gmail.com> wrote:
> I have a problem record I've written in Avro that crashes anything which tries
> to read it :(
>
> Can anyone make sense of these errors?
>
> The exception in Pig/AvroStorage is this:
>
>> java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
>> at
>> org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27
>> 5)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.
>> nextKeyValue(PigRecordReader.java:187)
>> at
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask
>> .java:532)
>> at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
>> at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>> at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>> at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
>> at
>> org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDa
>> tumReader.java:67)
>> at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>> at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
>> at
>> org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(Pig
>> AvroRecordReader.java:80)
>> at
>> org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:27
>> 3)
>> ... 7 more
>
> When reading the record in Python:
>
>> File "/me/Collecting-Data/src/python/cat_avro", line 21, in <module>
>> for record in df_reader:
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
>> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", line 354, in
>> next
>> datum = self.datum_reader.read(self.datum_decoder)
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
>> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 445, in read
>> return self.read_data(self.writers_schema, self.readers_schema, decoder)
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
>> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 490, in read_data
>> return self.read_record(writers_schema, readers_schema, decoder)
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
>> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 690, in
>> read_record
>> field_val = self.read_data(field.type, readers_field.type, decoder)
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
>> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 488, in read_data
>> return self.read_union(writers_schema, readers_schema, decoder)
>> File
>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/si
>> te-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 650, in
>> read_union
>> raise SchemaResolutionException(fail_msg, writers_schema, readers_schema)
>> avro.io.SchemaResolutionException: Can't access branch index 64 for union
>> with 2 branches
>
> When reading the record in Ruby:
>
>> /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298:in
>> `read_data': Writer's schema and Reader's schema ["string","null"] do not
>> match. (Avro::IO::SchemaMatchException)
>
> --
> Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney>
> russell.jurney@gmail.com <ma...@gmail.com> datasyndrome.com
> <http://datasyndrome.com/>