You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Felix Xu <yg...@gmail.com> on 2011/03/30 17:49:18 UTC

A strange problem when I am trying to read avro record with a subset of the schema.

Hi, all. When I am trying to read avro file with a subset of that
schema(because I do not need all the details).I meet a strange problem.
1.I write data using this schema:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "device",
"type": "string"
},
{
"name": "children",
"type": {
"type": "array",
"items": "string"
}
}
]
}
}
            }
        }
    ]
}

2.Here is a JSONObject for that schema.
{
"timestamp":1234567890,
"type":{
"WMA":[
{
"device":"WMA1",
"children":["WMB1","WMB2"]
},
{
"device":"WMA2",
"children":["WMB1","WMB2"]
}
]
}

}

3.I write that record succefully.And it is okay if I use this schema for
reading:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "children",
"type": {
"type": "array",
"items": "string"
}
}
]
}
}
            }
        }
    ]
}

the result is :
{
"timestamp":1234567890,
"type":{
"WMA":[
{
"children":["WMB1","WMB2"]
},
{
"children":["WMB1","WMB2"]
}
]
}

}

4.But if i want to igonre the "children" part instead of "device",  I use
this schema for reading:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "device",
"type": "string"
}
]
}
}
            }
        }
    ]
}

Unfortunately,I get exception:

java.lang.ArrayIndexOutOfBoundsException: -8
cause:java.lang.ArrayIndexOutOfBoundsException
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:122)
at org.apache.avro.io.BinaryDecoder.skipString(BinaryDecoder.java:262)
at
org.apache.avro.io.ValidatingDecoder.skipString(ValidatingDecoder.java:113)
at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:60)
at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
at org.apache.avro.io.parsing.SkipParser.skipRepeater(SkipParser.java:83)
at
org.apache.avro.io.ValidatingDecoder.skipArray(ValidatingDecoder.java:195)
at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:70)
at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
at org.apache.avro.io.parsing.SkipParser.skipSymbol(SkipParser.java:93)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:226)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:162)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at
org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:196)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)
at
org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:233)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:167)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:223)
at AvroUtilTest.read(AvroUtilTest.java:77)
at AvroUtilTest.main(AvroUtilTest.java:61)

Re: A strange problem when I am trying to read avro record with a subset of the schema.

Posted by Felix Xu <yg...@gmail.com>.
Yes...I submitted that.

2011/4/1 Scott Carey <sc...@richrelevance.com>

> FYI for those not on the avro-dev mailing list, there is a related JIRA
> now:
> https://issues.apache.org/jira/browse/AVRO-793
>
> On 3/30/11 7:45 PM, "Felix Xu" <yg...@gmail.com> wrote:
>
> Okay,is there any jira topic related to this problem?
> My avro version is 1.5.0.
>
> 2011/3/31 Scott Carey <sc...@richrelevance.com>
>
>> There was a bug at some point in schema resolution where dropping the last
>> field of a record caused a problem.  Its possible that either:
>>
>> You are using a version where this isn't fixed.
>> Or
>> The fix did not work for array types
>>
>> On 3/30/11 7:17 PM, "Felix Xu" <yg...@gmail.com> wrote:
>>
>> Wow,it's amazing.
>> I did #2 and it worked.
>> What's the problem?How to fix it?
>>
>> 2011/3/31 Scott Carey <sc...@richrelevance.com>
>>
>>> 1: What version of Avro is this?
>>> 2:  If you change the schema you write with by making reversing the order
>>> of the fields of "sdf" (array, then string), are the results the same?
>>>
>>> This looks like a bug, file a JIRA ticket and if you have a reproducible
>>> test case or code snippet that reproduces, attach that to the ticket.
>>>
>>> Thanks!
>>>
>>> -Scott
>>>
>>> On 3/30/11 8:49 AM, "Felix Xu" <yg...@gmail.com> wrote:
>>>
>>> Hi, all. When I am trying to read avro file with a subset of that
>>> schema(because I do not need all the details).I meet a strange problem.
>>> 1.I write data using this schema:
>>> {
>>>     "name": "relation",
>>>     "type": "record",
>>>     "fields": [
>>>         {
>>>             "name": "timestamp",
>>>             "type": "long"
>>>         },
>>>         {
>>>             "name": "type",
>>>             "type": {
>>>                 "type": "map",
>>>                 "values":{
>>>                     "type" : "array",
>>> "items": {
>>> "type":"record",
>>> "name":"sdf",
>>> "fields": [
>>> {
>>> "name": "device",
>>> "type": "string"
>>> },
>>> {
>>> "name": "children",
>>> "type": {
>>> "type": "array",
>>> "items": "string"
>>> }
>>> }
>>> ]
>>> }
>>> }
>>>             }
>>>         }
>>>     ]
>>> }
>>>
>>> 2.Here is a JSONObject for that schema.
>>> {
>>> "timestamp":1234567890,
>>> "type":{
>>> "WMA":[
>>> {
>>> "device":"WMA1",
>>> "children":["WMB1","WMB2"]
>>> },
>>> {
>>> "device":"WMA2",
>>> "children":["WMB1","WMB2"]
>>> }
>>> ]
>>> }
>>>
>>> }
>>>
>>> 3.I write that record succefully.And it is okay if I use this schema for
>>> reading:
>>> {
>>>     "name": "relation",
>>>     "type": "record",
>>>     "fields": [
>>>         {
>>>             "name": "timestamp",
>>>             "type": "long"
>>>         },
>>>         {
>>>             "name": "type",
>>>             "type": {
>>>                 "type": "map",
>>>                 "values":{
>>>                     "type" : "array",
>>> "items": {
>>> "type":"record",
>>> "name":"sdf",
>>> "fields": [
>>> {
>>> "name": "children",
>>> "type": {
>>> "type": "array",
>>> "items": "string"
>>> }
>>> }
>>> ]
>>> }
>>> }
>>>             }
>>>         }
>>>     ]
>>> }
>>>
>>> the result is :
>>> {
>>> "timestamp":1234567890,
>>> "type":{
>>> "WMA":[
>>> {
>>> "children":["WMB1","WMB2"]
>>> },
>>> {
>>> "children":["WMB1","WMB2"]
>>> }
>>> ]
>>> }
>>>
>>> }
>>>
>>> 4.But if i want to igonre the "children" part instead of "device",  I use
>>> this schema for reading:
>>> {
>>>     "name": "relation",
>>>     "type": "record",
>>>     "fields": [
>>>         {
>>>             "name": "timestamp",
>>>             "type": "long"
>>>         },
>>>         {
>>>             "name": "type",
>>>             "type": {
>>>                 "type": "map",
>>>                 "values":{
>>>                     "type" : "array",
>>> "items": {
>>> "type":"record",
>>> "name":"sdf",
>>> "fields": [
>>> {
>>> "name": "device",
>>> "type": "string"
>>> }
>>> ]
>>> }
>>> }
>>>             }
>>>         }
>>>     ]
>>> }
>>>
>>> Unfortunately,I get exception:
>>>
>>> java.lang.ArrayIndexOutOfBoundsException: -8
>>> cause:java.lang.ArrayIndexOutOfBoundsException
>>> at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:122)
>>> at org.apache.avro.io.BinaryDecoder.skipString(BinaryDecoder.java:262)
>>> at
>>> org.apache.avro.io.ValidatingDecoder.skipString(ValidatingDecoder.java:113)
>>> at
>>> org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:60)
>>> at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
>>> at org.apache.avro.io.parsing.SkipParser.skipRepeater(SkipParser.java:83)
>>> at
>>> org.apache.avro.io.ValidatingDecoder.skipArray(ValidatingDecoder.java:195)
>>> at
>>> org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:70)
>>> at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
>>> at org.apache.avro.io.parsing.SkipParser.skipSymbol(SkipParser.java:93)
>>> at
>>> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:226)
>>> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>> at
>>> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:162)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:196)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:233)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:167)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>>> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
>>> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:223)
>>> at AvroUtilTest.read(AvroUtilTest.java:77)
>>> at AvroUtilTest.main(AvroUtilTest.java:61)
>>>
>>>
>>
>

Re: A strange problem when I am trying to read avro record with a subset of the schema.

Posted by Scott Carey <sc...@richrelevance.com>.
FYI for those not on the avro-dev mailing list, there is a related JIRA now:
https://issues.apache.org/jira/browse/AVRO-793

On 3/30/11 7:45 PM, "Felix Xu" <yg...@gmail.com>> wrote:

Okay,is there any jira topic related to this problem?
My avro version is 1.5.0.

2011/3/31 Scott Carey <sc...@richrelevance.com>>
There was a bug at some point in schema resolution where dropping the last field of a record caused a problem.  Its possible that either:

You are using a version where this isn't fixed.
Or
The fix did not work for array types

On 3/30/11 7:17 PM, "Felix Xu" <yg...@gmail.com>> wrote:

Wow,it's amazing.
I did #2 and it worked.
What's the problem?How to fix it?

2011/3/31 Scott Carey <sc...@richrelevance.com>>
1: What version of Avro is this?
2:  If you change the schema you write with by making reversing the order of the fields of "sdf" (array, then string), are the results the same?

This looks like a bug, file a JIRA ticket and if you have a reproducible test case or code snippet that reproduces, attach that to the ticket.

Thanks!

-Scott

On 3/30/11 8:49 AM, "Felix Xu" <yg...@gmail.com>> wrote:

Hi, all. When I am trying to read avro file with a subset of that schema(because I do not need all the details).I meet a strange problem.
1.I write data using this schema:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "device",
"type": "string"
},
{
"name": "children",
"type": {
"type": "array",
"items": "string"
}
}
]
}
}
            }
        }
    ]
}

2.Here is a JSONObject for that schema.
{
"timestamp":1234567890,
"type":{
"WMA":[
{
"device":"WMA1",
"children":["WMB1","WMB2"]
},
{
"device":"WMA2",
"children":["WMB1","WMB2"]
}
]
}

}

3.I write that record succefully.And it is okay if I use this schema for reading:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "children",
"type": {
"type": "array",
"items": "string"
}
}
]
}
}
            }
        }
    ]
}

the result is :
{
"timestamp":1234567890,
"type":{
"WMA":[
{
"children":["WMB1","WMB2"]
},
{
"children":["WMB1","WMB2"]
}
]
}

}

4.But if i want to igonre the "children" part instead of "device",  I use this schema for reading:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "device",
"type": "string"
}
]
}
}
            }
        }
    ]
}

Unfortunately,I get exception:

java.lang.ArrayIndexOutOfBoundsException: -8
cause:java.lang.ArrayIndexOutOfBoundsException
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:122)
at org.apache.avro.io.BinaryDecoder.skipString(BinaryDecoder.java:262)
at org.apache.avro.io.ValidatingDecoder.skipString(ValidatingDecoder.java:113)
at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:60)
at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
at org.apache.avro.io.parsing.SkipParser.skipRepeater(SkipParser.java:83)
at org.apache.avro.io.ValidatingDecoder.skipArray(ValidatingDecoder.java:195)
at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:70)
at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
at org.apache.avro.io.parsing.SkipParser.skipSymbol(SkipParser.java:93)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:226)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:162)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:196)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)
at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:233)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:167)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:223)
at AvroUtilTest.read(AvroUtilTest.java:77)
at AvroUtilTest.main(AvroUtilTest.java:61)




Re: A strange problem when I am trying to read avro record with a subset of the schema.

Posted by Felix Xu <yg...@gmail.com>.
Okay,is there any jira topic related to this problem?
My avro version is 1.5.0.

2011/3/31 Scott Carey <sc...@richrelevance.com>

> There was a bug at some point in schema resolution where dropping the last
> field of a record caused a problem.  Its possible that either:
>
> You are using a version where this isn't fixed.
> Or
> The fix did not work for array types
>
> On 3/30/11 7:17 PM, "Felix Xu" <yg...@gmail.com> wrote:
>
> Wow,it's amazing.
> I did #2 and it worked.
> What's the problem?How to fix it?
>
> 2011/3/31 Scott Carey <sc...@richrelevance.com>
>
>> 1: What version of Avro is this?
>> 2:  If you change the schema you write with by making reversing the order
>> of the fields of "sdf" (array, then string), are the results the same?
>>
>> This looks like a bug, file a JIRA ticket and if you have a reproducible
>> test case or code snippet that reproduces, attach that to the ticket.
>>
>> Thanks!
>>
>> -Scott
>>
>> On 3/30/11 8:49 AM, "Felix Xu" <yg...@gmail.com> wrote:
>>
>> Hi, all. When I am trying to read avro file with a subset of that
>> schema(because I do not need all the details).I meet a strange problem.
>> 1.I write data using this schema:
>> {
>>     "name": "relation",
>>     "type": "record",
>>     "fields": [
>>         {
>>             "name": "timestamp",
>>             "type": "long"
>>         },
>>         {
>>             "name": "type",
>>             "type": {
>>                 "type": "map",
>>                 "values":{
>>                     "type" : "array",
>> "items": {
>> "type":"record",
>> "name":"sdf",
>> "fields": [
>> {
>> "name": "device",
>> "type": "string"
>> },
>> {
>> "name": "children",
>> "type": {
>> "type": "array",
>> "items": "string"
>> }
>> }
>> ]
>> }
>> }
>>             }
>>         }
>>     ]
>> }
>>
>> 2.Here is a JSONObject for that schema.
>> {
>> "timestamp":1234567890,
>> "type":{
>> "WMA":[
>> {
>> "device":"WMA1",
>> "children":["WMB1","WMB2"]
>> },
>> {
>> "device":"WMA2",
>> "children":["WMB1","WMB2"]
>> }
>> ]
>> }
>>
>> }
>>
>> 3.I write that record succefully.And it is okay if I use this schema for
>> reading:
>> {
>>     "name": "relation",
>>     "type": "record",
>>     "fields": [
>>         {
>>             "name": "timestamp",
>>             "type": "long"
>>         },
>>         {
>>             "name": "type",
>>             "type": {
>>                 "type": "map",
>>                 "values":{
>>                     "type" : "array",
>> "items": {
>> "type":"record",
>> "name":"sdf",
>> "fields": [
>> {
>> "name": "children",
>> "type": {
>> "type": "array",
>> "items": "string"
>> }
>> }
>> ]
>> }
>> }
>>             }
>>         }
>>     ]
>> }
>>
>> the result is :
>> {
>> "timestamp":1234567890,
>> "type":{
>> "WMA":[
>> {
>> "children":["WMB1","WMB2"]
>> },
>> {
>> "children":["WMB1","WMB2"]
>> }
>> ]
>> }
>>
>> }
>>
>> 4.But if i want to igonre the "children" part instead of "device",  I use
>> this schema for reading:
>> {
>>     "name": "relation",
>>     "type": "record",
>>     "fields": [
>>         {
>>             "name": "timestamp",
>>             "type": "long"
>>         },
>>         {
>>             "name": "type",
>>             "type": {
>>                 "type": "map",
>>                 "values":{
>>                     "type" : "array",
>> "items": {
>> "type":"record",
>> "name":"sdf",
>> "fields": [
>> {
>> "name": "device",
>> "type": "string"
>> }
>> ]
>> }
>> }
>>             }
>>         }
>>     ]
>> }
>>
>> Unfortunately,I get exception:
>>
>> java.lang.ArrayIndexOutOfBoundsException: -8
>> cause:java.lang.ArrayIndexOutOfBoundsException
>> at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:122)
>> at org.apache.avro.io.BinaryDecoder.skipString(BinaryDecoder.java:262)
>> at
>> org.apache.avro.io.ValidatingDecoder.skipString(ValidatingDecoder.java:113)
>> at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:60)
>> at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
>> at org.apache.avro.io.parsing.SkipParser.skipRepeater(SkipParser.java:83)
>> at
>> org.apache.avro.io.ValidatingDecoder.skipArray(ValidatingDecoder.java:195)
>> at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:70)
>> at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
>> at org.apache.avro.io.parsing.SkipParser.skipSymbol(SkipParser.java:93)
>> at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:226)
>> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>> at
>> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
>> at
>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:162)
>> at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>> at
>> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:196)
>> at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)
>> at
>> org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:233)
>> at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
>> at
>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:167)
>> at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>> at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
>> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:223)
>> at AvroUtilTest.read(AvroUtilTest.java:77)
>> at AvroUtilTest.main(AvroUtilTest.java:61)
>>
>>
>

Re: A strange problem when I am trying to read avro record with a subset of the schema.

Posted by Scott Carey <sc...@richrelevance.com>.
There was a bug at some point in schema resolution where dropping the last field of a record caused a problem.  Its possible that either:

You are using a version where this isn't fixed.
Or
The fix did not work for array types

On 3/30/11 7:17 PM, "Felix Xu" <yg...@gmail.com>> wrote:

Wow,it's amazing.
I did #2 and it worked.
What's the problem?How to fix it?

2011/3/31 Scott Carey <sc...@richrelevance.com>>
1: What version of Avro is this?
2:  If you change the schema you write with by making reversing the order of the fields of "sdf" (array, then string), are the results the same?

This looks like a bug, file a JIRA ticket and if you have a reproducible test case or code snippet that reproduces, attach that to the ticket.

Thanks!

-Scott

On 3/30/11 8:49 AM, "Felix Xu" <yg...@gmail.com>> wrote:

Hi, all. When I am trying to read avro file with a subset of that schema(because I do not need all the details).I meet a strange problem.
1.I write data using this schema:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "device",
"type": "string"
},
{
"name": "children",
"type": {
"type": "array",
"items": "string"
}
}
]
}
}
            }
        }
    ]
}

2.Here is a JSONObject for that schema.
{
"timestamp":1234567890,
"type":{
"WMA":[
{
"device":"WMA1",
"children":["WMB1","WMB2"]
},
{
"device":"WMA2",
"children":["WMB1","WMB2"]
}
]
}

}

3.I write that record succefully.And it is okay if I use this schema for reading:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "children",
"type": {
"type": "array",
"items": "string"
}
}
]
}
}
            }
        }
    ]
}

the result is :
{
"timestamp":1234567890,
"type":{
"WMA":[
{
"children":["WMB1","WMB2"]
},
{
"children":["WMB1","WMB2"]
}
]
}

}

4.But if i want to igonre the "children" part instead of "device",  I use this schema for reading:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "device",
"type": "string"
}
]
}
}
            }
        }
    ]
}

Unfortunately,I get exception:

java.lang.ArrayIndexOutOfBoundsException: -8
cause:java.lang.ArrayIndexOutOfBoundsException
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:122)
at org.apache.avro.io.BinaryDecoder.skipString(BinaryDecoder.java:262)
at org.apache.avro.io.ValidatingDecoder.skipString(ValidatingDecoder.java:113)
at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:60)
at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
at org.apache.avro.io.parsing.SkipParser.skipRepeater(SkipParser.java:83)
at org.apache.avro.io.ValidatingDecoder.skipArray(ValidatingDecoder.java:195)
at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:70)
at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
at org.apache.avro.io.parsing.SkipParser.skipSymbol(SkipParser.java:93)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:226)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:162)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:196)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)
at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:233)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:167)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:223)
at AvroUtilTest.read(AvroUtilTest.java:77)
at AvroUtilTest.main(AvroUtilTest.java:61)



Re: A strange problem when I am trying to read avro record with a subset of the schema.

Posted by Felix Xu <yg...@gmail.com>.
Wow,it's amazing.
I did #2 and it worked.
What's the problem?How to fix it?

2011/3/31 Scott Carey <sc...@richrelevance.com>

> 1: What version of Avro is this?
> 2:  If you change the schema you write with by making reversing the order
> of the fields of "sdf" (array, then string), are the results the same?
>
> This looks like a bug, file a JIRA ticket and if you have a reproducible
> test case or code snippet that reproduces, attach that to the ticket.
>
> Thanks!
>
> -Scott
>
> On 3/30/11 8:49 AM, "Felix Xu" <yg...@gmail.com> wrote:
>
> Hi, all. When I am trying to read avro file with a subset of that
> schema(because I do not need all the details).I meet a strange problem.
> 1.I write data using this schema:
> {
>     "name": "relation",
>     "type": "record",
>     "fields": [
>         {
>             "name": "timestamp",
>             "type": "long"
>         },
>         {
>             "name": "type",
>             "type": {
>                 "type": "map",
>                 "values":{
>                     "type" : "array",
> "items": {
> "type":"record",
> "name":"sdf",
> "fields": [
> {
> "name": "device",
> "type": "string"
> },
> {
> "name": "children",
> "type": {
> "type": "array",
> "items": "string"
> }
> }
> ]
> }
> }
>             }
>         }
>     ]
> }
>
> 2.Here is a JSONObject for that schema.
> {
> "timestamp":1234567890,
> "type":{
> "WMA":[
> {
> "device":"WMA1",
> "children":["WMB1","WMB2"]
> },
> {
> "device":"WMA2",
> "children":["WMB1","WMB2"]
> }
> ]
> }
>
> }
>
> 3.I write that record succefully.And it is okay if I use this schema for
> reading:
> {
>     "name": "relation",
>     "type": "record",
>     "fields": [
>         {
>             "name": "timestamp",
>             "type": "long"
>         },
>         {
>             "name": "type",
>             "type": {
>                 "type": "map",
>                 "values":{
>                     "type" : "array",
> "items": {
> "type":"record",
> "name":"sdf",
> "fields": [
> {
> "name": "children",
> "type": {
> "type": "array",
> "items": "string"
> }
> }
> ]
> }
> }
>             }
>         }
>     ]
> }
>
> the result is :
> {
> "timestamp":1234567890,
> "type":{
> "WMA":[
> {
> "children":["WMB1","WMB2"]
> },
> {
> "children":["WMB1","WMB2"]
> }
> ]
> }
>
> }
>
> 4.But if i want to igonre the "children" part instead of "device",  I use
> this schema for reading:
> {
>     "name": "relation",
>     "type": "record",
>     "fields": [
>         {
>             "name": "timestamp",
>             "type": "long"
>         },
>         {
>             "name": "type",
>             "type": {
>                 "type": "map",
>                 "values":{
>                     "type" : "array",
> "items": {
> "type":"record",
> "name":"sdf",
> "fields": [
> {
> "name": "device",
> "type": "string"
> }
> ]
> }
> }
>             }
>         }
>     ]
> }
>
> Unfortunately,I get exception:
>
> java.lang.ArrayIndexOutOfBoundsException: -8
> cause:java.lang.ArrayIndexOutOfBoundsException
> at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:122)
> at org.apache.avro.io.BinaryDecoder.skipString(BinaryDecoder.java:262)
> at
> org.apache.avro.io.ValidatingDecoder.skipString(ValidatingDecoder.java:113)
> at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:60)
> at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
> at org.apache.avro.io.parsing.SkipParser.skipRepeater(SkipParser.java:83)
> at
> org.apache.avro.io.ValidatingDecoder.skipArray(ValidatingDecoder.java:195)
> at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:70)
> at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
> at org.apache.avro.io.parsing.SkipParser.skipSymbol(SkipParser.java:93)
> at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:226)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:162)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:196)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)
> at
> org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:233)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:167)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:223)
> at AvroUtilTest.read(AvroUtilTest.java:77)
> at AvroUtilTest.main(AvroUtilTest.java:61)
>
>

Re: A strange problem when I am trying to read avro record with a subset of the schema.

Posted by Scott Carey <sc...@richrelevance.com>.
1: What version of Avro is this?
2:  If you change the schema you write with by making reversing the order of the fields of "sdf" (array, then string), are the results the same?

This looks like a bug, file a JIRA ticket and if you have a reproducible test case or code snippet that reproduces, attach that to the ticket.

Thanks!

-Scott

On 3/30/11 8:49 AM, "Felix Xu" <yg...@gmail.com>> wrote:

Hi, all. When I am trying to read avro file with a subset of that schema(because I do not need all the details).I meet a strange problem.
1.I write data using this schema:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "device",
"type": "string"
},
{
"name": "children",
"type": {
"type": "array",
"items": "string"
}
}
]
}
}
            }
        }
    ]
}

2.Here is a JSONObject for that schema.
{
"timestamp":1234567890,
"type":{
"WMA":[
{
"device":"WMA1",
"children":["WMB1","WMB2"]
},
{
"device":"WMA2",
"children":["WMB1","WMB2"]
}
]
}

}

3.I write that record succefully.And it is okay if I use this schema for reading:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "children",
"type": {
"type": "array",
"items": "string"
}
}
]
}
}
            }
        }
    ]
}

the result is :
{
"timestamp":1234567890,
"type":{
"WMA":[
{
"children":["WMB1","WMB2"]
},
{
"children":["WMB1","WMB2"]
}
]
}

}

4.But if i want to igonre the "children" part instead of "device",  I use this schema for reading:
{
    "name": "relation",
    "type": "record",
    "fields": [
        {
            "name": "timestamp",
            "type": "long"
        },
        {
            "name": "type",
            "type": {
                "type": "map",
                "values":{
                    "type" : "array",
"items": {
"type":"record",
"name":"sdf",
"fields": [
{
"name": "device",
"type": "string"
}
]
}
}
            }
        }
    ]
}

Unfortunately,I get exception:

java.lang.ArrayIndexOutOfBoundsException: -8
cause:java.lang.ArrayIndexOutOfBoundsException
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:122)
at org.apache.avro.io.BinaryDecoder.skipString(BinaryDecoder.java:262)
at org.apache.avro.io.ValidatingDecoder.skipString(ValidatingDecoder.java:113)
at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:60)
at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
at org.apache.avro.io.parsing.SkipParser.skipRepeater(SkipParser.java:83)
at org.apache.avro.io.ValidatingDecoder.skipArray(ValidatingDecoder.java:195)
at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:70)
at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
at org.apache.avro.io.parsing.SkipParser.skipSymbol(SkipParser.java:93)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:226)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:162)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:196)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)
at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:233)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:167)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:223)
at AvroUtilTest.read(AvroUtilTest.java:77)
at AvroUtilTest.main(AvroUtilTest.java:61)