You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Jonathan Coveney <jc...@gmail.com> on 2013/04/04 18:21:43 UTC

Do the values in the json object have to be ordered?

I think an example is most useful:

https://gist.github.com/jcoveney/5311795

I realize that the python implementation isn't as strict as the Java
implementation, though this result is a bit surprising.

Basically, is it the case that the Java generic writer expects that the
Json object's keys will be in the same order as the fields? This is what
the gist is trying to show. I have a simple record definition, and then two
identical json objects that match that definition, except for the order.

In python this works, which you'd expect, but in Java it does not. I get
the following:

First successful!
Exception in thread "main" java.lang.RuntimeException:
org.apache.avro.AvroTypeException: Expected field name first got second
    at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:43)
Caused by: org.apache.avro.AvroTypeException: Expected field name first got
second
    at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:437)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:121)
    at org.apache.avro.io.JsonDecoder.readInt(JsonDecoder.java:148)
    at
org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
    at
org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:341)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:146)
    at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
    at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:38)

Am I doing something dumb wrong? Per the JSON spec, objects are unordered
so it seems very problematic that it is expecting it to be ordered.

Thank you,
Jon

Re: Do the values in the json object have to be ordered?

Posted by Jonathan Coveney <jc...@gmail.com>.
Looks like an issue in my version of avro:
https://issues.apache.org/jira/browse/AVRO-895?attachmentSortBy=dateTime

We're using 1.5.4....I guess it's time for an upgrade. Does anyone know if
there are any backwards compatibility issues between those version?


2013/4/4 Jonathan Coveney <jc...@gmail.com>

> Yeah, I'd love to have Doug's thoughts.
>
> Short of a bug fix, to work around I guess I can provide my own decoder?
> That seems like a bit of work though. I guess I could also make a builder
> for my schemas, and then traverse the json map and build it up? I guess
> making my own decoder would be less work than that.
>
> Would appreciate any thoughts on a good workaround, or if I should just
> try to patch it (assuming it is a bug) and backport the fix (something
> which I would like to avoid, but will do if I have to).
>
>
> 2013/4/4 Philip Zeyliger <ph...@cloudera.com>
>
>> It smells like a bug to me.  Doug typically has more insight here about
>> the Java implementation.  I'm mainly a user of the Specific* hierarchy and
>> not the Generic one.
>>
>> -- Philip
>>
>>
>> On Thu, Apr 4, 2013 at 10:28 AM, Jonathan Coveney <jc...@gmail.com>wrote:
>>
>>> Should I consider this a bug and fix it? I'm very surprised nobody has
>>> run into this before. Or is this considered "correct" by Avro, and it just
>>> happens that Avro violates the JSON spec? IMHO I'd go with the former, but
>>> I'd love input from the powers at be.
>>>
>>>
>>> 2013/4/4 Francis Galiegue <fg...@gmail.com>
>>>
>>>> On Thu, Apr 4, 2013 at 6:21 PM, Jonathan Coveney <jc...@gmail.com>
>>>> wrote:
>>>> > I think an example is most useful:
>>>> >
>>>> > https://gist.github.com/jcoveney/5311795
>>>> >
>>>> > I realize that the python implementation isn't as strict as the Java
>>>> > implementation, though this result is a bit surprising.
>>>> >
>>>> > Basically, is it the case that the Java generic writer expects that
>>>> the Json
>>>> > object's keys will be in the same order as the fields? This is what
>>>> the gist
>>>> > is trying to show. I have a simple record definition, and then two
>>>> identical
>>>> > json objects that match that definition, except for the order.
>>>> >
>>>> > In python this works, which you'd expect, but in Java it does not. I
>>>> get the
>>>> > following:
>>>> >
>>>> > First successful!
>>>> > Exception in thread "main" java.lang.RuntimeException:
>>>> > org.apache.avro.AvroTypeException: Expected field name first got
>>>> second
>>>> >     at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:43)
>>>> > Caused by: org.apache.avro.AvroTypeException: Expected field name
>>>> first got
>>>> > second
>>>> >     at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:437)
>>>> >     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>> >     at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:121)
>>>> >     at org.apache.avro.io.JsonDecoder.readInt(JsonDecoder.java:148)
>>>> >     at
>>>> >
>>>> org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
>>>> >     at
>>>> >
>>>> org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:341)
>>>> >     at
>>>> >
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:146)
>>>> >     at
>>>> >
>>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
>>>> >     at
>>>> >
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>>>> >     at
>>>> >
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>>>> >     at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:38)
>>>> >
>>>> > Am I doing something dumb wrong? Per the JSON spec, objects are
>>>> unordered so
>>>> > it seems very problematic that it is expecting it to be ordered.
>>>> >
>>>> > Thank you,
>>>> > Jon
>>>>
>>>> Indeed, this contradicts the JSON spec. Order does not matter in JSON.
>>>>
>>>> Jackson however deserializes JSON with a LinkedHashMap by default. I
>>>> suppose Avro takes advantage of this, but it still contradicts the
>>>> spec.
>>>>
>>>> --
>>>> Francis Galiegue, fgaliegue@gmail.com
>>>> JSON Schema in Java: http://json-schema-validator.herokuapp.com
>>>>
>>>
>>>
>>
>

Re: Do the values in the json object have to be ordered?

Posted by Jonathan Coveney <jc...@gmail.com>.
Yeah, I'd love to have Doug's thoughts.

Short of a bug fix, to work around I guess I can provide my own decoder?
That seems like a bit of work though. I guess I could also make a builder
for my schemas, and then traverse the json map and build it up? I guess
making my own decoder would be less work than that.

Would appreciate any thoughts on a good workaround, or if I should just try
to patch it (assuming it is a bug) and backport the fix (something which I
would like to avoid, but will do if I have to).


2013/4/4 Philip Zeyliger <ph...@cloudera.com>

> It smells like a bug to me.  Doug typically has more insight here about
> the Java implementation.  I'm mainly a user of the Specific* hierarchy and
> not the Generic one.
>
> -- Philip
>
>
> On Thu, Apr 4, 2013 at 10:28 AM, Jonathan Coveney <jc...@gmail.com>wrote:
>
>> Should I consider this a bug and fix it? I'm very surprised nobody has
>> run into this before. Or is this considered "correct" by Avro, and it just
>> happens that Avro violates the JSON spec? IMHO I'd go with the former, but
>> I'd love input from the powers at be.
>>
>>
>> 2013/4/4 Francis Galiegue <fg...@gmail.com>
>>
>>> On Thu, Apr 4, 2013 at 6:21 PM, Jonathan Coveney <jc...@gmail.com>
>>> wrote:
>>> > I think an example is most useful:
>>> >
>>> > https://gist.github.com/jcoveney/5311795
>>> >
>>> > I realize that the python implementation isn't as strict as the Java
>>> > implementation, though this result is a bit surprising.
>>> >
>>> > Basically, is it the case that the Java generic writer expects that
>>> the Json
>>> > object's keys will be in the same order as the fields? This is what
>>> the gist
>>> > is trying to show. I have a simple record definition, and then two
>>> identical
>>> > json objects that match that definition, except for the order.
>>> >
>>> > In python this works, which you'd expect, but in Java it does not. I
>>> get the
>>> > following:
>>> >
>>> > First successful!
>>> > Exception in thread "main" java.lang.RuntimeException:
>>> > org.apache.avro.AvroTypeException: Expected field name first got second
>>> >     at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:43)
>>> > Caused by: org.apache.avro.AvroTypeException: Expected field name
>>> first got
>>> > second
>>> >     at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:437)
>>> >     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>> >     at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:121)
>>> >     at org.apache.avro.io.JsonDecoder.readInt(JsonDecoder.java:148)
>>> >     at
>>> > org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
>>> >     at
>>> >
>>> org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:341)
>>> >     at
>>> >
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:146)
>>> >     at
>>> >
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
>>> >     at
>>> >
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>>> >     at
>>> >
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>>> >     at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:38)
>>> >
>>> > Am I doing something dumb wrong? Per the JSON spec, objects are
>>> unordered so
>>> > it seems very problematic that it is expecting it to be ordered.
>>> >
>>> > Thank you,
>>> > Jon
>>>
>>> Indeed, this contradicts the JSON spec. Order does not matter in JSON.
>>>
>>> Jackson however deserializes JSON with a LinkedHashMap by default. I
>>> suppose Avro takes advantage of this, but it still contradicts the
>>> spec.
>>>
>>> --
>>> Francis Galiegue, fgaliegue@gmail.com
>>> JSON Schema in Java: http://json-schema-validator.herokuapp.com
>>>
>>
>>
>

Re: Do the values in the json object have to be ordered?

Posted by Philip Zeyliger <ph...@cloudera.com>.
It smells like a bug to me.  Doug typically has more insight here about the
Java implementation.  I'm mainly a user of the Specific* hierarchy and not
the Generic one.

-- Philip


On Thu, Apr 4, 2013 at 10:28 AM, Jonathan Coveney <jc...@gmail.com>wrote:

> Should I consider this a bug and fix it? I'm very surprised nobody has run
> into this before. Or is this considered "correct" by Avro, and it just
> happens that Avro violates the JSON spec? IMHO I'd go with the former, but
> I'd love input from the powers at be.
>
>
> 2013/4/4 Francis Galiegue <fg...@gmail.com>
>
>> On Thu, Apr 4, 2013 at 6:21 PM, Jonathan Coveney <jc...@gmail.com>
>> wrote:
>> > I think an example is most useful:
>> >
>> > https://gist.github.com/jcoveney/5311795
>> >
>> > I realize that the python implementation isn't as strict as the Java
>> > implementation, though this result is a bit surprising.
>> >
>> > Basically, is it the case that the Java generic writer expects that the
>> Json
>> > object's keys will be in the same order as the fields? This is what the
>> gist
>> > is trying to show. I have a simple record definition, and then two
>> identical
>> > json objects that match that definition, except for the order.
>> >
>> > In python this works, which you'd expect, but in Java it does not. I
>> get the
>> > following:
>> >
>> > First successful!
>> > Exception in thread "main" java.lang.RuntimeException:
>> > org.apache.avro.AvroTypeException: Expected field name first got second
>> >     at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:43)
>> > Caused by: org.apache.avro.AvroTypeException: Expected field name first
>> got
>> > second
>> >     at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:437)
>> >     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>> >     at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:121)
>> >     at org.apache.avro.io.JsonDecoder.readInt(JsonDecoder.java:148)
>> >     at
>> > org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
>> >     at
>> >
>> org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:341)
>> >     at
>> >
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:146)
>> >     at
>> >
>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
>> >     at
>> >
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>> >     at
>> >
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>> >     at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:38)
>> >
>> > Am I doing something dumb wrong? Per the JSON spec, objects are
>> unordered so
>> > it seems very problematic that it is expecting it to be ordered.
>> >
>> > Thank you,
>> > Jon
>>
>> Indeed, this contradicts the JSON spec. Order does not matter in JSON.
>>
>> Jackson however deserializes JSON with a LinkedHashMap by default. I
>> suppose Avro takes advantage of this, but it still contradicts the
>> spec.
>>
>> --
>> Francis Galiegue, fgaliegue@gmail.com
>> JSON Schema in Java: http://json-schema-validator.herokuapp.com
>>
>
>

Re: Do the values in the json object have to be ordered?

Posted by Jonathan Coveney <jc...@gmail.com>.
Should I consider this a bug and fix it? I'm very surprised nobody has run
into this before. Or is this considered "correct" by Avro, and it just
happens that Avro violates the JSON spec? IMHO I'd go with the former, but
I'd love input from the powers at be.


2013/4/4 Francis Galiegue <fg...@gmail.com>

> On Thu, Apr 4, 2013 at 6:21 PM, Jonathan Coveney <jc...@gmail.com>
> wrote:
> > I think an example is most useful:
> >
> > https://gist.github.com/jcoveney/5311795
> >
> > I realize that the python implementation isn't as strict as the Java
> > implementation, though this result is a bit surprising.
> >
> > Basically, is it the case that the Java generic writer expects that the
> Json
> > object's keys will be in the same order as the fields? This is what the
> gist
> > is trying to show. I have a simple record definition, and then two
> identical
> > json objects that match that definition, except for the order.
> >
> > In python this works, which you'd expect, but in Java it does not. I get
> the
> > following:
> >
> > First successful!
> > Exception in thread "main" java.lang.RuntimeException:
> > org.apache.avro.AvroTypeException: Expected field name first got second
> >     at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:43)
> > Caused by: org.apache.avro.AvroTypeException: Expected field name first
> got
> > second
> >     at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:437)
> >     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> >     at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:121)
> >     at org.apache.avro.io.JsonDecoder.readInt(JsonDecoder.java:148)
> >     at
> > org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:341)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:146)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> >     at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:38)
> >
> > Am I doing something dumb wrong? Per the JSON spec, objects are
> unordered so
> > it seems very problematic that it is expecting it to be ordered.
> >
> > Thank you,
> > Jon
>
> Indeed, this contradicts the JSON spec. Order does not matter in JSON.
>
> Jackson however deserializes JSON with a LinkedHashMap by default. I
> suppose Avro takes advantage of this, but it still contradicts the
> spec.
>
> --
> Francis Galiegue, fgaliegue@gmail.com
> JSON Schema in Java: http://json-schema-validator.herokuapp.com
>

Re: Do the values in the json object have to be ordered?

Posted by Francis Galiegue <fg...@gmail.com>.
On Thu, Apr 4, 2013 at 6:21 PM, Jonathan Coveney <jc...@gmail.com> wrote:
> I think an example is most useful:
>
> https://gist.github.com/jcoveney/5311795
>
> I realize that the python implementation isn't as strict as the Java
> implementation, though this result is a bit surprising.
>
> Basically, is it the case that the Java generic writer expects that the Json
> object's keys will be in the same order as the fields? This is what the gist
> is trying to show. I have a simple record definition, and then two identical
> json objects that match that definition, except for the order.
>
> In python this works, which you'd expect, but in Java it does not. I get the
> following:
>
> First successful!
> Exception in thread "main" java.lang.RuntimeException:
> org.apache.avro.AvroTypeException: Expected field name first got second
>     at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:43)
> Caused by: org.apache.avro.AvroTypeException: Expected field name first got
> second
>     at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:437)
>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>     at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:121)
>     at org.apache.avro.io.JsonDecoder.readInt(JsonDecoder.java:148)
>     at
> org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
>     at
> org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:341)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:146)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>     at com.spotify.hadoop.mapred.Hrm.main(Hrm.java:38)
>
> Am I doing something dumb wrong? Per the JSON spec, objects are unordered so
> it seems very problematic that it is expecting it to be ordered.
>
> Thank you,
> Jon

Indeed, this contradicts the JSON spec. Order does not matter in JSON.

Jackson however deserializes JSON with a LinkedHashMap by default. I
suppose Avro takes advantage of this, but it still contradicts the
spec.

--
Francis Galiegue, fgaliegue@gmail.com
JSON Schema in Java: http://json-schema-validator.herokuapp.com