You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Jonathan Coveney <jc...@gmail.com> on 2013/04/09 11:06:34 UTC

Picking up default value for a union?

I have the following schema: {"name":"hey", "type":"record",
"fields":[{"name":"a","type":["null","string"],"default":null}]}

I am trying to deserialize the following against this schema using Java and
the GenericDatumReader: {}

I get the following error:
Caused by: org.apache.avro.AvroTypeException: Expected start-union. Got
END_OBJECT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
    at
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at
org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
    at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
    at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)

I'm not seeing any immediate issues online around this...is this expected?
I'm reading it in as such:

Schema avroSchema = new Schema.Parser().parse(schemaLine);
GenericDatumReader<Object> reader = new
GenericDatumReader<Object>(avroSchema);
Object datum = reader.read(null,
DecoderFactory.get().jsonDecoder(avroSchema, dataLine));

I'm going to see what's up and why it isn't picking up the default, but
imagined you guys might know what's up?

Thanks,
Jon

Re: Picking up default value for a union?

Posted by Jonathan Coveney <jc...@gmail.com>.
Thank you both. Makes sense


2013/4/11 Scott Carey <sc...@apache.org>

> Minor addition, the default value should be
>
> null
>
> not
>
> "null"
>
> -- the latter is a string, the former is null.
>
> http://avro.apache.org/docs/current/spec.html#schema_record
>
>
> On 4/9/13 8:42 PM, "Martin Kleppmann" <ma...@rapportive.com> wrote:
>
> >With Avro, it is generally assumed that your reader is working with
> >the exact same schema as the data was written with. If you want to
> >change your schema, e.g. add a field to a record, you still need the
> >exact same schema as was used for writing (the "writer's schema"), but
> >you can also give the decoder a second schema (the "reader's schema"),
> >and Avro will map data from the writer's schema into the reader's
> >schema for you ("schema evolution").
> >
> >This requirement of having the exact same schema as the writer makes
> >more sense with Avro's binary encoding, because it allows Avro to omit
> >the field names, which makes the encoding very compact. The
> >requirement makes less sense if you're using the JSON encoding, where
> >field names are inevitably part of the JSON. I think this behaviour is
> >expected, but I agree that it's a bit surprising, so perhaps it's
> >worth discussing whether we should change it.
> >
> >To answer your question, your input data {} looks like it was written
> >with a writer schema of {"name":"hey", "type":"record", "fields":[]}
> >so try using that as your writer schema. Then if you specify
> >{"name":"hey", "type":"record",
> >"fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
> >your reader schema, you should find that the resolving decoder fills
> >in the field "a" with the default null.
> >
> >Best,
> >Martin
> >
> >On 9 April 2013 02:44, Jonathan Coveney <jc...@gmail.com> wrote:
> >> Stepping through the code, it looks like the code only uses defaults for
> >> writing, not for reading. IE at read time it assumes that the defaults
> >>were
> >> already filled in. It seems like if the reader evolved the schema to
> >>include
> >> new fields, it would be desirable for the defaults to get filled in if
> >>not
> >> present? But stepping through, on reading the defaults are completely
> >> ignored.
> >>
> >>
> >> 2013/4/9 Jonathan Coveney <jc...@gmail.com>
> >>>
> >>> Please note: {"name":"hey", "type":"record",
> >>> "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
> >>> doesn't work
> >>>
> >>>
> >>> 2013/4/9 Jonathan Coveney <jc...@gmail.com>
> >>>>
> >>>> I have the following schema: {"name":"hey", "type":"record",
> >>>> "fields":[{"name":"a","type":["null","string"],"default":null}]}
> >>>>
> >>>> I am trying to deserialize the following against this schema using
> >>>>Java
> >>>> and the GenericDatumReader: {}
> >>>>
> >>>> I get the following error:
> >>>> Caused by: org.apache.avro.AvroTypeException: Expected start-union.
> >>>>Got
> >>>> END_OBJECT
> >>>>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
> >>>>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
> >>>>     at
> >>>>
> >>>>org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> >>>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> >>>>     at
> >>>>
> >>>>org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206
> >>>>)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
> >>>>:152)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade
> >>>>r.java:177)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
> >>>>:148)
> >>>>     at
> >>>>
> >>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
> >>>>:139)
> >>>>     at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)
> >>>>
> >>>> I'm not seeing any immediate issues online around this...is this
> >>>> expected? I'm reading it in as such:
> >>>>
> >>>> Schema avroSchema = new Schema.Parser().parse(schemaLine);
> >>>> GenericDatumReader<Object> reader = new
> >>>> GenericDatumReader<Object>(avroSchema);
> >>>> Object datum = reader.read(null,
> >>>> DecoderFactory.get().jsonDecoder(avroSchema, dataLine));
> >>>>
> >>>> I'm going to see what's up and why it isn't picking up the default,
> >>>>but
> >>>> imagined you guys might know what's up?
> >>>>
> >>>> Thanks,
> >>>> Jon
> >>>
> >>>
> >>
>
>
>

Re: Picking up default value for a union?

Posted by Scott Carey <sc...@apache.org>.
Minor addition, the default value should be

null

not 

"null"

-- the latter is a string, the former is null.

http://avro.apache.org/docs/current/spec.html#schema_record


On 4/9/13 8:42 PM, "Martin Kleppmann" <ma...@rapportive.com> wrote:

>With Avro, it is generally assumed that your reader is working with
>the exact same schema as the data was written with. If you want to
>change your schema, e.g. add a field to a record, you still need the
>exact same schema as was used for writing (the "writer's schema"), but
>you can also give the decoder a second schema (the "reader's schema"),
>and Avro will map data from the writer's schema into the reader's
>schema for you ("schema evolution").
>
>This requirement of having the exact same schema as the writer makes
>more sense with Avro's binary encoding, because it allows Avro to omit
>the field names, which makes the encoding very compact. The
>requirement makes less sense if you're using the JSON encoding, where
>field names are inevitably part of the JSON. I think this behaviour is
>expected, but I agree that it's a bit surprising, so perhaps it's
>worth discussing whether we should change it.
>
>To answer your question, your input data {} looks like it was written
>with a writer schema of {"name":"hey", "type":"record", "fields":[]}
>so try using that as your writer schema. Then if you specify
>{"name":"hey", "type":"record",
>"fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
>your reader schema, you should find that the resolving decoder fills
>in the field "a" with the default null.
>
>Best,
>Martin
>
>On 9 April 2013 02:44, Jonathan Coveney <jc...@gmail.com> wrote:
>> Stepping through the code, it looks like the code only uses defaults for
>> writing, not for reading. IE at read time it assumes that the defaults
>>were
>> already filled in. It seems like if the reader evolved the schema to
>>include
>> new fields, it would be desirable for the defaults to get filled in if
>>not
>> present? But stepping through, on reading the defaults are completely
>> ignored.
>>
>>
>> 2013/4/9 Jonathan Coveney <jc...@gmail.com>
>>>
>>> Please note: {"name":"hey", "type":"record",
>>> "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
>>> doesn't work
>>>
>>>
>>> 2013/4/9 Jonathan Coveney <jc...@gmail.com>
>>>>
>>>> I have the following schema: {"name":"hey", "type":"record",
>>>> "fields":[{"name":"a","type":["null","string"],"default":null}]}
>>>>
>>>> I am trying to deserialize the following against this schema using
>>>>Java
>>>> and the GenericDatumReader: {}
>>>>
>>>> I get the following error:
>>>> Caused by: org.apache.avro.AvroTypeException: Expected start-union.
>>>>Got
>>>> END_OBJECT
>>>>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
>>>>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
>>>>     at
>>>> 
>>>>org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>>     at
>>>> 
>>>>org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206
>>>>)
>>>>     at
>>>> 
>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>:152)
>>>>     at
>>>> 
>>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade
>>>>r.java:177)
>>>>     at
>>>> 
>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>:148)
>>>>     at
>>>> 
>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>:139)
>>>>     at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)
>>>>
>>>> I'm not seeing any immediate issues online around this...is this
>>>> expected? I'm reading it in as such:
>>>>
>>>> Schema avroSchema = new Schema.Parser().parse(schemaLine);
>>>> GenericDatumReader<Object> reader = new
>>>> GenericDatumReader<Object>(avroSchema);
>>>> Object datum = reader.read(null,
>>>> DecoderFactory.get().jsonDecoder(avroSchema, dataLine));
>>>>
>>>> I'm going to see what's up and why it isn't picking up the default,
>>>>but
>>>> imagined you guys might know what's up?
>>>>
>>>> Thanks,
>>>> Jon
>>>
>>>
>>



Re: Picking up default value for a union?

Posted by Martin Kleppmann <ma...@rapportive.com>.
With Avro, it is generally assumed that your reader is working with
the exact same schema as the data was written with. If you want to
change your schema, e.g. add a field to a record, you still need the
exact same schema as was used for writing (the "writer's schema"), but
you can also give the decoder a second schema (the "reader's schema"),
and Avro will map data from the writer's schema into the reader's
schema for you ("schema evolution").

This requirement of having the exact same schema as the writer makes
more sense with Avro's binary encoding, because it allows Avro to omit
the field names, which makes the encoding very compact. The
requirement makes less sense if you're using the JSON encoding, where
field names are inevitably part of the JSON. I think this behaviour is
expected, but I agree that it's a bit surprising, so perhaps it's
worth discussing whether we should change it.

To answer your question, your input data {} looks like it was written
with a writer schema of {"name":"hey", "type":"record", "fields":[]}
so try using that as your writer schema. Then if you specify
{"name":"hey", "type":"record",
"fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
your reader schema, you should find that the resolving decoder fills
in the field "a" with the default null.

Best,
Martin

On 9 April 2013 02:44, Jonathan Coveney <jc...@gmail.com> wrote:
> Stepping through the code, it looks like the code only uses defaults for
> writing, not for reading. IE at read time it assumes that the defaults were
> already filled in. It seems like if the reader evolved the schema to include
> new fields, it would be desirable for the defaults to get filled in if not
> present? But stepping through, on reading the defaults are completely
> ignored.
>
>
> 2013/4/9 Jonathan Coveney <jc...@gmail.com>
>>
>> Please note: {"name":"hey", "type":"record",
>> "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
>> doesn't work
>>
>>
>> 2013/4/9 Jonathan Coveney <jc...@gmail.com>
>>>
>>> I have the following schema: {"name":"hey", "type":"record",
>>> "fields":[{"name":"a","type":["null","string"],"default":null}]}
>>>
>>> I am trying to deserialize the following against this schema using Java
>>> and the GenericDatumReader: {}
>>>
>>> I get the following error:
>>> Caused by: org.apache.avro.AvroTypeException: Expected start-union. Got
>>> END_OBJECT
>>>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
>>>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
>>>     at
>>> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>     at
>>> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>>>     at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)
>>>
>>> I'm not seeing any immediate issues online around this...is this
>>> expected? I'm reading it in as such:
>>>
>>> Schema avroSchema = new Schema.Parser().parse(schemaLine);
>>> GenericDatumReader<Object> reader = new
>>> GenericDatumReader<Object>(avroSchema);
>>> Object datum = reader.read(null,
>>> DecoderFactory.get().jsonDecoder(avroSchema, dataLine));
>>>
>>> I'm going to see what's up and why it isn't picking up the default, but
>>> imagined you guys might know what's up?
>>>
>>> Thanks,
>>> Jon
>>
>>
>

Re: Picking up default value for a union?

Posted by Jonathan Coveney <jc...@gmail.com>.
Stepping through the code, it looks like the code only uses defaults for
writing, not for reading. IE at read time it assumes that the defaults were
already filled in. It seems like if the reader evolved the schema to
include new fields, it would be desirable for the defaults to get filled in
if not present? But stepping through, on reading the defaults are
completely ignored.


2013/4/9 Jonathan Coveney <jc...@gmail.com>

> Please note: {"name":"hey", "type":"record",
> "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
> doesn't work
>
>
> 2013/4/9 Jonathan Coveney <jc...@gmail.com>
>
>> I have the following schema: {"name":"hey", "type":"record",
>> "fields":[{"name":"a","type":["null","string"],"default":null}]}
>>
>> I am trying to deserialize the following against this schema using Java
>> and the GenericDatumReader: {}
>>
>> I get the following error:
>> Caused by: org.apache.avro.AvroTypeException: Expected start-union. Got
>> END_OBJECT
>>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
>>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
>>     at
>> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>     at
>> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>     at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>>     at
>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>>     at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>>     at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>>     at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)
>>
>> I'm not seeing any immediate issues online around this...is this
>> expected? I'm reading it in as such:
>>
>> Schema avroSchema = new Schema.Parser().parse(schemaLine);
>> GenericDatumReader<Object> reader = new
>> GenericDatumReader<Object>(avroSchema);
>> Object datum = reader.read(null,
>> DecoderFactory.get().jsonDecoder(avroSchema, dataLine));
>>
>> I'm going to see what's up and why it isn't picking up the default, but
>> imagined you guys might know what's up?
>>
>> Thanks,
>> Jon
>>
>
>

Re: Picking up default value for a union?

Posted by Jonathan Coveney <jc...@gmail.com>.
Please note: {"name":"hey", "type":"record",
"fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
doesn't work


2013/4/9 Jonathan Coveney <jc...@gmail.com>

> I have the following schema: {"name":"hey", "type":"record",
> "fields":[{"name":"a","type":["null","string"],"default":null}]}
>
> I am trying to deserialize the following against this schema using Java
> and the GenericDatumReader: {}
>
> I get the following error:
> Caused by: org.apache.avro.AvroTypeException: Expected start-union. Got
> END_OBJECT
>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
>     at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>     at
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>     at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)
>
> I'm not seeing any immediate issues online around this...is this expected?
> I'm reading it in as such:
>
> Schema avroSchema = new Schema.Parser().parse(schemaLine);
> GenericDatumReader<Object> reader = new
> GenericDatumReader<Object>(avroSchema);
> Object datum = reader.read(null,
> DecoderFactory.get().jsonDecoder(avroSchema, dataLine));
>
> I'm going to see what's up and why it isn't picking up the default, but
> imagined you guys might know what's up?
>
> Thanks,
> Jon
>