You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Markus Weimer <we...@yahoo-inc.com> on 2010/08/03 01:28:26 UTC

Confused about default values

Hi,

I added the following line to a schema, recreated the static java classes
for it and compiled my code:

{"name": "bias", "type":"double", "default":"0.0"}

When I now try to read a file written before the change, I get an error:

Exception in thread "main" java.io.EOFException
        at 
org.apache.avro.io.BinaryDecoder.readDouble(BinaryDecoder.java:154)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:82)
        at 
org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java
:273)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:74)
        at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.jav
a:154)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:72)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:61)


I assumed that it would just return 0.0 for the fields not present in the
file. Is this a bug on my end?

Thanks,

Markus


Re: Confused about default values

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
Also, if this turns out to be the issue, please file a JIRA to ensure we
provide a clear error message to the user when we parse the schema.

On Mon, Aug 2, 2010 at 4:43 PM, Jeff Hammerbacher <ha...@cloudera.com>wrote:

> Hey,
>
> I think the issue is that you put "0.0" in quotes. Try just 0.0.
>
> Later,
> Jeff
>
>
> On Mon, Aug 2, 2010 at 4:40 PM, Doug Cutting <cu...@apache.org> wrote:
>
>> That sounds like something that should work.  Can you submit a bug report,
>> ideally with a complete test case?  Thanks!
>>
>> Doug
>>
>>
>> On 08/02/2010 04:28 PM, Markus Weimer wrote:
>>
>>> Hi,
>>>
>>> I added the following line to a schema, recreated the static java classes
>>> for it and compiled my code:
>>>
>>> {"name": "bias", "type":"double", "default":"0.0"}
>>>
>>> When I now try to read a file written before the change, I get an error:
>>>
>>> Exception in thread "main" java.io.EOFException
>>>         at
>>> org.apache.avro.io.BinaryDecoder.readDouble(BinaryDecoder.java:154)
>>>         at
>>>
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:82)
>>>         at
>>>
>>> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java
>>> :273)
>>>         at
>>>
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:74)
>>>         at
>>>
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.jav
>>> a:154)
>>>         at
>>>
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:72)
>>>         at
>>>
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:61)
>>>
>>>
>>> I assumed that it would just return 0.0 for the fields not present in the
>>> file. Is this a bug on my end?
>>>
>>> Thanks,
>>>
>>> Markus
>>>
>>>
>

Re: Confused about default values

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
Hey,

I think the issue is that you put "0.0" in quotes. Try just 0.0.

Later,
Jeff

On Mon, Aug 2, 2010 at 4:40 PM, Doug Cutting <cu...@apache.org> wrote:

> That sounds like something that should work.  Can you submit a bug report,
> ideally with a complete test case?  Thanks!
>
> Doug
>
>
> On 08/02/2010 04:28 PM, Markus Weimer wrote:
>
>> Hi,
>>
>> I added the following line to a schema, recreated the static java classes
>> for it and compiled my code:
>>
>> {"name": "bias", "type":"double", "default":"0.0"}
>>
>> When I now try to read a file written before the change, I get an error:
>>
>> Exception in thread "main" java.io.EOFException
>>         at
>> org.apache.avro.io.BinaryDecoder.readDouble(BinaryDecoder.java:154)
>>         at
>>
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:82)
>>         at
>>
>> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java
>> :273)
>>         at
>>
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:74)
>>         at
>>
>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.jav
>> a:154)
>>         at
>>
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:72)
>>         at
>>
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:61)
>>
>>
>> I assumed that it would just return 0.0 for the fields not present in the
>> file. Is this a bug on my end?
>>
>> Thanks,
>>
>> Markus
>>
>>

Re: Confused about default values

Posted by Doug Cutting <cu...@apache.org>.
That sounds like something that should work.  Can you submit a bug 
report, ideally with a complete test case?  Thanks!

Doug

On 08/02/2010 04:28 PM, Markus Weimer wrote:
> Hi,
>
> I added the following line to a schema, recreated the static java classes
> for it and compiled my code:
>
> {"name": "bias", "type":"double", "default":"0.0"}
>
> When I now try to read a file written before the change, I get an error:
>
> Exception in thread "main" java.io.EOFException
>          at
> org.apache.avro.io.BinaryDecoder.readDouble(BinaryDecoder.java:154)
>          at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:82)
>          at
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java
> :273)
>          at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:74)
>          at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.jav
> a:154)
>          at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:72)
>          at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:61)
>
>
> I assumed that it would just return 0.0 for the fields not present in the
> file. Is this a bug on my end?
>
> Thanks,
>
> Markus
>

Re: Confused about default values

Posted by Markus Weimer <we...@yahoo-inc.com>.
Hi,

Thanks for the suggestions! I tried changing to 0.0 as opposed to "0.0" with no success. Please note that I am on AVRO 1.2, as there is an incompatibility between hadoop 0.20 and newer versions of avro.

It seems that the question how I (de-)serialized the object could lead to an answer. I read the avro instance directly from an inputstream. The data in the stream has been serialized using the following code:

public static void store(final SpecificRecord m, final OutputStream out) throws IOException {
    final SpecificDatumWriter datumWriter = new SpecificDatumWriter(m.getSchema());
    final BinaryEncoder enc = new BinaryEncoder(out);
    datumWriter.write(m, enc);
    enc.flush();
}

I read from the stream using:

public static SpecificRecord load(final InputStream in) throws IOException {
    final SpecificDatumReader reader = new SpecificDatumReader(THECLASS._SCHEMA);
    final BinaryDecoder decoder = new BinaryDecoder(in);
    return ( SpecificRecord ) reader.read(null, decoder);
}

Presumably, this does not serialize the schema with the data, correct? That would explain the problem. I know that avro files do serialize the schema at the beginning. Is there a similar tool for writing to streams?

Thanks,

Markus

On 8/2/10 6:01 PM, "Scott Carey" <sc...@richrelevance.com> wrote:

How was this GenericDatumReader constructed?  Is it used to read from an Avro file or from something else?

Note that you may have to set the "expected" schema separately from the actual schema.  Avro needs to know what the schema was when it was written, in the Avro data file this is persisted with it and automatically set when read.


On Aug 2, 2010, at 4:28 PM, Markus Weimer wrote:

> Hi,
>
> I added the following line to a schema, recreated the static java classes
> for it and compiled my code:
>
> {"name": "bias", "type":"double", "default":"0.0"}
>
> When I now try to read a file written before the change, I get an error:
>
> Exception in thread "main" java.io.EOFException
>        at
> org.apache.avro.io.BinaryDecoder.readDouble(BinaryDecoder.java:154)
>        at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:82)
>        at
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java
> :273)
>        at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:74)
>        at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.jav
> a:154)
>        at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:72)
>        at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:61)
>
>
> I assumed that it would just return 0.0 for the fields not present in the
> file. Is this a bug on my end?
>
> Thanks,
>
> Markus
>



Re: Confused about default values

Posted by Scott Carey <sc...@richrelevance.com>.
How was this GenericDatumReader constructed?  Is it used to read from an Avro file or from something else? 

Note that you may have to set the "expected" schema separately from the actual schema.  Avro needs to know what the schema was when it was written, in the Avro data file this is persisted with it and automatically set when read.


On Aug 2, 2010, at 4:28 PM, Markus Weimer wrote:

> Hi,
> 
> I added the following line to a schema, recreated the static java classes
> for it and compiled my code:
> 
> {"name": "bias", "type":"double", "default":"0.0"}
> 
> When I now try to read a file written before the change, I get an error:
> 
> Exception in thread "main" java.io.EOFException
>        at 
> org.apache.avro.io.BinaryDecoder.readDouble(BinaryDecoder.java:154)
>        at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:82)
>        at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java
> :273)
>        at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:74)
>        at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.jav
> a:154)
>        at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:72)
>        at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:61)
> 
> 
> I assumed that it would just return 0.0 for the fields not present in the
> file. Is this a bug on my end?
> 
> Thanks,
> 
> Markus
>