You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Devajyoti Sarkar <ds...@q-kk.com> on 2011/01/17 13:53:33 UTC

Setting bytes in Java

Hi,

I am just beginning to use Avro, so I apologize if this is a silly question.

I would like to set a field of type "bytes" in Java. I am assuming that all
I need to do is wrap a byte[] in a ByteBuffer to set the value.
Unfortunately that does not seem to work. I am using a BinaryEncoder and
looking at its output, it has not written any the bytes that were in the
array. The first four values of the array are 0, -128, -128, -128.

Is it because Java uses 8-bit signed bytes while the Avro spec calls for
8-bit unsigned bytes in a field of type "bytes"? If so, how does one convert
Java bytes to the kind accepted by Avro?

Thanks in advance.

Dev

Re: Setting bytes in Java

Posted by Doug Cutting <cu...@apache.org>.
Dev,

What you describe should work.  Can you perhaps provide a simple code 
example to illustrate the problem you are having?

Thanks,

Doug

On 01/17/2011 04:53 AM, Devajyoti Sarkar wrote:
> Hi,
>
> I am just beginning to use Avro, so I apologize if this is a silly question.
>
> I would like to set a field of type "bytes" in Java. I am assuming that
> all I need to do is wrap a byte[] in a ByteBuffer to set the value.
> Unfortunately that does not seem to work. I am using a BinaryEncoder and
> looking at its output, it has not written any the bytes that were in the
> array. The first four values of the array are 0, -128, -128, -128.
>
> Is it because Java uses 8-bit signed bytes while the Avro spec calls for
> 8-bit unsigned bytes in a field of type "bytes"? If so, how does one
> convert Java bytes to the kind accepted by Avro?
>
> Thanks in advance.
>
> Dev

Re: Setting bytes in Java

Posted by Devajyoti Sarkar <ds...@q-kk.com>.
It has been filed as AVRO-738.

Thanks for the links.

Dev

On Wed, Jan 19, 2011 at 12:00 AM, Scott Carey <sc...@richrelevance.com>wrote:

> Please open a bug report in JIRA.  I don't have time to look at this now,
> but someone else might.
>
>
> On the topic of per record versioning and how to design a system that does
> not store schemas per record, there have been useful topics on this
> mailing list in the past:
>
>
> http://search-hadoop.com/m/66jvQoopYw/HAvroBase&subj=Re+question+about+comp
> letely+untagged+data+
>
> http://search-hadoop.com/m/q7lLU1GVhHd2/HAvroBase&subj=Re+Versioning+of+an+
> array+of+a+record
>
> On 1/18/11 10:08 AM, "David Rosenstrauch" <da...@darose.net> wrote:
>
> >I've also found this to be the case, and was wondering about it.  I also
> >had thought that I could just re-init an existing BinaryEncoder, but
> >found that I had to create a new one each time.  I didn't really think
> >much of it at the time, but in retrospect it does sound like it might be
> >a bug.  Perhaps one of the devs can comment more.  (And/or perhaps you
> >might want to open a bug report about this.)
> >
> >DR
> >
> >On 01/18/2011 03:17 AM, Devajyoti Sarkar wrote:
> >> Let me first give some context, I would like to store a datum serialized
> >> with a BinaryEncoder without having to place a schema with it (as the
> >> DataFileWriter does). Instead I have created a container record that
> >>stores
> >> a unique id for the schema version and a payload field of type "bytes".
> >>This
> >> allows me to have a self-describing data object (for example, to place
> >>in a
> >> cell in HBase) without the overhead of a schema per object. (Perhaps
> >>there
> >> is a better way to do this, if so please let me know).
> >>
> >> The code looks something like this:
> >>
> >>      GenericRecord container = new GenericData.Record(containerSchema);
> >>      writer.setSchema(containerSchema);
> >>      container.put(CONTAINER_SCHEMA_ID_FIELD,
> >> datum.getSchema().getProp(SCHEMA_ID_PROPERTY));
> >>      container.put(CONTAINER_PAYLOAD_FIELD,
> >> ByteBuffer.wrap(datumBits.toByteArray()));
> >>      ByteArrayOutputStream containerBits = new ByteArrayOutputStream();
> >>      encoder.init(containerBits);
> >>      writer.write(container, encoder);
> >>      encoder.flush();
> >>      containerBits.flush();
> >>      containerBits.close();
> >>
> >> I am trying to reuse an encoder by calling init() to re-initialize it.
> >> Perhaps this is what creates the problem. If I create a new encoder each
> >> time everything works fine. However, if I just use init, then the
> >> OutputStream for the encoder is reset but the OutputStream for the
> >> SimpleByteWriter within the encoder is not. This seems to be causing the
> >> problem because when the encoder is flushed, it does not write the
> >>bytes in
> >> the ByteWriter. Perhaps the init() method is not supposed to be used
> >>this
> >> way. But it would be nice to not have to create a new encoder each time.
> >>
> >> Can you please let me know if the above looks right and advise me as to
> >>what
> >> is the best way to do the serialization.
> >>
> >> Thanks,
> >> Dev
> >>
> >>
> >>
> >> On Tue, Jan 18, 2011 at 4:14 AM, Scott
> >>Carey<sc...@richrelevance.com>wrote:
> >>
> >>> BinaryEncoder buffers data, you may have to call flush() to see it in
> >>>the
> >>> output stream.
> >>>
> >>>
> >>> On 1/17/11 4:53 AM, "Devajyoti Sarkar"<ds...@q-kk.com>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am just beginning to use Avro, so I apologize if this is a silly
> >>> question.
> >>>
> >>> I would like to set a field of type "bytes" in Java. I am assuming
> >>>that all
> >>> I need to do is wrap a byte[] in a ByteBuffer to set the value.
> >>> Unfortunately that does not seem to work. I am using a BinaryEncoder
> >>>and
> >>> looking at its output, it has not written any the bytes that were in
> >>>the
> >>> array. The first four values of the array are 0, -128, -128, -128.
> >>>
> >>> Is it because Java uses 8-bit signed bytes while the Avro spec calls
> >>>for
> >>> 8-bit unsigned bytes in a field of type "bytes"? If so, how does one
> >>>convert
> >>> Java bytes to the kind accepted by Avro?
> >>>
> >>> Thanks in advance.
> >>>
> >>> Dev
> >>>
> >>>
> >>
> >
>
>

Re: Setting bytes in Java

Posted by Scott Carey <sc...@richrelevance.com>.
Please open a bug report in JIRA.  I don't have time to look at this now,
but someone else might.


On the topic of per record versioning and how to design a system that does
not store schemas per record, there have been useful topics on this
mailing list in the past:


http://search-hadoop.com/m/66jvQoopYw/HAvroBase&subj=Re+question+about+comp
letely+untagged+data+

http://search-hadoop.com/m/q7lLU1GVhHd2/HAvroBase&subj=Re+Versioning+of+an+
array+of+a+record

On 1/18/11 10:08 AM, "David Rosenstrauch" <da...@darose.net> wrote:

>I've also found this to be the case, and was wondering about it.  I also
>had thought that I could just re-init an existing BinaryEncoder, but
>found that I had to create a new one each time.  I didn't really think
>much of it at the time, but in retrospect it does sound like it might be
>a bug.  Perhaps one of the devs can comment more.  (And/or perhaps you
>might want to open a bug report about this.)
>
>DR
>
>On 01/18/2011 03:17 AM, Devajyoti Sarkar wrote:
>> Let me first give some context, I would like to store a datum serialized
>> with a BinaryEncoder without having to place a schema with it (as the
>> DataFileWriter does). Instead I have created a container record that
>>stores
>> a unique id for the schema version and a payload field of type "bytes".
>>This
>> allows me to have a self-describing data object (for example, to place
>>in a
>> cell in HBase) without the overhead of a schema per object. (Perhaps
>>there
>> is a better way to do this, if so please let me know).
>>
>> The code looks something like this:
>>
>>      GenericRecord container = new GenericData.Record(containerSchema);
>>      writer.setSchema(containerSchema);
>>      container.put(CONTAINER_SCHEMA_ID_FIELD,
>> datum.getSchema().getProp(SCHEMA_ID_PROPERTY));
>>      container.put(CONTAINER_PAYLOAD_FIELD,
>> ByteBuffer.wrap(datumBits.toByteArray()));
>>      ByteArrayOutputStream containerBits = new ByteArrayOutputStream();
>>      encoder.init(containerBits);
>>      writer.write(container, encoder);
>>      encoder.flush();
>>      containerBits.flush();
>>      containerBits.close();
>>
>> I am trying to reuse an encoder by calling init() to re-initialize it.
>> Perhaps this is what creates the problem. If I create a new encoder each
>> time everything works fine. However, if I just use init, then the
>> OutputStream for the encoder is reset but the OutputStream for the
>> SimpleByteWriter within the encoder is not. This seems to be causing the
>> problem because when the encoder is flushed, it does not write the
>>bytes in
>> the ByteWriter. Perhaps the init() method is not supposed to be used
>>this
>> way. But it would be nice to not have to create a new encoder each time.
>>
>> Can you please let me know if the above looks right and advise me as to
>>what
>> is the best way to do the serialization.
>>
>> Thanks,
>> Dev
>>
>>
>>
>> On Tue, Jan 18, 2011 at 4:14 AM, Scott
>>Carey<sc...@richrelevance.com>wrote:
>>
>>> BinaryEncoder buffers data, you may have to call flush() to see it in
>>>the
>>> output stream.
>>>
>>>
>>> On 1/17/11 4:53 AM, "Devajyoti Sarkar"<ds...@q-kk.com>  wrote:
>>>
>>> Hi,
>>>
>>> I am just beginning to use Avro, so I apologize if this is a silly
>>> question.
>>>
>>> I would like to set a field of type "bytes" in Java. I am assuming
>>>that all
>>> I need to do is wrap a byte[] in a ByteBuffer to set the value.
>>> Unfortunately that does not seem to work. I am using a BinaryEncoder
>>>and
>>> looking at its output, it has not written any the bytes that were in
>>>the
>>> array. The first four values of the array are 0, -128, -128, -128.
>>>
>>> Is it because Java uses 8-bit signed bytes while the Avro spec calls
>>>for
>>> 8-bit unsigned bytes in a field of type "bytes"? If so, how does one
>>>convert
>>> Java bytes to the kind accepted by Avro?
>>>
>>> Thanks in advance.
>>>
>>> Dev
>>>
>>>
>>
>


Re: Setting bytes in Java

Posted by David Rosenstrauch <da...@darose.net>.
I've also found this to be the case, and was wondering about it.  I also 
had thought that I could just re-init an existing BinaryEncoder, but 
found that I had to create a new one each time.  I didn't really think 
much of it at the time, but in retrospect it does sound like it might be 
a bug.  Perhaps one of the devs can comment more.  (And/or perhaps you 
might want to open a bug report about this.)

DR

On 01/18/2011 03:17 AM, Devajyoti Sarkar wrote:
> Let me first give some context, I would like to store a datum serialized
> with a BinaryEncoder without having to place a schema with it (as the
> DataFileWriter does). Instead I have created a container record that stores
> a unique id for the schema version and a payload field of type "bytes". This
> allows me to have a self-describing data object (for example, to place in a
> cell in HBase) without the overhead of a schema per object. (Perhaps there
> is a better way to do this, if so please let me know).
>
> The code looks something like this:
>
>      GenericRecord container = new GenericData.Record(containerSchema);
>      writer.setSchema(containerSchema);
>      container.put(CONTAINER_SCHEMA_ID_FIELD,
> datum.getSchema().getProp(SCHEMA_ID_PROPERTY));
>      container.put(CONTAINER_PAYLOAD_FIELD,
> ByteBuffer.wrap(datumBits.toByteArray()));
>      ByteArrayOutputStream containerBits = new ByteArrayOutputStream();
>      encoder.init(containerBits);
>      writer.write(container, encoder);
>      encoder.flush();
>      containerBits.flush();
>      containerBits.close();
>
> I am trying to reuse an encoder by calling init() to re-initialize it.
> Perhaps this is what creates the problem. If I create a new encoder each
> time everything works fine. However, if I just use init, then the
> OutputStream for the encoder is reset but the OutputStream for the
> SimpleByteWriter within the encoder is not. This seems to be causing the
> problem because when the encoder is flushed, it does not write the bytes in
> the ByteWriter. Perhaps the init() method is not supposed to be used this
> way. But it would be nice to not have to create a new encoder each time.
>
> Can you please let me know if the above looks right and advise me as to what
> is the best way to do the serialization.
>
> Thanks,
> Dev
>
>
>
> On Tue, Jan 18, 2011 at 4:14 AM, Scott Carey<sc...@richrelevance.com>wrote:
>
>> BinaryEncoder buffers data, you may have to call flush() to see it in the
>> output stream.
>>
>>
>> On 1/17/11 4:53 AM, "Devajyoti Sarkar"<ds...@q-kk.com>  wrote:
>>
>> Hi,
>>
>> I am just beginning to use Avro, so I apologize if this is a silly
>> question.
>>
>> I would like to set a field of type "bytes" in Java. I am assuming that all
>> I need to do is wrap a byte[] in a ByteBuffer to set the value.
>> Unfortunately that does not seem to work. I am using a BinaryEncoder and
>> looking at its output, it has not written any the bytes that were in the
>> array. The first four values of the array are 0, -128, -128, -128.
>>
>> Is it because Java uses 8-bit signed bytes while the Avro spec calls for
>> 8-bit unsigned bytes in a field of type "bytes"? If so, how does one convert
>> Java bytes to the kind accepted by Avro?
>>
>> Thanks in advance.
>>
>> Dev
>>
>>
>


Re: Setting bytes in Java

Posted by Devajyoti Sarkar <ds...@q-kk.com>.
Let me first give some context, I would like to store a datum serialized
with a BinaryEncoder without having to place a schema with it (as the
DataFileWriter does). Instead I have created a container record that stores
a unique id for the schema version and a payload field of type "bytes". This
allows me to have a self-describing data object (for example, to place in a
cell in HBase) without the overhead of a schema per object. (Perhaps there
is a better way to do this, if so please let me know).

The code looks something like this:

    GenericRecord container = new GenericData.Record(containerSchema);
    writer.setSchema(containerSchema);
    container.put(CONTAINER_SCHEMA_ID_FIELD,
datum.getSchema().getProp(SCHEMA_ID_PROPERTY));
    container.put(CONTAINER_PAYLOAD_FIELD,
ByteBuffer.wrap(datumBits.toByteArray()));
    ByteArrayOutputStream containerBits = new ByteArrayOutputStream();
    encoder.init(containerBits);
    writer.write(container, encoder);
    encoder.flush();
    containerBits.flush();
    containerBits.close();

I am trying to reuse an encoder by calling init() to re-initialize it.
Perhaps this is what creates the problem. If I create a new encoder each
time everything works fine. However, if I just use init, then the
OutputStream for the encoder is reset but the OutputStream for the
SimpleByteWriter within the encoder is not. This seems to be causing the
problem because when the encoder is flushed, it does not write the bytes in
the ByteWriter. Perhaps the init() method is not supposed to be used this
way. But it would be nice to not have to create a new encoder each time.

Can you please let me know if the above looks right and advise me as to what
is the best way to do the serialization.

Thanks,
Dev



On Tue, Jan 18, 2011 at 4:14 AM, Scott Carey <sc...@richrelevance.com>wrote:

> BinaryEncoder buffers data, you may have to call flush() to see it in the
> output stream.
>
>
> On 1/17/11 4:53 AM, "Devajyoti Sarkar" <ds...@q-kk.com> wrote:
>
> Hi,
>
> I am just beginning to use Avro, so I apologize if this is a silly
> question.
>
> I would like to set a field of type "bytes" in Java. I am assuming that all
> I need to do is wrap a byte[] in a ByteBuffer to set the value.
> Unfortunately that does not seem to work. I am using a BinaryEncoder and
> looking at its output, it has not written any the bytes that were in the
> array. The first four values of the array are 0, -128, -128, -128.
>
> Is it because Java uses 8-bit signed bytes while the Avro spec calls for
> 8-bit unsigned bytes in a field of type "bytes"? If so, how does one convert
> Java bytes to the kind accepted by Avro?
>
> Thanks in advance.
>
> Dev
>
>

Re: Setting bytes in Java

Posted by Scott Carey <sc...@richrelevance.com>.
BinaryEncoder buffers data, you may have to call flush() to see it in the output stream.


On 1/17/11 4:53 AM, "Devajyoti Sarkar" <ds...@q-kk.com>> wrote:

Hi,

I am just beginning to use Avro, so I apologize if this is a silly question.

I would like to set a field of type "bytes" in Java. I am assuming that all I need to do is wrap a byte[] in a ByteBuffer to set the value. Unfortunately that does not seem to work. I am using a BinaryEncoder and looking at its output, it has not written any the bytes that were in the array. The first four values of the array are 0, -128, -128, -128.

Is it because Java uses 8-bit signed bytes while the Avro spec calls for 8-bit unsigned bytes in a field of type "bytes"? If so, how does one convert Java bytes to the kind accepted by Avro?

Thanks in advance.

Dev