You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Arunasalam G <ze...@gmail.com> on 2015/02/11 07:48:05 UTC

Doubt in a AVRO scenario

Hi,

I am new to AVRO and have a doubt in a scenario. Kindly requesting you to
help me on this.

    1. An AVRO object is constructed and is serialized and stored as value
in HBASE table.
    2. While retrieving it, we are retrieving the stored Byte Array value.

Is it possible to retrieve the schema directly from the Byte Array value
just like we retrieve schema from a file?
Because we don't want to initialize the DatumReader object with schema
initialized.
I find that AVRO supports retrieving a schema from an AVRO file using
DataFileReader wherein, the schema can be obtained from the avro file and
hence, DatumReader can be initialized without schema.
But in our case, we needed to initialize the Reader without schema and
hence, is there any way to retrieve the schema stored in a serialized byte
array object?

I would be really greatful if you take a look at the following scenario.

Thanks in advance.

Regards,
Arun G

Re: Doubt in a AVRO scenario

Posted by Sean Busbey <bu...@cloudera.com>.
Hi!

DatumWriter doesn't serialize the schema when writing individual datum out.
If you look at your byte array contents, I believe you'll find that it just
contains the binary representation of the record.

Your use case sounds very similar to a recent question on the list on
storing records in byte arrays without the original writer schema[1]. The
guidance on that thread about using a schema id should work for your
scenario as well. Note that it will be up to you to handle serialization of
the id and lookup of the schema.

[1]: *http://s.apache.org/2IM <http://s.apache.org/2IM>*



On Thu, Feb 12, 2015 at 1:50 AM, Arunasalam G <ze...@gmail.com> wrote:

> Hi,
>
> Is there any way to retrieve schema from the encoded data without knowing
> its schema prior to deserialization?
> As requested, we have given the steps that we did for serializing the data
> and schema.
>
> Please help us in resolving the scenario.
> Looking forward to hearing from you soon.
>
> Thanks and Regards,
> Arun G.
>
> On Wed, Feb 11, 2015 at 2:56 PM, Arunasalam G <ze...@gmail.com>
> wrote:
>
>> Hi,
>>
>>     We serialized the schema using the following code.
>>
>> ByteArrayOutputStream out = new ByteArrayOutputStream();
>> BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out, null);
>> DatumWriter<Record> writer = new SpecificDatumWriter<Record>(schema);
>>
>>                 writer.write(record, encoder);
>> encoder.flush();
>> out.close();
>>
>> Here, record is of type org.apache.avro.generic.GenericData.Record.
>>
>> Thanks and Regards,
>> Arun G
>>
>>
>> On Wed, Feb 11, 2015 at 2:08 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>>> On Wed, Feb 11, 2015 at 1:24 AM, Arunasalam G <ze...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Our scenario is we have stored the data with schema added to it.
>>>>
>>>> I would like to make it more simple without bringing the Hbase into
>>>> consideration.
>>>>
>>>> We have an Avro data object which has both data and schema and is
>>>> serialized to Byte Array.
>>>> Is there any way to retrieve the schema from this ByteArray object?
>>>>
>>>> Lets assume that we don't know what schema is present in the incoming
>>>> object.
>>>> I could find that for an AVRO data file, its possible to retrieve the
>>>> schema from the file and similarly, is there any way for retrieving the
>>>> schema from a serialized byte array object?
>>>>
>>>>
>>> It depends entirely on how you serialized the schema + binary into the
>>> byte array. Did you use some library or can you briefly describe the method
>>> used?
>>>
>>> --
>>> Sean
>>>
>>
>>
>


-- 
Sean

Re: Doubt in a AVRO scenario

Posted by Arunasalam G <ze...@gmail.com>.
Hi,

Is there any way to retrieve schema from the encoded data without knowing
its schema prior to deserialization?
As requested, we have given the steps that we did for serializing the data
and schema.

Please help us in resolving the scenario.
Looking forward to hearing from you soon.

Thanks and Regards,
Arun G.

On Wed, Feb 11, 2015 at 2:56 PM, Arunasalam G <ze...@gmail.com> wrote:

> Hi,
>
>     We serialized the schema using the following code.
>
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out, null);
> DatumWriter<Record> writer = new SpecificDatumWriter<Record>(schema);
>
>                 writer.write(record, encoder);
> encoder.flush();
> out.close();
>
> Here, record is of type org.apache.avro.generic.GenericData.Record.
>
> Thanks and Regards,
> Arun G
>
>
> On Wed, Feb 11, 2015 at 2:08 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> On Wed, Feb 11, 2015 at 1:24 AM, Arunasalam G <ze...@gmail.com>
>> wrote:
>>
>>>
>>> Our scenario is we have stored the data with schema added to it.
>>>
>>> I would like to make it more simple without bringing the Hbase into
>>> consideration.
>>>
>>> We have an Avro data object which has both data and schema and is
>>> serialized to Byte Array.
>>> Is there any way to retrieve the schema from this ByteArray object?
>>>
>>> Lets assume that we don't know what schema is present in the incoming
>>> object.
>>> I could find that for an AVRO data file, its possible to retrieve the
>>> schema from the file and similarly, is there any way for retrieving the
>>> schema from a serialized byte array object?
>>>
>>>
>> It depends entirely on how you serialized the schema + binary into the
>> byte array. Did you use some library or can you briefly describe the method
>> used?
>>
>> --
>> Sean
>>
>
>

Re: Doubt in a AVRO scenario

Posted by Arunasalam G <ze...@gmail.com>.
Hi,

    We serialized the schema using the following code.

ByteArrayOutputStream out = new ByteArrayOutputStream();
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out, null);
DatumWriter<Record> writer = new SpecificDatumWriter<Record>(schema);

                writer.write(record, encoder);
encoder.flush();
out.close();

Here, record is of type org.apache.avro.generic.GenericData.Record.

Thanks and Regards,
Arun G


On Wed, Feb 11, 2015 at 2:08 PM, Sean Busbey <bu...@cloudera.com> wrote:

> On Wed, Feb 11, 2015 at 1:24 AM, Arunasalam G <ze...@gmail.com>
> wrote:
>
>>
>> Our scenario is we have stored the data with schema added to it.
>>
>> I would like to make it more simple without bringing the Hbase into
>> consideration.
>>
>> We have an Avro data object which has both data and schema and is
>> serialized to Byte Array.
>> Is there any way to retrieve the schema from this ByteArray object?
>>
>> Lets assume that we don't know what schema is present in the incoming
>> object.
>> I could find that for an AVRO data file, its possible to retrieve the
>> schema from the file and similarly, is there any way for retrieving the
>> schema from a serialized byte array object?
>>
>>
> It depends entirely on how you serialized the schema + binary into the
> byte array. Did you use some library or can you briefly describe the method
> used?
>
> --
> Sean
>

Re: Doubt in a AVRO scenario

Posted by Sean Busbey <bu...@cloudera.com>.
On Wed, Feb 11, 2015 at 1:24 AM, Arunasalam G <ze...@gmail.com> wrote:

>
> Our scenario is we have stored the data with schema added to it.
>
> I would like to make it more simple without bringing the Hbase into
> consideration.
>
> We have an Avro data object which has both data and schema and is
> serialized to Byte Array.
> Is there any way to retrieve the schema from this ByteArray object?
>
> Lets assume that we don't know what schema is present in the incoming
> object.
> I could find that for an AVRO data file, its possible to retrieve the
> schema from the file and similarly, is there any way for retrieving the
> schema from a serialized byte array object?
>
>
It depends entirely on how you serialized the schema + binary into the byte
array. Did you use some library or can you briefly describe the method
used?

-- 
Sean

Re: Doubt in a AVRO scenario

Posted by Arunasalam G <ze...@gmail.com>.
Hi Arvind,

Thank you very much for the timely response.

Our scenario is we have stored the data with schema added to it.

I would like to make it more simple without bringing the Hbase into
consideration.

We have an Avro data object which has both data and schema and is
serialized to Byte Array.
Is there any way to retrieve the schema from this ByteArray object?

Lets assume that we don't know what schema is present in the incoming
object.
I could find that for an AVRO data file, its possible to retrieve the
schema from the file and similarly, is there any way for retrieving the
schema from a serialized byte array object?

Thanks in advance.

Regards,
Arun G



On Wed, Feb 11, 2015 at 12:35 PM, Arvind Kalyan <ba...@gmail.com> wrote:

> Schema is not stored along with data. You either need to store it as part
> of the data (hbase value: schema_id + avrodata; and map schema_id to a
> schema somehow in your own code) if you have different schema for different
> rows/cells, or you can keep the schema separately for the whole hbase db if
> all rows are expected to have the same schema.
>
>
>
> On Tue, Feb 10, 2015 at 10:48 PM, Arunasalam G <ze...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am new to AVRO and have a doubt in a scenario. Kindly requesting you to
>> help me on this.
>>
>>     1. An AVRO object is constructed and is serialized and stored as
>> value in HBASE table.
>>     2. While retrieving it, we are retrieving the stored Byte Array
>> value.
>>
>> Is it possible to retrieve the schema directly from the Byte Array value
>> just like we retrieve schema from a file?
>> Because we don't want to initialize the DatumReader object with schema
>> initialized.
>> I find that AVRO supports retrieving a schema from an AVRO file using
>> DataFileReader wherein, the schema can be obtained from the avro file and
>> hence, DatumReader can be initialized without schema.
>> But in our case, we needed to initialize the Reader without schema and
>> hence, is there any way to retrieve the schema stored in a serialized byte
>> array object?
>>
>> I would be really greatful if you take a look at the following scenario.
>>
>> Thanks in advance.
>>
>> Regards,
>> Arun G
>>
>
>
>
> --
> Arvind Kalyan
> http://www.linkedin.com/in/base16
> cell: (408) 761-2030
>

Re: Doubt in a AVRO scenario

Posted by Arvind Kalyan <ba...@gmail.com>.
Schema is not stored along with data. You either need to store it as part
of the data (hbase value: schema_id + avrodata; and map schema_id to a
schema somehow in your own code) if you have different schema for different
rows/cells, or you can keep the schema separately for the whole hbase db if
all rows are expected to have the same schema.



On Tue, Feb 10, 2015 at 10:48 PM, Arunasalam G <ze...@gmail.com>
wrote:

> Hi,
>
> I am new to AVRO and have a doubt in a scenario. Kindly requesting you to
> help me on this.
>
>     1. An AVRO object is constructed and is serialized and stored as value
> in HBASE table.
>     2. While retrieving it, we are retrieving the stored Byte Array value.
>
> Is it possible to retrieve the schema directly from the Byte Array value
> just like we retrieve schema from a file?
> Because we don't want to initialize the DatumReader object with schema
> initialized.
> I find that AVRO supports retrieving a schema from an AVRO file using
> DataFileReader wherein, the schema can be obtained from the avro file and
> hence, DatumReader can be initialized without schema.
> But in our case, we needed to initialize the Reader without schema and
> hence, is there any way to retrieve the schema stored in a serialized byte
> array object?
>
> I would be really greatful if you take a look at the following scenario.
>
> Thanks in advance.
>
> Regards,
> Arun G
>



-- 
Arvind Kalyan
http://www.linkedin.com/in/base16
cell: (408) 761-2030