You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by "W.P. McNeill" <bi...@gmail.com> on 2011/05/17 00:10:11 UTC

Can I nest schema?

I am writing a Hadoop application whose values are objects called Records
which are serialized using Avro.  (I specify a Serialization class for the
Records via the io.serializations property.)

I now need to expand my application so that instead of just a Record I need
to have a more complicated data structure, call it an Augmented Record.  Say
that an Augmented Record contains integer N in addition to the record, so
now the value looks like (N, Record).  Adding an integer field to the Record
schema just to support this one Hadoop process would be a hack, but I also
can't create a Writable (WritableInt, Record) object because Record uses its
own Avro serialization scheme and so is not Writable.  What I want to do is
basically create a new schema of the form [Integer: N, Record: R], where the
Record schema is read in dynamically.  Can I dynamically nest schema in this
manner?  If not, what is the best approach to serializing an Augmented
Record?

Thanks.

Re: Can I nest schema?

Posted by Sudharsan Sampath <su...@gmail.com>.
Hi,

You can create a record with the integer and the record itself as fields and
use this as the record for the job. Your schema would look something as
follows.

{
    "name" : "augmentedRecord",
    "type" : "record",
    "fields" : [{
        "name" : "index",
        "type" : "int"
        },{
        "name" : "actualRecord",
        "type" : "record",
        "fields" : [{
            <<your original schema>>
            }]
        }]
}

- Sudhan S

On Tue, May 17, 2011 at 3:40 AM, W.P. McNeill <bi...@gmail.com> wrote:

> I am writing a Hadoop application whose values are objects called Records
> which are serialized using Avro.  (I specify a Serialization class for the
> Records via the io.serializations property.)
>
> I now need to expand my application so that instead of just a Record I need
> to have a more complicated data structure, call it an Augmented Record.  Say
> that an Augmented Record contains integer N in addition to the record, so
> now the value looks like (N, Record).  Adding an integer field to the Record
> schema just to support this one Hadoop process would be a hack, but I also
> can't create a Writable (WritableInt, Record) object because Record uses its
> own Avro serialization scheme and so is not Writable.  What I want to do is
> basically create a new schema of the form [Integer: N, Record: R], where the
> Record schema is read in dynamically.  Can I dynamically nest schema in this
> manner?  If not, what is the best approach to serializing an Augmented
> Record?
>
> Thanks.
>
>

Re: Can I nest schema?

Posted by Scott Carey <sc...@richrelevance.com>.
You can dynamically create a record for this job:

Schema.createRecord( … )
create a field with the int,
create a field with the Record,
put these in a List,
call setFields() on the record.

Use that record for the job.

The result is a record with two fields, the int and the nested Record.

On 5/16/11 3:10 PM, "W.P. McNeill" <bi...@gmail.com>> wrote:

I am writing a Hadoop application whose values are objects called Records which are serialized using Avro.  (I specify a Serialization class for the Records via the io.serializations property.)

I now need to expand my application so that instead of just a Record I need to have a more complicated data structure, call it an Augmented Record.  Say that an Augmented Record contains integer N in addition to the record, so now the value looks like (N, Record).  Adding an integer field to the Record schema just to support this one Hadoop process would be a hack, but I also can't create a Writable (WritableInt, Record) object because Record uses its own Avro serialization scheme and so is not Writable.  What I want to do is basically create a new schema of the form [Integer: N, Record: R], where the Record schema is read in dynamically.  Can I dynamically nest schema in this manner?  If not, what is the best approach to serializing an Augmented Record?

Thanks.