You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by "W.P. McNeill" <bi...@gmail.com> on 2011/05/17 00:10:11 UTC
Can I nest schema?
I am writing a Hadoop application whose values are objects called Records
which are serialized using Avro. (I specify a Serialization class for the
Records via the io.serializations property.)
I now need to expand my application so that instead of just a Record I need
to have a more complicated data structure, call it an Augmented Record. Say
that an Augmented Record contains integer N in addition to the record, so
now the value looks like (N, Record). Adding an integer field to the Record
schema just to support this one Hadoop process would be a hack, but I also
can't create a Writable (WritableInt, Record) object because Record uses its
own Avro serialization scheme and so is not Writable. What I want to do is
basically create a new schema of the form [Integer: N, Record: R], where the
Record schema is read in dynamically. Can I dynamically nest schema in this
manner? If not, what is the best approach to serializing an Augmented
Record?
Thanks.
Re: Can I nest schema?
Posted by Sudharsan Sampath <su...@gmail.com>.
Hi,
You can create a record with the integer and the record itself as fields and
use this as the record for the job. Your schema would look something as
follows.
{
"name" : "augmentedRecord",
"type" : "record",
"fields" : [{
"name" : "index",
"type" : "int"
},{
"name" : "actualRecord",
"type" : "record",
"fields" : [{
<<your original schema>>
}]
}]
}
- Sudhan S
On Tue, May 17, 2011 at 3:40 AM, W.P. McNeill <bi...@gmail.com> wrote:
> I am writing a Hadoop application whose values are objects called Records
> which are serialized using Avro. (I specify a Serialization class for the
> Records via the io.serializations property.)
>
> I now need to expand my application so that instead of just a Record I need
> to have a more complicated data structure, call it an Augmented Record. Say
> that an Augmented Record contains integer N in addition to the record, so
> now the value looks like (N, Record). Adding an integer field to the Record
> schema just to support this one Hadoop process would be a hack, but I also
> can't create a Writable (WritableInt, Record) object because Record uses its
> own Avro serialization scheme and so is not Writable. What I want to do is
> basically create a new schema of the form [Integer: N, Record: R], where the
> Record schema is read in dynamically. Can I dynamically nest schema in this
> manner? If not, what is the best approach to serializing an Augmented
> Record?
>
> Thanks.
>
>
Re: Can I nest schema?
Posted by Scott Carey <sc...@richrelevance.com>.
You can dynamically create a record for this job:
Schema.createRecord( … )
create a field with the int,
create a field with the Record,
put these in a List,
call setFields() on the record.
Use that record for the job.
The result is a record with two fields, the int and the nested Record.
On 5/16/11 3:10 PM, "W.P. McNeill" <bi...@gmail.com>> wrote:
I am writing a Hadoop application whose values are objects called Records which are serialized using Avro. (I specify a Serialization class for the Records via the io.serializations property.)
I now need to expand my application so that instead of just a Record I need to have a more complicated data structure, call it an Augmented Record. Say that an Augmented Record contains integer N in addition to the record, so now the value looks like (N, Record). Adding an integer field to the Record schema just to support this one Hadoop process would be a hack, but I also can't create a Writable (WritableInt, Record) object because Record uses its own Avro serialization scheme and so is not Writable. What I want to do is basically create a new schema of the form [Integer: N, Record: R], where the Record schema is read in dynamically. Can I dynamically nest schema in this manner? If not, what is the best approach to serializing an Augmented Record?
Thanks.