You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Echo Li <ec...@gmail.com> on 2014/07/22 21:06:36 UTC

How to deserialize avro file with union/many schemas?

Hello,

I'm new here, hope I can get help from you guys. Basically I have an avro
file with union/many schemas and mixed records. I will need to split it to
many avro file, one schema per file. All the stuff I've been reading is
about serializing and deserializing avro file with one schema, which is
pretty straightforward, but in my case I have no clue, any ideas?

Re: How to deserialize avro file with union/many schemas?

Posted by Sachin Goyal <sg...@walmartlabs.com>.
Hi Echo,

Can you share the code that you used to create the below schema?
How are you appending the schemas into one object?
And how is the data being appended to the same object?

Wouldn’t it be simpler to segregate the objects for different schemas such that
one group of objects contains only one schema and its related data objects?

-Sachin

From: Echo Li <ec...@gmail.com>>
Reply-To: "user@avro.apache.org<ma...@avro.apache.org>" <us...@avro.apache.org>>
Date: Wednesday, July 23, 2014 at 7:50 PM
To: "user@avro.apache.org<ma...@avro.apache.org>" <us...@avro.apache.org>>
Subject: Re: How to deserialize avro file with union/many schemas?

thanks Sachin,

My schema more like:
[ { schema-one with type="record"}{schema-two with type="record"}...]

and followed by datums and each pertaining to one of the schemas, and each schema will map to one class.




On Wed, Jul 23, 2014 at 3:42 PM, Sachin Goyal <sg...@walmartlabs.com>> wrote:

To see a union schema, do the following:
System.out.println (ReflectData.AllowNull.get().getSchema(YourClass.class));

And then do the following:
System.out.println (ReflectData.get().getSchema(YourClass.class));

Diff the two outputs.
First one generates a UNION of each and every field with a null.

Hope that helps.
Sachin


From: Echo Li <ec...@gmail.com>>>
Reply-To: "user@avro.apache.org<ma...@avro.apache.org>>" <us...@avro.apache.org>>>
Date: Wednesday, July 23, 2014 at 3:09 PM
To: "user@avro.apache.org<ma...@avro.apache.org>>" <us...@avro.apache.org>>>
Subject: Re: How to deserialize avro file with union/many schemas?

Hi Mike,

I read through most of the doc on avro site, don't see anything about the "union schema", Mike, would you mind give me some example here how the union schma is defined? also what package/method can retrieve the master schema from avro file? is that "getschema()" should work? and how to read in each Avro datums whithout knowing their corresponding schema?....

very much appreciate your help!


On Tue, Jul 22, 2014 at 10:25 PM, Michael Pigott <mp...@gmail.com>>> wrote:

It's just a regular Union :-) http://avro.apache.org/docs/1.7.6/spec.html#Unions

Regards,
Mike

On Jul 23, 2014 1:22 AM, "Echo" <ec...@gmail.com>>> wrote:
Thanks Mike, it sounds make sense, is there any doc I can read about union schema?

On Jul 22, 2014, at 2:32 PM, Michael Pigott <mp...@gmail.com>>> wrote:

Echo,
    Just to make sure I understand you correctly - do you have a file with multiple Avro datums in it, each one following a separate schema?  And are all of these schemas unioned together in a file-level "master schema?"  (As far as I know, Avro file readers and writers only support one schema per file, so this is the only way your question makes sense to me.)
    If that's the case, then you can get the file's "master schema" and determine what all of the different types are:

List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema is of Type.UNION

Then when you read each Avro datum in the file, you can check which of the schemas it conforms to, and write a new file with just that sub-schema and the one datum in it.

Does that make sense?
Mike


On Tue, Jul 22, 2014 at 3:22 PM, Lewis John Mcgibbney <le...@gmail.com>>> wrote:
For the purpose of others on this list, can ytou please provide an example of your schema?
Thanks
Lewis


On Tue, Jul 22, 2014 at 12:06 PM, Echo Li <ec...@gmail.com>>> wrote:
Hello,

I'm new here, hope I can get help from you guys. Basically I have an avro file with union/many schemas and mixed records. I will need to split it to many avro file, one schema per file. All the stuff I've been reading is about serializing and deserializing avro file with one schema, which is pretty straightforward, but in my case I have no clue, any ideas?



--
Lewis




Re: How to deserialize avro file with union/many schemas?

Posted by Echo Li <ec...@gmail.com>.
thanks Sachin,

My schema more like:
[ { schema-one with type="record"}{schema-two with type="record"}...]

and followed by datums and each pertaining to one of the schemas, and each
schema will map to one class.




On Wed, Jul 23, 2014 at 3:42 PM, Sachin Goyal <sg...@walmartlabs.com>
wrote:

>
> To see a union schema, do the following:
> System.out.println
> (ReflectData.AllowNull.get().getSchema(YourClass.class));
>
> And then do the following:
> System.out.println (ReflectData.get().getSchema(YourClass.class));
>
> Diff the two outputs.
> First one generates a UNION of each and every field with a null.
>
> Hope that helps.
> Sachin
>
>
> From: Echo Li <ec...@gmail.com>>
> Reply-To: "user@avro.apache.org<ma...@avro.apache.org>" <
> user@avro.apache.org<ma...@avro.apache.org>>
> Date: Wednesday, July 23, 2014 at 3:09 PM
> To: "user@avro.apache.org<ma...@avro.apache.org>" <
> user@avro.apache.org<ma...@avro.apache.org>>
> Subject: Re: How to deserialize avro file with union/many schemas?
>
> Hi Mike,
>
> I read through most of the doc on avro site, don't see anything about the
> "union schema", Mike, would you mind give me some example here how the
> union schma is defined? also what package/method can retrieve the master
> schema from avro file? is that "getschema()" should work? and how to read
> in each Avro datums whithout knowing their corresponding schema?....
>
> very much appreciate your help!
>
>
> On Tue, Jul 22, 2014 at 10:25 PM, Michael Pigott <
> mpigott.subscriptions@gmail.com<ma...@gmail.com>>
> wrote:
>
> It's just a regular Union :-)
> http://avro.apache.org/docs/1.7.6/spec.html#Unions
>
> Regards,
> Mike
>
> On Jul 23, 2014 1:22 AM, "Echo" <echolql@gmail.com<mailto:
> echolql@gmail.com>> wrote:
> Thanks Mike, it sounds make sense, is there any doc I can read about union
> schema?
>
> On Jul 22, 2014, at 2:32 PM, Michael Pigott <
> mpigott.subscriptions@gmail.com<ma...@gmail.com>>
> wrote:
>
> Echo,
>     Just to make sure I understand you correctly - do you have a file with
> multiple Avro datums in it, each one following a separate schema?  And are
> all of these schemas unioned together in a file-level "master schema?"  (As
> far as I know, Avro file readers and writers only support one schema per
> file, so this is the only way your question makes sense to me.)
>     If that's the case, then you can get the file's "master schema" and
> determine what all of the different types are:
>
> List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema
> is of Type.UNION
>
> Then when you read each Avro datum in the file, you can check which of the
> schemas it conforms to, and write a new file with just that sub-schema and
> the one datum in it.
>
> Does that make sense?
> Mike
>
>
> On Tue, Jul 22, 2014 at 3:22 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com<ma...@gmail.com>> wrote:
> For the purpose of others on this list, can ytou please provide an example
> of your schema?
> Thanks
> Lewis
>
>
> On Tue, Jul 22, 2014 at 12:06 PM, Echo Li <echolql@gmail.com<mailto:
> echolql@gmail.com>> wrote:
> Hello,
>
> I'm new here, hope I can get help from you guys. Basically I have an avro
> file with union/many schemas and mixed records. I will need to split it to
> many avro file, one schema per file. All the stuff I've been reading is
> about serializing and deserializing avro file with one schema, which is
> pretty straightforward, but in my case I have no clue, any ideas?
>
>
>
> --
> Lewis
>
>
>

Re: How to deserialize avro file with union/many schemas?

Posted by Sachin Goyal <sg...@walmartlabs.com>.
To see a union schema, do the following:
System.out.println (ReflectData.AllowNull.get().getSchema(YourClass.class));

And then do the following:
System.out.println (ReflectData.get().getSchema(YourClass.class));

Diff the two outputs.
First one generates a UNION of each and every field with a null.

Hope that helps.
Sachin


From: Echo Li <ec...@gmail.com>>
Reply-To: "user@avro.apache.org<ma...@avro.apache.org>" <us...@avro.apache.org>>
Date: Wednesday, July 23, 2014 at 3:09 PM
To: "user@avro.apache.org<ma...@avro.apache.org>" <us...@avro.apache.org>>
Subject: Re: How to deserialize avro file with union/many schemas?

Hi Mike,

I read through most of the doc on avro site, don't see anything about the "union schema", Mike, would you mind give me some example here how the union schma is defined? also what package/method can retrieve the master schema from avro file? is that "getschema()" should work? and how to read in each Avro datums whithout knowing their corresponding schema?....

very much appreciate your help!


On Tue, Jul 22, 2014 at 10:25 PM, Michael Pigott <mp...@gmail.com>> wrote:

It's just a regular Union :-) http://avro.apache.org/docs/1.7.6/spec.html#Unions

Regards,
Mike

On Jul 23, 2014 1:22 AM, "Echo" <ec...@gmail.com>> wrote:
Thanks Mike, it sounds make sense, is there any doc I can read about union schema?

On Jul 22, 2014, at 2:32 PM, Michael Pigott <mp...@gmail.com>> wrote:

Echo,
    Just to make sure I understand you correctly - do you have a file with multiple Avro datums in it, each one following a separate schema?  And are all of these schemas unioned together in a file-level "master schema?"  (As far as I know, Avro file readers and writers only support one schema per file, so this is the only way your question makes sense to me.)
    If that's the case, then you can get the file's "master schema" and determine what all of the different types are:

List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema is of Type.UNION

Then when you read each Avro datum in the file, you can check which of the schemas it conforms to, and write a new file with just that sub-schema and the one datum in it.

Does that make sense?
Mike


On Tue, Jul 22, 2014 at 3:22 PM, Lewis John Mcgibbney <le...@gmail.com>> wrote:
For the purpose of others on this list, can ytou please provide an example of your schema?
Thanks
Lewis


On Tue, Jul 22, 2014 at 12:06 PM, Echo Li <ec...@gmail.com>> wrote:
Hello,

I'm new here, hope I can get help from you guys. Basically I have an avro file with union/many schemas and mixed records. I will need to split it to many avro file, one schema per file. All the stuff I've been reading is about serializing and deserializing avro file with one schema, which is pretty straightforward, but in my case I have no clue, any ideas?



--
Lewis



Re: How to deserialize avro file with union/many schemas?

Posted by Echo Li <ec...@gmail.com>.
Hi Mike,

I read through most of the doc on avro site, don't see anything about the
"union schema", Mike, would you mind give me some example here how the
union schma is defined? also what package/method can retrieve the master
schema from avro file? is that "getschema()" should work? and how to read
in each Avro datums whithout knowing their corresponding schema?....

very much appreciate your help!


On Tue, Jul 22, 2014 at 10:25 PM, Michael Pigott <
mpigott.subscriptions@gmail.com> wrote:

> It's just a regular Union :-)
> http://avro.apache.org/docs/1.7.6/spec.html#Unions
>
> Regards,
> Mike
> On Jul 23, 2014 1:22 AM, "Echo" <ec...@gmail.com> wrote:
>
>> Thanks Mike, it sounds make sense, is there any doc I can read about
>> union schema?
>>
>> On Jul 22, 2014, at 2:32 PM, Michael Pigott <
>> mpigott.subscriptions@gmail.com> wrote:
>>
>> Echo,
>>     Just to make sure I understand you correctly - do you have a file
>> with multiple Avro datums in it, each one following a separate schema?  And
>> are all of these schemas unioned together in a file-level "master schema?"
>>  (As far as I know, Avro file readers and writers only support one schema
>> per file, so this is the only way your question makes sense to me.)
>>     If that's the case, then you can get the file's "master schema" and
>> determine what all of the different types are:
>>
>> List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema
>> is of Type.UNION
>>
>> Then when you read each Avro datum in the file, you can check which of
>> the schemas it conforms to, and write a new file with just that sub-schema
>> and the one datum in it.
>>
>> Does that make sense?
>> Mike
>>
>>
>> On Tue, Jul 22, 2014 at 3:22 PM, Lewis John Mcgibbney <
>> lewis.mcgibbney@gmail.com> wrote:
>>
>>> For the purpose of others on this list, can ytou please provide an
>>> example of your schema?
>>> Thanks
>>> Lewis
>>>
>>>
>>> On Tue, Jul 22, 2014 at 12:06 PM, Echo Li <ec...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm new here, hope I can get help from you guys. Basically I have an
>>>> avro file with union/many schemas and mixed records. I will need to split
>>>> it to many avro file, one schema per file. All the stuff I've been reading
>>>> is about serializing and deserializing avro file with one schema, which is
>>>> pretty straightforward, but in my case I have no clue, any ideas?
>>>>
>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>
>>

Re: How to deserialize avro file with union/many schemas?

Posted by Michael Pigott <mp...@gmail.com>.
It's just a regular Union :-)
http://avro.apache.org/docs/1.7.6/spec.html#Unions

Regards,
Mike
On Jul 23, 2014 1:22 AM, "Echo" <ec...@gmail.com> wrote:

> Thanks Mike, it sounds make sense, is there any doc I can read about union
> schema?
>
> On Jul 22, 2014, at 2:32 PM, Michael Pigott <
> mpigott.subscriptions@gmail.com> wrote:
>
> Echo,
>     Just to make sure I understand you correctly - do you have a file with
> multiple Avro datums in it, each one following a separate schema?  And are
> all of these schemas unioned together in a file-level "master schema?"  (As
> far as I know, Avro file readers and writers only support one schema per
> file, so this is the only way your question makes sense to me.)
>     If that's the case, then you can get the file's "master schema" and
> determine what all of the different types are:
>
> List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema
> is of Type.UNION
>
> Then when you read each Avro datum in the file, you can check which of the
> schemas it conforms to, and write a new file with just that sub-schema and
> the one datum in it.
>
> Does that make sense?
> Mike
>
>
> On Tue, Jul 22, 2014 at 3:22 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> For the purpose of others on this list, can ytou please provide an
>> example of your schema?
>> Thanks
>> Lewis
>>
>>
>> On Tue, Jul 22, 2014 at 12:06 PM, Echo Li <ec...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I'm new here, hope I can get help from you guys. Basically I have an
>>> avro file with union/many schemas and mixed records. I will need to split
>>> it to many avro file, one schema per file. All the stuff I've been reading
>>> is about serializing and deserializing avro file with one schema, which is
>>> pretty straightforward, but in my case I have no clue, any ideas?
>>>
>>
>>
>>
>> --
>> *Lewis*
>>
>
>

Re: How to deserialize avro file with union/many schemas?

Posted by Echo <ec...@gmail.com>.
Thanks Mike, it sounds make sense, is there any doc I can read about union schema?

> On Jul 22, 2014, at 2:32 PM, Michael Pigott <mp...@gmail.com> wrote:
> 
> Echo,
>     Just to make sure I understand you correctly - do you have a file with multiple Avro datums in it, each one following a separate schema?  And are all of these schemas unioned together in a file-level "master schema?"  (As far as I know, Avro file readers and writers only support one schema per file, so this is the only way your question makes sense to me.)
>     If that's the case, then you can get the file's "master schema" and determine what all of the different types are:
> 
> List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema is of Type.UNION
> 
> Then when you read each Avro datum in the file, you can check which of the schemas it conforms to, and write a new file with just that sub-schema and the one datum in it.
> 
> Does that make sense?
> Mike
> 
> 
>> On Tue, Jul 22, 2014 at 3:22 PM, Lewis John Mcgibbney <le...@gmail.com> wrote:
>> For the purpose of others on this list, can ytou please provide an example of your schema?
>> Thanks
>> Lewis
>> 
>> 
>>> On Tue, Jul 22, 2014 at 12:06 PM, Echo Li <ec...@gmail.com> wrote:
>>> Hello,
>>> 
>>> I'm new here, hope I can get help from you guys. Basically I have an avro file with union/many schemas and mixed records. I will need to split it to many avro file, one schema per file. All the stuff I've been reading is about serializing and deserializing avro file with one schema, which is pretty straightforward, but in my case I have no clue, any ideas? 
>> 
>> 
>> 
>> -- 
>> Lewis 
> 

Re: How to deserialize avro file with union/many schemas?

Posted by Echo <ec...@gmail.com>.
Also, does the avro command line tool work with union schema?

> On Jul 22, 2014, at 2:32 PM, Michael Pigott <mp...@gmail.com> wrote:
> 
> Echo,
>     Just to make sure I understand you correctly - do you have a file with multiple Avro datums in it, each one following a separate schema?  And are all of these schemas unioned together in a file-level "master schema?"  (As far as I know, Avro file readers and writers only support one schema per file, so this is the only way your question makes sense to me.)
>     If that's the case, then you can get the file's "master schema" and determine what all of the different types are:
> 
> List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema is of Type.UNION
> 
> Then when you read each Avro datum in the file, you can check which of the schemas it conforms to, and write a new file with just that sub-schema and the one datum in it.
> 
> Does that make sense?
> Mike
> 
> 
>> On Tue, Jul 22, 2014 at 3:22 PM, Lewis John Mcgibbney <le...@gmail.com> wrote:
>> For the purpose of others on this list, can ytou please provide an example of your schema?
>> Thanks
>> Lewis
>> 
>> 
>>> On Tue, Jul 22, 2014 at 12:06 PM, Echo Li <ec...@gmail.com> wrote:
>>> Hello,
>>> 
>>> I'm new here, hope I can get help from you guys. Basically I have an avro file with union/many schemas and mixed records. I will need to split it to many avro file, one schema per file. All the stuff I've been reading is about serializing and deserializing avro file with one schema, which is pretty straightforward, but in my case I have no clue, any ideas? 
>> 
>> 
>> 
>> -- 
>> Lewis 
> 

Re: How to deserialize avro file with union/many schemas?

Posted by Michael Pigott <mp...@gmail.com>.
Echo,
    Just to make sure I understand you correctly - do you have a file with
multiple Avro datums in it, each one following a separate schema?  And are
all of these schemas unioned together in a file-level "master schema?"  (As
far as I know, Avro file readers and writers only support one schema per
file, so this is the only way your question makes sense to me.)
    If that's the case, then you can get the file's "master schema" and
determine what all of the different types are:

List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema is
of Type.UNION

Then when you read each Avro datum in the file, you can check which of the
schemas it conforms to, and write a new file with just that sub-schema and
the one datum in it.

Does that make sense?
Mike


On Tue, Jul 22, 2014 at 3:22 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> For the purpose of others on this list, can ytou please provide an example
> of your schema?
> Thanks
> Lewis
>
>
> On Tue, Jul 22, 2014 at 12:06 PM, Echo Li <ec...@gmail.com> wrote:
>
>> Hello,
>>
>> I'm new here, hope I can get help from you guys. Basically I have an avro
>> file with union/many schemas and mixed records. I will need to split it to
>> many avro file, one schema per file. All the stuff I've been reading is
>> about serializing and deserializing avro file with one schema, which is
>> pretty straightforward, but in my case I have no clue, any ideas?
>>
>
>
>
> --
> *Lewis*
>

Re: How to deserialize avro file with union/many schemas?

Posted by Lewis John Mcgibbney <le...@gmail.com>.
For the purpose of others on this list, can ytou please provide an example
of your schema?
Thanks
Lewis


On Tue, Jul 22, 2014 at 12:06 PM, Echo Li <ec...@gmail.com> wrote:

> Hello,
>
> I'm new here, hope I can get help from you guys. Basically I have an avro
> file with union/many schemas and mixed records. I will need to split it to
> many avro file, one schema per file. All the stuff I've been reading is
> about serializing and deserializing avro file with one schema, which is
> pretty straightforward, but in my case I have no clue, any ideas?
>



-- 
*Lewis*