You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Motoko Kusanagi <ma...@outlook.com> on 2018/05/25 21:08:31 UTC

Avro Schema Question

Hi,


I read the specification multiple times. In the specification, it says "A Schema is represented in JSON<http://www.json.org/> by one of:" in the Schema Declaration section. The "one" confuses me as I am interpreting it as exactly one of the 3 that it listed.


In short, can I do this as a single schema?

{type : int},

{type : string},

{type : int},


Or do the following as a single schema?

{type : int},

{type : record ....},

{type : record ....}, // Not the same as the previous.

{type : string},


Or do I have to "embed" the above under a complex type like a record if I want complex schema? Or does "one of" mean I have to choose one and exactly one for the high top-most level of the schema?


Thanks!!



Re: Avro Schema Question

Posted by Motoko Kusanagi <ma...@outlook.com>.
Hi Elliot,

Thanks for that bit of info. It is helpful. Where do you draw the line between complex unions versus simple unions? In other words, what criteria do you use to say this union is too complex?

Thanks,

Scott
________________________________
From: Elliot West <te...@gmail.com>
Sent: Saturday, May 26, 2018 1:58 AM
To: user@avro.apache.org
Subject: Re: Avro Schema Question

A word of caution on the union type. You may find support for unions very patchy if you are hoping to process records using well known data processing engines. We’ve been unable to usefully read union types in both Apache Spark and Hive for example. The simple null union construct is the exception: [null, typeA], as it is usually represented by a nullable columns of typeA. We’ve resorted to prohibiting schemas with complex unions so that our producers can’t create data that is not fully readable by our consumers.

Elliot.

On Fri, 25 May 2018 at 22:30, Motoko Kusanagi <ma...@outlook.com>> wrote:
Hi Michael,

Thanks!! Yes, it does.

Scott
________________________________
From: Michael Smith <mi...@syapse.com>>
Sent: Friday, May 25, 2018 2:21 PM
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: Re: Avro Schema Question

{"type": "int"}, {"type": "string"} is not valid json, so you definitely can't do that. But

[{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a single value that is either an int or a string. At the highest level, your schema can only be one type, but that type may be (and in fact probably will be) a complex type -- a union of records or a single record.

Does that answer your question?

On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi <ma...@outlook.com>> wrote:

Hi,


I read the specification multiple times. In the specification, it says "A Schema is represented in JSON<http://www.json.org/> by one of:" in the Schema Declaration section. The "one" confuses me as I am interpreting it as exactly one of the 3 that it listed.


In short, can I do this as a single schema?

{type : int},

{type : string},

{type : int},


Or do the following as a single schema?

{type : int},

{type : record ....},

{type : record ....}, // Not the same as the previous.

{type : string},


Or do I have to "embed" the above under a complex type like a record if I want complex schema? Or does "one of" mean I have to choose one and exactly one for the high top-most level of the schema?


Thanks!!



--


Michael A. Smith — Senior Systems Engineer

________________________________

michaels@syapse.com<ma...@syapse.com>
syapse.com
<http://www.syapse.com/>100 Matsonford Road<https://maps.google.com/?q=100+Matsonford+Rd&entry=gmail&source=g>
Five Radnor Corporate Center
Suite 444
Radnor, PA 19087
https://www.linkedin.com/in/michaelalexandersmith


[https://lh3.googleusercontent.com/8OwE1TeaqeIeUgpNi5sD9LKfc0Zl8IoENh1w5JbTbmluiHFjMqEPDL_Fl-0ulgaUPxTKEXoYlY2GIdVBSHaqLihzqQCLtJR-gwZWJt9ri0rHgb7rn0hKtqYv5m9iVMdjIUv4xlOx]


Re: Avro Schema Question

Posted by Elliot West <te...@gmail.com>.
A word of caution on the union type. You may find support for unions very
patchy if you are hoping to process records using well known data
processing engines. We’ve been unable to usefully read union types in both
Apache Spark and Hive for example. The simple null union construct is the
exception: [null, typeA], as it is usually represented by a nullable
columns of typeA. We’ve resorted to prohibiting schemas with complex unions
so that our producers can’t create data that is not fully readable by our
consumers.

Elliot.

On Fri, 25 May 2018 at 22:30, Motoko Kusanagi <
major-motoko-kusanagi@outlook.com> wrote:

> Hi Michael,
>
> Thanks!! Yes, it does.
>
> Scott
> ------------------------------
> *From:* Michael Smith <mi...@syapse.com>
> *Sent:* Friday, May 25, 2018 2:21 PM
> *To:* user@avro.apache.org
> *Subject:* Re: Avro Schema Question
>
> {"type": "int"}, {"type": "string"} is not valid json, so you definitely
> can't do that. But
>
> [{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a
> single value that is either an int or a string. At the highest level, your
> schema can only be one type, but that type may be (and in fact probably
> will be) a complex type -- a union of records or a single record.
>
> Does that answer your question?
>
> On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi <
> major-motoko-kusanagi@outlook.com> wrote:
>
> Hi,
>
>
> I read the specification multiple times. In the specification, it says "A
> Schema is represented in JSON <http://www.json.org/> by one of:" in the
> Schema Declaration section. The "one" confuses me as I am interpreting it
> as exactly one of the 3 that it listed.
>
>
> In short, can I do this as a single schema?
>
> {type : int},
>
> {type : string},
>
> {type : int},
>
>
> Or do the following as a single schema?
>
> {type : int},
>
> {type : record ....},
>
> {type : record ....}, // Not the same as the previous.
>
> {type : string},
>
>
> Or do I have to "embed" the above under a complex type like a record if I
> want complex schema? Or does "one of" mean I have to choose one and exactly
> one for the high top-most level of the schema?
>
>
> Thanks!!
>
>
>
> --
>
> Michael A. Smith — Senior Systems Engineer
> ------------------------------
>
> michaels@syapse.com
> syapse.com
> <http://www.syapse.com/>100 Matsonford Road
> <https://maps.google.com/?q=100+Matsonford+Rd&entry=gmail&source=g>
> Five Radnor Corporate Center
> Suite 444
> Radnor, PA 19087
> https://www.linkedin.com/in/michaelalexandersmith
>
>

Re: Avro Schema Question

Posted by Motoko Kusanagi <ma...@outlook.com>.
Hi Michael,

Thanks!! Yes, it does.

Scott
________________________________
From: Michael Smith <mi...@syapse.com>
Sent: Friday, May 25, 2018 2:21 PM
To: user@avro.apache.org
Subject: Re: Avro Schema Question

{"type": "int"}, {"type": "string"} is not valid json, so you definitely can't do that. But

[{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a single value that is either an int or a string. At the highest level, your schema can only be one type, but that type may be (and in fact probably will be) a complex type -- a union of records or a single record.

Does that answer your question?

On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi <ma...@outlook.com>> wrote:

Hi,


I read the specification multiple times. In the specification, it says "A Schema is represented in JSON<http://www.json.org/> by one of:" in the Schema Declaration section. The "one" confuses me as I am interpreting it as exactly one of the 3 that it listed.


In short, can I do this as a single schema?

{type : int},

{type : string},

{type : int},


Or do the following as a single schema?

{type : int},

{type : record ....},

{type : record ....}, // Not the same as the previous.

{type : string},


Or do I have to "embed" the above under a complex type like a record if I want complex schema? Or does "one of" mean I have to choose one and exactly one for the high top-most level of the schema?


Thanks!!



--


Michael A. Smith — Senior Systems Engineer

________________________________

michaels@syapse.com<ma...@syapse.com>
syapse.com
<http://www.syapse.com/>100 Matsonford Road<https://maps.google.com/?q=100+Matsonford+Rd&entry=gmail&source=g>
Five Radnor Corporate Center
Suite 444
Radnor, PA 19087
https://www.linkedin.com/in/michaelalexandersmith


[https://lh3.googleusercontent.com/8OwE1TeaqeIeUgpNi5sD9LKfc0Zl8IoENh1w5JbTbmluiHFjMqEPDL_Fl-0ulgaUPxTKEXoYlY2GIdVBSHaqLihzqQCLtJR-gwZWJt9ri0rHgb7rn0hKtqYv5m9iVMdjIUv4xlOx]


Re: Avro Schema Question

Posted by Michael Smith <mi...@syapse.com>.
{"type": "int"}, {"type": "string"} is not valid json, so you definitely
can't do that. But

[{"type": "int"}, {"type": "string"}] is a valid schema -- it can encode a
single value that is either an int or a string. At the highest level, your
schema can only be one type, but that type may be (and in fact probably
will be) a complex type -- a union of records or a single record.

Does that answer your question?

On Fri, May 25, 2018 at 5:08 PM Motoko Kusanagi <
major-motoko-kusanagi@outlook.com> wrote:

> Hi,
>
>
> I read the specification multiple times. In the specification, it says "A
> Schema is represented in JSON <http://www.json.org/> by one of:" in the
> Schema Declaration section. The "one" confuses me as I am interpreting it
> as exactly one of the 3 that it listed.
>
>
> In short, can I do this as a single schema?
>
> {type : int},
>
> {type : string},
>
> {type : int},
>
>
> Or do the following as a single schema?
>
> {type : int},
>
> {type : record ....},
>
> {type : record ....}, // Not the same as the previous.
>
> {type : string},
>
>
> Or do I have to "embed" the above under a complex type like a record if I
> want complex schema? Or does "one of" mean I have to choose one and exactly
> one for the high top-most level of the schema?
>
>
> Thanks!!
>
>
>
> --

Michael A. Smith — Senior Systems Engineer
------------------------------

michaels@syapse.com
syapse.com
<http://www.syapse.com/>100 Matsonford Road
<https://maps.google.com/?q=100+Matsonford+Rd&entry=gmail&source=g>
Five Radnor Corporate Center
Suite 444
Radnor, PA 19087
https://www.linkedin.com/in/michaelalexandersmith