You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Francis Galiegue <fg...@gmail.com> on 2013/02/22 22:45:15 UTC

Experiment: a JSON Schema describing Avro schemas

Hello,

I have written a JSON Schema describing Avro schemas (in their JSON
form), at least to the best of my knowledge -- that is, I read the
spec, adapted, injected some sample data, and until now it seems to
work:

https://github.com/fge/sample-json-schemas/blob/master/avro/avro-schema.json

Note that this is a structural description only, as such it won't
express constraints such as "this or that namespace must exist" etc.

Comments welcome! In particular data that is failing and why -- I
could not find much data so far...

Have fun,
--
Francis Galiegue, fgaliegue@gmail.com
Try out your JSON Schemas: http://json-schema-validator.herokuapp.com

Re: Experiment: a JSON Schema describing Avro schemas

Posted by Francis Galiegue <fg...@gmail.com>.
On Fri, Feb 22, 2013 at 11:24 PM, Francis Galiegue <fg...@gmail.com> wrote:
[...]
>
> Hmm, interesting... I'll test them on my site one by one, I am sure I
> will find some bugs in the schema! Thanks for the resources!
>

OK, I have tested them all and all pass now! Thanks for the resource!

Next step is Avro schema <-> JSON Schema conversion...

-- 
Francis Galiegue, fgaliegue@gmail.com
Try out your JSON Schemas: http://json-schema-validator.herokuapp.com

Re: Experiment: a JSON Schema describing Avro schemas

Posted by Francis Galiegue <fg...@gmail.com>.
On Fri, Feb 22, 2013 at 11:12 PM, Doug Cutting <cu...@apache.org> wrote:
[...]
>
>>>  - a schema can be the name of a schema defined earlier, e.g.
>>> {"type":"record", "name":"List", "fields":[{"name":"next",
>>> "type":[null, "List"]}]}
>>>
>>
>> I don't believe the schema forbids that at this point but I guess I'll
>> have to check..
>
> No, but you permit any string as a schema, rather than just primitives
> and names.
>

Yes, that is true. But on the other hand, JSON Schema does structural
validation, actual functional checks are still the role of the
application to do. And that also stands for JSON Schemas themselves
since you can have a "dangling" JSON Reference for instance -- you
cannot know unless you try and dereference it.

> A list of schemas to test this on in Avro's source are:
>
> ./share/test/schemas/weather.avsc
> ./share/test/schemas/interop.avsc
> ./share/schemas/org/apache/avro/ipc/HandshakeResponse.avsc
> ./share/schemas/org/apache/avro/ipc/HandshakeRequest.avsc
> ./share/schemas/org/apache/avro/data/Json.avsc
> ./doc/examples/user.avsc
> ./lang/java/trevni/avro/src/test/cases/dremel/sub1/sub.avsc
> ./lang/java/trevni/avro/src/test/cases/dremel/input.avsc
> ./lang/java/compiler/src/test/idl/putOnClassPath/OnTheClasspath.avsc
> ./lang/java/compiler/src/test/idl/input/player.avsc
> ./lang/java/compiler/src/test/idl/input/position.avsc
> ./lang/java/compiler/src/test/idl/input/foo.avsc
> ./lang/java/compiler/src/test/resources/simple_record.avsc
> ./lang/java/maven-plugin/src/test/avro/User.avsc
> ./lang/java/maven-plugin/src/test/avro/directImport/PrivacyDirectImport.avsc
> ./lang/java/maven-plugin/src/test/avro/imports/PrivacyImport.avsc
> ./lang/java/tools/src/test/compiler/input/player.avsc
> ./lang/java/tools/src/test/compiler/input/position.avsc
> ./lang/java/mapred/src/test/java/org/apache/avro/mapred/tether/WordCount.avsc
> ./lang/java/mapred/src/test/avro/TextStats.avsc
> ./lang/java/avro/src/test/resources/FooBarSpecificRecord.avsc
> ./lang/c/tests/schema_tests/pass/interop.avsc
>
> Additionally, one might alter the Schema parser & printer to log each
> schema, then run unit tests and collect these, since there are many
> more schemas that are constructed by the unit tests.  If you're
> interested, I could try to construct a file of valid Avro schemas for
> such testing.

Hmm, interesting... I'll test them on my site one by one, I am sure I
will find some bugs in the schema! Thanks for the resources!

-- 
Francis Galiegue, fgaliegue@gmail.com
Try out your JSON Schemas: http://json-schema-validator.herokuapp.com

Re: Experiment: a JSON Schema describing Avro schemas

Posted by Doug Cutting <cu...@apache.org>.
On Fri, Feb 22, 2013 at 2:04 PM, Francis Galiegue <fg...@gmail.com> wrote:
> By default, JSON Schema allows additional members, it will only forbid
> them if additionalProperties is false (or constrain what they can be
> if additionalProperties is a schema)

I didn't realize that.  Thanks for clarifying.

> so the question would rather be,
> is there some scenarios where they are not allowed?

No, I think they should be allowed anywhere.

> Ah, OK. So you can have either a "full" name, a short name and a
> namespace, or even a "full" name and a namespace?

Yes, any of those three are valid.  (When no namespace is specified,
the namespace of the containing schema is implied.)

>>  - a schema can be the name of a schema defined earlier, e.g.
>> {"type":"record", "name":"List", "fields":[{"name":"next",
>> "type":[null, "List"]}]}
>>
>
> I don't believe the schema forbids that at this point but I guess I'll
> have to check..

No, but you permit any string as a schema, rather than just primitives
and names.

A list of schemas to test this on in Avro's source are:

./share/test/schemas/weather.avsc
./share/test/schemas/interop.avsc
./share/schemas/org/apache/avro/ipc/HandshakeResponse.avsc
./share/schemas/org/apache/avro/ipc/HandshakeRequest.avsc
./share/schemas/org/apache/avro/data/Json.avsc
./doc/examples/user.avsc
./lang/java/trevni/avro/src/test/cases/dremel/sub1/sub.avsc
./lang/java/trevni/avro/src/test/cases/dremel/input.avsc
./lang/java/compiler/src/test/idl/putOnClassPath/OnTheClasspath.avsc
./lang/java/compiler/src/test/idl/input/player.avsc
./lang/java/compiler/src/test/idl/input/position.avsc
./lang/java/compiler/src/test/idl/input/foo.avsc
./lang/java/compiler/src/test/resources/simple_record.avsc
./lang/java/maven-plugin/src/test/avro/User.avsc
./lang/java/maven-plugin/src/test/avro/directImport/PrivacyDirectImport.avsc
./lang/java/maven-plugin/src/test/avro/imports/PrivacyImport.avsc
./lang/java/tools/src/test/compiler/input/player.avsc
./lang/java/tools/src/test/compiler/input/position.avsc
./lang/java/mapred/src/test/java/org/apache/avro/mapred/tether/WordCount.avsc
./lang/java/mapred/src/test/avro/TextStats.avsc
./lang/java/avro/src/test/resources/FooBarSpecificRecord.avsc
./lang/c/tests/schema_tests/pass/interop.avsc

Additionally, one might alter the Schema parser & printer to log each
schema, then run unit tests and collect these, since there are many
more schemas that are constructed by the unit tests.  If you're
interested, I could try to construct a file of valid Avro schemas for
such testing.

Re: Experiment: a JSON Schema describing Avro schemas

Posted by Francis Galiegue <fg...@gmail.com>.
Hello,

On Fri, Feb 22, 2013 at 11:01 PM, Doug Cutting <cu...@apache.org> wrote:
> A few quick comments:
>  - properties besides those mentioned in the spec are permitted as
> metadata, e.g., {"type":"int", "java":"short"}.

By default, JSON Schema allows additional members, it will only forbid
them if additionalProperties is false (or constrain what they can be
if additionalProperties is a schema), so the question would rather be,
is there some scenarios where they are not allowed?

>  - a name can be prefixed by a namespace, e.g., {"type":"record",
> "name":"foo.Bar"}

Ah, OK. So you can have either a "full" name, a short name and a
namespace, or even a "full" name and a namespace?

>  - a schema can be the name of a schema defined earlier, e.g.
> {"type":"record", "name":"List", "fields":[{"name":"next",
> "type":[null, "List"]}]}
>

I don't believe the schema forbids that at this point but I guess I'll
have to check..

Thanks!

-- 
Francis Galiegue, fgaliegue@gmail.com
Try out your JSON Schemas: http://json-schema-validator.herokuapp.com

Re: Experiment: a JSON Schema describing Avro schemas

Posted by Doug Cutting <cu...@apache.org>.
A few quick comments:
 - properties besides those mentioned in the spec are permitted as
metadata, e.g., {"type":"int", "java":"short"}.
 - a name can be prefixed by a namespace, e.g., {"type":"record",
"name":"foo.Bar"}
 - a schema can be the name of a schema defined earlier, e.g.
{"type":"record", "name":"List", "fields":[{"name":"next",
"type":[null, "List"]}]}

Doug

On Fri, Feb 22, 2013 at 1:45 PM, Francis Galiegue <fg...@gmail.com> wrote:
> Hello,
>
> I have written a JSON Schema describing Avro schemas (in their JSON
> form), at least to the best of my knowledge -- that is, I read the
> spec, adapted, injected some sample data, and until now it seems to
> work:
>
> https://github.com/fge/sample-json-schemas/blob/master/avro/avro-schema.json
>
> Note that this is a structural description only, as such it won't
> express constraints such as "this or that namespace must exist" etc.
>
> Comments welcome! In particular data that is failing and why -- I
> could not find much data so far...
>
> Have fun,
> --
> Francis Galiegue, fgaliegue@gmail.com
> Try out your JSON Schemas: http://json-schema-validator.herokuapp.com