You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Elliot West <te...@gmail.com> on 2016/08/11 10:23:06 UTC

Avro as a foundation of a JSON based system

Hello,

We are building a data processing system that has the following required
properties:

   - Data is produced/consumed in JSON format
   - These JSON documents must always adhere to a schema
   - The schema must be defined in JSON also
   - It should be possible to evolve schemas and verify schema compatibility

I initially started looking at Avro, not as a solution, but to understand
how it schema evolution can be managed. However, I quickly discovered that
with its JSON support it is able to meet all of my requirements.

I am now considering a system where data structure is defined using the
Avro JSON schema, data is submitted using JSON that is then internally
decoded into Avro records, these records are then eventually encoded back
into JSON at the point of consumption. It seems to me that I can then take
advantage of Avro’s schema evolution features, while only ever exposing
JSON to consumers and producers. Aside from the dependency on Avro’s JSON
schema syntax, the use of Avro then becomes an internal implementation
detail.

As I am completely new to Avro, I was wondering if this is a credible idea,
or if anyone would care to share their experiences of similar systems that
they have built?

Many thanks,

Elliot.

Re: Avro as a foundation of a JSON based system

Posted by Piotr Wikieł <pi...@gmail.com>.
Take a look at Hermes pub-sub.
http://hermes-pubsub.readthedocs.io/en/latest/
It modifies architecture of Kafka from pull to push and allows consumers
and producers to use avro and json interchangeably (but you choose what
format you want to use on Kafka on topic basis). Consumers are
microservices. We use it in production with >100 microservices. Hermes is
currently using schema-repo as schemas storage.

2016-08-11 12:49 GMT+02:00 Jarrad, Ken <ke...@citi.com>:

> On my project we initially used JSON representation of Avro objects to
> allow human readability of messages on Kafka topics. We abandoned this when
> we encountered inability to represent NaN using the standard (Java) tool
> chain. We now use the binary representation/encoding. There is a JSON
> parser factory property that can be relaxed to allow NaN but we did not
> attempt this. Beware that sometimes NaN is encoded to JSON with quotes
> around it.
>
> *From:* Elliot West [mailto:teabot@gmail.com]
> *Sent:* 11 August 2016 11:23
> *To:* user@avro.apache.org
> *Subject:* Avro as a foundation of a JSON based system
>
>
>
> Hello,
>
>
>
> We are building a data processing system that has the following required
> properties:
>
>    - Data is produced/consumed in JSON format
>    - These JSON documents must always adhere to a schema
>    - The schema must be defined in JSON also
>    - It should be possible to evolve schemas and verify schema
>    compatibility
>
> I initially started looking at Avro, not as a solution, but to understand
> how it schema evolution can be managed. However, I quickly discovered that
> with its JSON support it is able to meet all of my requirements.
>
>
>
> I am now considering a system where data structure is defined using the
> Avro JSON schema, data is submitted using JSON that is then internally
> decoded into Avro records, these records are then eventually encoded back
> into JSON at the point of consumption. It seems to me that I can then take
> advantage of Avro’s schema evolution features, while only ever exposing
> JSON to consumers and producers. Aside from the dependency on Avro’s JSON
> schema syntax, the use of Avro then becomes an internal implementation
> detail.
>
>
>
> As I am completely new to Avro, I was wondering if this is a credible
> idea, or if anyone would care to share their experiences of similar systems
> that they have built?
>
>
>
> Many thanks,
>
>
>
> Elliot.
>
>
>

RE: Avro as a foundation of a JSON based system

Posted by "Jarrad, Ken " <ke...@citi.com>.
On my project we initially used JSON representation of Avro objects to allow human readability of messages on Kafka topics. We abandoned this when we encountered inability to represent NaN using the standard (Java) tool chain. We now use the binary representation/encoding. There is a JSON parser factory property that can be relaxed to allow NaN but we did not attempt this. Beware that sometimes NaN is encoded to JSON with quotes around it.
From: Elliot West [mailto:teabot@gmail.com]
Sent: 11 August 2016 11:23
To: user@avro.apache.org
Subject: Avro as a foundation of a JSON based system

Hello,

We are building a data processing system that has the following required properties:

  *   Data is produced/consumed in JSON format
  *   These JSON documents must always adhere to a schema
  *   The schema must be defined in JSON also
  *   It should be possible to evolve schemas and verify schema compatibility
I initially started looking at Avro, not as a solution, but to understand how it schema evolution can be managed. However, I quickly discovered that with its JSON support it is able to meet all of my requirements.

I am now considering a system where data structure is defined using the Avro JSON schema, data is submitted using JSON that is then internally decoded into Avro records, these records are then eventually encoded back into JSON at the point of consumption. It seems to me that I can then take advantage of Avro’s schema evolution features, while only ever exposing JSON to consumers and producers. Aside from the dependency on Avro’s JSON schema syntax, the use of Avro then becomes an internal implementation detail.

As I am completely new to Avro, I was wondering if this is a credible idea, or if anyone would care to share their experiences of similar systems that they have built?

Many thanks,

Elliot.


Re: Avro as a foundation of a JSON based system

Posted by Josh <jo...@gmail.com>.
I looked into this further and found that this behaviour actually only
happens in one specific case.

If we have a field x defined at the top level of the Avro schema with type
'record' (i.e. x is a nested record), and there are no other fields after
the nested record, then additional fields in the JSON version of x will be
ignored and the Avro record will be read successfully. However, if there
are any other fields defined in the schema after x, then we get an
AvroTypeException when there are unexpected fields in the JSON.

Not sure why this happens, but my fix for now is to just add a nullable
dummy field at the end of the schema.

On Mon, Nov 21, 2016 at 1:35 PM, Josh <jo...@gmail.com> wrote:

> For me, it appears to completely ignore fields in the JSON that aren't
> defined in the reader schema. The reader succeeds and builds a generic
> record (which excludes any additional fields in the JSON).
>
> Thanks for looking into it!
>
> Josh
>
>
> On Fri, Nov 18, 2016 at 8:31 PM, Zoltan Farkas <zo...@yahoo.com>
> wrote:
>
>> I recall that it would fail if you have extra fields in the json that are
>> not defined in the reader schema and not in the writer schema.
>> let me look into it and will get back to you.
>>
>> —Z
>>
>>
>> On Nov 18, 2016, at 7:21 AM, Josh <jo...@gmail.com> wrote:
>>
>> Hi Zoltan,
>>
>> Your ExtendedJsonDecoder / Encoder looks really useful for doing the
>> conversions between JSON and Avro.
>>
>> I just have a quick question -  when I use the ExtendedJsonDecoder with a
>> GenericDatumReader, I get an AvroTypeException whenever the JSON doesn't
>> conform to the Avro schema (as expected). However, if the JSON has some
>> additional fields (i.e. fields that are present in the JSON, but not
>> present in the Avro schema), then the reader ignores those extra fields and
>> converts the JSON to Avro successfully. Do you know if there's a simple way
>> to make the reader detect these extra fields, and throw an exception in
>> that case?
>>
>> Thanks,
>> Josh
>>
>> On Thu, Aug 11, 2016 at 3:52 PM, Zoltan Farkas <zo...@yahoo.com>
>> wrote:
>>
>>> We are doing the same successfully so far… here is some detail:
>>>
>>> we do not use the standard JSON Encoders/Decoders from the avro project
>>> and we have our own which provide a more “natural” JSON encoding that
>>> implements:
>>>
>>> https://issues.apache.org/jira/browse/AVRO-1582
>>>
>>> For us it was also important to fix:
>>>
>>> https://issues.apache.org/jira/browse/AVRO-1723
>>>
>>> We had to use our own fork to be able to fix/implement our needs faster,
>>> which you can look at: https://github.com/zolyfarkas/avro
>>>
>>> Here is how we use the avro schemas:
>>>
>>> We develop our avro schema’s in separate projects “schema projects”.
>>>
>>> These projects are standard maven projects, stored in version control,
>>> build with CI and published to a maven repo the following:
>>> 1) avro generated java objects, sources and javadoc.
>>> 2) c# generated objects. (accessible with nugget to everybody)
>>> 3) zip package containing all schemas.
>>>
>>> We use avro IDL to define the schemas in the project, the avsc json
>>> format is difficult to read and maintain, the schema json is only a wire
>>> format for us.
>>>
>>> We see these advantages:
>>>
>>> 1) Building/Releasing a schema project is identical with releasing any
>>> maven project. (Jenkins, maven release plugin...)
>>> 2) Using this we can take advantage of the maven dependency system and
>>> reuse schemas. it is as simple as adding a <dependency> in your pom and a
>>> import statement in your idl. (C# uses nugget)
>>> 3) As a side result our maven repo becomes a schema repo. And so far we
>>> see no reason to use a dedicated schema repo like:
>>> https://issues.apache.org/jira/browse/AVRO-1124
>>> 4) the schema owner not only publishes schemas but also publishes al
>>> DTOs for java and .NET, this way any team that needs to use the schema has
>>> no need to generate code, all they need is to add a package dependency to
>>> they project.
>>> 5) During the build we also validate compatibiliy with the previously
>>> released schemas.
>>> 6) During the build we also validate schema quality. (comments on
>>> fields, naming…). We are planning to make this maven plugin open source.
>>> 7) Maven dependencies give you all the data needed to figure out what
>>> apps use a schema like: group:myschema:3.0
>>> 8) A rest service that uses a avro object for payload, can serve/accept
>>> data in: application/octet-stream;fmt=avro (avro binary),
>>> application/json;fmt=avro (classic json encoding),
>>> application/json;fmt=enhanced(AVRO-1582) allowing us to pick the right
>>> format for the right use case. (AVRO-1582 json can be significantly smaller
>>> in size than binary on certain type of data)
>>> 9) During the build we generate improved HTML doc for the avro objects,
>>> like: http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/
>>>
>>> The more we leverage avro the more use cases we find like:
>>>
>>> 1) config discovery plugin that scans code for uses of
>>> System.getProperty… and generates a avro idl :
>>> http://zolyfarkas.github.io/spf4j/spf4j-config-discovery-m
>>> aven-plugin/index.html
>>> 2) generate avro idl from jdbc metadata...
>>>
>>> hope it helps!
>>>
>>> cheers
>>>
>>> —Z
>>>
>>>
>>> On Aug 11, 2016, at 6:23 AM, Elliot West <te...@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> We are building a data processing system that has the following required
>>> properties:
>>>
>>>    - Data is produced/consumed in JSON format
>>>    - These JSON documents must always adhere to a schema
>>>    - The schema must be defined in JSON also
>>>    - It should be possible to evolve schemas and verify schema
>>>    compatibility
>>>
>>> I initially started looking at Avro, not as a solution, but to
>>> understand how it schema evolution can be managed. However, I quickly
>>> discovered that with its JSON support it is able to meet all of my
>>> requirements.
>>>
>>> I am now considering a system where data structure is defined using the
>>> Avro JSON schema, data is submitted using JSON that is then internally
>>> decoded into Avro records, these records are then eventually encoded back
>>> into JSON at the point of consumption. It seems to me that I can then take
>>> advantage of Avro’s schema evolution features, while only ever exposing
>>> JSON to consumers and producers. Aside from the dependency on Avro’s JSON
>>> schema syntax, the use of Avro then becomes an internal implementation
>>> detail.
>>>
>>> As I am completely new to Avro, I was wondering if this is a credible
>>> idea, or if anyone would care to share their experiences of similar systems
>>> that they have built?
>>>
>>> Many thanks,
>>>
>>> Elliot.
>>>
>>>
>>>
>>
>>
>

Re: Avro as a foundation of a JSON based system

Posted by Josh <jo...@gmail.com>.
For me, it appears to completely ignore fields in the JSON that aren't
defined in the reader schema. The reader succeeds and builds a generic
record (which excludes any additional fields in the JSON).

Thanks for looking into it!

Josh

On Fri, Nov 18, 2016 at 8:31 PM, Zoltan Farkas <zo...@yahoo.com> wrote:

> I recall that it would fail if you have extra fields in the json that are
> not defined in the reader schema and not in the writer schema.
> let me look into it and will get back to you.
>
> —Z
>
>
> On Nov 18, 2016, at 7:21 AM, Josh <jo...@gmail.com> wrote:
>
> Hi Zoltan,
>
> Your ExtendedJsonDecoder / Encoder looks really useful for doing the
> conversions between JSON and Avro.
>
> I just have a quick question -  when I use the ExtendedJsonDecoder with a
> GenericDatumReader, I get an AvroTypeException whenever the JSON doesn't
> conform to the Avro schema (as expected). However, if the JSON has some
> additional fields (i.e. fields that are present in the JSON, but not
> present in the Avro schema), then the reader ignores those extra fields and
> converts the JSON to Avro successfully. Do you know if there's a simple way
> to make the reader detect these extra fields, and throw an exception in
> that case?
>
> Thanks,
> Josh
>
> On Thu, Aug 11, 2016 at 3:52 PM, Zoltan Farkas <zo...@yahoo.com>
> wrote:
>
>> We are doing the same successfully so far… here is some detail:
>>
>> we do not use the standard JSON Encoders/Decoders from the avro project
>> and we have our own which provide a more “natural” JSON encoding that
>> implements:
>>
>> https://issues.apache.org/jira/browse/AVRO-1582
>>
>> For us it was also important to fix:
>>
>> https://issues.apache.org/jira/browse/AVRO-1723
>>
>> We had to use our own fork to be able to fix/implement our needs faster,
>> which you can look at: https://github.com/zolyfarkas/avro
>>
>> Here is how we use the avro schemas:
>>
>> We develop our avro schema’s in separate projects “schema projects”.
>>
>> These projects are standard maven projects, stored in version control,
>> build with CI and published to a maven repo the following:
>> 1) avro generated java objects, sources and javadoc.
>> 2) c# generated objects. (accessible with nugget to everybody)
>> 3) zip package containing all schemas.
>>
>> We use avro IDL to define the schemas in the project, the avsc json
>> format is difficult to read and maintain, the schema json is only a wire
>> format for us.
>>
>> We see these advantages:
>>
>> 1) Building/Releasing a schema project is identical with releasing any
>> maven project. (Jenkins, maven release plugin...)
>> 2) Using this we can take advantage of the maven dependency system and
>> reuse schemas. it is as simple as adding a <dependency> in your pom and a
>> import statement in your idl. (C# uses nugget)
>> 3) As a side result our maven repo becomes a schema repo. And so far we
>> see no reason to use a dedicated schema repo like:
>> https://issues.apache.org/jira/browse/AVRO-1124
>> 4) the schema owner not only publishes schemas but also publishes al DTOs
>> for java and .NET, this way any team that needs to use the schema has no
>> need to generate code, all they need is to add a package dependency to they
>> project.
>> 5) During the build we also validate compatibiliy with the previously
>> released schemas.
>> 6) During the build we also validate schema quality. (comments on fields,
>> naming…). We are planning to make this maven plugin open source.
>> 7) Maven dependencies give you all the data needed to figure out what
>> apps use a schema like: group:myschema:3.0
>> 8) A rest service that uses a avro object for payload, can serve/accept
>> data in: application/octet-stream;fmt=avro (avro binary),
>> application/json;fmt=avro (classic json encoding),
>> application/json;fmt=enhanced(AVRO-1582) allowing us to pick the right
>> format for the right use case. (AVRO-1582 json can be significantly smaller
>> in size than binary on certain type of data)
>> 9) During the build we generate improved HTML doc for the avro objects,
>> like: http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/
>>
>> The more we leverage avro the more use cases we find like:
>>
>> 1) config discovery plugin that scans code for uses of
>> System.getProperty… and generates a avro idl :
>> http://zolyfarkas.github.io/spf4j/spf4j-config-discovery-m
>> aven-plugin/index.html
>> 2) generate avro idl from jdbc metadata...
>>
>> hope it helps!
>>
>> cheers
>>
>> —Z
>>
>>
>> On Aug 11, 2016, at 6:23 AM, Elliot West <te...@gmail.com> wrote:
>>
>> Hello,
>>
>> We are building a data processing system that has the following required
>> properties:
>>
>>    - Data is produced/consumed in JSON format
>>    - These JSON documents must always adhere to a schema
>>    - The schema must be defined in JSON also
>>    - It should be possible to evolve schemas and verify schema
>>    compatibility
>>
>> I initially started looking at Avro, not as a solution, but to understand
>> how it schema evolution can be managed. However, I quickly discovered that
>> with its JSON support it is able to meet all of my requirements.
>>
>> I am now considering a system where data structure is defined using the
>> Avro JSON schema, data is submitted using JSON that is then internally
>> decoded into Avro records, these records are then eventually encoded back
>> into JSON at the point of consumption. It seems to me that I can then take
>> advantage of Avro’s schema evolution features, while only ever exposing
>> JSON to consumers and producers. Aside from the dependency on Avro’s JSON
>> schema syntax, the use of Avro then becomes an internal implementation
>> detail.
>>
>> As I am completely new to Avro, I was wondering if this is a credible
>> idea, or if anyone would care to share their experiences of similar systems
>> that they have built?
>>
>> Many thanks,
>>
>> Elliot.
>>
>>
>>
>
>

Re: Avro as a foundation of a JSON based system

Posted by Zoltan Farkas <zo...@yahoo.com>.
I recall that it would fail if you have extra fields in the json that are not defined in the reader schema and not in the writer schema.
let me look into it and will get back to you.

—Z


> On Nov 18, 2016, at 7:21 AM, Josh <jo...@gmail.com> wrote:
> 
> Hi Zoltan,
> 
> Your ExtendedJsonDecoder / Encoder looks really useful for doing the conversions between JSON and Avro.
> 
> I just have a quick question -  when I use the ExtendedJsonDecoder with a GenericDatumReader, I get an AvroTypeException whenever the JSON doesn't conform to the Avro schema (as expected). However, if the JSON has some additional fields (i.e. fields that are present in the JSON, but not present in the Avro schema), then the reader ignores those extra fields and converts the JSON to Avro successfully. Do you know if there's a simple way to make the reader detect these extra fields, and throw an exception in that case?
> 
> Thanks,
> Josh
> 
> On Thu, Aug 11, 2016 at 3:52 PM, Zoltan Farkas <zolyfarkas@yahoo.com <ma...@yahoo.com>> wrote:
> We are doing the same successfully so far… here is some detail:
> 
> we do not use the standard JSON Encoders/Decoders from the avro project and we have our own which provide a more “natural” JSON encoding that implements:
> 
> https://issues.apache.org/jira/browse/AVRO-1582 <https://issues.apache.org/jira/browse/AVRO-1582>
> 
> For us it was also important to fix:
> 
> https://issues.apache.org/jira/browse/AVRO-1723 <https://issues.apache.org/jira/browse/AVRO-1723>
> 
> We had to use our own fork to be able to fix/implement our needs faster, which you can look at: https://github.com/zolyfarkas/avro <https://github.com/zolyfarkas/avro> 
> 
> Here is how we use the avro schemas:
> 
> We develop our avro schema’s in separate projects “schema projects”.
> 
> These projects are standard maven projects, stored in version control, build with CI and published to a maven repo the following:
> 1) avro generated java objects, sources and javadoc.
> 2) c# generated objects. (accessible with nugget to everybody)
> 3) zip package containing all schemas.
> 
> We use avro IDL to define the schemas in the project, the avsc json format is difficult to read and maintain, the schema json is only a wire format for us.
> 
> We see these advantages:
> 
> 1) Building/Releasing a schema project is identical with releasing any maven project. (Jenkins, maven release plugin...)
> 2) Using this we can take advantage of the maven dependency system and reuse schemas. it is as simple as adding a <dependency> in your pom and a import statement in your idl. (C# uses nugget)
> 3) As a side result our maven repo becomes a schema repo. And so far we see no reason to use a dedicated schema repo like: https://issues.apache.org/jira/browse/AVRO-1124 <https://issues.apache.org/jira/browse/AVRO-1124> 
> 4) the schema owner not only publishes schemas but also publishes al DTOs for java and .NET, this way any team that needs to use the schema has no need to generate code, all they need is to add a package dependency to they project.
> 5) During the build we also validate compatibiliy with the previously released schemas.
> 6) During the build we also validate schema quality. (comments on fields, naming…). We are planning to make this maven plugin open source.
> 7) Maven dependencies give you all the data needed to figure out what apps use a schema like: group:myschema:3.0
> 8) A rest service that uses a avro object for payload, can serve/accept data in: application/octet-stream;fmt=avro (avro binary), application/json;fmt=avro (classic json encoding), application/json;fmt=enhanced(AVRO-1582) allowing us to pick the right format for the right use case. (AVRO-1582 json can be significantly smaller in size than binary on certain type of data)
> 9) During the build we generate improved HTML doc for the avro objects, like: http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/ <http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/> 
> 
> The more we leverage avro the more use cases we find like:
> 
> 1) config discovery plugin that scans code for uses of System.getProperty… and generates a avro idl : http://zolyfarkas.github.io/spf4j/spf4j-config-discovery-maven-plugin/index.html <http://zolyfarkas.github.io/spf4j/spf4j-config-discovery-maven-plugin/index.html> 
> 2) generate avro idl from jdbc metadata...
> 
> hope it helps!
> 
> cheers
> 
> —Z
> 
> 
>> On Aug 11, 2016, at 6:23 AM, Elliot West <teabot@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hello,
>> 
>> We are building a data processing system that has the following required properties:
>> Data is produced/consumed in JSON format
>> These JSON documents must always adhere to a schema
>> The schema must be defined in JSON also
>> It should be possible to evolve schemas and verify schema compatibility
>> I initially started looking at Avro, not as a solution, but to understand how it schema evolution can be managed. However, I quickly discovered that with its JSON support it is able to meet all of my requirements.
>> 
>> I am now considering a system where data structure is defined using the Avro JSON schema, data is submitted using JSON that is then internally decoded into Avro records, these records are then eventually encoded back into JSON at the point of consumption. It seems to me that I can then take advantage of Avro’s schema evolution features, while only ever exposing JSON to consumers and producers. Aside from the dependency on Avro’s JSON schema syntax, the use of Avro then becomes an internal implementation detail.
>> 
>> As I am completely new to Avro, I was wondering if this is a credible idea, or if anyone would care to share their experiences of similar systems that they have built?
>> 
>> Many thanks,
>> 
>> Elliot.
>> 
> 
> 


Re: Avro as a foundation of a JSON based system

Posted by Josh <jo...@gmail.com>.
Hi Zoltan,

Your ExtendedJsonDecoder / Encoder looks really useful for doing the
conversions between JSON and Avro.

I just have a quick question -  when I use the ExtendedJsonDecoder with a
GenericDatumReader, I get an AvroTypeException whenever the JSON doesn't
conform to the Avro schema (as expected). However, if the JSON has some
additional fields (i.e. fields that are present in the JSON, but not
present in the Avro schema), then the reader ignores those extra fields and
converts the JSON to Avro successfully. Do you know if there's a simple way
to make the reader detect these extra fields, and throw an exception in
that case?

Thanks,
Josh

On Thu, Aug 11, 2016 at 3:52 PM, Zoltan Farkas <zo...@yahoo.com> wrote:

> We are doing the same successfully so far… here is some detail:
>
> we do not use the standard JSON Encoders/Decoders from the avro project
> and we have our own which provide a more “natural” JSON encoding that
> implements:
>
> https://issues.apache.org/jira/browse/AVRO-1582
>
> For us it was also important to fix:
>
> https://issues.apache.org/jira/browse/AVRO-1723
>
> We had to use our own fork to be able to fix/implement our needs faster,
> which you can look at: https://github.com/zolyfarkas/avro
>
> Here is how we use the avro schemas:
>
> We develop our avro schema’s in separate projects “schema projects”.
>
> These projects are standard maven projects, stored in version control,
> build with CI and published to a maven repo the following:
> 1) avro generated java objects, sources and javadoc.
> 2) c# generated objects. (accessible with nugget to everybody)
> 3) zip package containing all schemas.
>
> We use avro IDL to define the schemas in the project, the avsc json format
> is difficult to read and maintain, the schema json is only a wire format
> for us.
>
> We see these advantages:
>
> 1) Building/Releasing a schema project is identical with releasing any
> maven project. (Jenkins, maven release plugin...)
> 2) Using this we can take advantage of the maven dependency system and
> reuse schemas. it is as simple as adding a <dependency> in your pom and a
> import statement in your idl. (C# uses nugget)
> 3) As a side result our maven repo becomes a schema repo. And so far we
> see no reason to use a dedicated schema repo like: https://issues.apache.
> org/jira/browse/AVRO-1124
> 4) the schema owner not only publishes schemas but also publishes al DTOs
> for java and .NET, this way any team that needs to use the schema has no
> need to generate code, all they need is to add a package dependency to they
> project.
> 5) During the build we also validate compatibiliy with the previously
> released schemas.
> 6) During the build we also validate schema quality. (comments on fields,
> naming…). We are planning to make this maven plugin open source.
> 7) Maven dependencies give you all the data needed to figure out what apps
> use a schema like: group:myschema:3.0
> 8) A rest service that uses a avro object for payload, can serve/accept
> data in: application/octet-stream;fmt=avro (avro binary),
> application/json;fmt=avro (classic json encoding),
> application/json;fmt=enhanced(AVRO-1582) allowing us to pick the right
> format for the right use case. (AVRO-1582 json can be significantly smaller
> in size than binary on certain type of data)
> 9) During the build we generate improved HTML doc for the avro objects,
> like: http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/
>
> The more we leverage avro the more use cases we find like:
>
> 1) config discovery plugin that scans code for uses of System.getProperty…
> and generates a avro idl : http://zolyfarkas.github.io/
> spf4j/spf4j-config-discovery-maven-plugin/index.html
> 2) generate avro idl from jdbc metadata...
>
> hope it helps!
>
> cheers
>
> —Z
>
>
> On Aug 11, 2016, at 6:23 AM, Elliot West <te...@gmail.com> wrote:
>
> Hello,
>
> We are building a data processing system that has the following required
> properties:
>
>    - Data is produced/consumed in JSON format
>    - These JSON documents must always adhere to a schema
>    - The schema must be defined in JSON also
>    - It should be possible to evolve schemas and verify schema
>    compatibility
>
> I initially started looking at Avro, not as a solution, but to understand
> how it schema evolution can be managed. However, I quickly discovered that
> with its JSON support it is able to meet all of my requirements.
>
> I am now considering a system where data structure is defined using the
> Avro JSON schema, data is submitted using JSON that is then internally
> decoded into Avro records, these records are then eventually encoded back
> into JSON at the point of consumption. It seems to me that I can then take
> advantage of Avro’s schema evolution features, while only ever exposing
> JSON to consumers and producers. Aside from the dependency on Avro’s JSON
> schema syntax, the use of Avro then becomes an internal implementation
> detail.
>
> As I am completely new to Avro, I was wondering if this is a credible
> idea, or if anyone would care to share their experiences of similar systems
> that they have built?
>
> Many thanks,
>
> Elliot.
>
>
>

Re: Avro as a foundation of a JSON based system

Posted by Zoltan Farkas <zo...@yahoo.com>.
We are doing the same successfully so far… here is some detail:

we do not use the standard JSON Encoders/Decoders from the avro project and we have our own which provide a more “natural” JSON encoding that implements:

https://issues.apache.org/jira/browse/AVRO-1582 <https://issues.apache.org/jira/browse/AVRO-1582>

For us it was also important to fix:

https://issues.apache.org/jira/browse/AVRO-1723 <https://issues.apache.org/jira/browse/AVRO-1723>

We had to use our own fork to be able to fix/implement our needs faster, which you can look at: https://github.com/zolyfarkas/avro <https://github.com/zolyfarkas/avro> 

Here is how we use the avro schemas:

We develop our avro schema’s in separate projects “schema projects”.

These projects are standard maven projects, stored in version control, build with CI and published to a maven repo the following:
1) avro generated java objects, sources and javadoc.
2) c# generated objects. (accessible with nugget to everybody)
3) zip package containing all schemas.

We use avro IDL to define the schemas in the project, the avsc json format is difficult to read and maintain, the schema json is only a wire format for us.

We see these advantages:

1) Building/Releasing a schema project is identical with releasing any maven project. (Jenkins, maven release plugin...)
2) Using this we can take advantage of the maven dependency system and reuse schemas. it is as simple as adding a <dependency> in your pom and a import statement in your idl. (C# uses nugget)
3) As a side result our maven repo becomes a schema repo. And so far we see no reason to use a dedicated schema repo like: https://issues.apache.org/jira/browse/AVRO-1124 <https://issues.apache.org/jira/browse/AVRO-1124> 
4) the schema owner not only publishes schemas but also publishes al DTOs for java and .NET, this way any team that needs to use the schema has no need to generate code, all they need is to add a package dependency to they project.
5) During the build we also validate compatibiliy with the previously released schemas.
6) During the build we also validate schema quality. (comments on fields, naming…). We are planning to make this maven plugin open source.
7) Maven dependencies give you all the data needed to figure out what apps use a schema like: group:myschema:3.0
8) A rest service that uses a avro object for payload, can serve/accept data in: application/octet-stream;fmt=avro (avro binary), application/json;fmt=avro (classic json encoding), application/json;fmt=enhanced(AVRO-1582) allowing us to pick the right format for the right use case. (AVRO-1582 json can be significantly smaller in size than binary on certain type of data)
9) During the build we generate improved HTML doc for the avro objects, like: http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/ <http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/> 

The more we leverage avro the more use cases we find like:

1) config discovery plugin that scans code for uses of System.getProperty… and generates a avro idl : http://zolyfarkas.github.io/spf4j/spf4j-config-discovery-maven-plugin/index.html <http://zolyfarkas.github.io/spf4j/spf4j-config-discovery-maven-plugin/index.html> 
2) generate avro idl from jdbc metadata...

hope it helps!

cheers

—Z


> On Aug 11, 2016, at 6:23 AM, Elliot West <te...@gmail.com> wrote:
> 
> Hello,
> 
> We are building a data processing system that has the following required properties:
> Data is produced/consumed in JSON format
> These JSON documents must always adhere to a schema
> The schema must be defined in JSON also
> It should be possible to evolve schemas and verify schema compatibility
> I initially started looking at Avro, not as a solution, but to understand how it schema evolution can be managed. However, I quickly discovered that with its JSON support it is able to meet all of my requirements.
> 
> I am now considering a system where data structure is defined using the Avro JSON schema, data is submitted using JSON that is then internally decoded into Avro records, these records are then eventually encoded back into JSON at the point of consumption. It seems to me that I can then take advantage of Avro’s schema evolution features, while only ever exposing JSON to consumers and producers. Aside from the dependency on Avro’s JSON schema syntax, the use of Avro then becomes an internal implementation detail.
> 
> As I am completely new to Avro, I was wondering if this is a credible idea, or if anyone would care to share their experiences of similar systems that they have built?
> 
> Many thanks,
> 
> Elliot.
>