You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Peter Amstutz <pe...@curoverse.com> on 2015/08/28 15:25:36 UTC

handling fields with "any" structure

Hello everyone,

I am using Avro to load and validate JSON documents.  Mostly this
works very well and it is straightforward to express the structure of
my document using Avro schema. However, I have a few fields which can
have "any" content.  It is impossible to declare all possible
structures in advance, and I can't use a union type of primitives
because the fields may also contain complex types (nested lists/maps)
and Avro doesn't allow named unions.

So far as I have been able to determine, this is impossible with
standard Avro schema, so I am curious if anyone else has dealt with
this problem and can suggest any workarounds.  Currently my best
(least bad) idea is to preprocess the JSON to pull out the "any"
fields and store them on the side before handing the document to Avro
for loading.  This is awkward so I would love to hear if anyone has
any other ideas.

Thanks,
Peter

RE: handling fields with "any" structure

Posted by "Farkas, Zoltan" <Zo...@pimco.com>.
Yes, it is not released yet, 

but If you get  the latest official avro repo and build it yourself you can play around with it...


--Z

-----Original Message-----
From: Peter Amstutz [mailto:peter.amstutz@curoverse.com] 
Sent: Friday, August 28, 2015 7:16 AM
To: user@avro.apache.org
Subject: Re: handling fields with "any" structure

Thanks!  I'm not familiar with logical types (sounds like this is a new, unreleased feature?) so I'll have to look into it.

- Peter

On Fri, Aug 28, 2015 at 10:09 AM, Farkas, Zoltan <Zo...@pimco.com> wrote:
> Hi Peter,
>
> I have recently implemented this with the logicalType concept introduced recently in avro.
> (I have my own fork (https://github.com/zolyfarkas/avro ) that I use 
> until I find some time to merge Ryan's implementation, but I have 
> other improvements that I rely on like idl forward declarations, 
> improved json encoding...)
>
> Here is how I implemented the any type:
>
>     /** a unknown serialized java object */
>     @logicalType("unknown")
>     record Unknown {
>         /** maven schema ID (optional for future extension, with different ID types) */
>         union {null, MavenSchemaId} mavenSchemaId = null;
>
>         /** the avro serialized object */
>         union {null, string, bytes} serObj;
>
>     }
>
> The maven schema ID contains enough info to retrieve the schema that the record is serialized into.(the serObj field).
>
> In my case I store all schemas in a maven repo, and my MavenSchemaId looks like:
>
>
>     /** A maven artifact ID */
>     record MavenArtifactId {
>
>         /** The maven group id */
>         string groupId;
>
>         /** The maven artifactId */
>         string artifactId;
>
>         /** The schema version */
>         string version;
>    }
>
>    /** A maven schema ID*/
>    record MavenSchemaId {
>
>         /** The maven artifact */
>         MavenArtifactId artifactId;
>
>         /** The record name (namespace + name) */
>         string recordName;
>     }
>
> But a schemaID can really be anything, (a number, a string...), as long as you have a system/service to resolve it. You can even put the schema in the Unknown record if that works for you...
>
> So every time I need a "Any"(Unknown) field I use it like:
>
> Import idl "common.avdl"
>
> record  MyRecord {
> ...
> Unknown any;
> ...
> }
>
> The generated DTOs set and get an Object (just like unions), when you deseralize you will get either a SpecificRecord (if you have a generated DTO..) or a GenericRecord...
>
> Let me know if you have any questions...
> (would be interested to know if you encounter any issues implementing 
> this with the official avro logical type implementation...)
>
> cheers
>
> --Z
>
> -----Original Message-----
> From: Peter Amstutz [mailto:peter.amstutz@curoverse.com]
> Sent: Friday, August 28, 2015 6:26 AM
> To: user@avro.apache.org
> Subject: handling fields with "any" structure
>
> Hello everyone,
>
> I am using Avro to load and validate JSON documents.  Mostly this works very well and it is straightforward to express the structure of my document using Avro schema. However, I have a few fields which can have "any" content.  It is impossible to declare all possible structures in advance, and I can't use a union type of primitives because the fields may also contain complex types (nested lists/maps) and Avro doesn't allow named unions.
>
> So far as I have been able to determine, this is impossible with standard Avro schema, so I am curious if anyone else has dealt with this problem and can suggest any workarounds.  Currently my best (least bad) idea is to preprocess the JSON to pull out the "any"
> fields and store them on the side before handing the document to Avro for loading.  This is awkward so I would love to hear if anyone has any other ideas.
>
> Thanks,
> Peter
> This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute, alter or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmissions cannot be guaranteed to be secure or without error as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender, therefore, does not accept liability for any errors or omissions in the contents of this message which arise during or as a result of e-mail transmission. If verification is required, please request a hard-copy version. This message is provided for information purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments in any jurisdiction.  Securities are offered in the U.S. through PIMCO Investments LLC, distributor and a company of PIMCO LLC.

Re: handling fields with "any" structure

Posted by Peter Amstutz <pe...@curoverse.com>.
Thanks!  I'm not familiar with logical types (sounds like this is a
new, unreleased feature?) so I'll have to look into it.

- Peter

On Fri, Aug 28, 2015 at 10:09 AM, Farkas, Zoltan
<Zo...@pimco.com> wrote:
> Hi Peter,
>
> I have recently implemented this with the logicalType concept introduced recently in avro.
> (I have my own fork (https://github.com/zolyfarkas/avro ) that I use until I find some time to merge Ryan's implementation, but I have other improvements that I rely on like idl forward declarations, improved json encoding...)
>
> Here is how I implemented the any type:
>
>     /** a unknown serialized java object */
>     @logicalType("unknown")
>     record Unknown {
>         /** maven schema ID (optional for future extension, with different ID types) */
>         union {null, MavenSchemaId} mavenSchemaId = null;
>
>         /** the avro serialized object */
>         union {null, string, bytes} serObj;
>
>     }
>
> The maven schema ID contains enough info to retrieve the schema that the record is serialized into.(the serObj field).
>
> In my case I store all schemas in a maven repo, and my MavenSchemaId looks like:
>
>
>     /** A maven artifact ID */
>     record MavenArtifactId {
>
>         /** The maven group id */
>         string groupId;
>
>         /** The maven artifactId */
>         string artifactId;
>
>         /** The schema version */
>         string version;
>    }
>
>    /** A maven schema ID*/
>    record MavenSchemaId {
>
>         /** The maven artifact */
>         MavenArtifactId artifactId;
>
>         /** The record name (namespace + name) */
>         string recordName;
>     }
>
> But a schemaID can really be anything, (a number, a string...), as long as you have a system/service to resolve it. You can even put the schema in the Unknown record if that works for you...
>
> So every time I need a "Any"(Unknown) field I use it like:
>
> Import idl "common.avdl"
>
> record  MyRecord {
> ...
> Unknown any;
> ...
> }
>
> The generated DTOs set and get an Object (just like unions), when you deseralize you will get either a SpecificRecord (if you have a generated DTO..) or a GenericRecord...
>
> Let me know if you have any questions...
> (would be interested to know if you encounter any issues implementing this with the official avro logical type implementation...)
>
> cheers
>
> --Z
>
> -----Original Message-----
> From: Peter Amstutz [mailto:peter.amstutz@curoverse.com]
> Sent: Friday, August 28, 2015 6:26 AM
> To: user@avro.apache.org
> Subject: handling fields with "any" structure
>
> Hello everyone,
>
> I am using Avro to load and validate JSON documents.  Mostly this works very well and it is straightforward to express the structure of my document using Avro schema. However, I have a few fields which can have "any" content.  It is impossible to declare all possible structures in advance, and I can't use a union type of primitives because the fields may also contain complex types (nested lists/maps) and Avro doesn't allow named unions.
>
> So far as I have been able to determine, this is impossible with standard Avro schema, so I am curious if anyone else has dealt with this problem and can suggest any workarounds.  Currently my best (least bad) idea is to preprocess the JSON to pull out the "any"
> fields and store them on the side before handing the document to Avro for loading.  This is awkward so I would love to hear if anyone has any other ideas.
>
> Thanks,
> Peter
> This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute, alter or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmissions cannot be guaranteed to be secure or without error as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender, therefore, does not accept liability for any errors or omissions in the contents of this message which arise during or as a result of e-mail transmission. If verification is required, please request a hard-copy version. This message is provided for information purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments in any jurisdiction.  Securities are offered in the U.S. through PIMCO Investments LLC, distributor and a company of PIMCO LLC.

RE: handling fields with "any" structure

Posted by "Farkas, Zoltan" <Zo...@pimco.com>.
Hi Peter,

I have recently implemented this with the logicalType concept introduced recently in avro.
(I have my own fork (https://github.com/zolyfarkas/avro ) that I use until I find some time to merge Ryan's implementation, but I have other improvements that I rely on like idl forward declarations, improved json encoding...)

Here is how I implemented the any type:

    /** a unknown serialized java object */
    @logicalType("unknown")
    record Unknown {
        /** maven schema ID (optional for future extension, with different ID types) */
        union {null, MavenSchemaId} mavenSchemaId = null; 

        /** the avro serialized object */
        union {null, string, bytes} serObj;

    }

The maven schema ID contains enough info to retrieve the schema that the record is serialized into.(the serObj field).

In my case I store all schemas in a maven repo, and my MavenSchemaId looks like:


    /** A maven artifact ID */
    record MavenArtifactId {

        /** The maven group id */
        string groupId;

        /** The maven artifactId */
        string artifactId;

        /** The schema version */
        string version;
   }
   
   /** A maven schema ID*/
   record MavenSchemaId {

        /** The maven artifact */
        MavenArtifactId artifactId;

        /** The record name (namespace + name) */
        string recordName;
    }

But a schemaID can really be anything, (a number, a string...), as long as you have a system/service to resolve it. You can even put the schema in the Unknown record if that works for you...

So every time I need a "Any"(Unknown) field I use it like:

Import idl "common.avdl"

record  MyRecord {
...
Unknown any;
...
}

The generated DTOs set and get an Object (just like unions), when you deseralize you will get either a SpecificRecord (if you have a generated DTO..) or a GenericRecord...

Let me know if you have any questions... 
(would be interested to know if you encounter any issues implementing this with the official avro logical type implementation...)

cheers

--Z

-----Original Message-----
From: Peter Amstutz [mailto:peter.amstutz@curoverse.com] 
Sent: Friday, August 28, 2015 6:26 AM
To: user@avro.apache.org
Subject: handling fields with "any" structure

Hello everyone,

I am using Avro to load and validate JSON documents.  Mostly this works very well and it is straightforward to express the structure of my document using Avro schema. However, I have a few fields which can have "any" content.  It is impossible to declare all possible structures in advance, and I can't use a union type of primitives because the fields may also contain complex types (nested lists/maps) and Avro doesn't allow named unions.

So far as I have been able to determine, this is impossible with standard Avro schema, so I am curious if anyone else has dealt with this problem and can suggest any workarounds.  Currently my best (least bad) idea is to preprocess the JSON to pull out the "any"
fields and store them on the side before handing the document to Avro for loading.  This is awkward so I would love to hear if anyone has any other ideas.

Thanks,
Peter
This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute, alter or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmissions cannot be guaranteed to be secure or without error as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender, therefore, does not accept liability for any errors or omissions in the contents of this message which arise during or as a result of e-mail transmission. If verification is required, please request a hard-copy version. This message is provided for information purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments in any jurisdiction.  Securities are offered in the U.S. through PIMCO Investments LLC, distributor and a company of PIMCO LLC.