You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Paul Mazak (JIRA)" <ji...@apache.org> on 2015/07/10 19:34:05 UTC

[jira] [Commented] (AVRO-1699) AutoMap field values between Avro objects with different schemas

    [ https://issues.apache.org/jira/browse/AVRO-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622629#comment-14622629 ] 

Paul Mazak commented on AVRO-1699:
----------------------------------

I'll attach an example of how we've been doing this in production.

> AutoMap field values between Avro objects with different schemas
> ----------------------------------------------------------------
>
>                 Key: AVRO-1699
>                 URL: https://issues.apache.org/jira/browse/AVRO-1699
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Paul Mazak
>
> There are a few use cases for this:
> *Various Avro input data to one common output*
> You want to pickup Avro files in different schemas and normalize into one. You might wish to transform to the superset of the input schemas.
> *Aggregating Raw Data*
> You want to rewrite data grouped by some fields and aggregated.  The output Avro in this case would be a subset of the input Avro, where at least the group by fields are in both input and output schemas.
> *Alternate Views*
> You have Avro data that you want to trim different ways to create subsets that would be useful for views in Hive or exports for SQL tables.
> *Schema Migration*
> You've added fields to a schema and you are storing data in both the old and new schemas.  You have Avro in an old schema and you can't process it with Avro in the new schema (using pig or java map-reduce).  AutoMapping would up-convert your old data by setting null for the new fields added, and all data are in the new schema.
> _Considerations:_
>  * Loop over the source schema fields available to automap over and return any that were unable to be mapped.
>  * Allow mappings between compatible types. For example going from integers to longs, floats to strings, etc.
>  * Field names match case-sensitive.
>  * Make use of aliases in the schema when considering fields to automap.
>  * Deep copy nested structures like arrays and maps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)