You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Oscar Westra van Holthe - Kind (Jira)" <ji...@apache.org> on 2021/11/23 11:12:00 UTC

[jira] [Resolved] (AVRO-3026) Allow custom annotations in IDL files and support translating them to AVSC Avro.

     [ https://issues.apache.org/jira/browse/AVRO-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oscar Westra van Holthe - Kind resolved AVRO-3026.
--------------------------------------------------
    Resolution: Fixed

Marking as resolved, because the original question has been answered.

> Allow custom annotations in IDL files and support translating them to AVSC Avro.
> --------------------------------------------------------------------------------
>
>                 Key: AVRO-3026
>                 URL: https://issues.apache.org/jira/browse/AVRO-3026
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: spec
>    Affects Versions: 1.9.0, 1.9.1, 1.9.2, 1.10.1
>            Reporter: Feroze Daud
>            Priority: Major
>
> h2. Introduction
> Our company has standardized on Avro schemas for all Data intestion and storage. As part of this, and to satisfy CCPA, we need to be able to tag the records and fields appropriately if they have PI, or Non PI information, etc.
> Avro AVSC files, being valid json, can easily be modified to add tags that will be used by downstream processors, and also wont interfere with Avro itself ( to generate POJO, serialization, deserialization etc)
> One such key we chose is simply called *tags*. It's example usage is shown below.
> {code:java}
> {
>    "type": "record",
>    "name": "PropertyOwner",
>    "namespace": "com.acme.Property", 
>    "tags": ["PI", "PII" ],
>    "fields": [
>    {
>       "name": "FullName",
>       "type": "string",
>       "tags": ["Name"]
>    },
>    {
>        "name": "PhoneNumber",
>        "type": "string",
>        "tags": ["Phone"]
>    }],
> }{code}
>  
> These tags can be processed by downstream processors and the data landing in datalake, or database can be tagged appropriately.
>  
> h2. Problem Description
> While tagging will work fine for AVSC because adding extra fields doesnt make it invalid, we will have a problem when using IDL to author schemas. IDL spec does not allow a way to add extra tags that are copied over to the Avro schema.
>  
> h2. Proposal
> I propose that we allow a special *@annotation* tag . And, this tag can be applied to records and fields. Whatever is in this annotation should be copied verbatim to the output AVSC.
> For eg:
> {code:java}
> @annotation("tags", "[\"PI\", \"Non PI\"]"
> record Employee {
>   @annotation("tags", "[\"Name\"]"
>   string fullName;
>   boolean active = true;
>   long salary;
>   @annotation("tags", "[\"Phone\"]"
>   string phone;
> } {code}
>  
> would generate an avro schema as folllows:
>  
> {code:java}
>  {
>  "type": "record",
>  "name": "Employee",
>  "tags": ["PI", "PII" ],
>  "fields": [
>  {
>  "name": "FullName",
>  "type": "string",
>  "tags": ["Name"]
>  },
>  { 
>   "name": "PhoneNumber", 
>   "type": "string", 
>   "tags": ["Phone"] 
>  }],
> }{code}
>  
> As you can see, we dont need to support any wellformed JSONness in the *@annotation* . It just takes a string and we just render it into the output json.
> @annotation("foo", "[\"bar\"]") -> "tags": ["bar"]
> @annotation("foo", "\{\"bar\": \"jar\"}") -> "tags": {"bar": "jar"}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)