You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Ryan Skraba (Jira)" <ji...@apache.org> on 2020/06/17 14:42:00 UTC
[jira] [Commented] (AVRO-2299) Get Plain Schema

    [ https://issues.apache.org/jira/browse/AVRO-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138507#comment-17138507 ] 

Ryan Skraba commented on AVRO-2299:
-----------------------------------

Hello -- once again, my apologies for arriving so late to this JIRA!  I have some specific feedback for the JIRA that might make it easier to merge quickly!

Everyone seems to agree that the Parsing Canonical Form for schemas is not sufficient for advanced "storage and comparison" of schemas, especially around long-lived Schema Registries and versioning/resolving schemas as artifacts.  Fair enough, that's not what it's for!

The requirement is to be able to identify the "same" schemas regardless of any custom annotations (a.k.a. user JsonProperties in the Java SDK) that might be present.

If I understand your use case correctly, you'd like to be able to store a "cleaned" schema for describing and versioning your persistence, but also allow the devs to add/remove useful custom annotations (such as GDRP info) during processing.  It should be easy for the dev to find the cleaned schema from the annotated schema, or to determine that two differently annotated schemas are the "same".

[~tjwp]'s resolution canonical form is slightly different, excluding some reserved attributes in the spec that aren't used in resolution either (notably doc, order, and logicalType).

I'd like to propose **not** adding new canonical forms to the spec, but simply adding the tools to "normalize" any schema according to the same rules as the existing Parsing Canonical Form, but with an allowlist/blocklist for reserved and user properties.  And, of course, if logicalType is included, all of its sub-attributes should be included (for user-defined types).

It seems to me that this would be less constraining and a more generally useful strategy, providing a useful schema transformation tool for some language SDKS without multiplying the number of "Canonical Forms" supported or making them an obligatory part of a language SDK.

> Get Plain Schema
> ----------------
>
>                 Key: AVRO-2299
>                 URL: https://issues.apache.org/jira/browse/AVRO-2299
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.9.0, 1.8.2, 1.9.1
>            Reporter: Rumeshkrishnan Mohan
>            Assignee: Doug Cutting
>            Priority: Major
>              Labels: features
>
> {panel:title=Avro Schema Reserved Keys:}
> "doc", "fields", "items", "name", "namespace",
>  "size", "symbols", "values", "type", "aliases", "default"
> {panel}
> AVRO also supports user defined properties for both Schema and Field.
> Is there way to get the schema with reserved property (key, value)? 
> Input Schema: 
> {code:java}
> {
>   "name": "testSchema",
>   "namespace": "com.avro",
>   "type": "record",
>   "fields": [
>     {
>       "name": "email",
>       "type": "string",
>       "doc": "email id",
>       "user_field_prop": "xxxxx"
>     }
>   ],
>   "user_schema_prop": "xxxxxx"
> }{code}
> Expected Plain Schema:
> {code:java}
> {
>   "name": "testSchema",
>   "namespace": "com.avro",
>   "type": "record",
>   "fields": [
>     {
>       "name": "email",
>       "type": "string",
>       "doc": "email id"
>     }
>   ]
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)