You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2020/04/29 02:18:04 UTC

[GitHub] [avro] rayokota opened a new pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

rayokota opened a new pull request #869:
URL: https://github.com/apache/avro/pull/869


   I need a method to reconstruct the original schema string before referenced schemas were resolved and inlined by the Parser.   This is a small enhancement that addresses https://issues.apache.org/jira/browse/AVRO-2822 by exposing a toString() method to pass the referenced schemas.
   
   This also addresses https://github.com/confluentinc/schema-registry/issues/1432


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] rayokota edited a comment on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
rayokota edited a comment on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-621271223


   @RyanSkraba , yes, it is to be able to write nested schemas only by name, instead of inlining them.   Currently in https://github.com/confluentinc/schema-registry/issues/1432 I am having to place my code in the `org.apache.avro` package because this functionality is available, but is package-private.
   
   The motivation behind this is that with Schema Registry, we have a very large customer who wants to be able to evolve and version nested schemas independent of the root schema.  In order to do this, the latest versions of nested schemas will be added to the `Parser` dynamically.  So the root schema should not contain the inline nested schemas, but only refer to them by name.
   
   Below is an example of output of this method, given nested or referenced schemas with name `acme.Product`.  Note that only the name is generated rather than the `acme.Product` schema being generated inline.
   
   ```
   {
    "type": "record",
    "namespace": "acme",
    "name": "order",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "order_date", "type": "string"},
        {
           "name": "product",
           "type": "acme.product"
        }
    ]
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] rayokota commented on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
rayokota commented on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-623780611


   Thanks @RyanSkraba !  I added a unit test.  :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] rayokota edited a comment on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
rayokota edited a comment on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-621271223


   @RyanSkraba , yes, it is to be able to write nested schemas only by name, instead of inlining them.   Currently in https://github.com/confluentinc/schema-registry/issues/1432 I am having to place my code in the `org.apache.avro` package because this functionality is available, but is package-private.
   
   The motivation behind this is that with Schema Registry, we have a very large customer who wants to be able to evolve and version nested schemas independent of the root schema.  In order to do this, the latest versions of nested schemas will be added to the `Parser` dynamically.  So the string representation of the root schema (which we reuse) should not contain the inline nested schemas, but only refer to them by name.
   
   Below is an example of output of this method, given a nested or referenced schema with name `acme.Product`.  Note that only the name is generated rather than the `acme.Product` schema being generated inline.
   
   ```
   {
    "type": "record",
    "namespace": "acme",
    "name": "Order",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "order_date", "type": "string"},
        {
           "name": "product",
           "type": "acme.Product"
        }
    ]
   }
   ```
   
   I believe the code for protocols and imports makes use of this, by not generating the imports inline when printing out the protocol definition.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] rayokota commented on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
rayokota commented on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-621271223


   @RyanSkraba , yes, it is to be able to write nested schemas only by name, instead of inlining them.   Currently in https://github.com/confluentinc/schema-registry/issues/1432 I am having to place my code in the `org.apache.avro` package because this functionality is available, but is package-private.
   
   The motivation behind this is that with Schema Registry, we have a very large customer who wants to be able to evolve and version nested schemas independent of the root schema.  In order to do this, the latest versions of nested schemas will be added to the `Parser` dynamically.  So the root schema should not contain the inline nested schemas, but only refer to them by name.
   
   Below is an example of output of this method, given nested or referenced schemas with name `acme.Product`.  Note that only the name is generated rather than the `acme.Product` schema be generated inline.
   
   ```
   {
    "type": "record",
    "namespace": "acme",
    "name": "order",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "order_date", "type": "string"},
        {
           "name": "product",
           "type": "acme.product"
        }
    ]
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] rayokota edited a comment on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
rayokota edited a comment on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-621271223


   @RyanSkraba , yes, it is to be able to write nested schemas only by name, instead of inlining them.   Currently in https://github.com/confluentinc/schema-registry/issues/1432 I am having to place my code in the `org.apache.avro` package because this functionality is available, but is package-private.
   
   The motivation behind this is that with Schema Registry, we have a very large customer who wants to be able to evolve and version nested schemas independent of the root schema.  In order to do this, the latest versions of nested schemas will be added to the `Parser` dynamically.  So the root schema should not contain the inline nested schemas, but only refer to them by name.
   
   Below is an example of output of this method, given a nested or referenced schema with name `acme.Product`.  Note that only the name is generated rather than the `acme.Product` schema being generated inline.
   
   ```
   {
    "type": "record",
    "namespace": "acme",
    "name": "Order",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "order_date", "type": "string"},
        {
           "name": "product",
           "type": "acme.Product"
        }
    ]
   }
   ```
   
   I believe the code for protocols and imports makes use of this, by not generating the imports inline when printing out the protocol definition.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-623567079


   Hello!  I'm OK with doing this for the next release if we're OK with deprecating it in a later release.
   
   I'll create and link a JIRA.
   
   A unit test would be welcome!  There's [TestNestedRecords.java](https://github.com/apache/avro/blob/master/lang/java/avro/src/test/java/org/apache/avro/TestNestedRecords.java) with a nested record example or [TestSchema.java](https://github.com/apache/avro/blob/master/lang/java/avro/src/test/java/org/apache/avro/TestNestedRecords.java) with other usages of toString(...)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] rayokota commented on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
rayokota commented on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-623190520


   @RyanSkraba , is it ok if we add this `toString` method to 1.10.0, and someone can look whether a more general Formatter should be added later, as you suggested?  Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-621171717


   Hello and thanks for the contribution!  I've got no problem with adding another toString(...) method for 1.10.x, but I like the idea of the Formatter for a long-term solution.  I added to the JIRA:
   
   > I think we should probably get away from using toString(...) to generate JSON.  It's probably time to add a configurable Formatter (like we replaced .parse(...) with a configurable Parser a while ago.)
   
   Is this too much to do for 1.10.x ?  I can create a new JIRA if we want to do this in a later release.
   
   Just to clarify: the goal here is to avoid writing out some nested schemas again, and to refer to them by name?  In that case, we would expect parsing this schema string would fail when using a brand-new parser, right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] zolyfarkas commented on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
zolyfarkas commented on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-621802979


   It would be nice to expose Schema.Names for the purpose of customizing schema reference resolution. 
   
   Here are the use cases I have:
   
   1) Avsc files referencing other schemas by name, declared in other avsc files.(without the need to rely on any specific processing order). I have implemented this in a [custom maven plugin](http://www.spf4j.org/spf4j-avro-components/maven-avro-schema-plugin/index.html)  and had to use various hacks to get it done/ [see](https://github.com/zolyfarkas/spf4j/blob/master/spf4j-avro-components/maven-avro-schema-plugin/src/main/java/org/spf4j/maven/plugin/avro/avscp/SchemaCompileMojo.java#L350) basically use the same resolution mechanism used by the avdl+avpr.
   
   2) Custom references. ability to refer to a schema published to a repo like: {"$ref":"org.spf4j.demo:jaxrs-spf4j-demo-schema:0.3:2"}. For a detailed description [see](https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences)
   
   Probably worthwhile to create a new JIRA for this, or a AEP? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] rayokota edited a comment on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
rayokota edited a comment on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-621271223


   @RyanSkraba , yes, it is to be able to write nested schemas only by name, instead of inlining them.   Currently in https://github.com/confluentinc/schema-registry/issues/1432 I am having to place my code in the `org.apache.avro` package because this functionality is available, but is package-private.
   
   The motivation behind this is that with Schema Registry, we have a very large customer who wants to be able to evolve and version nested schemas independent of the root schema.  In order to do this, the latest versions of nested schemas will be added to the `Parser` dynamically.  So the string representation of the root schema (which we reuse) should not contain the inline nested schemas, but only refer to them by name.
   
   Below is an example of the output of this method, given a nested or referenced schema with name `acme.Product`.  Note that only the name is generated rather than the `acme.Product` schema being generated inline.
   
   ```
   {
    "type": "record",
    "namespace": "acme",
    "name": "Order",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "order_date", "type": "string"},
        {
           "name": "product",
           "type": "acme.Product"
        }
    ]
   }
   ```
   
   So given a `Schema` and it's referenced or nested `Schema` instances, this lets us reconstruct the original root schema string given to the `Parser`, so that we can reuse it later.
   
   I believe the code for protocols and imports makes use of this same functionality (which is package private), by not generating the imports inline when printing out the protocol definition.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] rayokota edited a comment on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
rayokota edited a comment on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-621271223


   @RyanSkraba , yes, it is to be able to write nested schemas only by name, instead of inlining them.   Currently in https://github.com/confluentinc/schema-registry/issues/1432 I am having to place my code in the `org.apache.avro` package because this functionality is available, but is package-private.
   
   The motivation behind this is that with Schema Registry, we have a very large customer who wants to be able to evolve and version nested schemas independent of the root schema.  In order to do this, the latest versions of nested schemas will be added to the `Parser` dynamically.  So the root schema should not contain the inline nested schemas, but only refer to them by name.
   
   Below is an example of output of this method, given nested or referenced schemas with name `acme.Product`.  Note that only the name is generated rather than the `acme.Product` schema being generated inline.
   
   ```
   {
    "type": "record",
    "namespace": "acme",
    "name": "Order",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "order_date", "type": "string"},
        {
           "name": "product",
           "type": "acme.Product"
        }
    ]
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] rayokota edited a comment on pull request #869: AVRO-2822: Add toString that doesn't inline referenced schemas

Posted by GitBox <gi...@apache.org>.
rayokota edited a comment on pull request #869:
URL: https://github.com/apache/avro/pull/869#issuecomment-621271223


   @RyanSkraba , yes, it is to be able to write nested schemas only by name, instead of inlining them.   Currently in https://github.com/confluentinc/schema-registry/issues/1432 I am having to place my code in the `org.apache.avro` package because this functionality is available, but is package-private.
   
   The motivation behind this is that with Schema Registry, we have a very large customer who wants to be able to evolve and version nested schemas independent of the root schema.  In order to do this, the latest versions of nested schemas will be added to the `Parser` dynamically.  So the root schema should not contain the inline nested schemas, but only refer to them by name.
   
   Below is an example of output of this method, given a nested or referenced schema with name `acme.Product`.  Note that only the name is generated rather than the `acme.Product` schema being generated inline.
   
   ```
   {
    "type": "record",
    "namespace": "acme",
    "name": "Order",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "order_date", "type": "string"},
        {
           "name": "product",
           "type": "acme.Product"
        }
    ]
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org