You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2021/03/30 22:13:05 UTC

[GitHub] [avro] subhashb opened a new pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

subhashb opened a new pull request #1167:
URL: https://github.com/apache/avro/pull/1167


   This PR adds support for generating Parsing Canonical Forms of Avro Schemas to the main `avro` package.
   
   The bulk of the work was done by @kojiromike and @forsberg. This PR cleans up code where necessary, adds more test cases, and clarifies on transformations wherever not applicable in Python (ex. Transformation of integers with leading zeros)
   
   Closes: https://issues.apache.org/jira/browse/AVRO-1938
   Original PR: https://github.com/apache/avro/pull/700/
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO/) issues and references them in the PR title.
     - https://issues.apache.org/jira/browse/AVRO-1938
   
   ### Tests
   
   - [x] My PR adds the following unit tests:
     Test cases in `test/test_schema.py::CanonicalFormTestCase`
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes how to use it.
     Test cases act as stand-ins for the feature for now. Comprehensive documentation needs to be added later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] subhashb commented on a change in pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
subhashb commented on a change in pull request #1167:
URL: https://github.com/apache/avro/pull/1167#discussion_r605794047



##########
File path: lang/py/avro/schema.py
##########
@@ -469,10 +511,19 @@ def __str__(self):
         return json.dumps(self.to_json())
 
     def to_json(self, names=None):
-        if names is None:
-            names = Names()
+        names = names or Names()
+
         to_dump = self.props.copy()
         to_dump['type'] = self.type.to_json(names)
+
+        return to_dump
+
+    def to_canonical_json(self, names=None):
+        names = names or Names()

Review comment:
       Amen. 👍 
   There are more things I have noted, but I want to get the main package to be on par with the specification first before I go after them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] subhashb commented on pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
subhashb commented on pull request #1167:
URL: https://github.com/apache/avro/pull/1167#issuecomment-812030918


   @kojiromike Addressed the latest review comments. PR is good to go.
   I will push fingerprinting changes in a subsequent PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] subhashb commented on pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
subhashb commented on pull request #1167:
URL: https://github.com/apache/avro/pull/1167#issuecomment-811244499


   @kojiromike Would you prefer adding fingerprinting support in a subsequent PR, or should it be tacked to this PR itself?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] subhashb commented on a change in pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
subhashb commented on a change in pull request #1167:
URL: https://github.com/apache/avro/pull/1167#discussion_r605797065



##########
File path: lang/py/avro/schema.py
##########
@@ -291,22 +331,24 @@ def has_name(self, name_attr, space_attr):
 
     def get_name(self, name_attr, space_attr):
         test = Name(name_attr, space_attr, self.default_namespace).fullname
-        if test not in self.names:
-            return None
-        return self.names[test]
+
+        return None if test not in self.names else self.names[test]

Review comment:
       Changed in https://github.com/apache/avro/pull/1167/commits/3a250d8686b0b8e3098a1dc1da0be3839ecc3bb9




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] subhashb edited a comment on pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
subhashb edited a comment on pull request #1167:
URL: https://github.com/apache/avro/pull/1167#issuecomment-811244499


   @kojiromike Would you prefer adding fingerprinting support in a subsequent PR, or should it be tacked to this PR itself?
   
   I am almost ready with fingerprinting changes. If we would like to let this PR bake for a bit, I can add fingerprinting support to the same PR. Or we can pull this in, and I can raise a separate PR. 
   
   Please let me know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] kojiromike commented on a change in pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
kojiromike commented on a change in pull request #1167:
URL: https://github.com/apache/avro/pull/1167#discussion_r604545992



##########
File path: lang/py/avro/schema.py
##########
@@ -214,6 +236,19 @@ def validate(self, datum):
         """
         raise Exception("Must be implemented by subclasses.")
 
+    @abc.abstractmethod
+    def to_canonical_json(self, names):
+        """
+        Converts the schema object into its Canonical Form
+        http://avro.apache.org/docs/current/spec.html#Parsing+Canonical+Form+for+Schemas
+        """
+        raise NotImplementedError("to_canonical_json must be implemented by subclasses")

Review comment:
       Doesn't the abc take care of this?

##########
File path: lang/py/avro/schema.py
##########
@@ -601,6 +646,12 @@ def to_json(self, names=None):
             names.names[self.fullname] = self
             return names.prune_namespace(self.props)
 
+    def to_canonical_json(self, names):

Review comment:
       Should names be optional?

##########
File path: lang/py/avro/schema.py
##########
@@ -836,6 +912,11 @@ def to_json(self, names=None):
             to_dump.append(schema.to_json(names))
         return to_dump
 
+    def to_canonical_json(self, names):

Review comment:
       …

##########
File path: lang/py/avro/schema.py
##########
@@ -774,6 +843,13 @@ def to_json(self, names=None):
         to_dump['values'] = self.get_prop('values').to_json(names)
         return to_dump
 
+    def to_canonical_json(self, names):

Review comment:
       Should names be optional?

##########
File path: lang/py/avro/schema.py
##########
@@ -963,6 +1044,25 @@ def to_json(self, names=None):
         to_dump['fields'] = [f.to_json(names) for f in self.fields]
         return to_dump
 
+    def to_canonical_json(self, names):
+        if names is None:
+            names = Names()
+
+        if self.type == 'request':
+            raise NotImplementedError("Canonical form (probably) does not make sense on type request")
+
+        to_dump = self.canonical_properties
+        to_dump["name"] = self.fullname
+
+        if self.fullname in names.names:
+            return self.name_ref(names)
+        else:

Review comment:
       No need for else after return

##########
File path: lang/py/avro/schema.py
##########
@@ -963,6 +1044,25 @@ def to_json(self, names=None):
         to_dump['fields'] = [f.to_json(names) for f in self.fields]
         return to_dump
 
+    def to_canonical_json(self, names):

Review comment:
       …




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] subhashb commented on a change in pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
subhashb commented on a change in pull request #1167:
URL: https://github.com/apache/avro/pull/1167#discussion_r605058416



##########
File path: lang/py/avro/schema.py
##########
@@ -601,6 +646,12 @@ def to_json(self, names=None):
             names.names[self.fullname] = self
             return names.prune_namespace(self.props)
 
+    def to_canonical_json(self, names):

Review comment:
       Yes. Addressed in eacfdd6a0cd6de9b4e15701d08d9e994f09be310

##########
File path: lang/py/avro/schema.py
##########
@@ -836,6 +912,11 @@ def to_json(self, names=None):
             to_dump.append(schema.to_json(names))
         return to_dump
 
+    def to_canonical_json(self, names):

Review comment:
       Yes. Addressed in eacfdd6a0cd6de9b4e15701d08d9e994f09be310

##########
File path: lang/py/avro/schema.py
##########
@@ -963,6 +1044,25 @@ def to_json(self, names=None):
         to_dump['fields'] = [f.to_json(names) for f in self.fields]
         return to_dump
 
+    def to_canonical_json(self, names):

Review comment:
       Addressed in eacfdd6a0cd6de9b4e15701d08d9e994f09be310

##########
File path: lang/py/avro/schema.py
##########
@@ -774,6 +843,13 @@ def to_json(self, names=None):
         to_dump['values'] = self.get_prop('values').to_json(names)
         return to_dump
 
+    def to_canonical_json(self, names):

Review comment:
       Yes. Addressed in eacfdd6a0cd6de9b4e15701d08d9e994f09be310




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] subhashb commented on a change in pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
subhashb commented on a change in pull request #1167:
URL: https://github.com/apache/avro/pull/1167#discussion_r605790522



##########
File path: lang/py/avro/schema.py
##########
@@ -136,7 +147,18 @@ def _is_timezone_aware_datetime(dt):
 # Base Classes
 #
 
-class Schema(abc.ABC):
+class CanonicalPropertiesMixin(object):
+    """A Mixin that provides canonical properties to Schema and Field types."""
+    @property
+    def canonical_properties(self):
+        props = self.props
+        return collections.OrderedDict(

Review comment:
       While keys are inserted in the order specified by the canonical form, dictionary order is guaranteed only in Python 3.7 and above. So I believe we still need this to be `OrderedDict`. 
   
   If we were to stop supporting 3.6 someday, we could revert to using a dictionary; I believe it would work without any changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] kojiromike commented on pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
kojiromike commented on pull request #1167:
URL: https://github.com/apache/avro/pull/1167#issuecomment-811964043


   Please put "closes #700" in the pr description to automatically close that pr on merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] subhashb commented on a change in pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
subhashb commented on a change in pull request #1167:
URL: https://github.com/apache/avro/pull/1167#discussion_r605058846



##########
File path: lang/py/avro/schema.py
##########
@@ -963,6 +1044,25 @@ def to_json(self, names=None):
         to_dump['fields'] = [f.to_json(names) for f in self.fields]
         return to_dump
 
+    def to_canonical_json(self, names):
+        if names is None:
+            names = Names()
+
+        if self.type == 'request':
+            raise NotImplementedError("Canonical form (probably) does not make sense on type request")
+
+        to_dump = self.canonical_properties
+        to_dump["name"] = self.fullname
+
+        if self.fullname in names.names:
+            return self.name_ref(names)
+        else:

Review comment:
       👍 Addressed in eacfdd6a0cd6de9b4e15701d08d9e994f09be310




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] kojiromike merged pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
kojiromike merged pull request #1167:
URL: https://github.com/apache/avro/pull/1167


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] subhashb commented on a change in pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
subhashb commented on a change in pull request #1167:
URL: https://github.com/apache/avro/pull/1167#discussion_r605058223



##########
File path: lang/py/avro/schema.py
##########
@@ -214,6 +236,19 @@ def validate(self, datum):
         """
         raise Exception("Must be implemented by subclasses.")
 
+    @abc.abstractmethod
+    def to_canonical_json(self, names):
+        """
+        Converts the schema object into its Canonical Form
+        http://avro.apache.org/docs/current/spec.html#Parsing+Canonical+Form+for+Schemas
+        """
+        raise NotImplementedError("to_canonical_json must be implemented by subclasses")

Review comment:
       It does 👍 
   Addressed in eacfdd6a0cd6de9b4e15701d08d9e994f09be310




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] subhashb commented on pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
subhashb commented on pull request #1167:
URL: https://github.com/apache/avro/pull/1167#issuecomment-811518892


   Fingerprinting support is now done and ready to push. 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] kojiromike commented on a change in pull request #1167: AVRO-1938: Generate Parsing Canonical Forms of Schema

Posted by GitBox <gi...@apache.org>.
kojiromike commented on a change in pull request #1167:
URL: https://github.com/apache/avro/pull/1167#discussion_r605599414



##########
File path: lang/py/avro/schema.py
##########
@@ -136,7 +147,18 @@ def _is_timezone_aware_datetime(dt):
 # Base Classes
 #
 
-class Schema(abc.ABC):
+class CanonicalPropertiesMixin(object):
+    """A Mixin that provides canonical properties to Schema and Field types."""
+    @property
+    def canonical_properties(self):
+        props = self.props
+        return collections.OrderedDict(

Review comment:
       Should we just use a dictionary here?

##########
File path: lang/py/avro/schema.py
##########
@@ -469,10 +511,19 @@ def __str__(self):
         return json.dumps(self.to_json())
 
     def to_json(self, names=None):
-        if names is None:
-            names = Names()
+        names = names or Names()
+
         to_dump = self.props.copy()
         to_dump['type'] = self.type.to_json(names)
+
+        return to_dump
+
+    def to_canonical_json(self, names=None):
+        names = names or Names()

Review comment:
       At some point (not here) we should consider tightening up these interfaces so we don't have to copy patterns like this into every implementation.

##########
File path: lang/py/avro/schema.py
##########
@@ -291,22 +331,24 @@ def has_name(self, name_attr, space_attr):
 
     def get_name(self, name_attr, space_attr):
         test = Name(name_attr, space_attr, self.default_namespace).fullname
-        if test not in self.names:
-            return None
-        return self.names[test]
+
+        return None if test not in self.names else self.names[test]

Review comment:
       Seems like `self.names.get(test)` should be equivalent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org