You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2022/07/11 09:55:18 UTC

[GitHub] [avro] izveigor opened a new pull request, #1759: Duplicate symbols

izveigor opened a new pull request, #1759:
URL: https://github.com/apache/avro/pull/1759

   If EnumSchema has duplicate symbols, an error will raise. Instead of a list of duplicate symbols or a value of duplicate symbol, error shows all list of symbols. Improvement removes this defect and shows a message "Duplicate symbol" with the symbol, if it is one, or "Duplicates symbols" with the list of duplicate symbols, if there are more than one symbol.
   
   P.S. Tests do not check error's message. Try to write a test for checking a message of an error can take a long time.
   
   ### Jira
   
   - [ ] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO/) issues and references them in the PR title. For example, "AVRO-1234: My Avro PR"
     - https://issues.apache.org/jira/browse/AVRO-XXX
     - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] izveigor commented on pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
izveigor commented on PR #1759:
URL: https://github.com/apache/avro/pull/1759#issuecomment-1180370645

   Hello, @martin-g!
   Thanks to the response!
   
   I added two test cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1759:
URL: https://github.com/apache/avro/pull/1759#discussion_r920060990


##########
lang/py/avro/schema.py:
##########
@@ -570,7 +570,12 @@ def __init__(
                     raise avro.errors.InvalidName("An enum symbol must be a valid schema name.")
 
         if len(set(symbols)) < len(symbols):
-            raise avro.errors.AvroException(f"Duplicate symbol: {symbols}")
+            duplicate_symbols = {symbol for symbol in symbols if symbols.count(symbol) > 1}
+
+            if len(duplicate_symbols) == 1:

Review Comment:
   Sorry, i didn't see you were in case of existing duplicates symbols `if len(set(symbols)) < len(symbols):`, so else part can't be for 0. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] izveigor commented on pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
izveigor commented on PR #1759:
URL: https://github.com/apache/avro/pull/1759#issuecomment-1180443784

   Hello, @martin-g!
   
   I fixed test by case intersection. I think escape should be in the code because it is more correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] jjaakola-aiven commented on a diff in pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
jjaakola-aiven commented on code in PR #1759:
URL: https://github.com/apache/avro/pull/1759#discussion_r1003412161


##########
lang/py/avro/test/test_schema.py:
##########
@@ -713,6 +713,22 @@ def test_parse_invalid_symbol(self):
         except avro.errors.InvalidName:  # pragma: no coverage
             self.fail("When enum symbol validation is disabled, an invalid symbol should not raise InvalidName.")
 
+    def test_parse_duplicate_symbol(self):
+        duplicate_symbol_schema = json.dumps({"type": "enum", "name": "AVRO3573", "symbols": ["A", "A", "B", "C", "D"]})
+        with self.assertRaisesRegex(
+            avro.errors.AvroException, r"Duplicate symbol: \[\'A\'\]", msg="When enum symbol has a duplicate, AvroException should raise."
+        ):
+            avro.schema.parse(duplicate_symbol_schema)
+
+    def test_parse_duplicate_symbols(self):
+        duplicate_symbols_schema = json.dumps({"type": "enum", "name": "AVRO3573", "symbols": ["A", "A", "B", "C", "C", "D"]})
+        with self.assertRaisesRegex(
+            avro.errors.AvroException,
+            r"Duplicate symbols: (\[\'A\', \'C\'\])|(\[\'C\', \'A\'\])",

Review Comment:
   Is there a case where the resulting order in error message would be backwards?
   If not I'd prefer for readability that error message and assertion would be exact. E.g.
   ```python
       with self.assertRaises(avro.errors.AvroException, msg="Should raise AvroException") as exc_info:
               avro.schema.parse(duplicate_symbols_schema)
       assert str(exc_info.exception) == "Duplicate symbols: (['A', 'C'])"
   ```
   



##########
lang/py/avro/schema.py:
##########
@@ -570,7 +570,12 @@ def __init__(
                     raise avro.errors.InvalidName("An enum symbol must be a valid schema name.")
 
         if len(set(symbols)) < len(symbols):
-            raise avro.errors.AvroException(f"Duplicate symbol: {symbols}")
+            duplicate_symbols = {symbol for symbol in symbols if symbols.count(symbol) > 1}
+
+            if len(duplicate_symbols) == 1:
+                raise avro.errors.AvroException(f"Duplicate symbol: {list(duplicate_symbols)}")
+            else:
+                raise avro.errors.AvroException(f"Duplicate symbols: {list(duplicate_symbols)}")

Review Comment:
   Java gives an error message of `Duplicate enum symbol: A` and reports only single duplicate. I am not sure if the messages should be aligned, just noting the difference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] izveigor commented on pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
izveigor commented on PR #1759:
URL: https://github.com/apache/avro/pull/1759#issuecomment-1262048415

   Are you still interesting in these changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] izveigor commented on a diff in pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
izveigor commented on code in PR #1759:
URL: https://github.com/apache/avro/pull/1759#discussion_r919851566


##########
lang/py/avro/schema.py:
##########
@@ -570,7 +570,12 @@ def __init__(
                     raise avro.errors.InvalidName("An enum symbol must be a valid schema name.")
 
         if len(set(symbols)) < len(symbols):
-            raise avro.errors.AvroException(f"Duplicate symbol: {symbols}")
+            duplicate_symbols = {symbol for symbol in symbols if symbols.count(symbol) > 1}
+
+            if len(duplicate_symbols) == 1:

Review Comment:
   Hello, @clesaec!
   
   I do not really understand what you want to say. if we have 0 duplicates symbols, a length of set will less than a length of list and an error will not raise. if we have 1 duplicate symbol, the application shows the list with only one symbol. If we have more than 1 duplicate symbol, the application shows the list of duplicates symbols.
   
   The message for 1 duplicate symbol differs from the message for more 1 duplicates symbols only the grammar of the language.
   
   If you worry about format of output (list), then many libraries use this format both for one symbol and for many symbols. For example, marshmallow: https://github.com/marshmallow-code/marshmallow/blob/dev/src/marshmallow/schema.py#L1009-L1018



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] izveigor commented on pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
izveigor commented on PR #1759:
URL: https://github.com/apache/avro/pull/1759#issuecomment-1291675519

   Because I deleted my fork, I can not change this pull request. I created new pull request to this problem: https://github.com/apache/avro/pull/1933.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] izveigor closed pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
izveigor closed pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)
URL: https://github.com/apache/avro/pull/1759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on a diff in pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
martin-g commented on code in PR #1759:
URL: https://github.com/apache/avro/pull/1759#discussion_r917905621


##########
lang/py/avro/test/test_schema.py:
##########
@@ -713,6 +713,20 @@ def test_parse_invalid_symbol(self):
         except avro.errors.InvalidName:  # pragma: no coverage
             self.fail("When enum symbol validation is disabled, an invalid symbol should not raise InvalidName.")
 
+    def test_parse_duplicate_symbol(self):
+        duplicate_symbol_schema = json.dumps({"type": "enum", "name": "AVRO3573", "symbols": ["A", "A", "B", "C", "D"]})
+        with self.assertRaisesRegex(
+            avro.errors.AvroException, r"Duplicate symbol: \[\'A\'\]", msg="When enum symbol has a duplicate, AvroException should raise."

Review Comment:
   It seems the escaping is not needed here



##########
lang/py/avro/test/test_schema.py:
##########
@@ -713,6 +713,20 @@ def test_parse_invalid_symbol(self):
         except avro.errors.InvalidName:  # pragma: no coverage
             self.fail("When enum symbol validation is disabled, an invalid symbol should not raise InvalidName.")
 
+    def test_parse_duplicate_symbol(self):
+        duplicate_symbol_schema = json.dumps({"type": "enum", "name": "AVRO3573", "symbols": ["A", "A", "B", "C", "D"]})
+        with self.assertRaisesRegex(
+            avro.errors.AvroException, r"Duplicate symbol: \[\'A\'\]", msg="When enum symbol has a duplicate, AvroException should raise."
+        ):
+            avro.schema.parse(duplicate_symbol_schema)
+
+    def test_parse_duplicate_symbols(self):
+        duplicate_symbols_schema = json.dumps({"type": "enum", "name": "AVRO3573", "symbols": ["A", "A", "B", "C", "C", "D"]})
+        with self.assertRaisesRegex(
+            avro.errors.AvroException, r"Duplicate symbols: \[\'A\', \'C\'\]", msg="When enum symbols have duplicates, AvroException should raise."

Review Comment:
   and here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1759:
URL: https://github.com/apache/avro/pull/1759#discussion_r919760019


##########
lang/py/avro/schema.py:
##########
@@ -570,7 +570,12 @@ def __init__(
                     raise avro.errors.InvalidName("An enum symbol must be a valid schema name.")
 
         if len(set(symbols)) < len(symbols):
-            raise avro.errors.AvroException(f"Duplicate symbol: {symbols}")
+            duplicate_symbols = {symbol for symbol in symbols if symbols.count(symbol) > 1}
+
+            if len(duplicate_symbols) == 1:

Review Comment:
   What if there are 2 duplicate symbols ["A", "A", "B", "B"]
   shouldn't it be : 
   `if len(duplicate_symbols) > 0:`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] izveigor closed pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

Posted by GitBox <gi...@apache.org>.
izveigor closed pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)
URL: https://github.com/apache/avro/pull/1759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org