You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@avro.apache.org by GitBox <gi...@apache.org> on 2022/10/24 14:51:34 UTC

[GitHub] [avro] jjaakola-aiven commented on a diff in pull request #1759: AVRO-3573: Duplicate symbols (EnumSchema)

jjaakola-aiven commented on code in PR #1759:
URL: https://github.com/apache/avro/pull/1759#discussion_r1003412161


##########
lang/py/avro/test/test_schema.py:
##########
@@ -713,6 +713,22 @@ def test_parse_invalid_symbol(self):
         except avro.errors.InvalidName:  # pragma: no coverage
             self.fail("When enum symbol validation is disabled, an invalid symbol should not raise InvalidName.")
 
+    def test_parse_duplicate_symbol(self):
+        duplicate_symbol_schema = json.dumps({"type": "enum", "name": "AVRO3573", "symbols": ["A", "A", "B", "C", "D"]})
+        with self.assertRaisesRegex(
+            avro.errors.AvroException, r"Duplicate symbol: \[\'A\'\]", msg="When enum symbol has a duplicate, AvroException should raise."
+        ):
+            avro.schema.parse(duplicate_symbol_schema)
+
+    def test_parse_duplicate_symbols(self):
+        duplicate_symbols_schema = json.dumps({"type": "enum", "name": "AVRO3573", "symbols": ["A", "A", "B", "C", "C", "D"]})
+        with self.assertRaisesRegex(
+            avro.errors.AvroException,
+            r"Duplicate symbols: (\[\'A\', \'C\'\])|(\[\'C\', \'A\'\])",

Review Comment:
   Is there a case where the resulting order in error message would be backwards?
   If not I'd prefer for readability that error message and assertion would be exact. E.g.
   ```python
       with self.assertRaises(avro.errors.AvroException, msg="Should raise AvroException") as exc_info:
               avro.schema.parse(duplicate_symbols_schema)
       assert str(exc_info.exception) == "Duplicate symbols: (['A', 'C'])"
   ```
   



##########
lang/py/avro/schema.py:
##########
@@ -570,7 +570,12 @@ def __init__(
                     raise avro.errors.InvalidName("An enum symbol must be a valid schema name.")
 
         if len(set(symbols)) < len(symbols):
-            raise avro.errors.AvroException(f"Duplicate symbol: {symbols}")
+            duplicate_symbols = {symbol for symbol in symbols if symbols.count(symbol) > 1}
+
+            if len(duplicate_symbols) == 1:
+                raise avro.errors.AvroException(f"Duplicate symbol: {list(duplicate_symbols)}")
+            else:
+                raise avro.errors.AvroException(f"Duplicate symbols: {list(duplicate_symbols)}")

Review Comment:
   Java gives an error message of `Duplicate enum symbol: A` and reports only single duplicate. I am not sure if the messages should be aligned, just noting the difference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org