You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Oleksii Shevtsov (Jira)" <ji...@apache.org> on 2022/12/21 13:30:00 UTC

[jira] [Created] (AVRO-3694) Correlate messages with locations in reader/writer schema compatibility check results

Oleksii Shevtsov created AVRO-3694:
--------------------------------------

             Summary: Correlate messages with locations in reader/writer schema compatibility check results
                 Key: AVRO-3694
                 URL: https://issues.apache.org/jira/browse/AVRO-3694
             Project: Apache Avro
          Issue Type: Improvement
          Components: python
            Reporter: Oleksii Shevtsov


There is an issue with the class {*}SchemaCompatibilityResult{*}, defined in {*}compatibility.py{*}:
{code:java}
class SchemaCompatibilityResult:
    def __init__(
        self,
        compatibility: SchemaCompatibilityType = SchemaCompatibilityType.recursion_in_progress,
        incompatibilities: Optional[List[SchemaIncompatibilityType]] = None,
        messages: Optional[Set[str]] = None,
        locations: Optional[Set[str]] = None,
    ):
        self.locations = locations or {"/"}
        self.messages = messages or set()
        self.compatibility = compatibility
        self.incompatibilities = incompatibilities or []{code}
Here, *locations* and *messages* are defined as python sets and therefore are unordered. When a compatibility check is made between a reader and a writer schema, the check is made recursively, and results of the above type are merged together for each incompatibility found. The problem is that locations and messages must go in pairs, while they are defined as separate attributes, and are currently merged as follows, see {*}compatibility.py{*}:
{code:java}
def merge(this: SchemaCompatibilityResult, that: SchemaCompatibilityResult) -> SchemaCompatibilityResult:
    ...
        messages = this.messages.union(that.messages)
        locations = this.locations.union(that.locations)
    ...{code}
Since python sets are not ordered, it is possible to get *messages* that are not in sync with their {*}locations{*}.
h2. Proposed solution
Encapsulate `location` and `message` into a simple data class to keep these two pieces of information together.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)