You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Felix GV (JIRA)" <ji...@apache.org> on 2017/09/02 14:53:00 UTC

[jira] [Comment Edited] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

    [ https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151515#comment-16151515 ] 

Felix GV edited comment on AVRO-1340 at 9/2/17 2:52 PM:
--------------------------------------------------------

It seems to me like symbolAliases offer a lot of overlap with the functionality of the fallbackSymbol. If the desired long-term direction is to have symbolAliases (which I'm still not convinced is useful, but I wouldn't mind having them anyway) then it may be less confusing overall to have JUST symbolAliases and not the fallbackSymbol as well.

Otherwise, having both creates yet another set of edge cases stemming from the combination of the two concepts. While it is definitely possible to come up with appropriate policies for all such edge cases and make the implementation compliant with those policies, I am still wary of the cognitive burden that it will place on developers. A simple API is a very valuable asset.

For example: let's take the HTTP response code logging example from above, assuming [~zolyfarkas]'s V3 as the starting point.

V3:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300"],
"fallbackSymbol": "UNKNOWN"

V4:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300"],
"symbolAliases":{"300":["301", "302"]},
"fallbackSymbol": "UNKNOWN"

V5:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300", "301", "302"],
"symbolAliases":{"300":["301", "302"]},
"fallbackSymbol": "UNKNOWN"

V6:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300", "301", "302"],
"fallbackSymbol": "UNKNOWN"

Which of these schemas are supposed to be compatible with one another, and what is the outcome of sending 301 between the various combinations? Let's take a stab at it:

V5->V3: Do I use the writer's translation rule or the reader's? If I use the writer's, then my 301 will be read as 300, if I use the reader's, then it'll be read as UNKNOWN.

V5->V4: Same fallback and aliases on both the reader and writer. Which of the two rules take precedence over the other?

V5->V6: Both have 301 defined, so no special rules come into play.

V6->V3: 301 is definitely UNKNOWN.

V6->V4: Do I do the opposite as the V5->V3 translation? i.e.: writer's rule changes 301 to UNKNOWN, reader's rule changes 301 to 300. Or do we instead disregard reader and writer as the criterion, and rather give precedence to one type of rule over the other (aliases take precedence over fallback, no matter which side they're defined on, or vice versa). Moving on.

V6->V5: Both have 301 defined, so no special rules come into play.

--

3 of the above translations (V5->V3, V5->V4, V6->V4) have ambiguous behaviour. Do we decide some translation rules for each of them and mark them as compatible? Or do we mark them as incompatible? Either of these seem to leave a bitter taste.

I would much rather have just a single enum evolution mechanism, either fallback, or aliases, but not both.

In that regard, treating the two as separate issues may prevent us from making a holistic design choice.


was (Author: felixgv):
It seems to me like symbolAliases offer a lot of overlap with the functionality of the fallbackSymbol. If the desired long-term direction is to have symbolAliases (which I'm still not convinced is useful, but I wouldn't mind having them anyway) then it may be less confusing overall to have JUST symbolAliases and not the fallbackSymbol as well.

Otherwise, having both creates yet another set of edge cases stemming from the combination of the two concepts. While it is definitely possible to come up with appropriate policies for all such edge cases and make the implementation compliant with those policies, I am still wary of the cognitive burden that it will place on developers. A simple API is a very valuable asset.

For example: let's take the HTTP response code logging example from above, assuming [~zolyfarkas]'s V3 as the starting point.

V3:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300"],
"fallbackSymbol": "UNKNOWN"

V4:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300"],
"symbolAliases": {"300": ["301", "302"]},
"fallbackSymbol": "UNKNOWN"

V5:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300", "301", "302"],
"symbolAliases": {"300": ["301", "302"]},
"fallbackSymbol": "UNKNOWN"

V6:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300", "301", "302"],
"fallbackSymbol": "UNKNOWN"

Which of these schemas are supposed to be compatible with one another, and what is the outcome of sending 301 between the various combinations? Let's take a stab at it:

V5->V3: Do I use the writer's translation rule or the reader's? If I use the writer's, then my 301 will be read as 300, if I use the reader's, then it'll be read as UNKNOWN.

V5->V4: Same fallback and aliases on both the reader and writer. Which of the two rules take precedence over the other?

V5->V6: Both have 301 defined, so no special rules come into play.

V6->V3: 301 is definitely UNKNOWN.

V6->V4: Do I do the opposite as the V5->V3 translation? i.e.: writer's rule changes 301 to UNKNOWN, reader's rule changes 301 to 300. Or do we instead disregard reader and writer as the criterion, and rather give precedence to one type of rule over the other (aliases take precedence over fallback, no matter which side they're defined on, or vice versa). Moving on.

V6->V5: Both have 301 defined, so no special rules come into play.

--

3 of the above translations (V5->V3, V5->V4, V6->V4) have ambiguous behaviour. Do we decide some translation rules for each of them and mark them as compatible? Or do we mark them as incompatible? Either of these seem to leave a bitter taste.

I would much rather have just a single enum evolution mechanism, either fallback, or aliases, but not both.

In that regard, treating the two as separate issues may prevent us from making a holistic design choice.

> use default to allow old readers to specify default enum value when encountering new enum symbols
> -------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1340
>                 URL: https://issues.apache.org/jira/browse/AVRO-1340
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>         Environment: N/A
>            Reporter: Jim Donofrio
>            Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value and keep old reader's compatible. Why not use the default option to refer to one of enum values so that when a old reader encounters a enum ordinal it does not recognize, it can default to the optional schema provided one. If the old schema does not provide a default then the older reader can continue to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)