You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@avro.apache.org by "Ryan Skraba (Jira)" <ji...@apache.org> on 2023/06/15 17:21:00 UTC

[jira] [Commented] (AVRO-3760) Using enum with default symbol, cannot parse future value

    [ https://issues.apache.org/jira/browse/AVRO-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733178#comment-17733178 ] 

Ryan Skraba commented on AVRO-3760:
-----------------------------------

Hello!  I think that your example code might be wrong, but the bug exists regardless.

If I understand the intention of AVRO-1340 correctly, the test code must have both the reader and writer schema present in order to use the default:

{code}
    reader = DatumReader(future_schema, current_schema)
{code}

In this case you still get an exception: **avro.errors.SchemaResolutionException: Symbol crc32_be not present in Reader's Schema**

It's a bit vague to me, but my understanding is that the default in an enum is meant to serve as the "fail safely" value when a symbol is removed during schema evolution, but not when on corrupted or unexpected data (like an enum index out of bounds).  Would it suit your needs if the default symbol was only used when both schemas are known?

To be clear, the bug is still valid, just the implementation would change.

> Using enum with default symbol, cannot parse future value
> ---------------------------------------------------------
>
>                 Key: AVRO-3760
>                 URL: https://issues.apache.org/jira/browse/AVRO-3760
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.11.1
>         Environment: {code}
> $ pip freeze | grep -i avro
> avro==1.11.1
> $ python --version
> Python 3.8.16
> {code}
>            Reporter: Anton Agestam
>            Assignee: Anton Agestam
>            Priority: Major
>             Fix For: 1.11.2
>
>
> It seems like support for default symbols is broken. In the example below, since I'm using default symbols, I expected to be able to add new values to the enum and see the default value when parsing using the old schema.
> {code:python}
> import io
> from avro.io import DatumReader, DatumWriter, BinaryDecoder, BinaryEncoder
> import avro.schema
> current_schema = avro.schema.parse("""
> {
>     "fields": [
>         {
>             "default": "unknown",
>             "name": "checksum_algorithm",
>             "type": {
>                 "name": "ChecksumAlgorithm",
>                 "symbols": [
>                     "unknown",
>                     "xxhash3_64_be"
>                 ],
>                 "type": "enum",
>                 "default": "unknown"
>             }
>         }
>     ],
>     "name": "Metadata",
>     "type": "record"
> }
> """)
> # Future schema adds the "crc32_be" symbol.
> future_schema = avro.schema.parse("""
> {
>     "fields": [
>         {
>             "default": "unknown",
>             "name": "checksum_algorithm",
>             "type": {
>                 "name": "ChecksumAlgorithm",
>                 "symbols": [
>                     "unknown",
>                     "xxhash3_64_be",
>                     "crc32_be"
>                 ],
>                 "type": "enum",
>                 "default": "unknown"
>             }
>         }
>     ],
>     "name": "Metadata",
>     "type": "record"
> }
> """)
> with io.BytesIO() as buffer:
>     writer = DatumWriter(future_schema)
>     encoder = BinaryEncoder(buffer)
>     writer.write({"checksum_algorithm": "crc32_be"}, encoder)
>     buffer.seek(0)
>     reader = DatumReader(current_schema)
>     decoder = BinaryDecoder(buffer)
>     decoded = reader.read(decoder)
> print(decoded)
> {code}
> Instead, this results in an exception:
> {code}
> Traceback (most recent call last):
>   File "reproduce-avro.py", line 58, in <module>
>     decoded = reader.read(decoder)
>   File "/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py", line 649, in read
>     return self.read_data(self.writers_schema, self.readers_schema, decoder)
>   File "/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py", line 727, in read_data
>     return self.read_record(writers_schema, readers_schema, decoder)
>   File "/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py", line 922, in read_record
>     field_val = self.read_data(field.type, readers_field.type, decoder)
>   File "/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py", line 720, in read_data
>     return self.read_enum(writers_schema, readers_schema, decoder)
>   File "/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py", line 779, in read_enum
>     raise avro.errors.SchemaResolutionException(
> avro.errors.SchemaResolutionException: Can't access enum index 2 for enum with 2 symbols
> Writer's Schema: {
>   "type": "enum",
>   "default": "unknown",
>   "name": "ChecksumAlgorithm",
>   "symbols": [
>     "unknown",
>     "xxhash3_64_be"
>   ]
> }
> Reader's Schema: {
>   "type": "enum",
>   "default": "unknown",
>   "name": "ChecksumAlgorithm",
>   "symbols": [
>     "unknown",
>     "xxhash3_64_be"
>   ]
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)