You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "John Patrick Boueri (JIRA)" <ji...@apache.org> on 2019/01/31 05:13:00 UTC

[jira] [Commented] (AVRO-1968) Python DatumWriter seems to evaluate union types in reverse order

    [ https://issues.apache.org/jira/browse/AVRO-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756877#comment-16756877 ] 

John Patrick Boueri commented on AVRO-1968:
-------------------------------------------

Hello [~neilf]

I would like fix this issue by breaking out of this loop in the case of the first match

[https://github.com/apache/avro/blob/master/lang/py/src/avro/io.py#L872-L876]

If you agree I can go ahead and make the patch.

 

Thanks!

> Python DatumWriter seems to evaluate union types in reverse order 
> ------------------------------------------------------------------
>
>                 Key: AVRO-1968
>                 URL: https://issues.apache.org/jira/browse/AVRO-1968
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.8.1
>            Reporter: Neil Ferguson
>            Priority: Major
>         Attachments: avro_test.py
>
>
> The Python DatumWriter seems to evaluate types in a union in reverse order. For example, with the following schema:
> {noformat}
>         {
>             "type": "record",
>             "name": "MyRecord",
>             "fields": [
>                 {"name": "my_field", "type": ["boolean", "double"]}
>             ]
>         }
> {noformat}
> If I set my_field to a boolean in my data, it seems to be encoded as a double. However, if I reverse the order of the types in my union ({{["double", "boolean"]}}) it seems to be encoded as a boolean.
> This seems unintuitive for a couple of reasons:
>  * I'd expect the types in the union to be evaluated in the order they are specified, but they seem to be evaluated in reverse order
>  * Encoding a boolean as a double is a bit weird
> I'm not sure if this is a bug or expected behaviour though. If this is the expected behaviour (or it can't be changed without breaking things) then it would be nice if this was documented somewhere (I searched by couldn't find anything), as it's pretty unintuitive.
> I've attached a full test case. The test case encodes and then decodes the data with both the original schema and the reversed version. For me it prints:
> {noformat}
> Type: <type 'float'>
> Type from reversed schema: <type 'bool'>
> {noformat}
> Ideally I'd expect the type to be 'bool' both times, but failing that I'd expect the type to be 'bool' the first time, and 'float' the second time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)