You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Sean Busbey (JIRA)" <ji...@apache.org> on 2012/10/08 16:28:03 UTC

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471575#comment-13471575 ] 

Sean Busbey commented on AVRO-997:
----------------------------------

I just ran into this as part of Hive's integration with Avro 1.7.1 ([HIVE-3538|https://issues.apache.org/jira/browse/HIVE-3538]). AFAICT, Hive only makes use of the Generic API.

GenericData.validate returns true for a record with an Avro enum or a union that contains an Avro enum so long as the result of datum.toString is in the set of elements. Actually serializing works fine for arbitrary incoming values when the field is an Avro enum, but fails in the union case if the value isn't GenericEnumSymbol. In addition to Java enums, this seems likely to come up for GenericData users when they attempt to use Strings.

This seems like a bug for GenericData. I could provide a patch that makes GenericDatumWriter.write consistent with GenericData.validate for the union case, if there's interest. Alternatively, I could provide one that causes the validate and write calls to be stricter wrt to the plain enum case, which I think would help avoid user confusion if enums are only supposed to be GenericEnumSybmol.

Thoughts?
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira