You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Aaron Kimball (Created) (JIRA)" <ji...@apache.org> on 2012/01/18 17:42:41 UTC

[jira] [Created] (AVRO-997) Union of enum and null cannot be serialized

Union of enum and null cannot be serialized
-------------------------------------------

                 Key: AVRO-997
                 URL: https://issues.apache.org/jira/browse/AVRO-997
             Project: Avro
          Issue Type: Bug
    Affects Versions: 1.5.1
            Reporter: Aaron Kimball


I have a schema like:

{code}
[
{
  "type": "enum",
  "name": "Gender",
  "symbols": ["M", "F"]
},

{
  "type" : "record",
  "name" : "Foo",
  "fields" : [
    { "type" : ["Gender", "null"], "name" : "gender" },
    ...
  ]
}
]
{code}

I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}

When I go to serialize this, I get:
{code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Scott Carey (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188896#comment-13188896 ] 

Scott Carey commented on AVRO-997:
----------------------------------

As a work-around, you may be able to wrap the Enum inside another record.

{code}
[
{
  "type": "enum",
  "name": "Gender",
  "symbols": ["M", "F"]
},

{ "type": "record",
  "name": "GenderWrapper",
  "fields": [
  {"name": "gender", "type":"Gender"}
  ]
},

{
  "type" : "record",
  "name" : "Foo",
  "fields" : [
    { "type" : ["GenderWrapper", "null"], "name" : "gender" },
    ...
  ]
}
]
{code}
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Scott Carey (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188894#comment-13188894 ] 

Scott Carey commented on AVRO-997:
----------------------------------

If that is the case, then this is a bug.  GenericData.resolveUnion() only operates on 'instanceof GenericEnumSymbol' for enums.

This means that an Enum in a union of any kind cannot be serialized in a SpecificRecord.  This is either a bug where SpecificData should override GenericData's behavior and handle the Java enum, or GenericData should handle java Enums as well.

It also means we need better test coverage on SpecificRecords.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Sean Busbey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474548#comment-13474548 ] 

Sean Busbey commented on AVRO-997:
----------------------------------

Scott, I could write a patch that maintains the current behavior for unions of "things that look like enum members when toString is called" and enums, which is to say the non-enum type gets picked if it is present. eg ["string", "long", {"type":"enum", "symbols":["ONE", "TWO", "42"]}] says that "ONE" is a string and 42 is a long. Since it would be consistent with current behavior, a warning in the API about the coercion might be sufficient. It would be less efficient than picking the enum member, but doing that would be just as breaking as reworking the generic api to be strict.

The only time I've seen this come up is when someone has a union field and then wants to mark it as optional so they move it to a union with null. While their original use of Java String for field values was incorrect, the generic api as is would have worked fine up until the addition of the union. In general I'm a fan of "just works" behavior from APIs, but I can understand Doug's desire to favor consistent strict application as a way of steering users away from behavior that will lead to confusing non-roundtrip-able edge cases.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>            Assignee: Sean Busbey
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch, AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Doug Cutting (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192672#comment-13192672 ] 

Doug Cutting commented on AVRO-997:
-----------------------------------

I think in current trunk you'd get "Unknown datum type: M" which is somewhat more informative than the 1.5.1 message.  Maybe this could be improved to something like, "Unknown datum for GenericData: M" to hint that specific, reflect, thrift, protobuf or some other data model might be better?
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Scott Carey (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189311#comment-13189311 ] 

Scott Carey commented on AVRO-997:
----------------------------------

Ah, I missed that test.  

In that case, a GenericDatumWriter may be being used to serialize a SpecificRecord, which won't work.  SpecificRecord however is an instance of IndexedRecord, so the API won't complain and it will work for 'most' operations.


                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Scott Carey (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188873#comment-13188873 ] 

Scott Carey commented on AVRO-997:
----------------------------------

The following works in Avro 1.6.1, not sure about Avro 1.5.1 (it would need to be modified since it uses Schema.Parser)
This does not test an object created by the Specific Record API.

{code}
  @Test
  public void testWriteNullableEnum() throws IOException {
    Schema.Parser p = new Schema.Parser();
    Schema genderSchema = p.parse(
        "{"
        + "\"type\": \"enum\","
        + "\"name\": \"Gender\","
        + "\"symbols\": [\"M\", \"F\"]"
        +"}");
    Schema fooSchema = p.parse(
        "{"
        + "\"type\" : \"record\","
        + "\"name\" : \"Foo\","
        + "\"fields\" : ["
        + "{\"type\" : [\"Gender\", \"null\"], \"name\" : \"gender\" }"
        +"]}");
    
    GenericData.Record foo = new GenericData.Record(fooSchema);
    
    foo.put(0, new GenericData.EnumSymbol(genderSchema, "M"));
    
    GenericDatumWriter<GenericData.Record> w = new GenericDatumWriter<GenericData.Record>(fooSchema);
    
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    Encoder e = EncoderFactory.get().binaryEncoder(baos, null);
    
    w.write(foo, e);
    
  }
{code}


Note that with a SpecificRecord, the enum must be a GenericData.EnumSymbol, it does not currently support a Java enum.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Aaron Kimball (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188900#comment-13188900 ] 

Aaron Kimball commented on AVRO-997:
------------------------------------

For now I'm working around this by adding a third {{Unspecified}} symbol to my enum, and removing the union with null entirely. Thanks for taking a look.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Aaron Kimball (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189382#comment-13189382 ] 

Aaron Kimball commented on AVRO-997:
------------------------------------

Doug, I'm using Avro 1.5.1 as noted in the affectsVersion field of this ticket.

Scott, you're correct: our system uses a single codepath to serialize all values passed in from the client/data layer to the underlying storage. This codepath accepts generic and specific values, and we currently always allocate a GenericDatumWriter for each schema. At least until this particular schema, this has always worked.

Is this an illegal use of GenericDatumWriter? If so, maybe its write() method could warn (or more strongly, throw an exception) if an instance of SpecificRecord is passed as an argument?
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474376#comment-13474376 ] 

Doug Cutting commented on AVRO-997:
-----------------------------------

Representations of the different types must be distinguishable at runtime to correctly write unions.  If you have a union like ["string", {"type":"enum", ...}] and an enum symbol could be represented as a string then you wouldn't know which branch of the union was intended.  (The implementation of union writing uses a hash-table keyed by type name, so types are not tried in order.)

Currently generic accepts strings when writing enums outside of unions, but when writing a union an enum must be represented as a GenericEnumSymbol.  Enum values are always read by generic as GenericEnumSymbol.  The patch I attached changes generic to be consistent, so that strings are not acceptable as enum symbols anywhere, a GenericEnumSymbol must always be used.  With specific & reflect this is extended to include Java Enum types.

                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>            Assignee: Sean Busbey
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch, AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471802#comment-13471802 ] 

Doug Cutting commented on AVRO-997:
-----------------------------------

> I can add some tests that fail suitably.

Great!  Thanks!

> would it be fine if I added a constructor to GenericData.EnumSymbol that allowed for Java Enums for the passed in symbol?

I don't see a problem with that.  We might have a constructor that accepts Object and calls toString()?
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473831#comment-13473831 ] 

Scott Carey commented on AVRO-997:
----------------------------------

Without looking at this in depth:

Why is it not a good idea to let the Generic API use a String or Java Enum?  It would not break compatibility.  What is the downside of expanding what GenericDatumWriter can handle for Enums?

                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>            Assignee: Sean Busbey
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch, AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Sean Busbey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474573#comment-13474573 ] 

Sean Busbey commented on AVRO-997:
----------------------------------

Oh no, I definitely think the current behavior is a bug. Specifically having validate pass but write fail is incorrect. I also think that my enum shouldn't stop working just because it's in a union, omitting ambiguities such as the string-enum union.

Making everything consistent by checking to see if the toString is in the enum member of a union after [GenericDate.resolveUnion has done the existing membership checks (~line 574)|http://svn.apache.org/viewvc/avro/trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java?view=markup&pathrev=1384500] would solve the part of this that I consider a bug, while making things easier for existing users (like the Hive issue that brought me here). I'll post a patch tomorrow showing how this would work (and an alternate set of tests that would correspond to the failing prior to the fix).

I think in any case there'll be a documentation component, either calling out the use of GenericEnumSymbol or warning users about the possibly surprising case of enum members ending up as avro strings in serialized data.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>            Assignee: Sean Busbey
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch, AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Sean Busbey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Busbey updated AVRO-997:
-----------------------------

    Attachment: AVRO-997.patch

Updated patch with tests that fail if GenericData.validate or GenericDatumWriter.write allow the use of String/Java Enum with an Avro enum schema.

Also adds a constructor to GenericData.EnumSymbol to ease proper use of the generic api.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch, AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-997:
------------------------------

    Attachment: AVRO-997.patch

I think GenericData#validate() and GenericDatumWriter#writeEnum() could both be stricter, failing when isEnum(datum) is false.  That would be an incompatible change though, possibly breaking applications.

Here's the patch I used to try this.  Tests pass.  We should probably add some tests that fail without this change too before committing something like this.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Sean Busbey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494555#comment-13494555 ] 

Sean Busbey commented on AVRO-997:
----------------------------------

The only reason I think enums are different is because I've seen people use them differently to date. Going with a consistent, stict enforcement would be fine; the only down side is pain to existing users. I only included the patch so we have a clearer view of the additional maintenance cost of maintaining compatibility.

I'm more concerned that _one_ of the patches get applied then I am with which one it is. I'm fine with fixing Hive downstream whichever way this goes.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>            Assignee: Sean Busbey
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch, AVRO-997.patch, AVRO-997.permissive-generic-api.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Doug Cutting (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189446#comment-13189446 ] 

Doug Cutting commented on AVRO-997:
-----------------------------------

SpecificDatumWriter can write both generic and specific instances, while GenericDatumWriter is only intended to correctly write generic instances.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-997:
------------------------------

    Assignee: Sean Busbey
      Status: Patch Available  (was: Open)

Thanks, Sean.  The tests look great.

I'll commit this soon unless there are objections.

I think the minor incompatible runtime change here should cause our next release to be called 1.8.0 rather than 1.7.3.  But I'll delay changing the Fix-for on other 1.7.3 issues for now though in case we decide to do a compatible 1.7.3 bugfix release.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>            Assignee: Sean Busbey
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch, AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474561#comment-13474561 ] 

Doug Cutting commented on AVRO-997:
-----------------------------------

Sean: To be clear, I won't be upset if we don't commit this.  I don't feel very strongly one way or the other.  I thought that you felt the current implementation was a bug.  I wrote the patch to see how hard this would be to change, and, having it written it, posted it.  Right now the implementation is lenient, only requiring GenericEnumSymbol when writing a union containing an enum, otherwise accepting anything whose toString() returns a matching symbol.  That can be confusing, but it also simplifies many programs.

Maybe this is just a documentation issue?
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>            Assignee: Sean Busbey
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch, AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Aaron Kimball (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188881#comment-13188881 ] 

Aaron Kimball commented on AVRO-997:
------------------------------------

{{Foo}} is a SpecificRecord. The {{Gender}} type generated by Avro is a Java {{enum}}; is this a bug in the Avro code generator, or am I just misusing it?

                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Sean Busbey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471791#comment-13471791 ] 

Sean Busbey commented on AVRO-997:
----------------------------------

I can add some tests that fail suitably. To ease fixing existing users of the generic api, would it be fine if I added a constructor to GenericData.EnumSymbol that allowed for Java Enums for the passed in symbol?
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Sean Busbey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Busbey updated AVRO-997:
-----------------------------

    Attachment: AVRO-997.permissive-generic-api.patch

Alternative non-breaking patch. Makes the generic api more permissive about use of things other than GenericEnumSymbol with enum schemas. See [review board|https://reviews.apache.org/r/7606/] for details.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>            Assignee: Sean Busbey
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch, AVRO-997.patch, AVRO-997.permissive-generic-api.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Aaron Kimball (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189451#comment-13189451 ] 

Aaron Kimball commented on AVRO-997:
------------------------------------

Thanks. In that case, this might be resolved as Invalid, unless you think it merits an exception, documentation fix, etc. as I suggest above.
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Doug Cutting (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189289#comment-13189289 ] 

Doug Cutting commented on AVRO-997:
-----------------------------------

SpecificData#isEnum() overrides GenericData#isEnum() and returns true for instances of Enum.  So this should work.  Also, TestSpecificDatumWriter#testResolveUnion writes and reads an instance of a record with a field whose type is a union of "null" and an enum using generated, specific code.  So this appears to be tested.

Aaron, can you provide a simple, complete test that fails?  Also, what version of Avro are you using?
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492693#comment-13492693 ] 

Doug Cutting commented on AVRO-997:
-----------------------------------

Sean, I prefer the simpler more consistent approach of my patch.  All other schemas use distinct types in Java, why shouldn't enums?
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>            Assignee: Sean Busbey
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch, AVRO-997.patch, AVRO-997.permissive-generic-api.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Sean Busbey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471575#comment-13471575 ] 

Sean Busbey commented on AVRO-997:
----------------------------------

I just ran into this as part of Hive's integration with Avro 1.7.1 ([HIVE-3538|https://issues.apache.org/jira/browse/HIVE-3538]). AFAICT, Hive only makes use of the Generic API.

GenericData.validate returns true for a record with an Avro enum or a union that contains an Avro enum so long as the result of datum.toString is in the set of elements. Actually serializing works fine for arbitrary incoming values when the field is an Avro enum, but fails in the union case if the value isn't GenericEnumSymbol. In addition to Java enums, this seems likely to come up for GenericData users when they attempt to use Strings.

This seems like a bug for GenericData. I could provide a patch that makes GenericDatumWriter.write consistent with GenericData.validate for the union case, if there's interest. Alternatively, I could provide one that causes the validate and write calls to be stricter wrt to the plain enum case, which I think would help avoid user confusion if enums are only supposed to be GenericEnumSybmol.

Thoughts?
                
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-997) Union of enum and null cannot be serialized

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-997:
------------------------------

    Fix Version/s: 1.8.0
     Hadoop Flags: Incompatible change
    
> Union of enum and null cannot be serialized
> -------------------------------------------
>
>                 Key: AVRO-997
>                 URL: https://issues.apache.org/jira/browse/AVRO-997
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Aaron Kimball
>             Fix For: 1.8.0
>
>         Attachments: AVRO-997.patch
>
>
> I have a schema like:
> {code}
> [
> {
>   "type": "enum",
>   "name": "Gender",
>   "symbols": ["M", "F"]
> },
> {
>   "type" : "record",
>   "name" : "Foo",
>   "fields" : [
>     { "type" : ["Gender", "null"], "name" : "gender" },
>     ...
>   ]
> }
> ]
> {code}
> I build a record like {{Foo foo = new Foo(); foo.gender = Gender.M;}}
> When I go to serialize this, I get:
> {code}Not in union [{"type":"enum","name":"Gender","symbols":["M","F"]},"null"]: M
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:482)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:70)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira