You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Alexandre Normand (JIRA)" <ji...@apache.org> on 2012/08/31 20:31:07 UTC
[jira] [Commented] (AVRO-1145) Can't read union of null and
primitive from value written with schema as primitive
[ https://issues.apache.org/jira/browse/AVRO-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446211#comment-13446211 ]
Alexandre Normand commented on AVRO-1145:
-----------------------------------------
The documentation in [parsing.html|http://svn.apache.org/viewvc/avro/trunk/lang/java/avro/src/main/java/org/apache/avro/io/parsing/doc-files/parsing.html?view=log] seems to suggest that we should have a {{ReaderUnionAction}} to handle this case. I naively expected to find an action class called {{ReaderUnionAction}} much in the same way there is a {{WriterUnionAction}}.
Is that something that's simply missing from the current implementation?
> Can't read union of null and primitive from value written with schema as primitive
> ----------------------------------------------------------------------------------
>
> Key: AVRO-1145
> URL: https://issues.apache.org/jira/browse/AVRO-1145
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.0
> Reporter: Alexandre Normand
> Attachments: TestPrimitiveToUnionResolution.java
>
>
> Using the its Java's generic representation API and I have a problem dealing with our current case of schema evolution. The scenario we're dealing with here is making a primitive-type field optional by changing the field to be a {{union}} of {{null}} and that primitive type.
> I'm going to use a simple example. Basically, our schemas are:
> Initial: A record with one field of type {{int}}
> Second version: Same record, same field name but the type is now a union of {{null}} and {{int}}
> According to the [schema resolution|http://avro.apache.org/docs/current/spec.html#Schema+Resolution] chapter of Avro's spec, the resolution for such a case should be:
> {code}
> if reader's is a union, but writer's is not
> The first schema in the reader's union that matches
> the writer's schema is recursively resolved against
> it. If none match, an error is signalled.
> {code}
> My interpretation is that we should resolve data serialized with the initial schema properly as int is part of the union in the reader's schema.
> However, when running a test of reading back a record serialized with version 1 using the version 2, I get
> *{{org.apache.avro.AvroTypeException: Attempt to process a int when a union was expected.}}*
> Here's a test that shows exactly this:
> {code}
> @Test
> public void testReadingUnionFromValueWrittenAsPrimitive() throws Exception {
> Schema writerSchema = new Schema.Parser().parse("{\n" +
> " \"type\":\"record\",\n" +
> " \"name\":\"NeighborComparisons\",\n" +
> " \"fields\": [\n" +
> " {\"name\": \"test\",\n" +
> " \"type\": \"int\" }]} ");
> Schema readersSchema = new Schema.Parser().parse(" {\n" +
> " \"type\":\"record\",\n" +
> " \"name\":\"NeighborComparisons\",\n" +
> " \"fields\": [ {\n" +
> " \"name\": \"test\",\n" +
> " \"type\": [\"null\", \"int\"],\n" +
> " \"default\": null } ] }");
> // Writing a record using the initial schema with the
> // test field defined as an int
> GenericData.Record record = new GenericData.Record(writerSchema);
> record.put("test", Integer.valueOf(10));
> ByteArrayOutputStream output = new ByteArrayOutputStream();
> JsonEncoder jsonEncoder = EncoderFactory.get().
> jsonEncoder(writerSchema, output);
> GenericDatumWriter<GenericData.Record> writer = new
> GenericDatumWriter<GenericData.Record>(writerSchema);
> writer.write(record, jsonEncoder);
> jsonEncoder.flush();
> output.flush();
> System.out.println(output.toString());
> // We try reading it back using the second schema
> // version where the test field is defined as a union of null and int
> JsonDecoder jsonDecoder = DecoderFactory.get().
> jsonDecoder(readersSchema, output.toString());
> GenericDatumReader<GenericData.Record> reader =
> new GenericDatumReader<GenericData.Record>(writerSchema,
> readersSchema);
> GenericData.Record read = reader.read(null, jsonDecoder);
> // We should be able to assert that the value is 10 but it
> // fails on reading the record before getting here
> assertEquals(10, read.get("test"));
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira