You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Ryan Blue (JIRA)" <ji...@apache.org> on 2016/09/05 16:42:21 UTC

[jira] [Comment Edited] (AVRO-1891) Generated Java code fails with union containing logical type

    [ https://issues.apache.org/jira/browse/AVRO-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15463687#comment-15463687 ] 

Ryan Blue edited comment on AVRO-1891 at 9/5/16 4:42 PM:
---------------------------------------------------------

I think all this requires is keeping a set of conversions that should be applied when reading or writing a specific class. Unlike generic where the conversions are determined by the data model at runtime, the conversions that should be used for a specific class are determined at compile time. We have the benefit of knowing that the compiler either added conversions for all instances of a logical type, or for none of them. So we only need to know the set of conversions the compiler had set up when a class was compiled.

Rather than relying on the set of conversions the SpecificData instance has configured, I think we should keep the set of conversions for the class being written or read. So we don't need to change how SpecificData looks up conversions, just the way the SpecificDatumReader/Writer does to avoid looking them up in the data model. (I agree with Doug and don't see an advantage of adding a conversion resolver.)

What about this:

* Maintain a thread-local reference to the current specific record class in SpecificDatumReader
* Add a static conversion map to each specific class with its conversions (generated code)
* Add conversion lookup methods to GenericDatumReader that delegate to GenericData
* Override the conversion lookup methods in SpecificDatumReader that use the current record class's set of conversions instead.

This way, there are no changes to how the data model lookups work, little generated code (just an annotation to conversion map), and few changes to the datum reader and writers. What do you guys think? I think this would be a bit smaller patch. I'll try to put it together tomorrow if I have time.


was (Author: rdblue):
I think all this requires is keeping a set of conversions that should be applied when reading or writing a specific class. Unlike generic where the conversions are determined by the data model at runtime, the conversions that should be used for a specific class are determined at compile time. We have the benefit of knowing that the compiler either added conversions for all instances of a logical type, or for none of them. So we only need to know the set of conversions the compiler had set up when a class was compiled.

Rather than relying on the set of conversions the SpecificData instance has configured, I think we should keep the set of conversions for the class being written or read. So we don't need to change how SpecificData looks up conversions, just the way the SpecificDatumReader/Writer does to avoid looking them up in the data model. (I agree with Doug and don't see an advantage of adding a conversion resolver.)

What about this:

* Maintain a thread-local reference to the current specific record class in GenericDatumReader
* Add a static conversion map to each specific class with its conversions (generated code)
* Add conversion lookup methods to GenericDatumReader that delegate to GenericData
* Override the conversion lookup methods in SpecificDatumReader that use the current record class's set of conversions instead.

This way, there are no changes to how the data model lookups work, little generated code (just an annotation to conversion map), and few changes to the datum reader and writers. What do you guys think? I think this would be a bit smaller patch. I'll try to put it together tomorrow if I have time.

> Generated Java code fails with union containing logical type
> ------------------------------------------------------------
>
>                 Key: AVRO-1891
>                 URL: https://issues.apache.org/jira/browse/AVRO-1891
>             Project: Avro
>          Issue Type: Bug
>          Components: java, logical types
>    Affects Versions: 1.8.1
>            Reporter: Ross Black
>            Priority: Blocker
>             Fix For: 1.8.3
>
>         Attachments: AVRO-1891.patch, AVRO-1891.yshi.1.patch, AVRO-1891.yshi.2.patch, AVRO-1891.yshi.3.patch, AVRO-1891.yshi.4.patch
>
>
> Example schema:
> {code}
>     {
>       "type": "record",
>       "name": "RecordV1",
>       "namespace": "org.brasslock.event",
>       "fields": [
>         { "name": "first", "type": ["null", {"type": "long", "logicalType":"timestamp-millis"}]}
>       ]
>     }
> {code}
> The avro compiler generates a field using the relevant joda class:
> {code}
>     public org.joda.time.DateTime first
> {code}
> Running the following code to perform encoding:
> {code}
>         final RecordV1 record = new RecordV1(DateTime.parse("2016-07-29T10:15:30.00Z"));
>         final DatumWriter<RecordV1> datumWriter = new SpecificDatumWriter<>(record.getSchema());
>         final ByteArrayOutputStream stream = new ByteArrayOutputStream(8192);
>         final BinaryEncoder encoder = EncoderFactory.get().directBinaryEncoder(stream, null);
>         datumWriter.write(record, encoder);
>         encoder.flush();
>         final byte[] bytes = stream.toByteArray();
> {code}
> fails with the exception stacktrace:
> {code}
>  org.apache.avro.AvroRuntimeException: Unknown datum type org.joda.time.DateTime: 2016-07-29T10:15:30.000Z
>     at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:741)
>     at org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:293)
>     at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:706)
>     at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:192)
>     at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:110)
>     at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:87)
>     at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
>     at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
>     at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
>     at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
>     at org.brasslock.avro.compiler.GeneratedRecordTest.shouldEncodeLogicalTypeInUnion(GeneratedRecordTest.java:82)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>     at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>     at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>     at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>     at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:117)
>     at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
>     at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:253)
>     at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:84)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> {code}
> The failure can be fixed by explicitly adding the relevant conversion(s) to DatumWriter / SpecificData:
> {code}
>         final RecordV1 record = new RecordV1(DateTime.parse("2007-12-03T10:15:30.00Z"));
>         final SpecificData specificData = new SpecificData();
>         specificData.addLogicalTypeConversion(new TimeConversions.TimestampConversion());
>         final DatumWriter<RecordV1> datumWriter = new SpecificDatumWriter<>(record.getSchema(), specificData);
>         final ByteArrayOutputStream stream = new ByteArrayOutputStream(AvroUtil.DEFAULT_BUFFER_SIZE);
>         final BinaryEncoder encoder = EncoderFactory.get().directBinaryEncoder(stream, null);
>         datumWriter.write(record, encoder);
>         encoder.flush();
>         final byte[] bytes = stream.toByteArray();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)