You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Laymain (JIRA)" <ji...@apache.org> on 2018/07/30 13:50:00 UTC
[jira] [Commented] (AVRO-1811) SpecificData.deepCopy() cannot be
used if schema compiler generated Java objects with Strings instead of UTF8
[ https://issues.apache.org/jira/browse/AVRO-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561940#comment-16561940 ]
Laymain commented on AVRO-1811:
-------------------------------
This issue also affects 1.8.2 version
> SpecificData.deepCopy() cannot be used if schema compiler generated Java objects with Strings instead of UTF8
> -------------------------------------------------------------------------------------------------------------
>
> Key: AVRO-1811
> URL: https://issues.apache.org/jira/browse/AVRO-1811
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.8.0, 1.8.1
> Reporter: Ryon Day
> Assignee: Yibing Shi
> Priority: Critical
> Attachments: AVRO-1811.1.patch
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> When the Avro compiler creates Java objects, you have the option to have them generate fields of type {{string}} with the Java standard {{String}} type, for wide interoperability with existing Java applications and APIs.
> By default, however, the compiler outputs these fields in the Avro-specific {{Utf8}} type, requiring frequent usage of the {{toString()}} method in order for default domain objects to be used with the majority of Java libraries.
> There are two ways to get around this. The first is to annotate every {{string}} field in a schema like so:
> {code}
> {
> "name": "some_string",
> "doc": "a field that is guaranteed to compile to java.lang.String",
> "type": [
> "null",
> {
> "type": "string",
> "avro.java.string": "String"
> }
> ]
> },
> {code}
> Unfortunately, long schemas containing many string fields can be dominated by this annotation by volume; for teams using heterogenous clients, they may to want to avoid Java-specific annotation in their schema files, or may not think to use it unless there exist Java exploiters of the schema at the time the schema is proposed and written.
> The other solution to the problem is to compile the schema into Java objects using the {{SpecificCompiler}}'s string type selection. This option actually alters the schema carried by the object's {{SCHEMA$}} field to have the above annotation in it, ensuring that when used by the Java API, the String type will be used.
> Unfortunately, this method is not interoperable with GenericRecords created by libraries that use the _original_ schema.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> # Create a schema with several {{string}} fields.
> # Parse the schema using the standard Avro schema parser
> # Create Java domain objects for that schema ensuring usage of the {{java.lang.String}} string type.
> # Create a message of some sort that ends up as a {{GenericRecord}} of the original schema
> # Attempt to use {{SpecificData.deepCopy()}} to make a {{SpecificRecord}} out of the {{GenericRecord}}
> There is a unit test that demonstrate this [here|https://github.com/ryonday/avroDecodingHelp/blob/master/1.8.0/src/test/java/com/ryonday/avro/test/v180/AvroDeepCopyTest.java]
> {panel}
> {panel:title=Expected Results|titleBGColor=#AD3|bgColor=#DDD}
> As the schemas are literally identical aside from string type, the conversion should work (and does work for schema that are exactly identical).
> {panel}
> {panel:title=Actual Results|titleBGColor=#D55|bgColor=#DDD}
> {{ClassCastException}} with the message {{org.apache.avro.util.Utf8 cannot be cast to java.lang.String}}
> {panel}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)