You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Laymain (JIRA)" <ji...@apache.org> on 2018/07/30 13:50:00 UTC
[jira] [Commented] (AVRO-1811) SpecificData.deepCopy() cannot be used if schema compiler generated Java objects with Strings instead of UTF8

    [ https://issues.apache.org/jira/browse/AVRO-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561940#comment-16561940 ] 

Laymain commented on AVRO-1811:
-------------------------------

This issue also affects 1.8.2 version

> SpecificData.deepCopy() cannot be used if schema compiler generated Java objects with Strings instead of UTF8
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1811
>                 URL: https://issues.apache.org/jira/browse/AVRO-1811
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.0, 1.8.1
>            Reporter: Ryon Day
>            Assignee: Yibing Shi
>            Priority: Critical
>         Attachments: AVRO-1811.1.patch
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> When the Avro compiler creates Java objects, you have the option to have them generate fields of type {{string}} with the Java standard {{String}} type, for wide interoperability with existing Java applications and APIs.
> By default, however, the compiler outputs these fields in the Avro-specific {{Utf8}} type, requiring frequent usage of the {{toString()}} method in order for default domain objects to be used with the majority of Java libraries.
> There are two ways to get around this. The first is to annotate every {{string}} field in a schema like so:
> {code}
>     {
>       "name": "some_string",
>       "doc": "a field that is guaranteed to compile to java.lang.String",
>       "type": [
>         "null",
>         {
>           "type": "string",
>           "avro.java.string": "String"
>         }
>       ]
>     },
> {code}
> Unfortunately, long schemas containing many string fields can be dominated by this annotation by volume; for teams using heterogenous clients, they may to want to avoid  Java-specific annotation in their schema files, or may not think to use it unless there exist Java exploiters of the schema at the time the schema is proposed and written.
> The other solution to the problem is to compile the schema into Java objects  using the {{SpecificCompiler}}'s string type selection. This option actually alters the schema carried by the object's {{SCHEMA$}} field to have the above annotation in it, ensuring that when used by the Java API, the String type will be used. 
> Unfortunately, this method is not interoperable with GenericRecords created by libraries that use the _original_ schema.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> # Create a schema with several {{string}} fields.
> # Parse the schema using the standard Avro schema parser
> # Create Java domain objects for that schema ensuring usage of the {{java.lang.String}} string type.
> # Create a message of some sort that ends up as a {{GenericRecord}} of the original schema
> # Attempt to use {{SpecificData.deepCopy()}} to make a {{SpecificRecord}} out of the {{GenericRecord}} 
> There is a unit test that demonstrate this [here|https://github.com/ryonday/avroDecodingHelp/blob/master/1.8.0/src/test/java/com/ryonday/avro/test/v180/AvroDeepCopyTest.java]
> {panel}
> {panel:title=Expected Results|titleBGColor=#AD3|bgColor=#DDD}
> As the schemas are literally identical aside from string type, the conversion should work (and does work for schema that are exactly identical).
> {panel}
> {panel:title=Actual Results|titleBGColor=#D55|bgColor=#DDD}
> {{ClassCastException}} with the message {{org.apache.avro.util.Utf8 cannot be cast to java.lang.String}}
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)