You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Alexander Malyshevskiy (JIRA)" <ji...@apache.org> on 2015/07/07 14:50:06 UTC

[jira] [Commented] (AVRO-1456) AvroAsTextInputFormat is inconsistent with the Avro JSON Encoding described in the Avro Specification

    [ https://issues.apache.org/jira/browse/AVRO-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616622#comment-14616622 ] 

Alexander Malyshevskiy commented on AVRO-1456:
----------------------------------------------

This is a definite bug in my case. I have avro files in HDFS and use Hadoop Pipes to process entities in my C++ code. My input format is AvroAsTextInputFormat so I could read entities in C++. I also use the ordinal JSON deserialize methods to deserialize those strings into my objects in C++. So I cannot use Unions in my case because I just cannot deserialize JSON strings because of inconsistent with the avro JSON encoding strings that I get.
May be you could point another method to get those entities in my C++ code with schema that uses Unions?

> AvroAsTextInputFormat is inconsistent with the Avro JSON Encoding described in the Avro Specification
> -----------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1456
>                 URL: https://issues.apache.org/jira/browse/AVRO-1456
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.6
>            Reporter: Jamie Olson
>
> org.apache.avro.mapred.AvroAsTextInputFormat relies on the toString() method rather than using org.apache.avro.generic.GenericDatumWriter.write() and org.apache.avro.io.JsonEncoder as in org.apache.avro.tool.DataFileReadTool.  This results in a serialization of the data element, without the fully qualified name as specified in the Avro Specifications JSON Encoding section: http://avro.apache.org/docs/1.7.6/spec.html#json_encoding
> The specification indicates that for a union type: ["null","string","Foo"], data should be serialized with:
> * null as null;
> * the string "a" as {"string": "a"}; and
> * a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding of a Foo instance.
> Instead, AvroAsTextInputFormat is serializing these values as
> * null as null;
> * the string "a" as "a"; and
> * a Foo instance as {...}, where {...} indicates the JSON encoding of a Foo instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)