You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2010/10/06 00:08:32 UTC

[jira] Updated: (AVRO-672) Convert JSON Text Input to Avro Tool

     [ https://issues.apache.org/jira/browse/AVRO-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-672:
------------------------------

    Attachment: AVRO-672.patch

It might be confusing to provide two different JSON encodings for Avro data.  Also, the encoding in your patch is indeed simpler, but can lose information.  For example, a string that looks like base64-encoded binary data would be assumed by Jackson to be binary data, which might not always be the case.  Schemas that include fixed or enum values are not supported by this encoding, nor are many unions.

If reading and writing arbitrary JSON is a priority, then the approach taken in AVRO-251 might be of interest.  Here's a patch that provides a DatumReader and DatumWriter for Jackson's JsonNode.  This uses a schema that permits arbitrary JSON data.  Would this be useful to you?  If so, we could provide it as a tool.

> Convert JSON Text Input to Avro Tool
> ------------------------------------
>
>                 Key: AVRO-672
>                 URL: https://issues.apache.org/jira/browse/AVRO-672
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Ron Bodkin
>         Attachments: AVRO-672.patch, AVRO-672.patch
>
>
> The attached patch allows reading a JSON-formatted text file in, converting to a conforming Avro text file, emitting one record per line, e.g., it can read this input file:
> {"intval":12}
> {"intval":-73,"strval":"hello, there!!"}
> with this schema:
> { "type":"record", "name":"TestRecord", "fields": [ {"name":"intval","type":"int"}, {"name":"strval","type":["string", "null"]}]}
> returning valid Avro. This is different than the DataFileWriteTool, which would read in the following internal encoding:
> {"intval":12,"strval":null}
> {"intval":-73,"strval":{"string":"hello, there!!"}}
> In general, the internal encodings used by Avro aren't natural when reading in JSON text that appears in the wild. Likewise, this utility allows changing invalid Avro identifier characters into an underscore, again to tolerate JSON that wasn't designed to be readable by Avro.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.