You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Ron Bodkin (JIRA)" <ji...@apache.org> on 2010/10/06 01:35:32 UTC

[jira] Commented: (AVRO-672) Convert JSON Text Input to Avro Tool

    [ https://issues.apache.org/jira/browse/AVRO-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918322#action_12918322 ] 

Ron Bodkin commented on AVRO-672:
---------------------------------

The use case I'm most interested in supporting is converting from JSON data to a previously-defined Avro schema, either in a batch file conversion, or in memory (for use with map-reduce). 

This newer patch emits the output in a standard, different schema and conversion to a previously-defined (custom) schema seems to be a problem that would require code like I wrote in my patch. Also, it'd be nice to be able to read in a value like "1" even to a double or a long field, even though it'd be parsed as a JSON integer node.

Also I have found it valuable to have transformation of names that have invalid characters since there's lots of valid JSON with identifiers that don't conform to the Avro identifier grammar. That would be pretty easy to put in this patch (although the regexp I used before was way too slow so I have a newer version that's efficient).

To allow reading in JSON text and creating objects in memory that conform to that schema, I think it'd be necessary to have hints for the type of data that arrays contain (e.g., in generated code or in runtime annotations if using a reflective style). That is something that I already ran into in trying to get the reflection reader to work with specific data (on AVRO-669).


> Convert JSON Text Input to Avro Tool
> ------------------------------------
>
>                 Key: AVRO-672
>                 URL: https://issues.apache.org/jira/browse/AVRO-672
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Ron Bodkin
>         Attachments: AVRO-672.patch, AVRO-672.patch
>
>
> The attached patch allows reading a JSON-formatted text file in, converting to a conforming Avro text file, emitting one record per line, e.g., it can read this input file:
> {"intval":12}
> {"intval":-73,"strval":"hello, there!!"}
> with this schema:
> { "type":"record", "name":"TestRecord", "fields": [ {"name":"intval","type":"int"}, {"name":"strval","type":["string", "null"]}]}
> returning valid Avro. This is different than the DataFileWriteTool, which would read in the following internal encoding:
> {"intval":12,"strval":null}
> {"intval":-73,"strval":{"string":"hello, there!!"}}
> In general, the internal encodings used by Avro aren't natural when reading in JSON text that appears in the wild. Likewise, this utility allows changing invalid Avro identifier characters into an underscore, again to tolerate JSON that wasn't designed to be readable by Avro.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.