You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Bram Biesbrouck (JIRA)" <ji...@apache.org> on 2016/01/18 22:03:39 UTC

[jira] [Commented] (AVRO-457) add tools that read/write xml records from/to avro data files

    [ https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105832#comment-15105832 ] 

Bram Biesbrouck commented on AVRO-457:
--------------------------------------

Please allow me to comment on this after having used Michael's project (from https://github.com/mikepigott/xml-to-avro) on the official (and fairly complex) ebucore.xsd schema version 1.6 (see https://tech.ebu.ch/MetadataEbuCore and https://www.ebu.ch/metadata/schemas/EBUCore/ebucore.zip)

To me, from a developer point of view, the need for the tool Michael has written is very high; nearly all official ontologies release their versions using XML schema (XSD) files. Just like the XJC (and by extent the JAXB) project, it's important to have de-facto standard projects to convert them to working memory models. Having a reliable XSD->AVSC converter would be awesome.

I've played around with Michael's code and got it to successfully generate an avro schema from the ebucore.xsd file. However, I had to make a lot of modifications to the original file because not all standards are implemented in xml-to-avro (for one, elements with default, empty types crash the converter).

After having tried four solutions:
1) https://github.com/stealthly/xml-avro
2) https://github.com/mikepigott/xml-to-avro
3) https://github.com/nokia/Avro-Schema-Generator
4) https://github.com/FasterXML/jackson-dataformat-avro

I conclude that solution 1 is the best for now, because it works out of the box without modifications and generates a more type-safe schema (than Michael's converter), although for complex schemas like ebucore, double types are introduced (eg; Double1, Double2, ...).

All this to make a point: I, together with a lot of other developers, truly see the need for an official XSD->AVSC converter, so please consider it. I can help with testing, but I'm no XSD expert. 
You might want to contact to folks at https://github.com/stealthly/xml-avro

bram

> add tools that read/write xml records from/to avro data files
> -------------------------------------------------------------
>
>                 Key: AVRO-457
>                 URL: https://issues.apache.org/jira/browse/AVRO-457
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.8
>            Reporter: Doug Cutting
>              Labels: gsoc
>         Attachments: AVRO-457.patch, AVRO-457.patch, AVRO-457.patch, AVRO-457.patch
>
>
> It might be useful to have command-line tools that can read & write arbitrary XML data from & to Avro data files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)