You are viewing a plain text version of this content. The canonical link for it is here.
Posted to odf-dev@incubator.apache.org by "Svante Schubert (JIRA)" <ji...@apache.org> on 2017/09/05 19:58:00 UTC
[jira] [Commented] (ODFTOOLKIT-458) Map the ODF XML RelaxNG schema into a GraphDB for Analysis

    [ https://issues.apache.org/jira/browse/ODFTOOLKIT-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154196#comment-16154196 ] 

Svante Schubert commented on ODFTOOLKIT-458:
--------------------------------------------

The first part - the creation of a Graph database filled with the ODF RelaxNG grammar - is working in general and ready for review:

a) The ANTL4 grammar to parse the MSV memory dump text file can be found here:
https://github.com/svanteschubert/gumtree/tree/msv/gen.antlr4-msv/src/main/antlr/com/github/gumtreediff/gen/antlr4/msv

b) The Java class to trigger the grammar and create two property files with data for verteces and edges can be found here:
https://github.com/svanteschubert/gumtree/blob/msv/gen.antlr4-msv/src/main/java/com/github/gumtreediff/gen/antlr4/msv/MsvFileMapper.java

c) The Java class to read the two property files from the previous step to fill a Graph database and export the new graph into some graphXML file to be loaded into an empty graph database, can be found here: https://github.com/svanteschubert/gumtree/blob/msv/gen.antlr4-msv/src/main/java/com/github/gumtreediff/gen/antlr4/msv/MsvTreeGenerator.java

I will attach the two property files and the graphXML file to this issue.

NOTE: 
Currently there is some issue with building ANTLR 4 using package names and split lexer and parser on Unix/Linux. I could only made it run with Windows.

> Map the ODF XML RelaxNG schema into a GraphDB for Analysis
> ----------------------------------------------------------
>
>                 Key: ODFTOOLKIT-458
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-458
>             Project: ODF Toolkit
>          Issue Type: Wish
>            Reporter: Svante Schubert
>            Assignee: Svante Schubert
>
> *PROBLEM*
> The ODF XML (RelaxNG) schema is too big to easily read or be analysed by humans.
> In version ODF 1.2 it has 598 elements and 1300 attributes.
> *SOLUTION*
> Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB (for instance Neo4J) and do some basic analysis (sanity checks) on it.
> For instance, I am curious on query questions as:
> a) is a certain ODF element able to become nested (e.g. <text: p>)
> b) is every ODF element with an ID allowed to exist more than once  (this issue occurred)
> c) what is the minimum mandatory ODF XML document 
> etc.
> These queries could help a lot to understand and test the XML schema.
> Certainly, I would love to have afterwards more tooling.
> For instance, to be able to add metadata to the nodes to categorise nodes (which are meant for metadata, styles, text container, which are just plain boilerplate (e.g. office:body)
> The idea is to improve the generation of ODFDOM source code to allow easier maintainability.
> *DESIGN IDEA*
> Instead of reading plain RelaxNG, I thought it might be a better idea to read already a 'normalised' document the dumped internal model from MSV. You may find the dump for each ODF version as test references from 
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/test/resources/examples/odf
> e.g. http://svn.apache.org/viewvc/incubator/odf/trunk/generator/schema2template/src/test/resources/examples/odf/odf12-msvtree.ref?revision=1167972&view=co 
> NOTE: 
> You may find more about the information on the dump and the MSV model in:
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/schema2template/example/odf/OdfHelper.java
> and
> <ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
> https://incubator.apache.org/odftoolkit/0.6.2-incubating/schema2template/
> I would love to have a discussion on further thoughts of yours on the list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)