You are viewing a plain text version of this content. The canonical link for it is here.

Posted to odf-dev@incubator.apache.org by "Svante Schubert (JIRA)" <ji...@apache.org> on 2017/06/20 12:27:00 UTC

[jira] [Updated] (ODFTOOLKIT-458) Map the ODF XML RelaxNG schema into a GraphDB for Analysis

     [ https://issues.apache.org/jira/browse/ODFTOOLKIT-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Svante Schubert updated ODFTOOLKIT-458:
---------------------------------------
    Description: 
*PROBLEM*
The ODF XML (RelaxNG) schema is too big to easily read or be analysed by humans.
In version ODF 1.2 it has 598 elements and 1300 attributes.


*SOLUTION*
Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB (for instance Neo4J) and do some basic analysis (sanity checks) on it.
For instance, I am curious on query questions as:
a) is a certain ODF element able to become nested (e.g. <text: p>)
b) is every ODF element with an ID allowed to exist more than once  (this issue occurred)
c) what is the minimum mandatory ODF XML document 
etc.

These queries could help a lot to understand and test the XML schema.

Certainly, I would love to have afterwards more tooling.
For instance, to be able to add metadata to the nodes to categorise nodes (which are meant for metadata, styles, text container, which are just plain boilerplate (e.g. office:body)

The idea is to improve the generation of ODFDOM source code to allow easier maintainability.


*DESIGN IDEA*
Instead of reading plain RelaxNG, I thought it might be a better idea to read already a 'normalised' document the dumped internal model from MSV. You may find the dump for each ODF version as test references from 
<ODFTOOLKIT_ROOT>/generator/schema2template/src/test/resources/examples/odf
e.g. http://svn.apache.org/viewvc/incubator/odf/trunk/generator/schema2template/src/test/resources/examples/odf/odf12-msvtree.ref?revision=1167972&view=co 

NOTE: 
You may find more about the information on the dump and the MSV model in:
<ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/schema2template/example/odf/OdfHelper.java
and
<ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
https://incubator.apache.org/odftoolkit/0.6.2-incubating/schema2template/

I would love to have a discussion on further thoughts of yours on the list.


  was:
*PROBLEM*
The ODF XML (RelaxNG) schema is too big to easily read or be analysed by humans.
In version ODF 1.2 it has 598 elements and 1300 attributes.


*SOLUTION*
Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB (for instance Neo4J) and do some basic analysis (sanity checks) on it.
For instance, I am curious on query questions as:
a) is a certain ODF element able to become nested (e.g. <text:p>)
b) is every ODF element with an ID allowed to exist more than once  (this issue occurred)
c) what is the minimum mandatory ODF XML document 
etc.

These queries could help a lot to understand and test the XML schema.

Certainly, I would love to have afterwards more tooling.
For instance, to be able to add metadata to the nodes to categorise nodes (which are meant for metadata, styles, text container, which are just plain boilerplate (e.g. office:body)

The idea is to improve the generation of ODFDOM source code to allow easier maintainability.


*DESIGN IDEA*
Instead of reading plain RelaxNG, I thought it might be a better idea to read already a 'normalised' document the dumped internal model from MSV. You may find the dump for each ODF version as test references from 
<ODFTOOLKIT_ROOT>/generator/schema2template/src/test/resources/examples/odf
e.g. http://svn.apache.org/viewvc/incubator/odf/trunk/generator/schema2template/src/test/resources/examples/odf/odf12-msvtree.ref?revision=1167972&view=co 

NOTE: 
You may find more about the information on the dump and the MSV model in:
<ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/schema2template/example/odf/OdfHelper.java
and
<ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
https://incubator.apache.org/odftoolkit/0.6.2-incubating/schema2template/

I would love to have a discussion on further thoughts of yours on the list.



> Map the ODF XML RelaxNG schema into a GraphDB for Analysis
> ----------------------------------------------------------
>
>                 Key: ODFTOOLKIT-458
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-458
>             Project: ODF Toolkit
>          Issue Type: Wish
>            Reporter: Svante Schubert
>            Assignee: Svante Schubert
>
> *PROBLEM*
> The ODF XML (RelaxNG) schema is too big to easily read or be analysed by humans.
> In version ODF 1.2 it has 598 elements and 1300 attributes.
> *SOLUTION*
> Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB (for instance Neo4J) and do some basic analysis (sanity checks) on it.
> For instance, I am curious on query questions as:
> a) is a certain ODF element able to become nested (e.g. <text: p>)
> b) is every ODF element with an ID allowed to exist more than once  (this issue occurred)
> c) what is the minimum mandatory ODF XML document 
> etc.
> These queries could help a lot to understand and test the XML schema.
> Certainly, I would love to have afterwards more tooling.
> For instance, to be able to add metadata to the nodes to categorise nodes (which are meant for metadata, styles, text container, which are just plain boilerplate (e.g. office:body)
> The idea is to improve the generation of ODFDOM source code to allow easier maintainability.
> *DESIGN IDEA*
> Instead of reading plain RelaxNG, I thought it might be a better idea to read already a 'normalised' document the dumped internal model from MSV. You may find the dump for each ODF version as test references from 
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/test/resources/examples/odf
> e.g. http://svn.apache.org/viewvc/incubator/odf/trunk/generator/schema2template/src/test/resources/examples/odf/odf12-msvtree.ref?revision=1167972&view=co 
> NOTE: 
> You may find more about the information on the dump and the MSV model in:
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/schema2template/example/odf/OdfHelper.java
> and
> <ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
> https://incubator.apache.org/odftoolkit/0.6.2-incubating/schema2template/
> I would love to have a discussion on further thoughts of yours on the list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: [jira] [Updated] (ODFTOOLKIT-458) Map the ODF XML RelaxNG schema into a GraphDB for Analysis

Posted by Svante Schubert <sv...@gmail.com>.

Hi Ian,

Thanks for your pointer. I will dive into your ODF Explorer references in
the next days. This Graph analysis of ODF XML schema is more a fun project
of mine and is therefore moved to the end of working day. ;)

Currently, I am quite impressed with Neo4j browser front-end (you will see
it with tutorials when you installed and started an instance).

In addition, for mapping the "RelaxNG memory dump" to "GraphTree creation
operations", I need to parse the memory dump. Instead of writing a parser
manually, it seems appropriate to generate it, for instance using ANTLR 4,
which I am currently started to test by walking through examples and the
PDF book that was written about.

As I said a fun project of mine, so no hurry. But it brings me to new
fields of software tooling, I have not used before. Happy if you join in.

Cheers,
Svante


ᐧ

2017-06-21 5:13 GMT+02:00 Ian C <ia...@amham.net>:

> Hi Svante,
>
> as you know sometime back I created something similar with my ODF Explorer.
> It's goals where a little more than just straight visualisation.
>
> You get to see things like
>
> http://hammyau.github.io/ODFExplorer/XPathGraphSingle.html
>
> The edit and see the differences..
>
> http://hammyau.github.io/ODFExplorer/XPathGraphCompare.html
>
>
> I don't know GraphDB but I am happy to donate the code for to the project
> and also to try to make it use GraphDB instead of GraphViz.
>
> Cheers,
>
> Ian
>
> On Tue, Jun 20, 2017 at 8:27 PM, Svante Schubert (JIRA) <ji...@apache.org>
> wrote:
>
> >
> >      [ https://issues.apache.org/jira/browse/ODFTOOLKIT-458?
> > page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> >
> > Svante Schubert updated ODFTOOLKIT-458:
> > ---------------------------------------
> >     Description:
> > *PROBLEM*
> > The ODF XML (RelaxNG) schema is too big to easily read or be analysed by
> > humans.
> > In version ODF 1.2 it has 598 elements and 1300 attributes.
> >
> >
> > *SOLUTION*
> > Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB
> > (for instance Neo4J) and do some basic analysis (sanity checks) on it.
> > For instance, I am curious on query questions as:
> > a) is a certain ODF element able to become nested (e.g. <text: p>)
> > b) is every ODF element with an ID allowed to exist more than once  (this
> > issue occurred)
> > c) what is the minimum mandatory ODF XML document
> > etc.
> >
> > These queries could help a lot to understand and test the XML schema.
> >
> > Certainly, I would love to have afterwards more tooling.
> > For instance, to be able to add metadata to the nodes to categorise nodes
> > (which are meant for metadata, styles, text container, which are just
> plain
> > boilerplate (e.g. office:body)
> >
> > The idea is to improve the generation of ODFDOM source code to allow
> > easier maintainability.
> >
> >
> > *DESIGN IDEA*
> > Instead of reading plain RelaxNG, I thought it might be a better idea to
> > read already a 'normalised' document the dumped internal model from MSV.
> > You may find the dump for each ODF version as test references from
> > <ODFTOOLKIT_ROOT>/generator/schema2template/src/test/
> > resources/examples/odf
> > e.g. http://svn.apache.org/viewvc/incubator/odf/trunk/generator/
> > schema2template/src/test/resources/examples/odf/odf12-
> > msvtree.ref?revision=1167972&view=co
> >
> > NOTE:
> > You may find more about the information on the dump and the MSV model in:
> > <ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/
> > schema2template/example/odf/OdfHelper.java
> > and
> > <ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
> > https://incubator.apache.org/odftoolkit/0.6.2-incubating/
> schema2template/
> >
> > I would love to have a discussion on further thoughts of yours on the
> list.
> >
> >
> >   was:
> > *PROBLEM*
> > The ODF XML (RelaxNG) schema is too big to easily read or be analysed by
> > humans.
> > In version ODF 1.2 it has 598 elements and 1300 attributes.
> >
> >
> > *SOLUTION*
> > Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB
> > (for instance Neo4J) and do some basic analysis (sanity checks) on it.
> > For instance, I am curious on query questions as:
> > a) is a certain ODF element able to become nested (e.g. <text:p>)
> > b) is every ODF element with an ID allowed to exist more than once  (this
> > issue occurred)
> > c) what is the minimum mandatory ODF XML document
> > etc.
> >
> > These queries could help a lot to understand and test the XML schema.
> >
> > Certainly, I would love to have afterwards more tooling.
> > For instance, to be able to add metadata to the nodes to categorise nodes
> > (which are meant for metadata, styles, text container, which are just
> plain
> > boilerplate (e.g. office:body)
> >
> > The idea is to improve the generation of ODFDOM source code to allow
> > easier maintainability.
> >
> >
> > *DESIGN IDEA*
> > Instead of reading plain RelaxNG, I thought it might be a better idea to
> > read already a 'normalised' document the dumped internal model from MSV.
> > You may find the dump for each ODF version as test references from
> > <ODFTOOLKIT_ROOT>/generator/schema2template/src/test/
> > resources/examples/odf
> > e.g. http://svn.apache.org/viewvc/incubator/odf/trunk/generator/
> > schema2template/src/test/resources/examples/odf/odf12-
> > msvtree.ref?revision=1167972&view=co
> >
> > NOTE:
> > You may find more about the information on the dump and the MSV model in:
> > <ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/
> > schema2template/example/odf/OdfHelper.java
> > and
> > <ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
> > https://incubator.apache.org/odftoolkit/0.6.2-incubating/
> schema2template/
> >
> > I would love to have a discussion on further thoughts of yours on the
> list.
> >
> >
> >
> > > Map the ODF XML RelaxNG schema into a GraphDB for Analysis
> > > ----------------------------------------------------------
> > >
> > >                 Key: ODFTOOLKIT-458
> > >                 URL: https://issues.apache.org/
> > jira/browse/ODFTOOLKIT-458
> > >             Project: ODF Toolkit
> > >          Issue Type: Wish
> > >            Reporter: Svante Schubert
> > >            Assignee: Svante Schubert
> > >
> > > *PROBLEM*
> > > The ODF XML (RelaxNG) schema is too big to easily read or be analysed
> by
> > humans.
> > > In version ODF 1.2 it has 598 elements and 1300 attributes.
> > > *SOLUTION*
> > > Therefore I would love to load the ODF XML RelaxNG schema into a
> GraphDB
> > (for instance Neo4J) and do some basic analysis (sanity checks) on it.
> > > For instance, I am curious on query questions as:
> > > a) is a certain ODF element able to become nested (e.g. <text: p>)
> > > b) is every ODF element with an ID allowed to exist more than once
> > (this issue occurred)
> > > c) what is the minimum mandatory ODF XML document
> > > etc.
> > > These queries could help a lot to understand and test the XML schema.
> > > Certainly, I would love to have afterwards more tooling.
> > > For instance, to be able to add metadata to the nodes to categorise
> > nodes (which are meant for metadata, styles, text container, which are
> just
> > plain boilerplate (e.g. office:body)
> > > The idea is to improve the generation of ODFDOM source code to allow
> > easier maintainability.
> > > *DESIGN IDEA*
> > > Instead of reading plain RelaxNG, I thought it might be a better idea
> to
> > read already a 'normalised' document the dumped internal model from MSV.
> > You may find the dump for each ODF version as test references from
> > > <ODFTOOLKIT_ROOT>/generator/schema2template/src/test/
> > resources/examples/odf
> > > e.g. http://svn.apache.org/viewvc/incubator/odf/trunk/generator/
> > schema2template/src/test/resources/examples/odf/odf12-
> > msvtree.ref?revision=1167972&view=co
> > > NOTE:
> > > You may find more about the information on the dump and the MSV model
> in:
> > > <ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/
> > schema2template/example/odf/OdfHelper.java
> > > and
> > > <ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
> > > https://incubator.apache.org/odftoolkit/0.6.2-incubating/
> > schema2template/
> > > I would love to have a discussion on further thoughts of yours on the
> > list.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.4.14#64029)
> >
>

Re: [jira] [Updated] (ODFTOOLKIT-458) Map the ODF XML RelaxNG schema into a GraphDB for Analysis

Posted by Ian C <ia...@amham.net>.

Hi Svante,

as you know sometime back I created something similar with my ODF Explorer.
It's goals where a little more than just straight visualisation.

You get to see things like

http://hammyau.github.io/ODFExplorer/XPathGraphSingle.html

The edit and see the differences..

http://hammyau.github.io/ODFExplorer/XPathGraphCompare.html


I don't know GraphDB but I am happy to donate the code for to the project
and also to try to make it use GraphDB instead of GraphViz.

Cheers,

Ian

On Tue, Jun 20, 2017 at 8:27 PM, Svante Schubert (JIRA) <ji...@apache.org>
wrote:

>
>      [ https://issues.apache.org/jira/browse/ODFTOOLKIT-458?
> page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Svante Schubert updated ODFTOOLKIT-458:
> ---------------------------------------
>     Description:
> *PROBLEM*
> The ODF XML (RelaxNG) schema is too big to easily read or be analysed by
> humans.
> In version ODF 1.2 it has 598 elements and 1300 attributes.
>
>
> *SOLUTION*
> Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB
> (for instance Neo4J) and do some basic analysis (sanity checks) on it.
> For instance, I am curious on query questions as:
> a) is a certain ODF element able to become nested (e.g. <text: p>)
> b) is every ODF element with an ID allowed to exist more than once  (this
> issue occurred)
> c) what is the minimum mandatory ODF XML document
> etc.
>
> These queries could help a lot to understand and test the XML schema.
>
> Certainly, I would love to have afterwards more tooling.
> For instance, to be able to add metadata to the nodes to categorise nodes
> (which are meant for metadata, styles, text container, which are just plain
> boilerplate (e.g. office:body)
>
> The idea is to improve the generation of ODFDOM source code to allow
> easier maintainability.
>
>
> *DESIGN IDEA*
> Instead of reading plain RelaxNG, I thought it might be a better idea to
> read already a 'normalised' document the dumped internal model from MSV.
> You may find the dump for each ODF version as test references from
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/test/
> resources/examples/odf
> e.g. http://svn.apache.org/viewvc/incubator/odf/trunk/generator/
> schema2template/src/test/resources/examples/odf/odf12-
> msvtree.ref?revision=1167972&view=co
>
> NOTE:
> You may find more about the information on the dump and the MSV model in:
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/
> schema2template/example/odf/OdfHelper.java
> and
> <ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
> https://incubator.apache.org/odftoolkit/0.6.2-incubating/schema2template/
>
> I would love to have a discussion on further thoughts of yours on the list.
>
>
>   was:
> *PROBLEM*
> The ODF XML (RelaxNG) schema is too big to easily read or be analysed by
> humans.
> In version ODF 1.2 it has 598 elements and 1300 attributes.
>
>
> *SOLUTION*
> Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB
> (for instance Neo4J) and do some basic analysis (sanity checks) on it.
> For instance, I am curious on query questions as:
> a) is a certain ODF element able to become nested (e.g. <text:p>)
> b) is every ODF element with an ID allowed to exist more than once  (this
> issue occurred)
> c) what is the minimum mandatory ODF XML document
> etc.
>
> These queries could help a lot to understand and test the XML schema.
>
> Certainly, I would love to have afterwards more tooling.
> For instance, to be able to add metadata to the nodes to categorise nodes
> (which are meant for metadata, styles, text container, which are just plain
> boilerplate (e.g. office:body)
>
> The idea is to improve the generation of ODFDOM source code to allow
> easier maintainability.
>
>
> *DESIGN IDEA*
> Instead of reading plain RelaxNG, I thought it might be a better idea to
> read already a 'normalised' document the dumped internal model from MSV.
> You may find the dump for each ODF version as test references from
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/test/
> resources/examples/odf
> e.g. http://svn.apache.org/viewvc/incubator/odf/trunk/generator/
> schema2template/src/test/resources/examples/odf/odf12-
> msvtree.ref?revision=1167972&view=co
>
> NOTE:
> You may find more about the information on the dump and the MSV model in:
> <ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/
> schema2template/example/odf/OdfHelper.java
> and
> <ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
> https://incubator.apache.org/odftoolkit/0.6.2-incubating/schema2template/
>
> I would love to have a discussion on further thoughts of yours on the list.
>
>
>
> > Map the ODF XML RelaxNG schema into a GraphDB for Analysis
> > ----------------------------------------------------------
> >
> >                 Key: ODFTOOLKIT-458
> >                 URL: https://issues.apache.org/
> jira/browse/ODFTOOLKIT-458
> >             Project: ODF Toolkit
> >          Issue Type: Wish
> >            Reporter: Svante Schubert
> >            Assignee: Svante Schubert
> >
> > *PROBLEM*
> > The ODF XML (RelaxNG) schema is too big to easily read or be analysed by
> humans.
> > In version ODF 1.2 it has 598 elements and 1300 attributes.
> > *SOLUTION*
> > Therefore I would love to load the ODF XML RelaxNG schema into a GraphDB
> (for instance Neo4J) and do some basic analysis (sanity checks) on it.
> > For instance, I am curious on query questions as:
> > a) is a certain ODF element able to become nested (e.g. <text: p>)
> > b) is every ODF element with an ID allowed to exist more than once
> (this issue occurred)
> > c) what is the minimum mandatory ODF XML document
> > etc.
> > These queries could help a lot to understand and test the XML schema.
> > Certainly, I would love to have afterwards more tooling.
> > For instance, to be able to add metadata to the nodes to categorise
> nodes (which are meant for metadata, styles, text container, which are just
> plain boilerplate (e.g. office:body)
> > The idea is to improve the generation of ODFDOM source code to allow
> easier maintainability.
> > *DESIGN IDEA*
> > Instead of reading plain RelaxNG, I thought it might be a better idea to
> read already a 'normalised' document the dumped internal model from MSV.
> You may find the dump for each ODF version as test references from
> > <ODFTOOLKIT_ROOT>/generator/schema2template/src/test/
> resources/examples/odf
> > e.g. http://svn.apache.org/viewvc/incubator/odf/trunk/generator/
> schema2template/src/test/resources/examples/odf/odf12-
> msvtree.ref?revision=1167972&view=co
> > NOTE:
> > You may find more about the information on the dump and the MSV model in:
> > <ODFTOOLKIT_ROOT>/generator/schema2template/src/main/java/
> schema2template/example/odf/OdfHelper.java
> > and
> > <ODFTOOLKIT_ROOT>/generator/schema2template/target/apidocs/index.html
> > https://incubator.apache.org/odftoolkit/0.6.2-incubating/
> schema2template/
> > I would love to have a discussion on further thoughts of yours on the
> list.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.4.14#64029)
>