You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2012/01/20 20:47:13 UTC
preserving comments when parsing xml uima descriptors support checked
in
Here's a brief description of this.
The XML parser methods already take a parameter, an instance of ParsingOptions.
This was augmented to have one additional boolean - preserveComments (defaults
to false).
If not set, then the parser works as before. No lexical handler is installed,
so it should operate as fast as before. There *is* one extra slot in the Java
object representation corresponding to some of the elements in the XML (not all
elements have their own Java object class); this slot is set to null in this case.
When preserveComments is true, the slot is set to be a reference to the DOM
Element node object corresponding to that object. This results in the "DOM"
that previously was a *temporary* object, being retained while the Java objects
corresponding to it are retained. This will increase the "footprint" for a
parsed UIMA Descriptor, of course.
The *toXML* method was modified to check this slot, and if it is not null, the
DOM around the vicinity of the element is scanned for comment and whitespace
nodes, and the appropriate ones are used. An attempt is made to be
heuristically close to the original - in the presence of some editing (adding /
deleting nodes). See the bottom of MetaDataObject_impl class for some details
of this.
The Component Descriptor Editor is modified to preserve comments (only for those
XML pieces which it is editing and might be writing out).
So, the good news is, if you edit a descriptor with the CDE and it has an Apache
license header at the top, it will no longer be deleted... :-)
All the test cases pass, and I did some amount of manual editing / testing; more
testing welcome.
-Marshall