You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2012/01/20 20:47:13 UTC

preserving comments when parsing xml uima descriptors support checked in

Here's a brief description of this.

The XML parser methods already take a parameter, an instance of ParsingOptions.  
This was augmented to have one additional boolean - preserveComments (defaults 
to false).

If not set, then the parser works as before.  No lexical handler is installed, 
so it should operate as fast as before.  There *is* one extra slot in the Java 
object representation corresponding to some of the elements in the XML (not all 
elements have their own Java object class); this slot is set to null in this case.

When preserveComments is true, the slot is set to be a reference to the DOM 
Element node object corresponding to that object.  This results in the "DOM" 
that previously was a *temporary* object, being retained while the Java objects 
corresponding to it are retained.  This will increase the "footprint" for a 
parsed UIMA Descriptor, of course.

The *toXML* method was modified to check this slot, and if it is not null, the 
DOM around the vicinity of the element is scanned for comment and whitespace 
nodes, and the appropriate ones are used.  An attempt is made to be 
heuristically close to the original - in the presence of some editing (adding / 
deleting nodes).  See the bottom of MetaDataObject_impl class for some details 
of this.

The Component Descriptor Editor is modified to preserve comments (only for those 
XML pieces which it is editing and might be writing out).

So, the good news is, if you edit a descriptor with the CDE and it has an Apache 
license header at the top, it will no longer be deleted... :-)

All the test cases pass, and I did some amount of manual editing / testing; more 
testing welcome.

-Marshall