You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Arnold, Curt" <Cu...@hyprotech.com> on 2001/04/20 21:29:39 UTC

Andy Heninger's new DOM [may be of interest to you deleting char* guys]

I've been laying the groundwork for taking a shot at completing
Andy Heninger's "new" DOM for Xerces-C++ and supporting
DOM 2 events.  Since there has been
a lot of discussion about the DOM recently, I thought that I would
let you know my intentions and current thoughts.  I need this for
my own internal development needs, so I'm going to do this anyway
and will submit it for consideration when actually fleshed out.
There is no certainty that what I will be doing will ever make it
within Xerces-C.

The first major need was a testing framework for the DOM that I 
could use to check the Xerces-C implementation against Xerces-J.  
NIST developed a Java DOM test suite and is cooperating with the
W3C to develop a NIST/W3C DOM test suite.  To get something that
I could use for this effort, I reimplemented the NIST test suite
using JUnit and have subsequently ported the tests to JSUnit and 
am in the process of getting the CppUnit variant running.  The
code for this is in the "domunit" module of the 
http://xmlconf.sourceforge.net project.  I've invited the W3C
DOM testing working group to play in my sandbox, but if they don't
I'll try to shadow their tests in "domunit".  I'll also be adding
DOM event tests in an "unofficial" package as needed.  If
you are interested in more information, check the recent postings
on the www-dom-ts@w3.org mailing list (http://lists.w3.org/Archives/Public/www-dom-ts)

Andy's DOM does a couple of very good things that I want to preserve,
but there are also a few things that I'll need to change.

First, it is a very good thing that tag and attribute names are pooled
and that all memory allocated when a document is loaded is allocated
using a unsynchronized allocator that doesn't deallocate individual
memory blocks but releases the whole chunk when the document goes
out of scope.

However, it isn't acceptible (to me at least) that memory use would 
grow if you made repeated modifications to the DOM.  So basically, 
that requires a distinction between memory allocated during document
load and memory allocated during DOM manipulation and in my mind requires
something just a little smarter than a "const XMLCh*" as the return value
from most of the DOM methods.  However, I think you can be a 
little bit smarter without compromising the performance that 
Andy was getting and anything that I do will be benchmarked and 
profiled mercilessly against Andy's implementation.

In addition, I'd like to support optionally using UTF-8 or another 
"compressed" internal representation to minimize the memory footprint.
Since there would not necessarily be a XMLCh* buffer that could be 
directly returned and I definitely don't want to allocate one on demand,
returning a "const XMLCh*" is not workable.

Basically, this requires a "new" DOMString, but it would not have 
many similarities to the existing DOMString.  The "new" DOMString would be:

1. Immutable
2. Would expose a subset of the const methods of std::basic_string
3. Would be assignable to std::basic_string<char>, std::basic_string<wchar_t>
and std::basic_string<XMLCh*>.  

I think the last can be done with an template overloading of
operator= that would get the length of the DOMString, reserve the
appropriate size for the basic_string and then iterate through the
DOMString's content.  The template for <char> and <wchar_t> would
use a transcoder that you registered with the DOM. 

If you wanted to pass a raw wchar_t* or raw char*, you would have to
construct the appropriate basic_string and then access its raw buffer.

In addition, I would like to preserve use compatibility with
the Java DOM, so that Java client code could be used with Xerces-C
with appropriate typedef-ing.

I'd also like to get some of the inherent capabilities of Object
in Java that haven't been replicated in Xerces-C.  Most importantly,
the need to be able to synchronize on a node, but also toString()
and possibly some others.

How successful I'll be with all of that, I don't yet know.  But I
thought that I would at least let the list know my intentions.
I'm not particularly interested in discussing particulars
(such as not using const XMLCh*) until I get some code working
so that we don't debate in a vacuum.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org