You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Tinny Ng <tn...@ca.ibm.com> on 2002/04/29 17:08:14 UTC

Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Hi everyone,

I've reviewed Andy's design objective of IDOM, Lenny's view of old DOM and his proposal of redesign, and some users feedback.   Here is a "quick" summary and I would like to call for a VOTE about the fate of these two interfaces.

1.0 Objective
==========
1.  Define the strategy of Xerces-C++ public DOM interface.  Decide which one to keep, old DOM interface or new IDOM interface


2.0 Motivation
===========
1. As a long term strategy, Xerces-C++ shouldn't define two W3C DOM interfaces which simply confuses users.   
    => We've already got many users' questions about what the difference, which one to use ... etc.
2. With limited resource, we should focus our development on ONE stream, no more duplicate effort
    => New DOM Level 3 development should be done on one interface, not both.
    => No more dual maintenance: two set of samples (e.g. DOMPrint vs IDOMPrint), two parsers (DOMParser vs IDOMParser)
3. To better place Apache Xerces-C++ in the market, we should have our Apache Recommended DOM C++ Binding in http://www.w3.org/DOM/Bindings
    => To encourage more users to develop DOM application AND implementation based on this binding.
    => Such binding should just define a set of abstract base classes (similar to JAVA interface) where no implementation model is assumed


3.0 History
=========
'DOM' was the initial "W3C DOM interface" developed by Xerces-C++.  However the performance of its implementation is not quite satisfactory.

Last year, Andy Heninger came up with a new design with faster performance, and such implementation came with a new set of interface => 'IDOM'.

Currently both 'DOM' and 'IDOM' are shipped with Xerces-C++.  'IDOM' is claimed as experimental (like a prototype) and is subject to change.

More information can be found in :
http://xml.apache.org/xerces-c/program.html
http://www.apache.org/~andyh/
http://marc.theaimsgroup.com/?t=101650188300002&r=1&w=2
http://marc.theaimsgroup.com/?w=2&r=1&s=Proposal%3A+C%2B%2B+Language+Binding+for+DOM+L&q=t



4.0 IDOM
=========
4.1 Interface
==========

4.1.1 Features of IDOM Interface
--------------------------------------------------
e.g. virtual IDOM_Element* IDOM_Document::createElement(const XMLCh* tagName) = 0;

1. Define as abstract base classes 
2. Use normal C++ pointers.
    => So that abstract base class is possible.
    => Make it more C++ like. Less Java like.


4.1.2 Pros and Cons of IDOM Interface
----------------------------------------------------------
Pros:
1. Abstract base classes that correspond to the W3C DOM interfaces
    => Can be recommended as Apache DOM C++ Binding
    => More standard like, no implementation assumed as they are just abstract interfaces using pure virtual functions
2. (Depends on users' preference)
    - someone prefers C++ like style

Cons:
1. IDOM_XXX - weird prefix 'I'
    Solution:
        - Proposed to rename to DOMXXXX which also matches the DOM Level 3 naming convention
2. (Depends on users' preference)
    - someone does not like pointers, and wants Java-like interface for ease to use, ease to learn and ease to port (from Java).
3. As the old DOM interface has been around for a long time, majority of current Xerces-C++ still uses the old DOM interface, significant migration impact
    Solution:
        - Announce the deprecation of old DOM interface for a couple of releases before removal
    
4.2 Implementation
===============
4.2.1 Features of IDOM Implementation
-----------------------------------------------------------
1. Use an independent storage allocator per document. The advantage here is that allocation would require no synchronization 
    => Fast, good scalability, reduced memory footprint
2. Use plain, null-terminated (XMLCh *) utf-16 strings. 
    => No DOMString class overhead which is another performance contributor that makes IDOM faster


4.2.2 Downside of IDOM Implementation
-------------------------------------------------------------
1. Manual memory management 
    - If document comes from parser, then parser owns the document.  If document comes from DOMImplementation, then users are responsible to delete it.
    Solution:
        - Provide a means of disassociating a document from the parser
        - Add a function "Node::release()", similar to the idea of "Range::detach", which allows users to indicate the release of the Node.  
            - From C++ Binding abstract interface perspective, it's up to implementation how to handle this "release()" function.
            - With Xerces-C++ IDOM implementation, the release() function will delete the 'this' pointer if it is a document, else no-op.
2. Memory retained until the document is deleted.
    - If you change the value of an attribute or call removeNode many times,  the memory of the old value is not deallocated for reuse and the document grows and grows
    Solution:
        - This in fact is a tradeoff for the fast performance offered by independent storage allocator.  
        - There is no immediate good solution in place


5.0 old DOM
==========
5.1 Interface
========== 

5.1.1 Features of old DOM Interface
-----------------------------------------------------
e.g. DOM_Element DOM_Document::createElement(const DOMString tagName);

1. Use smart pointers - Java-like


5.1.2 Pros and Cons of old DOM Interface
--------------------------------------------------------------
Pros:
1. DOM_XXX - reasonable name
2. (Depends on users' preference)
    - someone wants Java-like interface for ease to use, ease to learn and ease to port (from Java).
3. Not that many users have migrated to IDOM yet, so migration impact is minimal.

Cons:
1. Not abstract base class
    - Cannot be recommended as Apache DOM C++ Binding
    - Implementation (smart pointer indirection) is assumed
    Solution:
        - This in fact is a tradeoff for the ease of use of smart pointer design
        - No solution.
2. (Depends on users' preference)
    - someone wants C++-like as this is C++ interface

    
5.2 Implementation
===============
5.2.1 Features of old DOM Implementation
----------------------------------------------------------------
1. Automatic memory management
    - Memory is released when there is no more handles pointing to it
    - Use reference count to keep track of handles
2. Use thread-safe DOMString class


5.2.2 Downside of old DOM Implementation
--------------------------------------------------------------------
1. Performance is slow
    - Memory management is the biggest time consumer, and a lot of memory footprint.
    - There are a whole lot of blocks allocated when creating a document and then freed when finished with it. Each and every node requires at least one and sometimes several separately allocated blocks. DOMString take three. It adds up.
    Solution:
        - Lenny suggests to use IDOM interface internally in DOM implementation, patch in Bugzilla 5967
        - Then the performance benefits of IDOM is gained but the memory retained problem in IDOM implementation still remains to address.   
        - And internally, we will have dual interface maintenance model as IDOM interface is then used by DOM internally.


Vote Question:
============
I would like to call for a vote:

    ==>  Which INTERFACE should be the Xerces-C++ public supported W3C DOM Interface, DOM or IDOM? <===

Note:  
1. The question is asking which "interface" to be officially supported.  Once the choice of interface is chosen, we can discuss how to solve the downside of implementation as the next topic.
2. The one being voted will become the ONLY Xerces-C++ supported public W3C DOM Interface, and is where the DOM Level 3 being implemented.
3. The API of the other interface will be deprecated.  And its samples, and associated Parser will eventually be removed from the distribution 


Re: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Rhys Black <pd...@yahoo.com>.
--- Tinny Ng <tn...@ca.ibm.com> wrote:
> Based on all the discussion and votes above, here is
> the count:
> 
> Vote Question:
> ============
>     ==>  Which INTERFACE should be the Xerces-C++
> public supported W3C DOM
> Interface, DOM or IDOM? <===
> 
> Vote Count:
> =========
> 14 people has joined the discussion, 3 of them
> didn't cast a vote explicitly
> so I didn't count them in, and the result is
> 
> IDOM         9 votes where 2 of them vote with a
> condition (provided that
> memory retained issue can be fixed)
> old DOM     2 votes
> 
> Based on above votes result, here is my proposal:
> 
> Since most users like IDOM interface, and in fact
> the 2 votes for the old
> DOM are suggesting to use IDOM interface internally
> which sounds like just
> another implementation of IDOM.
> 
> So I would like to move on to post IDOM interface as
> the Apache Recommended
> C++ DOM Bindings for implementation and as the *RAW*
> user interface.
> 
> But to satisfy those who really like smart pointer
> approach, the old DOM
> will be kept as a viable alternative user interface
> in addition to the
> recommended "RAW" user interface.   With Lenny's
> patch, this old DOM is also
> based on the IDOM interface and thus does not
> "violate" our  Apache C++ DOM
> Bindings for  implementation recommendation.   The
> old DOM interface will
> not be deprecated, but will not be promoted as the
> primary DOM user
> interface.   The samples will be removed, but the
> programming guide will
> remain and will be documented as a viable alterative
> user interface for
> particular users' preference.   And for new
> development (like DOM Level 3),
> IDOM is the first priority.
> 
> As for implementation detail, Lenny's patch against
> the old DOM will be
> reviewed and applied (provided the *RAW* IDOM
> interface users are not
> penalized by any performance degradation with this
> patch, but even if it
> does, we will then guard the code with build
> variation)
> 
> And for IDOM memory retained issue, we will review
> all the alternatives to
> see what's the best solution.   Also if no one
> objects, then we will go
> ahead and rename IDOM_XXX to DOMXXX as well.
> 
> Any comment with this approach?
> 
> Tinny
Excellent

I would have voted for IDOM as well, had I not been
inundated with programming classwork (taking double
time :) ).  I don't have a lot of experience with
IDOM, but I have to say that the lighter memory
footprint is highly desirable in a server setting, and
as many of those using this software se3em to be in
that setting, it should be made a priority.

Thank you
Rhys Black
> 


__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Radovan Chytracek <Ra...@cern.ch>.
Hi,

    would the following proposal work for most of the developers and users of
Xerces-C?

DOM_XXXXX (original DOM interface)
----------------------------------
This original interface will be adapted on top of patched IDOM by Lenny, using
implicitly reference counting, so for its current users nothing will change.
This interface gets frozen and developers have freedom to play with IDOM
implementation further.

DOMXXXXX
---------
The "official" Xerces-C DOM C++ interface will use IDOM implementation without
implicit use of reference counting thus not forcing this to everybody who
doesn't or can't use it. If needed one can still call the AddRef(),
RemoveRef() and Release() methods in the way it fits his/her purpose.

In both cases one will use DOM handles, the difference is that in the former
case the reference counting operations are invoked automatically by handles
while in the latter the reference counting operations must be invoked
explicitly

Cheers

    Radovan


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Jason,

Looks like you should avoid the handles, stick to the raw pointer interface,
and expose release though your interface.

> Sorry, I guess I wasn't clear, what I mean is that there will be a
> documented way for people to use the smart pointer interface to DOM
> that does automatic memory collection. If Xerces-Perl is unable to
> support that, then people will try it anyway and get confused.

I think Tinny was suggesting that the primary documented interface would be
the updated IDOM.  I would expect some limited verbiage be applied to how to
use handles in your application as a way to gain automated memory
management, but all of the examples would be without handles.  Those
examples would have to make it clear which circumstances calls to release
are beneficial in manually managing resources.  So long as you note to users
of your binding that handles are not used, then they should not be confused.

BTW, Tinny has expressed that it is more important for the handles to be
backwards compatible with the original DOM than to be a simpler
implementation, which the smart pointer approach would have provided.  So
the handles will not be smart pointers after all.  Not that you care for
your purpose, just that we have been talking about them as smart pointers
and I just wanted to make that clear.

Lenny

-----Original Message-----
From: Jason E. Stewart [mailto:jason@openinformatics.com]
Sent: Monday, May 06, 2002 2:26 PM
To: xerces-c-dev@xml.apache.org
Cc: lenny.hoffman@objectivity.com
Subject: Re: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


"Lenny Hoffman" <le...@earthlink.net> writes:

> > The issue is what I would need to do to keep a smart pointer.
>
> Sorry, I haven't used SWIG. Does SWIG only work with pointers?

It isn't a limitation of SWIG, it is an issue of how Perl wraps C/C++
objects/structs, Perl demands a pointer.

> Specifically can perl_sv_setref_pv be modified to take nodes by
> value instead of by pointer? For example would the following work?:
>
> some_wrapper_method ()
> {
>   DOM_Element element = ...;
>   SV *perl_object = perl_sv_setref_pv('DOM_Element', element);
>   ... // return perl_object
> }

No, Perl demands a pointer.

I don't see how you could do it otherwise. Perl knows nothing about
Xerces or DOM_Element, so how could I have a function that took an
object of type DOM_Element and get it to compile, unless it was cast
to a void*? Maybe I'm just lacking creativity, but I don't see how
else it could be done.

If you can suggest an alternative, I'd be extremely interested to hear
it.

> By passing element by value, its behavior as a smart pointer is to not
copy
> the actual node implementation, but properly update its reference
> count.

What I could do, I suppose is invoke the copy constructor:

some_wrapper_method ()
{
  DOM_Element element = ...;
  SV *perl_object = perl_sv_setref_pv('DOM_Element',
(void*)DOM_Element(element));
  ... // return perl_object
}

> > It seems I will *have* to force users to explicitly call
> > release(), and that will mean the Xerces-C DOM API has
> > functionality not replicated in Xerces-Perl, which is likely to
> > lead to user confusion for me.
>
> Since release would be part of the Xerces-C DOM API, your replicating it
> should lead to no user confusion.

Sorry, I guess I wasn't clear, what I mean is that there will be a
documented way for people to use the smart pointer interface to DOM
that does automatic memory collection. If Xerces-Perl is unable to
support that, then people will try it anyway and get confused.

> How did you get SWIG to work with the original DOM interface?  That
> interface is really a handle interface, so working with these
> handles (as smart pointers not withstanding) should work in a
> similar way.  Then you can get automatic memory management and not
> bother your users with having to call release.

Yes, good point. Xerces-Perl handled it rather badly I'm afraid. I
never looked into it properly, because I knew that IDOM was coming,
but basically there was very poor performance, because any method that
returned a handle had to immediately invoke the copy contstructor for
that class to get back a pointer, so there was a *lot* of unnecessary
copying. But also, the memory issues were pretty wierd - I was
convinced that memory was not being properly released but never made
time to properly debug it (because IDOM was coming).

So raw pointers are much simpler for me, much less copying, and the
memory management is explicit. When I'm done with a node, I call
release.

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Lenny Hoffman" <le...@earthlink.net> writes:

> > The issue is what I would need to do to keep a smart pointer.
> 
> Sorry, I haven't used SWIG. Does SWIG only work with pointers?  

It isn't a limitation of SWIG, it is an issue of how Perl wraps C/C++
objects/structs, Perl demands a pointer.

> Specifically can perl_sv_setref_pv be modified to take nodes by
> value instead of by pointer? For example would the following work?:
> 
> some_wrapper_method ()
> {
>   DOM_Element element = ...;
>   SV *perl_object = perl_sv_setref_pv('DOM_Element', element);
>   ... // return perl_object
> }

No, Perl demands a pointer. 

I don't see how you could do it otherwise. Perl knows nothing about
Xerces or DOM_Element, so how could I have a function that took an
object of type DOM_Element and get it to compile, unless it was cast
to a void*? Maybe I'm just lacking creativity, but I don't see how
else it could be done.

If you can suggest an alternative, I'd be extremely interested to hear
it. 

> By passing element by value, its behavior as a smart pointer is to not copy
> the actual node implementation, but properly update its reference
> count.

What I could do, I suppose is invoke the copy constructor:

some_wrapper_method ()
{
  DOM_Element element = ...;
  SV *perl_object = perl_sv_setref_pv('DOM_Element', (void*)DOM_Element(element));
  ... // return perl_object
}

> > It seems I will *have* to force users to explicitly call
> > release(), and that will mean the Xerces-C DOM API has
> > functionality not replicated in Xerces-Perl, which is likely to
> > lead to user confusion for me.
> 
> Since release would be part of the Xerces-C DOM API, your replicating it
> should lead to no user confusion.

Sorry, I guess I wasn't clear, what I mean is that there will be a
documented way for people to use the smart pointer interface to DOM
that does automatic memory collection. If Xerces-Perl is unable to
support that, then people will try it anyway and get confused. 

> How did you get SWIG to work with the original DOM interface?  That
> interface is really a handle interface, so working with these
> handles (as smart pointers not withstanding) should work in a
> similar way.  Then you can get automatic memory management and not
> bother your users with having to call release.

Yes, good point. Xerces-Perl handled it rather badly I'm afraid. I
never looked into it properly, because I knew that IDOM was coming,
but basically there was very poor performance, because any method that
returned a handle had to immediately invoke the copy contstructor for
that class to get back a pointer, so there was a *lot* of unnecessary
copying. But also, the memory issues were pretty wierd - I was
convinced that memory was not being properly released but never made
time to properly debug it (because IDOM was coming).

So raw pointers are much simpler for me, much less copying, and the
memory management is explicit. When I'm done with a node, I call
release.

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Jason,

> The issue is what I would need to do to keep a smart pointer.

Sorry, I haven't used SWIG. Does SWIG only work with pointers?  Specifically
can perl_sv_setref_pv be modified to take nodes by value instead of by
pointer? For example would the following work?:

some_wrapper_method ()
{
  DOM_Element element = ...;
  SV *perl_object = perl_sv_setref_pv('DOM_Element', element);
  ... // return perl_object
}

By passing element by value, its behavior as a smart pointer is to not copy
the actual node implementation, but properly update its reference count.

The reason that I figured that using smart pointers would mean fewer updates
for you than using handles that were not is because current code you have
like this:

   IDOM_Element* element = ...;
   IDOM_Node* node = element->getFirstChild();

would become (assuming a different naming convention for handles as smart
pointers):

   DOMElementH element = ...;
   DOMNodeH node = element->getFirstChild();

as compared to (handles not as smart pointers):

   DOM_Element element = ...;
   DOM_Node node = element.getFirstChild();

Note that the handle as a smart pointer allows you to continue to use the ->
operator as if you were still working with raw pointers.  This means that
going to handles as smart pointers requires only one type of change, where
going to handles as full interface (backwards compatible with original DOM)
would require two types of change.

> It seems I will *have* to force
> users to explicitly call release(), and that will mean the Xerces-C
> DOM API has functionality not replicated in Xerces-Perl, which is
> likely to lead to user confusion for me.

Since release would be part of the Xerces-C DOM API, your replicating it
should lead to no user confusion.

How did you get SWIG to work with the original DOM interface?  That
interface is really a handle interface, so working with these handles (as
smart pointers not withstanding) should work in a similar way.  Then you can
get automatic memory management and not bother your users with having to
call release.

Lenny

-----Original Message-----
From: Jason E. Stewart [mailto:jason@openinformatics.com]
Sent: Monday, May 06, 2002 12:02 PM
To: xerces-c-dev@xml.apache.org
Cc: lenny.hoffman@objectivity.com
Subject: Re: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


"Lenny Hoffman" <le...@earthlink.net> writes:

> Am I missing something here?

Probably, not ;-)

It's usually my lack of C++ knowledge.

> > To track the underlying C++ objects, Xerces-Perl keeps a pointer
> > to the object, so all my method calls will be pointer invocations.
>
> I don't understand what the problem is.  You can continue to keep a
pointer,
> or you can keep a smart pointer.

The issue is what I would need to do to keep a smart pointer.

I use SWIG to generate all my wrapper code - so in some sense I am
constrained by how SWIG generates wrappers. Let's for a moment forget
this, and say I was doing it by hand.

If my wrapper code generated smart pointers like so:

some_wrapper_method ()
{
  DOM_Element element = ...;
  SV *perl_object = perl_sv_setref_pv('DOM_Element', &element);
  ... // return perl_object
}

how will the smart pointer know that it hasn't gone out of scope and
release itself automatically, thus leaving me with a bad pointer?

> I thought you said you currently support the IDOM.  If you are happy with
> that, then all you need to do is renamed IDOM_* to DOM*.

already done.

> You don't have to use the smart pointer handles if you don't want
> the benefit of automatic calls to release, either because you will
> be explicitly calling release when needed or you don't care about
> memory growth.

;-)

Believe me I would like automatic memory release, I'm just at a loss
to figure out how I can get to it. It seems I will *have* to force
users to explicitly call release(), and that will mean the Xerces-C
DOM API has functionality not replicated in Xerces-Perl, which is
likely to lead to user confusion for me.

> If you did decide to gain the benefit of handles, then their being
> smart pointers would mean fewer updates for you, not more, as the
> handles as smart pointers act quite a bit like the raw pointers you
> are using today.

If you can help me understand how to use them, then I will be very
happy.

Pardon my denseness,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Lenny Hoffman" <le...@earthlink.net> writes:

> Am I missing something here?

Probably, not ;-)

It's usually my lack of C++ knowledge.

> > To track the underlying C++ objects, Xerces-Perl keeps a pointer
> > to the object, so all my method calls will be pointer invocations.
> 
> I don't understand what the problem is.  You can continue to keep a pointer,
> or you can keep a smart pointer.  

The issue is what I would need to do to keep a smart pointer. 

I use SWIG to generate all my wrapper code - so in some sense I am
constrained by how SWIG generates wrappers. Let's for a moment forget
this, and say I was doing it by hand.

If my wrapper code generated smart pointers like so:

some_wrapper_method ()
{
  DOM_Element element = ...;
  SV *perl_object = perl_sv_setref_pv('DOM_Element', &element);
  ... // return perl_object
}

how will the smart pointer know that it hasn't gone out of scope and
release itself automatically, thus leaving me with a bad pointer?

> I thought you said you currently support the IDOM.  If you are happy with
> that, then all you need to do is renamed IDOM_* to DOM*.  

already done.

> You don't have to use the smart pointer handles if you don't want
> the benefit of automatic calls to release, either because you will
> be explicitly calling release when needed or you don't care about
> memory growth.  

;-)

Believe me I would like automatic memory release, I'm just at a loss
to figure out how I can get to it. It seems I will *have* to force
users to explicitly call release(), and that will mean the Xerces-C
DOM API has functionality not replicated in Xerces-Perl, which is
likely to lead to user confusion for me.

> If you did decide to gain the benefit of handles, then their being
> smart pointers would mean fewer updates for you, not more, as the
> handles as smart pointers act quite a bit like the raw pointers you
> are using today.

If you can help me understand how to use them, then I will be very
happy.

Pardon my denseness,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Jason,

> To track the underlying C++ objects, Xerces-Perl keeps a pointer
> to the object, so all my method calls will be pointer invocations.

I don't understand what the problem is.  You can continue to keep a pointer,
or you can keep a smart pointer.  I have never done any Perl wrappers, but I
have done significant Python wrapping (not Xerces), and I know that I could
do it either way.

I thought you said you currently support the IDOM.  If you are happy with
that, then all you need to do is renamed IDOM_* to DOM*.  You don't have to
use the smart pointer handles if you don't want the benefit of automatic
calls to release, either because you will be explicitly calling release when
needed or you don't care about memory growth.  If you did decide to gain the
benefit of handles, then their being smart pointers would mean fewer updates
for you, not more, as the handles as smart pointers act quite a bit like the
raw pointers you are using today.

Am I missing something here?

Lenny

-----Original Message-----
From: Jason E. Stewart [mailto:jason@openinformatics.com]
Sent: Monday, May 06, 2002 10:35 AM
To: xerces-c-dev@xml.apache.org
Cc: lenny.hoffman@objectivity.com; Tinny Ng
Subject: Re: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


"Lenny Hoffman" <le...@earthlink.net> writes:

> The smart pointer approach should also be easier to explain in the
> programmers guide.  For example it can be explained that when wishing to
> have automated "in use" indication assign returned nodes to handles:
>
>    {
>       DOM_Element element = ...;
>       DOM_Node node = ...;
>       element->removeChild(node);
>       ...
>    } // release on removed child called here automatically
>
> Or if wishing to avoid handles assign to pointers and explicitly call
> release:
>
>    {
>       DOMElement* element = ...;
>       DOMNode* node = ...;
>       element->removeChild(node);
>       ...
>       node->release();
>    }

The smart pointer approach sounds clever. It has a serious drawback,
for me anyway, which is that it cannot work for the Perl API (or the
Python API most likely), at least not without significant hackery on my
part. To track the underlying C++ objects, Xerces-Perl keeps a pointer
to the object, so all my method calls will be pointer invocations.

Also, it will most likely mean the Xerces-C documentation becomes
significantly out of line with Xerces-Perl.

Smart pointers are also a bitch for people like me who only program in
C++ when we *have* to to try and figure out what other peoples code is
doing.

my $0.05,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Lenny Hoffman" <le...@earthlink.net> writes:

> The smart pointer approach should also be easier to explain in the
> programmers guide.  For example it can be explained that when wishing to
> have automated "in use" indication assign returned nodes to handles:
> 
>    {
>       DOM_Element element = ...;
>       DOM_Node node = ...;
>       element->removeChild(node);
>       ...
>    } // release on removed child called here automatically
> 
> Or if wishing to avoid handles assign to pointers and explicitly call
> release:
> 
>    {
>       DOMElement* element = ...;
>       DOMNode* node = ...;
>       element->removeChild(node);
>       ...
>       node->release();
>    }

The smart pointer approach sounds clever. It has a serious drawback,
for me anyway, which is that it cannot work for the Perl API (or the
Python API most likely), at least not without significant hackery on my
part. To track the underlying C++ objects, Xerces-Perl keeps a pointer
to the object, so all my method calls will be pointer invocations. 

Also, it will most likely mean the Xerces-C documentation becomes
significantly out of line with Xerces-Perl.

Smart pointers are also a bitch for people like me who only program in
C++ when we *have* to to try and figure out what other peoples code is
doing.

my $0.05,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Tinny,

We have a choice here in how we implement our handles: keep backwards
capability with the original DOM interfaces at the cost of maintaining a
full dual interface, or sacrifice backwards compatibility with the original
DOM, but gain simplified maintenance, and possibly an easier interface to
understand.  Let me explain --

I had originally kept the DOM_ interfaces, but had them delegate to the IDOM
for anything that they did (this gives us the greatest backwards
compatibility with the original DOM).  But in the recent discussion I
thought you made the point that having such duplication of interfaces
represented increased maintenance, so I proposed changing the DOM_
interfaces so that they are smart pointers, thus eliminating method
duplication (this loses backwards compatibility with the original DOM).  For
example, instead of:

	DOM_Element element = ...;
      DOM_Node first = element.getFirstChild();

where there is both a DOM_Element::getFirstChild and an
DOMElement::getFirstChild, you would have:

	DOM_Element element = ...;
      DOM_Node first = element->getFirstChild();

where there is only DOMElement::getFirstChild.  Having the DOM_ interfaces
be simple smart pointers that delegate to methods using a single
overloaded -> operator instead of multiple methods for each method on a node
interface.  In fact in my prototype I was able to do in a single header file
what was done in 14.  With main development going forward on the renamed
IDOM interfaces, this approach offers the advantage of quick and easy
turnaround on adding handle support to new interfaces.

The smart pointer approach should also be easier to explain in the
programmers guide.  For example it can be explained that when wishing to
have automated "in use" indication assign returned nodes to handles:

   {
      DOM_Element element = ...;
      DOM_Node node = ...;
      element->removeChild(node);
      ...
   } // release on removed child called here automatically

Or if wishing to avoid handles assign to pointers and explicitly call
release:

   {
      DOMElement* element = ...;
      DOMNode* node = ...;
      element->removeChild(node);
      ...
      node->release();
   }

However release is called on a node, either by an implementation reference
counting with the help of handles or explicitly, it indicates to the
implementation that there are no longer any clients to the node and that it
resources could be released or recycled.  The memory growth fix I came up
with for the IDOM, for example, would recycle memory for the node and its
strings upon a call to release if that node were no longer part of the
document.

Since you and others were willing to forgo the original DOM altogether, I
got the impression that backwards compatibility with it was not too
important.  As you can see, the methods accessed via the smart pointer's ->
operator are really on the new IDOM interfaces and thus are not backwards
compatible with the original DOM.

Although I lean toward the simpler smart pointer approach, I can go either
way, just let me know.

Lenny

-----Original Message-----
From: Tinny Ng [mailto:tng-xml@ca.ibm.com]
Sent: Monday, May 06, 2002 9:27 AM
To: xerces-c-dev@xml.apache.org; lenny.hoffman@objectivity.com
Subject: Re: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


>
> BTW, the handles are no longer backward compatible with the original DOM_
> classes, and thus should follow a different naming convention.  Since you
> are renaming the IDOM_* classes to DOM*, perhaps the handles could be
named
> DOM*_h.  This is just my initial thought, you may have a better idea.
>

As indicated in my other post, I don't really want to rename the old DOM
interface.   In fact how incompatible are the handles going to be?  Except
the removal of "XML_DECL", I would prefer to keep compatibility as much as
possible in the old DOM interface after the patch.....

Tinny



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Tinny Ng <tn...@ca.ibm.com>.
>
> BTW, the handles are no longer backward compatible with the original DOM_
> classes, and thus should follow a different naming convention.  Since you
> are renaming the IDOM_* classes to DOM*, perhaps the handles could be
named
> DOM*_h.  This is just my initial thought, you may have a better idea.
>

As indicated in my other post, I don't really want to rename the old DOM
interface.   In fact how incompatible are the handles going to be?  Except
the removal of "XML_DECL", I would prefer to keep compatibility as much as
possible in the old DOM interface after the patch.....

Tinny



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Tinny,

I have no problem with your direction.  I will rework my patch consistent
with the discussion (simplify handles to not replicate IDOM methods, make
reference counting optional, make thread safe reference counting optional)
and re-base it on the latest nightly build (2002-05-03) -- it was based on
1.6.  This should take me at least a few days (given my other
responsibilities). I will attach it to 5967 when done.

BTW, the handles are no longer backward compatible with the original DOM_
classes, and thus should follow a different naming convention.  Since you
are renaming the IDOM_* classes to DOM*, perhaps the handles could be named
DOM*_h.  This is just my initial thought, you may have a better idea.

Thanks,

Lenny

-----Original Message-----
From: Tinny Ng [mailto:tng-xml@ca.ibm.com]
Sent: Friday, May 03, 2002 3:06 PM
To: xerces-c-dev@xml.apache.org
Subject: Re: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


Based on all the discussion and votes above, here is the count:

Vote Question:
============
    ==>  Which INTERFACE should be the Xerces-C++ public supported W3C DOM
Interface, DOM or IDOM? <===

Vote Count:
=========
14 people has joined the discussion, 3 of them didn't cast a vote explicitly
so I didn't count them in, and the result is

IDOM         9 votes where 2 of them vote with a condition (provided that
memory retained issue can be fixed)
old DOM     2 votes

Based on above votes result, here is my proposal:

Since most users like IDOM interface, and in fact the 2 votes for the old
DOM are suggesting to use IDOM interface internally which sounds like just
another implementation of IDOM.

So I would like to move on to post IDOM interface as the Apache Recommended
C++ DOM Bindings for implementation and as the *RAW* user interface.

But to satisfy those who really like smart pointer approach, the old DOM
will be kept as a viable alternative user interface in addition to the
recommended "RAW" user interface.   With Lenny's patch, this old DOM is also
based on the IDOM interface and thus does not "violate" our  Apache C++ DOM
Bindings for  implementation recommendation.   The old DOM interface will
not be deprecated, but will not be promoted as the primary DOM user
interface.   The samples will be removed, but the programming guide will
remain and will be documented as a viable alterative user interface for
particular users' preference.   And for new development (like DOM Level 3),
IDOM is the first priority.

As for implementation detail, Lenny's patch against the old DOM will be
reviewed and applied (provided the *RAW* IDOM interface users are not
penalized by any performance degradation with this patch, but even if it
does, we will then guard the code with build variation)

And for IDOM memory retained issue, we will review all the alternatives to
see what's the best solution.   Also if no one objects, then we will go
ahead and rename IDOM_XXX to DOMXXX as well.

Any comment with this approach?

Tinny



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Tinny Ng <tn...@ca.ibm.com>.
Based on all the discussion and votes above, here is the count:

Vote Question:
============
    ==>  Which INTERFACE should be the Xerces-C++ public supported W3C DOM
Interface, DOM or IDOM? <===

Vote Count:
=========
14 people has joined the discussion, 3 of them didn't cast a vote explicitly
so I didn't count them in, and the result is

IDOM         9 votes where 2 of them vote with a condition (provided that
memory retained issue can be fixed)
old DOM     2 votes

Based on above votes result, here is my proposal:

Since most users like IDOM interface, and in fact the 2 votes for the old
DOM are suggesting to use IDOM interface internally which sounds like just
another implementation of IDOM.

So I would like to move on to post IDOM interface as the Apache Recommended
C++ DOM Bindings for implementation and as the *RAW* user interface.

But to satisfy those who really like smart pointer approach, the old DOM
will be kept as a viable alternative user interface in addition to the
recommended "RAW" user interface.   With Lenny's patch, this old DOM is also
based on the IDOM interface and thus does not "violate" our  Apache C++ DOM
Bindings for  implementation recommendation.   The old DOM interface will
not be deprecated, but will not be promoted as the primary DOM user
interface.   The samples will be removed, but the programming guide will
remain and will be documented as a viable alterative user interface for
particular users' preference.   And for new development (like DOM Level 3),
IDOM is the first priority.

As for implementation detail, Lenny's patch against the old DOM will be
reviewed and applied (provided the *RAW* IDOM interface users are not
penalized by any performance degradation with this patch, but even if it
does, we will then guard the code with build variation)

And for IDOM memory retained issue, we will review all the alternatives to
see what's the best solution.   Also if no one objects, then we will go
ahead and rename IDOM_XXX to DOMXXX as well.

Any comment with this approach?

Tinny



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Tinny,

How are you going to fix the IDOM memory leak?  How are you going to support
implementations with different memory models?  Why are you presenting the
IDOM as an effective alternative, one worthy of gambling future Xerces
development on without resolving these issues?

As far as standardizing is concerned, I don't see why the DOM interface
can't be standardized, doing so does not dictate any particular
implementation.  What is the difference between standardizing the DOM
handles (noting that they can be implemented any way desired, i.e. by
desired by deriving from the IDOM abstract base classes using a handle/body
pattern) and standardizing the IDOM abstract base classes noting they can be
implemented any way desired (i.e. by deriving from the IDOM abstract base
classes).

Lenny
  -----Original Message-----
  From: Tinny Ng [mailto:tng-xml@ca.ibm.com]
  Sent: Monday, April 29, 2002 11:16 AM
  To: xerces-c-dev@xml.apache.org
  Subject: Re: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


  > Vote Question

  > ==>  Which INTERFACE should be the Xerces-C++ public supported W3C DOM
Interface, DOM or IDOM? <===

  My vote:  IDOM

  - I prefer IDOM over DOM. The smart pointer does not buy me much compared
to an official Apache DOM C++ Binding.  Agree with Joseph, I think abstract
base classes is more standard-like and is the right way to go.

  - With DOM-IDOM integration, although IDOM will be hidden, it is still a
dual interface maintenace. From Xerces-C++ active committer perspective, I
would prefer focus on IDOM only.   The old DOM interface can be kept for
compatibility, but we don't want to upgrade it with those DOM Level 3
interface, and dual maintain both DOMParser and IDOMParser.


Re: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Tinny Ng <tn...@ca.ibm.com>.
> Vote Question

> ==>  Which INTERFACE should be the Xerces-C++ public supported W3C DOM Interface, DOM or IDOM? <===

My vote:  IDOM

- I prefer IDOM over DOM. The smart pointer does not buy me much compared to an official Apache DOM C++ Binding.  Agree with Joseph, I think abstract base classes is more standard-like and is the right way to go.

- With DOM-IDOM integration, although IDOM will be hidden, it is still a dual interface maintenace. From Xerces-C++ active committer perspective, I would prefer focus on IDOM only.   The old DOM interface can be kept for compatibility, but we don't want to upgrade it with those DOM Level 3 interface, and dual maintain both DOMParser and IDOMParser.


Re: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Alberto Massari <al...@exln.com>.
At 11.08 29/04/2002 -0400, you wrote:
>Vote Question:
>============
>I would like to call for a vote:
>
>     ==>  Which INTERFACE should be the Xerces-C++ public supported W3C 
> DOM Interface, DOM or IDOM? <===

I vote for IDOM.
I like:
- the speed
- the virtual interface that will allow plugging any custom DOM implementation.
But I fear that the fact that Xerces doesn't call the destructor for the 
single nodes (but only releases the memory underneath it) can pose a big 
limitation to the exploitation of the second advantage (because it will 
force the developer to include data members allocated only through the 
IDOM_Document::allocate function).

Alberto

>
>Note:
>1. The question is asking which "interface" to be officially 
>supported.  Once the choice of interface is chosen, we can discuss how to 
>solve the downside of implementation as the next topic.
>2. The one being voted will become the ONLY Xerces-C++ supported public 
>W3C DOM Interface, and is where the DOM Level 3 being implemented.
>3. The API of the other interface will be deprecated.  And its samples, 
>and associated Parser will eventually be removed from the distribution
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Tinny,

 - Lenny suggests to use IDOM interface internally in DOM implementation,
patch in Bugzilla 5967
 - Then the performance benefits of IDOM is gained but the memory retained
problem in IDOM implementation still remains to address.

Actually, I have now addressed the memory retention problem.  I am running
some tests before adding the changes to patch 5967.  The solution is to
recycle allocated, but no longer in use memory, and is facilitated by
reference counting.  Here is a description of my approach.

The IDDocument storage allocator is only designed to hand out memory in
chunks, not release it in chunks, and updating it to support deletes would
greatly complicate its design, as well as diminish its performance.  This
means we need an alternative to deleting nodes once they are no longer
needed.



An alternative to deleting is recycling.  Recycling for nodes can be
accomplished by having IDDocument keep a free list for each type of node.
Nodes no longer needed are added to the free list for their node type, which
are in turn emptied by the document before creating a new node.  By
overloading the owning node pointer in free nodes to mean the next free
node, no additional data must be added to nodes to support free lists.



IDDocument’s storage allocator not only allocates storage for nodes, but
also strings.  Most of a document’s strings are organized into a string
pool, which uses a hash map to keep just one copy of any particular string
value.



It is interesting to note that the original DOM also had a string pool, one
that its documentation states is for the purpose of recycling element and
attribute names, but it only uses it for non-namespace elements, namespace
elements and all attributes get their own name strings.



In the IDOM, IDElementImpl, IDElementNSImpl, IDAttrImp, and IDAttrNSImp do
use the string pool for their names, but so does IDCharacterDataImp for its
value.  A string pool is extremely useful for storing a limited number of
repeating string values like element and attribute names, but is less useful
for storing the value of text nodes, which will usually vary more in value
than element and attribute names do.  It is understandable that without any
means of releasing memory for a string, that the string pool was used for
text node values, for a typical document will have some repeating text node
values, like “true” and “false” for Boolean attributes, and some memory will
be reused.  But, this is not a complete solution, because it is likely that
many documents will have a larger number of unique text node values, which
means that much memory will not get reused, and that the performance of the
string pool will be diminished with its increased size.



To solve these problems, while keeping the string pool for element and
attribute names, I added a string heap for arbitrary strings.  The string
heap gets its memory from the documents storage allocator, but support the
returning of strings to the heap.  I changed IDCharacterDataImp from getting
its string storage from the string pool to instead get it from the string
heap.  Also upon recycling, I have it return its current storage to the
string heap and get its new required storage from the string heap.



Lenny

  -----Original Message-----
  From: Tinny Ng [mailto:tng-xml@ca.ibm.com]
  Sent: Monday, April 29, 2002 10:08 AM
  To: xerces-c-dev@xml.apache.org
  Subject: Call for Vote: which one to be the Xerces-C++ public supported
W3C DOM interface


  Hi everyone,

  I've reviewed Andy's design objective of IDOM, Lenny's view of old DOM and
his proposal of redesign, and some users feedback.   Here is a "quick"
summary and I would like to call for a VOTE about the fate of these two
interfaces.

  1.0 Objective
  ==========
  1.  Define the strategy of Xerces-C++ public DOM interface.  Decide which
one to keep, old DOM interface or new IDOM interface


  2.0 Motivation
  ===========
  1. As a long term strategy, Xerces-C++ shouldn't define two W3C DOM
interfaces which simply confuses users.
      => We've already got many users' questions about what the difference,
which one to use ... etc.
  2. With limited resource, we should focus our development on ONE stream,
no more duplicate effort
      => New DOM Level 3 development should be done on one interface, not
both.
      => No more dual maintenance: two set of samples (e.g. DOMPrint vs
IDOMPrint), two parsers (DOMParser vs IDOMParser)
  3. To better place Apache Xerces-C++ in the market, we should have our
Apache Recommended DOM C++ Binding in http://www.w3.org/DOM/Bindings
      => To encourage more users to develop DOM application AND
implementation based on this binding.
      => Such binding should just define a set of abstract base classes
(similar to JAVA interface) where no implementation model is assumed


  3.0 History
  =========
  'DOM' was the initial "W3C DOM interface" developed by Xerces-C++.
However the performance of its implementation is not quite satisfactory.

  Last year, Andy Heninger came up with a new design with faster
performance, and such implementation came with a new set of interface =>
'IDOM'.

  Currently both 'DOM' and 'IDOM' are shipped with Xerces-C++.  'IDOM' is
claimed as experimental (like a prototype) and is subject to change.

  More information can be found in :
  http://xml.apache.org/xerces-c/program.html
  http://www.apache.org/~andyh/
  http://marc.theaimsgroup.com/?t=101650188300002&r=1&w=2

http://marc.theaimsgroup.com/?w=2&r=1&s=Proposal%3A+C%2B%2B+Language+Binding
+for+DOM+L&q=t



  4.0 IDOM
  =========
  4.1 Interface
  ==========

  4.1.1 Features of IDOM Interface
  --------------------------------------------------
  e.g. virtual IDOM_Element* IDOM_Document::createElement(const XMLCh*
tagName) = 0;

  1. Define as abstract base classes
  2. Use normal C++ pointers.
      => So that abstract base class is possible.
      => Make it more C++ like. Less Java like.


  4.1.2 Pros and Cons of IDOM Interface
  ----------------------------------------------------------
  Pros:
  1. Abstract base classes that correspond to the W3C DOM interfaces
      => Can be recommended as Apache DOM C++ Binding
      => More standard like, no implementation assumed as they are just
abstract interfaces using pure virtual functions
  2. (Depends on users' preference)
      - someone prefers C++ like style

  Cons:
  1. IDOM_XXX - weird prefix 'I'
      Solution:
          - Proposed to rename to DOMXXXX which also matches the DOM Level 3
naming convention
  2. (Depends on users' preference)
      - someone does not like pointers, and wants Java-like interface for
ease to use, ease to learn and ease to port (from Java).
  3. As the old DOM interface has been around for a long time, majority of
current Xerces-C++ still uses the old DOM interface, significant migration
impact
      Solution:
          - Announce the deprecation of old DOM interface for a couple of
releases before removal

  4.2 Implementation
  ===============
  4.2.1 Features of IDOM Implementation
  -----------------------------------------------------------
  1. Use an independent storage allocator per document. The advantage here
is that allocation would require no synchronization
      => Fast, good scalability, reduced memory footprint
  2. Use plain, null-terminated (XMLCh *) utf-16 strings.
      => No DOMString class overhead which is another performance
contributor that makes IDOM faster


  4.2.2 Downside of IDOM Implementation
  -------------------------------------------------------------
  1. Manual memory management
      - If document comes from parser, then parser owns the document.  If
document comes from DOMImplementation, then users are responsible to delete
it.
      Solution:
          - Provide a means of disassociating a document from the parser
          - Add a function "Node::release()", similar to the idea of
"Range::detach", which allows users to indicate the release of the Node.
              - From C++ Binding abstract interface perspective, it's up to
implementation how to handle this "release()" function.
              - With Xerces-C++ IDOM implementation, the release() function
will delete the 'this' pointer if it is a document, else no-op.
  2. Memory retained until the document is deleted.
      - If you change the value of an attribute or call removeNode many
times,  the memory of the old value is not deallocated for reuse and the
document grows and grows
      Solution:
          - This in fact is a tradeoff for the fast performance offered by
independent storage allocator.
          - There is no immediate good solution in place


  5.0 old DOM
  ==========
  5.1 Interface
  ==========

  5.1.1 Features of old DOM Interface
  -----------------------------------------------------
  e.g. DOM_Element DOM_Document::createElement(const DOMString tagName);

  1. Use smart pointers - Java-like


  5.1.2 Pros and Cons of old DOM Interface
  --------------------------------------------------------------
  Pros:
  1. DOM_XXX - reasonable name
  2. (Depends on users' preference)
      - someone wants Java-like interface for ease to use, ease to learn and
ease to port (from Java).
  3. Not that many users have migrated to IDOM yet, so migration impact is
minimal.

  Cons:
  1. Not abstract base class
      - Cannot be recommended as Apache DOM C++ Binding
      - Implementation (smart pointer indirection) is assumed
      Solution:
          - This in fact is a tradeoff for the ease of use of smart pointer
design
          - No solution.
  2. (Depends on users' preference)
      - someone wants C++-like as this is C++ interface


  5.2 Implementation
  ===============
  5.2.1 Features of old DOM Implementation
  ----------------------------------------------------------------
  1. Automatic memory management
      - Memory is released when there is no more handles pointing to it
      - Use reference count to keep track of handles
  2. Use thread-safe DOMString class


  5.2.2 Downside of old DOM Implementation
  --------------------------------------------------------------------
  1. Performance is slow
      - Memory management is the biggest time consumer, and a lot of memory
footprint.
      - There are a whole lot of blocks allocated when creating a document
and then freed when finished with it. Each and every node requires at least
one and sometimes several separately allocated blocks. DOMString take three.
It adds up.
      Solution:
          - Lenny suggests to use IDOM interface internally in DOM
implementation, patch in Bugzilla 5967
          - Then the performance benefits of IDOM is gained but the memory
retained problem in IDOM implementation still remains to address.
          - And internally, we will have dual interface maintenance model as
IDOM interface is then used by DOM internally.


  Vote Question:
  ============
  I would like to call for a vote:

      ==>  Which INTERFACE should be the Xerces-C++ public supported W3C DOM
Interface, DOM or IDOM? <===

  Note:
  1. The question is asking which "interface" to be officially supported.
Once the choice of interface is chosen, we can discuss how to solve the
downside of implementation as the next topic.
  2. The one being voted will become the ONLY Xerces-C++ supported public
W3C DOM Interface, and is where the DOM Level 3 being implemented.
  3. The API of the other interface will be deprecated.  And its samples,
and associated Parser will eventually be removed from the distribution


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Jason,

> I would really hope to use the more straight-forward IDOM 
> interface - it is much simpler for Xerces-Perl to handle 
> (in fact, as of 1.7.0 it is all I support).

Can you please elaborate on what about the IDOM makes it easier for you?  

Thanks,

Lenny

-----Original Message-----
From: Jason E. Stewart [mailto:jason@openinformatics.com]
Sent: Monday, April 29, 2002 2:35 PM
To: xerces-c-dev@xml.apache.org
Cc: lenny.hoffman@objectivity.com; mf@gimbio.de
Subject: Re: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


"Lenny Hoffman" <le...@earthlink.net> writes:

> The memory management problem solved by recycling no longer used
> nodes and strings.  The only clean way I know to know when nodes and
> strings are being used is to use the handle/body pattern, which is
> what is used by the original DOM.  What I have done is use the
> original DOM handles and the IDOM implementation, but fixed the IDOM
> memory problem.

Hey Lenny and Tinny,

Just to be clear here:

* Lenny is saying that he is using the old DOM interface, but the
  underlying IDOM implementation - meaning the use of handle objects
  instead of using straight pointers to objects?

I would really hope to use the more straight-forward IDOM interface -
it is much simpler for Xerces-Perl to handle (in fact, as of 1.7.0 it
is all I support).

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Lenny Hoffman" <le...@earthlink.net> writes:

> The memory management problem solved by recycling no longer used
> nodes and strings.  The only clean way I know to know when nodes and
> strings are being used is to use the handle/body pattern, which is
> what is used by the original DOM.  What I have done is use the
> original DOM handles and the IDOM implementation, but fixed the IDOM
> memory problem.

Hey Lenny and Tinny,

Just to be clear here:

* Lenny is saying that he is using the old DOM interface, but the
  underlying IDOM implementation - meaning the use of handle objects
  instead of using straight pointers to objects?

I would really hope to use the more straight-forward IDOM interface -
it is much simpler for Xerces-Perl to handle (in fact, as of 1.7.0 it
is all I support).

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Markus,

Thank you very much for the insight.

Note that simply accessing the IDOM implementation via handles does not
affect its thread safety-ness, thus your application is safe.

if (pm_Element)
    pm_Element->getAttribute(...);

How can I do this with references?

You do it with the current handles like this:

if (!pm_Element.isNull())
    pm_Element.getAttribute(...);

Adding an int operator to DOM_Node would allow even more friendly syntax;
e.g.

if (pm_Element)
    pm_Element.getAttribute(...);

This could be easily added.

In fact, an -> operators could be added to the DOM_Node classes and get
this:

if (pm_Element)
    pm_Element->getAttribute(...);

This is now exactly what you started out with, thus is completely backward
compatible with your current use of the IDOM.


XMLCh* are easier to handle as DOMString-Objects in ATL :  CComBSTR cBstr =
pm_Element->getAttribute(...);

Good point, the current DOMString class does not have an XMLCh* operator,
which if it did would solve your problem.  I pretty much gutted the original
DOMString class to make it a simple wrapper around an XMLCh* returned from
IDOM implementations, in lieu of suffering the costs of a the cross document
string management of the original DOM.  As far as I can tell the only reason
the original DOMString did not have an XMLCh* operator was because there was
no guarantee that its internal XMLCh* was null terminated; well, that
guarantee does now exist and the operator can be added -- I will do that.
So your example remains:

CComBSTR cBstr = pm_Element->getAttribute(...);

Note that string classes are convenient way to perform various operations on
a string without using the static (read functional) methods provided by
XMLString.  I even implemented COW (copy on write) behavior in the new
DOMString class, so that you can feel free to modify a string returned from
a node without having to manually make a copy.

If folks don't find the DOMString wrapper to be that important, that frees
me up to simplify the handle classes and address one of Tinny's concerns.
Tinny pointed out that while the new design hides dual interfaces (DOM and
IDOM) from users, it does not hide them from DOM developers;  as DOM 3
support is added, each interface change would have to be made to both DOM
and IDOM classes.  The only reason I went with complete interface
replication instead of simple smart pointers for the handle classes was to
be able to translate XMLCh pointers returned from IDOM nodes into
DOMStrings.  If I am allowed to get rid of DOMString altogether I can make
the handle classes simple smart pointers that do not replicate IDOM
interfaces, and thus the duplication of effort is eliminated.

Lenny

 -----Original Message-----
From: Markus Fellner [mailto:fellner@gimbio.de]
Sent: Monday, April 29, 2002 6:17 PM
To: xerces-c-dev@xml.apache.org; lenny.hoffman@objectivity.com
Subject: AW: Call for Vote: which one to be the Xerces-C++ public supported
W3C DOM interface


  O.k the main reaseon for my IDOM flirtation is...
  I've chosen IDOM cause of its thread-safeness. And now I have several
thousands lines of code using IDOM interface.

  Some other reasons are...
  I have many IDOM_Element*  members (pm_Elem) in my classes. After parsing
they will be assigned one time and than many times checked if they are
really assigned and used for reading and writing attributes.

  if (pm_Element)
      pm_Element->getAttribute(...);

  How can I do this with references?

  XMLCh* are easier to handle as DOMString-Objects in ATL :  CComBSTR cBstr
= pm_Element->getAttribute(...);
  ...

  Sorry for my short answer. I go on holiday tomorrow  and i have to pack
up!

  I'm back in 2 weeks and looking forward to see the results of this voting.
  It's a pitty to go during a hot discussion on this list.

  Markus
    -----Ursprüngliche Nachricht-----
    Von: Lenny Hoffman [mailto:lennyhoffman@earthlink.net]
    Gesendet: Montag, 29. April 2002 23:54
    An: mf@gimbio.de; xerces-c-dev@xml.apache.org;
lenny.hoffman@objectivity.com
    Betreff: RE: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


    Hi Markus,

    To be clear, the fix I created for the IDOM was to recycle memory once a
node or string is no longer needed.   To know when a node is no longer
needed I used the original DOM interface, but have them wrapping up the IDOM
as the implementation.  IDOM performance is maintained, but ease of use is
greatly improved.  Without using the DOM handles to know when an IDOM node
is in use or not, application code will be drawn into explicitly stating
when a node is no longer needed and can be recycled, which is yet another
thing to be documented and to for application developers to get wrong and
suffer consequences for.

    If you love and use the IDOM for its performance, you want the memory
problem fixed so that it is really fixed, not a workaround that only works
if your application does everything right, then you will love what I have
done with combining DOM classes as handles, and IDOM classes as bodies.

    If what you love is working with pointers instead of with objects,
please let me know why.

    One thing I have found harder with objects vs.. pointers is down casting
from node to derived objects like element.  The syntax is a bit cleaner with
pointers; e.g.:

        DOM_Node node = ...
        DOM_Element elem =  (const DOM_Element&)node;

    vs:

        IDOM_Node* node = ..
        IDOM_Element* elem = (IDOM_Element*)node;

    It is easy to forget to add the const in the first case, and is somewhat
non-intuitive because slicing can happen, though it is not problem in this
case.

    To solve this problem I have thought of adding overloaded constructors
and assignment operators that take a DOM_Node to DOM_Node derived classes
like DOM_Element.  Thus the first example becomes:

        DOM_Node node = ...
        DOM_Element elem =  node;

    Not only is this code more succinct, but it is safer, as the overloaded
constructor and assignment operator can check for node compatibility via the
getNodeType call.

    Again, please let me know what other aspects of points make things
easier for you.

    > Hope your fix has no effects on thread-safe-ness!

    No affect whatsoever.

    Lenny
      -----Original Message-----
      From: Markus Fellner [mailto:fellner@gimbio.de]
      Sent: Monday, April 29, 2002 4:15 PM
      To: xerces-c-dev@xml.apache.org; lenny.hoffman@objectivity.com
      Subject: AW: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


      Hi Lenny,

      I hope your fix of the IDOM memory problem goes into the next official
release. But I use and love the IDOM interface.
      It's really easier for an old C++ programmer like me! And I use IDOM
cause of its threadsafe properties. Hope your fix has no effects on
thread-safe-ness!

      Markus

        -----Ursprüngliche Nachricht-----
        Von: Lenny Hoffman [mailto:lennyhoffman@earthlink.net]
        Gesendet: Montag, 29. April 2002 17:57
        An: xerces-c-dev@xml.apache.org; mf@gimbio.de
        Betreff: RE: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


        Hi Markus,

        The memory management problem solved by recycling no longer used
nodes and strings.  The only clean way I know to know when nodes and strings
are being used is to use the handle/body pattern, which is what is used by
the original DOM.  What I have done is use the original DOM handles and the
IDOM implementation, but fixed the IDOM memory problem.

        Lenny
          -----Original Message-----
          From: Markus Fellner [mailto:fellner@gimbio.de]
          Sent: Monday, April 29, 2002 10:54 AM
          To: xerces-c-dev@xml.apache.org
          Subject: AW: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


          If the memory management problem is solved, I prefer IDOM!!!
            -----Ursprüngliche Nachricht-----
            Von: Tinny Ng [mailto:tng-xml@ca.ibm.com]
            Gesendet: Montag, 29. April 2002 17:08
            An: xerces-c-dev@xml.apache.org
            Betreff: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


            Hi everyone,

            I've reviewed Andy's design objective of IDOM, Lenny's view of
old DOM and his proposal of redesign, and some users feedback.   Here is a
"quick" summary and I would like to call for a VOTE about the fate of these
two interfaces.

            1.0 Objective
            ==========
            1.  Define the strategy of Xerces-C++ public DOM interface.
Decide which one to keep, old DOM interface or new IDOM interface


            2.0 Motivation
            ===========
            1. As a long term strategy, Xerces-C++ shouldn't define two W3C
DOM interfaces which simply confuses users.
                => We've already got many users' questions about what the
difference, which one to use ... etc.
            2. With limited resource, we should focus our development on ONE
stream, no more duplicate effort
                => New DOM Level 3 development should be done on one
interface, not both.
                => No more dual maintenance: two set of samples (e.g.
DOMPrint vs IDOMPrint), two parsers (DOMParser vs IDOMParser)
            3. To better place Apache Xerces-C++ in the market, we should
have our Apache Recommended DOM C++ Binding in
http://www.w3.org/DOM/Bindings
                => To encourage more users to develop DOM application AND
implementation based on this binding.
                => Such binding should just define a set of abstract base
classes (similar to JAVA interface) where no implementation model is assumed


            3.0 History
            =========
            'DOM' was the initial "W3C DOM interface" developed by
Xerces-C++.  However the performance of its implementation is not quite
satisfactory.

            Last year, Andy Heninger came up with a new design with faster
performance, and such implementation came with a new set of interface =>
'IDOM'.

            Currently both 'DOM' and 'IDOM' are shipped with Xerces-C++.
'IDOM' is claimed as experimental (like a prototype) and is subject to
change.

            More information can be found in :
            http://xml.apache.org/xerces-c/program.html
            http://www.apache.org/~andyh/
            http://marc.theaimsgroup.com/?t=101650188300002&r=1&w=2

http://marc.theaimsgroup.com/?w=2&r=1&s=Proposal%3A+C%2B%2B+Language+Binding
+for+DOM+L&q=t



            4.0 IDOM
            =========
            4.1 Interface
            ==========

            4.1.1 Features of IDOM Interface
            --------------------------------------------------
            e.g. virtual IDOM_Element* IDOM_Document::createElement(const
XMLCh* tagName) = 0;

            1. Define as abstract base classes
            2. Use normal C++ pointers.
                => So that abstract base class is possible.
                => Make it more C++ like. Less Java like.


            4.1.2 Pros and Cons of IDOM Interface
            ----------------------------------------------------------
            Pros:
            1. Abstract base classes that correspond to the W3C DOM
interfaces
                => Can be recommended as Apache DOM C++ Binding
                => More standard like, no implementation assumed as they are
just abstract interfaces using pure virtual functions
            2. (Depends on users' preference)
                - someone prefers C++ like style

            Cons:
            1. IDOM_XXX - weird prefix 'I'
                Solution:
                    - Proposed to rename to DOMXXXX which also matches the
DOM Level 3 naming convention
            2. (Depends on users' preference)
                - someone does not like pointers, and wants Java-like
interface for ease to use, ease to learn and ease to port (from Java).
            3. As the old DOM interface has been around for a long time,
majority of current Xerces-C++ still uses the old DOM interface, significant
migration impact
                Solution:
                    - Announce the deprecation of old DOM interface for a
couple of releases before removal

            4.2 Implementation
            ===============
            4.2.1 Features of IDOM Implementation
            -----------------------------------------------------------
            1. Use an independent storage allocator per document. The
advantage here is that allocation would require no synchronization
                => Fast, good scalability, reduced memory footprint
            2. Use plain, null-terminated (XMLCh *) utf-16 strings.
                => No DOMString class overhead which is another performance
contributor that makes IDOM faster


            4.2.2 Downside of IDOM Implementation
            -------------------------------------------------------------
            1. Manual memory management
                - If document comes from parser, then parser owns the
document.  If document comes from DOMImplementation, then users are
responsible to delete it.
                Solution:
                    - Provide a means of disassociating a document from the
parser
                    - Add a function "Node::release()", similar to the idea
of "Range::detach", which allows users to indicate the release of the Node.
                        - From C++ Binding abstract interface perspective,
it's up to implementation how to handle this "release()" function.
                        - With Xerces-C++ IDOM implementation, the release()
function will delete the 'this' pointer if it is a document, else no-op.
            2. Memory retained until the document is deleted.
                - If you change the value of an attribute or call removeNode
many times,  the memory of the old value is not deallocated for reuse and
the document grows and grows
                Solution:
                    - This in fact is a tradeoff for the fast performance
offered by independent storage allocator.
                    - There is no immediate good solution in place


            5.0 old DOM
            ==========
            5.1 Interface
            ==========

            5.1.1 Features of old DOM Interface
            -----------------------------------------------------
            e.g. DOM_Element DOM_Document::createElement(const DOMString
tagName);

            1. Use smart pointers - Java-like


            5.1.2 Pros and Cons of old DOM Interface
            --------------------------------------------------------------
            Pros:
            1. DOM_XXX - reasonable name
            2. (Depends on users' preference)
                - someone wants Java-like interface for ease to use, ease to
learn and ease to port (from Java).
            3. Not that many users have migrated to IDOM yet, so migration
impact is minimal.

            Cons:
            1. Not abstract base class
                - Cannot be recommended as Apache DOM C++ Binding
                - Implementation (smart pointer indirection) is assumed
                Solution:
                    - This in fact is a tradeoff for the ease of use of
smart pointer design
                    - No solution.
            2. (Depends on users' preference)
                - someone wants C++-like as this is C++ interface


            5.2 Implementation
            ===============
            5.2.1 Features of old DOM Implementation
            ----------------------------------------------------------------
            1. Automatic memory management
                - Memory is released when there is no more handles pointing
to it
                - Use reference count to keep track of handles
            2. Use thread-safe DOMString class


            5.2.2 Downside of old DOM Implementation
            ----------------------------------------------------------------
----
            1. Performance is slow
                - Memory management is the biggest time consumer, and a lot
of memory footprint.
                - There are a whole lot of blocks allocated when creating a
document and then freed when finished with it. Each and every node requires
at least one and sometimes several separately allocated blocks. DOMString
take three. It adds up.
                Solution:
                    - Lenny suggests to use IDOM interface internally in DOM
implementation, patch in Bugzilla 5967
                    - Then the performance benefits of IDOM is gained but
the memory retained problem in IDOM implementation still remains to address.
                    - And internally, we will have dual interface
maintenance model as IDOM interface is then used by DOM internally.


            Vote Question:
            ============
            I would like to call for a vote:

                ==>  Which INTERFACE should be the Xerces-C++ public
supported W3C DOM Interface, DOM or IDOM? <===

            Note:
            1. The question is asking which "interface" to be officially
supported.  Once the choice of interface is chosen, we can discuss how to
solve the downside of implementation as the next topic.
            2. The one being voted will become the ONLY Xerces-C++ supported
public W3C DOM Interface, and is where the DOM Level 3 being implemented.
            3. The API of the other interface will be deprecated.  And its
samples, and associated Parser will eventually be removed from the
distribution


AW: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Markus Fellner <fe...@gimbio.de>.
O.k the main reaseon for my IDOM flirtation is...
I've chosen IDOM cause of its thread-safeness. And now I have several
thousands lines of code using IDOM interface.

Some other reasons are...
I have many IDOM_Element*  members (pm_Elem) in my classes. After parsing
they will be assigned one time and than many times checked if they are
really assigned and used for reading and writing attributes.

if (pm_Element)
    pm_Element->getAttribute(...);

How can I do this with references?

XMLCh* are easier to handle as DOMString-Objects in ATL :  CComBSTR cBstr =
pm_Element->getAttribute(...);
...

Sorry for my short answer. I go on holiday tomorrow  and i have to pack up!

I'm back in 2 weeks and looking forward to see the results of this voting.
It's a pitty to go during a hot discussion on this list.

Markus
  -----Ursprüngliche Nachricht-----
  Von: Lenny Hoffman [mailto:lennyhoffman@earthlink.net]
  Gesendet: Montag, 29. April 2002 23:54
  An: mf@gimbio.de; xerces-c-dev@xml.apache.org;
lenny.hoffman@objectivity.com
  Betreff: RE: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


  Hi Markus,

  To be clear, the fix I created for the IDOM was to recycle memory once a
node or string is no longer needed.   To know when a node is no longer
needed I used the original DOM interface, but have them wrapping up the IDOM
as the implementation.  IDOM performance is maintained, but ease of use is
greatly improved.  Without using the DOM handles to know when an IDOM node
is in use or not, application code will be drawn into explicitly stating
when a node is no longer needed and can be recycled, which is yet another
thing to be documented and to for application developers to get wrong and
suffer consequences for.

  If you love and use the IDOM for its performance, you want the memory
problem fixed so that it is really fixed, not a workaround that only works
if your application does everything right, then you will love what I have
done with combining DOM classes as handles, and IDOM classes as bodies.

  If what you love is working with pointers instead of with objects, please
let me know why.

  One thing I have found harder with objects vs.. pointers is down casting
from node to derived objects like element.  The syntax is a bit cleaner with
pointers; e.g.:

      DOM_Node node = ...
      DOM_Element elem =  (const DOM_Element&)node;

  vs:

      IDOM_Node* node = ..
      IDOM_Element* elem = (IDOM_Element*)node;

  It is easy to forget to add the const in the first case, and is somewhat
non-intuitive because slicing can happen, though it is not problem in this
case.

  To solve this problem I have thought of adding overloaded constructors and
assignment operators that take a DOM_Node to DOM_Node derived classes like
DOM_Element.  Thus the first example becomes:

      DOM_Node node = ...
      DOM_Element elem =  node;

  Not only is this code more succinct, but it is safer, as the overloaded
constructor and assignment operator can check for node compatibility via the
getNodeType call.

  Again, please let me know what other aspects of points make things easier
for you.

  > Hope your fix has no effects on thread-safe-ness!

  No affect whatsoever.

  Lenny
    -----Original Message-----
    From: Markus Fellner [mailto:fellner@gimbio.de]
    Sent: Monday, April 29, 2002 4:15 PM
    To: xerces-c-dev@xml.apache.org; lenny.hoffman@objectivity.com
    Subject: AW: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


    Hi Lenny,

    I hope your fix of the IDOM memory problem goes into the next official
release. But I use and love the IDOM interface.
    It's really easier for an old C++ programmer like me! And I use IDOM
cause of its threadsafe properties. Hope your fix has no effects on
thread-safe-ness!

    Markus

      -----Ursprüngliche Nachricht-----
      Von: Lenny Hoffman [mailto:lennyhoffman@earthlink.net]
      Gesendet: Montag, 29. April 2002 17:57
      An: xerces-c-dev@xml.apache.org; mf@gimbio.de
      Betreff: RE: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


      Hi Markus,

      The memory management problem solved by recycling no longer used nodes
and strings.  The only clean way I know to know when nodes and strings are
being used is to use the handle/body pattern, which is what is used by the
original DOM.  What I have done is use the original DOM handles and the IDOM
implementation, but fixed the IDOM memory problem.

      Lenny
        -----Original Message-----
        From: Markus Fellner [mailto:fellner@gimbio.de]
        Sent: Monday, April 29, 2002 10:54 AM
        To: xerces-c-dev@xml.apache.org
        Subject: AW: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


        If the memory management problem is solved, I prefer IDOM!!!
          -----Ursprüngliche Nachricht-----
          Von: Tinny Ng [mailto:tng-xml@ca.ibm.com]
          Gesendet: Montag, 29. April 2002 17:08
          An: xerces-c-dev@xml.apache.org
          Betreff: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


          Hi everyone,

          I've reviewed Andy's design objective of IDOM, Lenny's view of old
DOM and his proposal of redesign, and some users feedback.   Here is a
"quick" summary and I would like to call for a VOTE about the fate of these
two interfaces.

          1.0 Objective
          ==========
          1.  Define the strategy of Xerces-C++ public DOM interface.
Decide which one to keep, old DOM interface or new IDOM interface


          2.0 Motivation
          ===========
          1. As a long term strategy, Xerces-C++ shouldn't define two W3C
DOM interfaces which simply confuses users.
              => We've already got many users' questions about what the
difference, which one to use ... etc.
          2. With limited resource, we should focus our development on ONE
stream, no more duplicate effort
              => New DOM Level 3 development should be done on one
interface, not both.
              => No more dual maintenance: two set of samples (e.g. DOMPrint
vs IDOMPrint), two parsers (DOMParser vs IDOMParser)
          3. To better place Apache Xerces-C++ in the market, we should have
our Apache Recommended DOM C++ Binding in http://www.w3.org/DOM/Bindings
              => To encourage more users to develop DOM application AND
implementation based on this binding.
              => Such binding should just define a set of abstract base
classes (similar to JAVA interface) where no implementation model is assumed


          3.0 History
          =========
          'DOM' was the initial "W3C DOM interface" developed by Xerces-C++.
However the performance of its implementation is not quite satisfactory.

          Last year, Andy Heninger came up with a new design with faster
performance, and such implementation came with a new set of interface =>
'IDOM'.

          Currently both 'DOM' and 'IDOM' are shipped with Xerces-C++.
'IDOM' is claimed as experimental (like a prototype) and is subject to
change.

          More information can be found in :
          http://xml.apache.org/xerces-c/program.html
          http://www.apache.org/~andyh/
          http://marc.theaimsgroup.com/?t=101650188300002&r=1&w=2

http://marc.theaimsgroup.com/?w=2&r=1&s=Proposal%3A+C%2B%2B+Language+Binding
+for+DOM+L&q=t



          4.0 IDOM
          =========
          4.1 Interface
          ==========

          4.1.1 Features of IDOM Interface
          --------------------------------------------------
          e.g. virtual IDOM_Element* IDOM_Document::createElement(const
XMLCh* tagName) = 0;

          1. Define as abstract base classes
          2. Use normal C++ pointers.
              => So that abstract base class is possible.
              => Make it more C++ like. Less Java like.


          4.1.2 Pros and Cons of IDOM Interface
          ----------------------------------------------------------
          Pros:
          1. Abstract base classes that correspond to the W3C DOM interfaces
              => Can be recommended as Apache DOM C++ Binding
              => More standard like, no implementation assumed as they are
just abstract interfaces using pure virtual functions
          2. (Depends on users' preference)
              - someone prefers C++ like style

          Cons:
          1. IDOM_XXX - weird prefix 'I'
              Solution:
                  - Proposed to rename to DOMXXXX which also matches the DOM
Level 3 naming convention
          2. (Depends on users' preference)
              - someone does not like pointers, and wants Java-like
interface for ease to use, ease to learn and ease to port (from Java).
          3. As the old DOM interface has been around for a long time,
majority of current Xerces-C++ still uses the old DOM interface, significant
migration impact
              Solution:
                  - Announce the deprecation of old DOM interface for a
couple of releases before removal

          4.2 Implementation
          ===============
          4.2.1 Features of IDOM Implementation
          -----------------------------------------------------------
          1. Use an independent storage allocator per document. The
advantage here is that allocation would require no synchronization
              => Fast, good scalability, reduced memory footprint
          2. Use plain, null-terminated (XMLCh *) utf-16 strings.
              => No DOMString class overhead which is another performance
contributor that makes IDOM faster


          4.2.2 Downside of IDOM Implementation
          -------------------------------------------------------------
          1. Manual memory management
              - If document comes from parser, then parser owns the
document.  If document comes from DOMImplementation, then users are
responsible to delete it.
              Solution:
                  - Provide a means of disassociating a document from the
parser
                  - Add a function "Node::release()", similar to the idea of
"Range::detach", which allows users to indicate the release of the Node.
                      - From C++ Binding abstract interface perspective,
it's up to implementation how to handle this "release()" function.
                      - With Xerces-C++ IDOM implementation, the release()
function will delete the 'this' pointer if it is a document, else no-op.
          2. Memory retained until the document is deleted.
              - If you change the value of an attribute or call removeNode
many times,  the memory of the old value is not deallocated for reuse and
the document grows and grows
              Solution:
                  - This in fact is a tradeoff for the fast performance
offered by independent storage allocator.
                  - There is no immediate good solution in place


          5.0 old DOM
          ==========
          5.1 Interface
          ==========

          5.1.1 Features of old DOM Interface
          -----------------------------------------------------
          e.g. DOM_Element DOM_Document::createElement(const DOMString
tagName);

          1. Use smart pointers - Java-like


          5.1.2 Pros and Cons of old DOM Interface
          --------------------------------------------------------------
          Pros:
          1. DOM_XXX - reasonable name
          2. (Depends on users' preference)
              - someone wants Java-like interface for ease to use, ease to
learn and ease to port (from Java).
          3. Not that many users have migrated to IDOM yet, so migration
impact is minimal.

          Cons:
          1. Not abstract base class
              - Cannot be recommended as Apache DOM C++ Binding
              - Implementation (smart pointer indirection) is assumed
              Solution:
                  - This in fact is a tradeoff for the ease of use of smart
pointer design
                  - No solution.
          2. (Depends on users' preference)
              - someone wants C++-like as this is C++ interface


          5.2 Implementation
          ===============
          5.2.1 Features of old DOM Implementation
          ----------------------------------------------------------------
          1. Automatic memory management
              - Memory is released when there is no more handles pointing to
it
              - Use reference count to keep track of handles
          2. Use thread-safe DOMString class


          5.2.2 Downside of old DOM Implementation
          ------------------------------------------------------------------
--
          1. Performance is slow
              - Memory management is the biggest time consumer, and a lot of
memory footprint.
              - There are a whole lot of blocks allocated when creating a
document and then freed when finished with it. Each and every node requires
at least one and sometimes several separately allocated blocks. DOMString
take three. It adds up.
              Solution:
                  - Lenny suggests to use IDOM interface internally in DOM
implementation, patch in Bugzilla 5967
                  - Then the performance benefits of IDOM is gained but the
memory retained problem in IDOM implementation still remains to address.
                  - And internally, we will have dual interface maintenance
model as IDOM interface is then used by DOM internally.


          Vote Question:
          ============
          I would like to call for a vote:

              ==>  Which INTERFACE should be the Xerces-C++ public supported
W3C DOM Interface, DOM or IDOM? <===

          Note:
          1. The question is asking which "interface" to be officially
supported.  Once the choice of interface is chosen, we can discuss how to
solve the downside of implementation as the next topic.
          2. The one being voted will become the ONLY Xerces-C++ supported
public W3C DOM Interface, and is where the DOM Level 3 being implemented.
          3. The API of the other interface will be deprecated.  And its
samples, and associated Parser will eventually be removed from the
distribution


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Markus,

To be clear, the fix I created for the IDOM was to recycle memory once a
node or string is no longer needed.   To know when a node is no longer
needed I used the original DOM interface, but have them wrapping up the IDOM
as the implementation.  IDOM performance is maintained, but ease of use is
greatly improved.  Without using the DOM handles to know when an IDOM node
is in use or not, application code will be drawn into explicitly stating
when a node is no longer needed and can be recycled, which is yet another
thing to be documented and to for application developers to get wrong and
suffer consequences for.

If you love and use the IDOM for its performance, you want the memory
problem fixed so that it is really fixed, not a workaround that only works
if your application does everything right, then you will love what I have
done with combining DOM classes as handles, and IDOM classes as bodies.

If what you love is working with pointers instead of with objects, please
let me know why.

One thing I have found harder with objects vs.. pointers is down casting
from node to derived objects like element.  The syntax is a bit cleaner with
pointers; e.g.:

    DOM_Node node = ...
    DOM_Element elem =  (const DOM_Element&)node;

vs:

    IDOM_Node* node = ..
    IDOM_Element* elem = (IDOM_Element*)node;

It is easy to forget to add the const in the first case, and is somewhat
non-intuitive because slicing can happen, though it is not problem in this
case.

To solve this problem I have thought of adding overloaded constructors and
assignment operators that take a DOM_Node to DOM_Node derived classes like
DOM_Element.  Thus the first example becomes:

    DOM_Node node = ...
    DOM_Element elem =  node;

Not only is this code more succinct, but it is safer, as the overloaded
constructor and assignment operator can check for node compatibility via the
getNodeType call.

Again, please let me know what other aspects of points make things easier
for you.

> Hope your fix has no effects on thread-safe-ness!

No affect whatsoever.

Lenny
  -----Original Message-----
  From: Markus Fellner [mailto:fellner@gimbio.de]
  Sent: Monday, April 29, 2002 4:15 PM
  To: xerces-c-dev@xml.apache.org; lenny.hoffman@objectivity.com
  Subject: AW: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


  Hi Lenny,

  I hope your fix of the IDOM memory problem goes into the next official
release. But I use and love the IDOM interface.
  It's really easier for an old C++ programmer like me! And I use IDOM cause
of its threadsafe properties. Hope your fix has no effects on
thread-safe-ness!

  Markus

    -----Ursprüngliche Nachricht-----
    Von: Lenny Hoffman [mailto:lennyhoffman@earthlink.net]
    Gesendet: Montag, 29. April 2002 17:57
    An: xerces-c-dev@xml.apache.org; mf@gimbio.de
    Betreff: RE: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


    Hi Markus,

    The memory management problem solved by recycling no longer used nodes
and strings.  The only clean way I know to know when nodes and strings are
being used is to use the handle/body pattern, which is what is used by the
original DOM.  What I have done is use the original DOM handles and the IDOM
implementation, but fixed the IDOM memory problem.

    Lenny
      -----Original Message-----
      From: Markus Fellner [mailto:fellner@gimbio.de]
      Sent: Monday, April 29, 2002 10:54 AM
      To: xerces-c-dev@xml.apache.org
      Subject: AW: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


      If the memory management problem is solved, I prefer IDOM!!!
        -----Ursprüngliche Nachricht-----
        Von: Tinny Ng [mailto:tng-xml@ca.ibm.com]
        Gesendet: Montag, 29. April 2002 17:08
        An: xerces-c-dev@xml.apache.org
        Betreff: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


        Hi everyone,

        I've reviewed Andy's design objective of IDOM, Lenny's view of old
DOM and his proposal of redesign, and some users feedback.   Here is a
"quick" summary and I would like to call for a VOTE about the fate of these
two interfaces.

        1.0 Objective
        ==========
        1.  Define the strategy of Xerces-C++ public DOM interface.  Decide
which one to keep, old DOM interface or new IDOM interface


        2.0 Motivation
        ===========
        1. As a long term strategy, Xerces-C++ shouldn't define two W3C DOM
interfaces which simply confuses users.
            => We've already got many users' questions about what the
difference, which one to use ... etc.
        2. With limited resource, we should focus our development on ONE
stream, no more duplicate effort
            => New DOM Level 3 development should be done on one interface,
not both.
            => No more dual maintenance: two set of samples (e.g. DOMPrint
vs IDOMPrint), two parsers (DOMParser vs IDOMParser)
        3. To better place Apache Xerces-C++ in the market, we should have
our Apache Recommended DOM C++ Binding in http://www.w3.org/DOM/Bindings
            => To encourage more users to develop DOM application AND
implementation based on this binding.
            => Such binding should just define a set of abstract base
classes (similar to JAVA interface) where no implementation model is assumed


        3.0 History
        =========
        'DOM' was the initial "W3C DOM interface" developed by Xerces-C++.
However the performance of its implementation is not quite satisfactory.

        Last year, Andy Heninger came up with a new design with faster
performance, and such implementation came with a new set of interface =>
'IDOM'.

        Currently both 'DOM' and 'IDOM' are shipped with Xerces-C++.  'IDOM'
is claimed as experimental (like a prototype) and is subject to change.

        More information can be found in :
        http://xml.apache.org/xerces-c/program.html
        http://www.apache.org/~andyh/
        http://marc.theaimsgroup.com/?t=101650188300002&r=1&w=2

http://marc.theaimsgroup.com/?w=2&r=1&s=Proposal%3A+C%2B%2B+Language+Binding
+for+DOM+L&q=t



        4.0 IDOM
        =========
        4.1 Interface
        ==========

        4.1.1 Features of IDOM Interface
        --------------------------------------------------
        e.g. virtual IDOM_Element* IDOM_Document::createElement(const XMLCh*
tagName) = 0;

        1. Define as abstract base classes
        2. Use normal C++ pointers.
            => So that abstract base class is possible.
            => Make it more C++ like. Less Java like.


        4.1.2 Pros and Cons of IDOM Interface
        ----------------------------------------------------------
        Pros:
        1. Abstract base classes that correspond to the W3C DOM interfaces
            => Can be recommended as Apache DOM C++ Binding
            => More standard like, no implementation assumed as they are
just abstract interfaces using pure virtual functions
        2. (Depends on users' preference)
            - someone prefers C++ like style

        Cons:
        1. IDOM_XXX - weird prefix 'I'
            Solution:
                - Proposed to rename to DOMXXXX which also matches the DOM
Level 3 naming convention
        2. (Depends on users' preference)
            - someone does not like pointers, and wants Java-like interface
for ease to use, ease to learn and ease to port (from Java).
        3. As the old DOM interface has been around for a long time,
majority of current Xerces-C++ still uses the old DOM interface, significant
migration impact
            Solution:
                - Announce the deprecation of old DOM interface for a couple
of releases before removal

        4.2 Implementation
        ===============
        4.2.1 Features of IDOM Implementation
        -----------------------------------------------------------
        1. Use an independent storage allocator per document. The advantage
here is that allocation would require no synchronization
            => Fast, good scalability, reduced memory footprint
        2. Use plain, null-terminated (XMLCh *) utf-16 strings.
            => No DOMString class overhead which is another performance
contributor that makes IDOM faster


        4.2.2 Downside of IDOM Implementation
        -------------------------------------------------------------
        1. Manual memory management
            - If document comes from parser, then parser owns the document.
If document comes from DOMImplementation, then users are responsible to
delete it.
            Solution:
                - Provide a means of disassociating a document from the
parser
                - Add a function "Node::release()", similar to the idea of
"Range::detach", which allows users to indicate the release of the Node.
                    - From C++ Binding abstract interface perspective, it's
up to implementation how to handle this "release()" function.
                    - With Xerces-C++ IDOM implementation, the release()
function will delete the 'this' pointer if it is a document, else no-op.
        2. Memory retained until the document is deleted.
            - If you change the value of an attribute or call removeNode
many times,  the memory of the old value is not deallocated for reuse and
the document grows and grows
            Solution:
                - This in fact is a tradeoff for the fast performance
offered by independent storage allocator.
                - There is no immediate good solution in place


        5.0 old DOM
        ==========
        5.1 Interface
        ==========

        5.1.1 Features of old DOM Interface
        -----------------------------------------------------
        e.g. DOM_Element DOM_Document::createElement(const DOMString
tagName);

        1. Use smart pointers - Java-like


        5.1.2 Pros and Cons of old DOM Interface
        --------------------------------------------------------------
        Pros:
        1. DOM_XXX - reasonable name
        2. (Depends on users' preference)
            - someone wants Java-like interface for ease to use, ease to
learn and ease to port (from Java).
        3. Not that many users have migrated to IDOM yet, so migration
impact is minimal.

        Cons:
        1. Not abstract base class
            - Cannot be recommended as Apache DOM C++ Binding
            - Implementation (smart pointer indirection) is assumed
            Solution:
                - This in fact is a tradeoff for the ease of use of smart
pointer design
                - No solution.
        2. (Depends on users' preference)
            - someone wants C++-like as this is C++ interface


        5.2 Implementation
        ===============
        5.2.1 Features of old DOM Implementation
        ----------------------------------------------------------------
        1. Automatic memory management
            - Memory is released when there is no more handles pointing to
it
            - Use reference count to keep track of handles
        2. Use thread-safe DOMString class


        5.2.2 Downside of old DOM Implementation
        --------------------------------------------------------------------
        1. Performance is slow
            - Memory management is the biggest time consumer, and a lot of
memory footprint.
            - There are a whole lot of blocks allocated when creating a
document and then freed when finished with it. Each and every node requires
at least one and sometimes several separately allocated blocks. DOMString
take three. It adds up.
            Solution:
                - Lenny suggests to use IDOM interface internally in DOM
implementation, patch in Bugzilla 5967
                - Then the performance benefits of IDOM is gained but the
memory retained problem in IDOM implementation still remains to address.
                - And internally, we will have dual interface maintenance
model as IDOM interface is then used by DOM internally.


        Vote Question:
        ============
        I would like to call for a vote:

            ==>  Which INTERFACE should be the Xerces-C++ public supported
W3C DOM Interface, DOM or IDOM? <===

        Note:
        1. The question is asking which "interface" to be officially
supported.  Once the choice of interface is chosen, we can discuss how to
solve the downside of implementation as the next topic.
        2. The one being voted will become the ONLY Xerces-C++ supported
public W3C DOM Interface, and is where the DOM Level 3 being implemented.
        3. The API of the other interface will be deprecated.  And its
samples, and associated Parser will eventually be removed from the
distribution


AW: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Markus Fellner <fe...@gimbio.de>.
Hi Lenny,

I hope your fix of the IDOM memory problem goes into the next official
release. But I use and love the IDOM interface.
It's really easier for an old C++ programmer like me! And I use IDOM cause
of its threadsafe properties. Hope your fix has no effects on
thread-safe-ness!

Markus

  -----Ursprüngliche Nachricht-----
  Von: Lenny Hoffman [mailto:lennyhoffman@earthlink.net]
  Gesendet: Montag, 29. April 2002 17:57
  An: xerces-c-dev@xml.apache.org; mf@gimbio.de
  Betreff: RE: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


  Hi Markus,

  The memory management problem solved by recycling no longer used nodes and
strings.  The only clean way I know to know when nodes and strings are being
used is to use the handle/body pattern, which is what is used by the
original DOM.  What I have done is use the original DOM handles and the IDOM
implementation, but fixed the IDOM memory problem.

  Lenny
    -----Original Message-----
    From: Markus Fellner [mailto:fellner@gimbio.de]
    Sent: Monday, April 29, 2002 10:54 AM
    To: xerces-c-dev@xml.apache.org
    Subject: AW: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


    If the memory management problem is solved, I prefer IDOM!!!
      -----Ursprüngliche Nachricht-----
      Von: Tinny Ng [mailto:tng-xml@ca.ibm.com]
      Gesendet: Montag, 29. April 2002 17:08
      An: xerces-c-dev@xml.apache.org
      Betreff: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


      Hi everyone,

      I've reviewed Andy's design objective of IDOM, Lenny's view of old DOM
and his proposal of redesign, and some users feedback.   Here is a "quick"
summary and I would like to call for a VOTE about the fate of these two
interfaces.

      1.0 Objective
      ==========
      1.  Define the strategy of Xerces-C++ public DOM interface.  Decide
which one to keep, old DOM interface or new IDOM interface


      2.0 Motivation
      ===========
      1. As a long term strategy, Xerces-C++ shouldn't define two W3C DOM
interfaces which simply confuses users.
          => We've already got many users' questions about what the
difference, which one to use ... etc.
      2. With limited resource, we should focus our development on ONE
stream, no more duplicate effort
          => New DOM Level 3 development should be done on one interface,
not both.
          => No more dual maintenance: two set of samples (e.g. DOMPrint vs
IDOMPrint), two parsers (DOMParser vs IDOMParser)
      3. To better place Apache Xerces-C++ in the market, we should have our
Apache Recommended DOM C++ Binding in http://www.w3.org/DOM/Bindings
          => To encourage more users to develop DOM application AND
implementation based on this binding.
          => Such binding should just define a set of abstract base classes
(similar to JAVA interface) where no implementation model is assumed


      3.0 History
      =========
      'DOM' was the initial "W3C DOM interface" developed by Xerces-C++.
However the performance of its implementation is not quite satisfactory.

      Last year, Andy Heninger came up with a new design with faster
performance, and such implementation came with a new set of interface =>
'IDOM'.

      Currently both 'DOM' and 'IDOM' are shipped with Xerces-C++.  'IDOM'
is claimed as experimental (like a prototype) and is subject to change.

      More information can be found in :
      http://xml.apache.org/xerces-c/program.html
      http://www.apache.org/~andyh/
      http://marc.theaimsgroup.com/?t=101650188300002&r=1&w=2

http://marc.theaimsgroup.com/?w=2&r=1&s=Proposal%3A+C%2B%2B+Language+Binding
+for+DOM+L&q=t



      4.0 IDOM
      =========
      4.1 Interface
      ==========

      4.1.1 Features of IDOM Interface
      --------------------------------------------------
      e.g. virtual IDOM_Element* IDOM_Document::createElement(const XMLCh*
tagName) = 0;

      1. Define as abstract base classes
      2. Use normal C++ pointers.
          => So that abstract base class is possible.
          => Make it more C++ like. Less Java like.


      4.1.2 Pros and Cons of IDOM Interface
      ----------------------------------------------------------
      Pros:
      1. Abstract base classes that correspond to the W3C DOM interfaces
          => Can be recommended as Apache DOM C++ Binding
          => More standard like, no implementation assumed as they are just
abstract interfaces using pure virtual functions
      2. (Depends on users' preference)
          - someone prefers C++ like style

      Cons:
      1. IDOM_XXX - weird prefix 'I'
          Solution:
              - Proposed to rename to DOMXXXX which also matches the DOM
Level 3 naming convention
      2. (Depends on users' preference)
          - someone does not like pointers, and wants Java-like interface
for ease to use, ease to learn and ease to port (from Java).
      3. As the old DOM interface has been around for a long time, majority
of current Xerces-C++ still uses the old DOM interface, significant
migration impact
          Solution:
              - Announce the deprecation of old DOM interface for a couple
of releases before removal

      4.2 Implementation
      ===============
      4.2.1 Features of IDOM Implementation
      -----------------------------------------------------------
      1. Use an independent storage allocator per document. The advantage
here is that allocation would require no synchronization
          => Fast, good scalability, reduced memory footprint
      2. Use plain, null-terminated (XMLCh *) utf-16 strings.
          => No DOMString class overhead which is another performance
contributor that makes IDOM faster


      4.2.2 Downside of IDOM Implementation
      -------------------------------------------------------------
      1. Manual memory management
          - If document comes from parser, then parser owns the document.
If document comes from DOMImplementation, then users are responsible to
delete it.
          Solution:
              - Provide a means of disassociating a document from the parser
              - Add a function "Node::release()", similar to the idea of
"Range::detach", which allows users to indicate the release of the Node.
                  - From C++ Binding abstract interface perspective, it's up
to implementation how to handle this "release()" function.
                  - With Xerces-C++ IDOM implementation, the release()
function will delete the 'this' pointer if it is a document, else no-op.
      2. Memory retained until the document is deleted.
          - If you change the value of an attribute or call removeNode many
times,  the memory of the old value is not deallocated for reuse and the
document grows and grows
          Solution:
              - This in fact is a tradeoff for the fast performance offered
by independent storage allocator.
              - There is no immediate good solution in place


      5.0 old DOM
      ==========
      5.1 Interface
      ==========

      5.1.1 Features of old DOM Interface
      -----------------------------------------------------
      e.g. DOM_Element DOM_Document::createElement(const DOMString tagName);

      1. Use smart pointers - Java-like


      5.1.2 Pros and Cons of old DOM Interface
      --------------------------------------------------------------
      Pros:
      1. DOM_XXX - reasonable name
      2. (Depends on users' preference)
          - someone wants Java-like interface for ease to use, ease to learn
and ease to port (from Java).
      3. Not that many users have migrated to IDOM yet, so migration impact
is minimal.

      Cons:
      1. Not abstract base class
          - Cannot be recommended as Apache DOM C++ Binding
          - Implementation (smart pointer indirection) is assumed
          Solution:
              - This in fact is a tradeoff for the ease of use of smart
pointer design
              - No solution.
      2. (Depends on users' preference)
          - someone wants C++-like as this is C++ interface


      5.2 Implementation
      ===============
      5.2.1 Features of old DOM Implementation
      ----------------------------------------------------------------
      1. Automatic memory management
          - Memory is released when there is no more handles pointing to it
          - Use reference count to keep track of handles
      2. Use thread-safe DOMString class


      5.2.2 Downside of old DOM Implementation
      --------------------------------------------------------------------
      1. Performance is slow
          - Memory management is the biggest time consumer, and a lot of
memory footprint.
          - There are a whole lot of blocks allocated when creating a
document and then freed when finished with it. Each and every node requires
at least one and sometimes several separately allocated blocks. DOMString
take three. It adds up.
          Solution:
              - Lenny suggests to use IDOM interface internally in DOM
implementation, patch in Bugzilla 5967
              - Then the performance benefits of IDOM is gained but the
memory retained problem in IDOM implementation still remains to address.
              - And internally, we will have dual interface maintenance
model as IDOM interface is then used by DOM internally.


      Vote Question:
      ============
      I would like to call for a vote:

          ==>  Which INTERFACE should be the Xerces-C++ public supported W3C
DOM Interface, DOM or IDOM? <===

      Note:
      1. The question is asking which "interface" to be officially
supported.  Once the choice of interface is chosen, we can discuss how to
solve the downside of implementation as the next topic.
      2. The one being voted will become the ONLY Xerces-C++ supported
public W3C DOM Interface, and is where the DOM Level 3 being implemented.
      3. The API of the other interface will be deprecated.  And its
samples, and associated Parser will eventually be removed from the
distribution


RE: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Lenny Hoffman <le...@earthlink.net>.
Hi Markus,

The memory management problem solved by recycling no longer used nodes and
strings.  The only clean way I know to know when nodes and strings are being
used is to use the handle/body pattern, which is what is used by the
original DOM.  What I have done is use the original DOM handles and the IDOM
implementation, but fixed the IDOM memory problem.

Lenny
  -----Original Message-----
  From: Markus Fellner [mailto:fellner@gimbio.de]
  Sent: Monday, April 29, 2002 10:54 AM
  To: xerces-c-dev@xml.apache.org
  Subject: AW: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


  If the memory management problem is solved, I prefer IDOM!!!
    -----Ursprüngliche Nachricht-----
    Von: Tinny Ng [mailto:tng-xml@ca.ibm.com]
    Gesendet: Montag, 29. April 2002 17:08
    An: xerces-c-dev@xml.apache.org
    Betreff: Call for Vote: which one to be the Xerces-C++ public supported
W3C DOM interface


    Hi everyone,

    I've reviewed Andy's design objective of IDOM, Lenny's view of old DOM
and his proposal of redesign, and some users feedback.   Here is a "quick"
summary and I would like to call for a VOTE about the fate of these two
interfaces.

    1.0 Objective
    ==========
    1.  Define the strategy of Xerces-C++ public DOM interface.  Decide
which one to keep, old DOM interface or new IDOM interface


    2.0 Motivation
    ===========
    1. As a long term strategy, Xerces-C++ shouldn't define two W3C DOM
interfaces which simply confuses users.
        => We've already got many users' questions about what the
difference, which one to use ... etc.
    2. With limited resource, we should focus our development on ONE stream,
no more duplicate effort
        => New DOM Level 3 development should be done on one interface, not
both.
        => No more dual maintenance: two set of samples (e.g. DOMPrint vs
IDOMPrint), two parsers (DOMParser vs IDOMParser)
    3. To better place Apache Xerces-C++ in the market, we should have our
Apache Recommended DOM C++ Binding in http://www.w3.org/DOM/Bindings
        => To encourage more users to develop DOM application AND
implementation based on this binding.
        => Such binding should just define a set of abstract base classes
(similar to JAVA interface) where no implementation model is assumed


    3.0 History
    =========
    'DOM' was the initial "W3C DOM interface" developed by Xerces-C++.
However the performance of its implementation is not quite satisfactory.

    Last year, Andy Heninger came up with a new design with faster
performance, and such implementation came with a new set of interface =>
'IDOM'.

    Currently both 'DOM' and 'IDOM' are shipped with Xerces-C++.  'IDOM' is
claimed as experimental (like a prototype) and is subject to change.

    More information can be found in :
    http://xml.apache.org/xerces-c/program.html
    http://www.apache.org/~andyh/
    http://marc.theaimsgroup.com/?t=101650188300002&r=1&w=2

http://marc.theaimsgroup.com/?w=2&r=1&s=Proposal%3A+C%2B%2B+Language+Binding
+for+DOM+L&q=t



    4.0 IDOM
    =========
    4.1 Interface
    ==========

    4.1.1 Features of IDOM Interface
    --------------------------------------------------
    e.g. virtual IDOM_Element* IDOM_Document::createElement(const XMLCh*
tagName) = 0;

    1. Define as abstract base classes
    2. Use normal C++ pointers.
        => So that abstract base class is possible.
        => Make it more C++ like. Less Java like.


    4.1.2 Pros and Cons of IDOM Interface
    ----------------------------------------------------------
    Pros:
    1. Abstract base classes that correspond to the W3C DOM interfaces
        => Can be recommended as Apache DOM C++ Binding
        => More standard like, no implementation assumed as they are just
abstract interfaces using pure virtual functions
    2. (Depends on users' preference)
        - someone prefers C++ like style

    Cons:
    1. IDOM_XXX - weird prefix 'I'
        Solution:
            - Proposed to rename to DOMXXXX which also matches the DOM Level
3 naming convention
    2. (Depends on users' preference)
        - someone does not like pointers, and wants Java-like interface for
ease to use, ease to learn and ease to port (from Java).
    3. As the old DOM interface has been around for a long time, majority of
current Xerces-C++ still uses the old DOM interface, significant migration
impact
        Solution:
            - Announce the deprecation of old DOM interface for a couple of
releases before removal

    4.2 Implementation
    ===============
    4.2.1 Features of IDOM Implementation
    -----------------------------------------------------------
    1. Use an independent storage allocator per document. The advantage here
is that allocation would require no synchronization
        => Fast, good scalability, reduced memory footprint
    2. Use plain, null-terminated (XMLCh *) utf-16 strings.
        => No DOMString class overhead which is another performance
contributor that makes IDOM faster


    4.2.2 Downside of IDOM Implementation
    -------------------------------------------------------------
    1. Manual memory management
        - If document comes from parser, then parser owns the document.  If
document comes from DOMImplementation, then users are responsible to delete
it.
        Solution:
            - Provide a means of disassociating a document from the parser
            - Add a function "Node::release()", similar to the idea of
"Range::detach", which allows users to indicate the release of the Node.
                - From C++ Binding abstract interface perspective, it's up
to implementation how to handle this "release()" function.
                - With Xerces-C++ IDOM implementation, the release()
function will delete the 'this' pointer if it is a document, else no-op.
    2. Memory retained until the document is deleted.
        - If you change the value of an attribute or call removeNode many
times,  the memory of the old value is not deallocated for reuse and the
document grows and grows
        Solution:
            - This in fact is a tradeoff for the fast performance offered by
independent storage allocator.
            - There is no immediate good solution in place


    5.0 old DOM
    ==========
    5.1 Interface
    ==========

    5.1.1 Features of old DOM Interface
    -----------------------------------------------------
    e.g. DOM_Element DOM_Document::createElement(const DOMString tagName);

    1. Use smart pointers - Java-like


    5.1.2 Pros and Cons of old DOM Interface
    --------------------------------------------------------------
    Pros:
    1. DOM_XXX - reasonable name
    2. (Depends on users' preference)
        - someone wants Java-like interface for ease to use, ease to learn
and ease to port (from Java).
    3. Not that many users have migrated to IDOM yet, so migration impact is
minimal.

    Cons:
    1. Not abstract base class
        - Cannot be recommended as Apache DOM C++ Binding
        - Implementation (smart pointer indirection) is assumed
        Solution:
            - This in fact is a tradeoff for the ease of use of smart
pointer design
            - No solution.
    2. (Depends on users' preference)
        - someone wants C++-like as this is C++ interface


    5.2 Implementation
    ===============
    5.2.1 Features of old DOM Implementation
    ----------------------------------------------------------------
    1. Automatic memory management
        - Memory is released when there is no more handles pointing to it
        - Use reference count to keep track of handles
    2. Use thread-safe DOMString class


    5.2.2 Downside of old DOM Implementation
    --------------------------------------------------------------------
    1. Performance is slow
        - Memory management is the biggest time consumer, and a lot of
memory footprint.
        - There are a whole lot of blocks allocated when creating a document
and then freed when finished with it. Each and every node requires at least
one and sometimes several separately allocated blocks. DOMString take three.
It adds up.
        Solution:
            - Lenny suggests to use IDOM interface internally in DOM
implementation, patch in Bugzilla 5967
            - Then the performance benefits of IDOM is gained but the memory
retained problem in IDOM implementation still remains to address.
            - And internally, we will have dual interface maintenance model
as IDOM interface is then used by DOM internally.


    Vote Question:
    ============
    I would like to call for a vote:

        ==>  Which INTERFACE should be the Xerces-C++ public supported W3C
DOM Interface, DOM or IDOM? <===

    Note:
    1. The question is asking which "interface" to be officially supported.
Once the choice of interface is chosen, we can discuss how to solve the
downside of implementation as the next topic.
    2. The one being voted will become the ONLY Xerces-C++ supported public
W3C DOM Interface, and is where the DOM Level 3 being implemented.
    3. The API of the other interface will be deprecated.  And its samples,
and associated Parser will eventually be removed from the distribution


AW: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Posted by Markus Fellner <fe...@gimbio.de>.
If the memory management problem is solved, I prefer IDOM!!!
  -----Ursprüngliche Nachricht-----
  Von: Tinny Ng [mailto:tng-xml@ca.ibm.com]
  Gesendet: Montag, 29. April 2002 17:08
  An: xerces-c-dev@xml.apache.org
  Betreff: Call for Vote: which one to be the Xerces-C++ public supported
W3C DOM interface


  Hi everyone,

  I've reviewed Andy's design objective of IDOM, Lenny's view of old DOM and
his proposal of redesign, and some users feedback.   Here is a "quick"
summary and I would like to call for a VOTE about the fate of these two
interfaces.

  1.0 Objective
  ==========
  1.  Define the strategy of Xerces-C++ public DOM interface.  Decide which
one to keep, old DOM interface or new IDOM interface


  2.0 Motivation
  ===========
  1. As a long term strategy, Xerces-C++ shouldn't define two W3C DOM
interfaces which simply confuses users.
      => We've already got many users' questions about what the difference,
which one to use ... etc.
  2. With limited resource, we should focus our development on ONE stream,
no more duplicate effort
      => New DOM Level 3 development should be done on one interface, not
both.
      => No more dual maintenance: two set of samples (e.g. DOMPrint vs
IDOMPrint), two parsers (DOMParser vs IDOMParser)
  3. To better place Apache Xerces-C++ in the market, we should have our
Apache Recommended DOM C++ Binding in http://www.w3.org/DOM/Bindings
      => To encourage more users to develop DOM application AND
implementation based on this binding.
      => Such binding should just define a set of abstract base classes
(similar to JAVA interface) where no implementation model is assumed


  3.0 History
  =========
  'DOM' was the initial "W3C DOM interface" developed by Xerces-C++.
However the performance of its implementation is not quite satisfactory.

  Last year, Andy Heninger came up with a new design with faster
performance, and such implementation came with a new set of interface =>
'IDOM'.

  Currently both 'DOM' and 'IDOM' are shipped with Xerces-C++.  'IDOM' is
claimed as experimental (like a prototype) and is subject to change.

  More information can be found in :
  http://xml.apache.org/xerces-c/program.html
  http://www.apache.org/~andyh/
  http://marc.theaimsgroup.com/?t=101650188300002&r=1&w=2

http://marc.theaimsgroup.com/?w=2&r=1&s=Proposal%3A+C%2B%2B+Language+Binding
+for+DOM+L&q=t



  4.0 IDOM
  =========
  4.1 Interface
  ==========

  4.1.1 Features of IDOM Interface
  --------------------------------------------------
  e.g. virtual IDOM_Element* IDOM_Document::createElement(const XMLCh*
tagName) = 0;

  1. Define as abstract base classes
  2. Use normal C++ pointers.
      => So that abstract base class is possible.
      => Make it more C++ like. Less Java like.


  4.1.2 Pros and Cons of IDOM Interface
  ----------------------------------------------------------
  Pros:
  1. Abstract base classes that correspond to the W3C DOM interfaces
      => Can be recommended as Apache DOM C++ Binding
      => More standard like, no implementation assumed as they are just
abstract interfaces using pure virtual functions
  2. (Depends on users' preference)
      - someone prefers C++ like style

  Cons:
  1. IDOM_XXX - weird prefix 'I'
      Solution:
          - Proposed to rename to DOMXXXX which also matches the DOM Level 3
naming convention
  2. (Depends on users' preference)
      - someone does not like pointers, and wants Java-like interface for
ease to use, ease to learn and ease to port (from Java).
  3. As the old DOM interface has been around for a long time, majority of
current Xerces-C++ still uses the old DOM interface, significant migration
impact
      Solution:
          - Announce the deprecation of old DOM interface for a couple of
releases before removal

  4.2 Implementation
  ===============
  4.2.1 Features of IDOM Implementation
  -----------------------------------------------------------
  1. Use an independent storage allocator per document. The advantage here
is that allocation would require no synchronization
      => Fast, good scalability, reduced memory footprint
  2. Use plain, null-terminated (XMLCh *) utf-16 strings.
      => No DOMString class overhead which is another performance
contributor that makes IDOM faster


  4.2.2 Downside of IDOM Implementation
  -------------------------------------------------------------
  1. Manual memory management
      - If document comes from parser, then parser owns the document.  If
document comes from DOMImplementation, then users are responsible to delete
it.
      Solution:
          - Provide a means of disassociating a document from the parser
          - Add a function "Node::release()", similar to the idea of
"Range::detach", which allows users to indicate the release of the Node.
              - From C++ Binding abstract interface perspective, it's up to
implementation how to handle this "release()" function.
              - With Xerces-C++ IDOM implementation, the release() function
will delete the 'this' pointer if it is a document, else no-op.
  2. Memory retained until the document is deleted.
      - If you change the value of an attribute or call removeNode many
times,  the memory of the old value is not deallocated for reuse and the
document grows and grows
      Solution:
          - This in fact is a tradeoff for the fast performance offered by
independent storage allocator.
          - There is no immediate good solution in place


  5.0 old DOM
  ==========
  5.1 Interface
  ==========

  5.1.1 Features of old DOM Interface
  -----------------------------------------------------
  e.g. DOM_Element DOM_Document::createElement(const DOMString tagName);

  1. Use smart pointers - Java-like


  5.1.2 Pros and Cons of old DOM Interface
  --------------------------------------------------------------
  Pros:
  1. DOM_XXX - reasonable name
  2. (Depends on users' preference)
      - someone wants Java-like interface for ease to use, ease to learn and
ease to port (from Java).
  3. Not that many users have migrated to IDOM yet, so migration impact is
minimal.

  Cons:
  1. Not abstract base class
      - Cannot be recommended as Apache DOM C++ Binding
      - Implementation (smart pointer indirection) is assumed
      Solution:
          - This in fact is a tradeoff for the ease of use of smart pointer
design
          - No solution.
  2. (Depends on users' preference)
      - someone wants C++-like as this is C++ interface


  5.2 Implementation
  ===============
  5.2.1 Features of old DOM Implementation
  ----------------------------------------------------------------
  1. Automatic memory management
      - Memory is released when there is no more handles pointing to it
      - Use reference count to keep track of handles
  2. Use thread-safe DOMString class


  5.2.2 Downside of old DOM Implementation
  --------------------------------------------------------------------
  1. Performance is slow
      - Memory management is the biggest time consumer, and a lot of memory
footprint.
      - There are a whole lot of blocks allocated when creating a document
and then freed when finished with it. Each and every node requires at least
one and sometimes several separately allocated blocks. DOMString take three.
It adds up.
      Solution:
          - Lenny suggests to use IDOM interface internally in DOM
implementation, patch in Bugzilla 5967
          - Then the performance benefits of IDOM is gained but the memory
retained problem in IDOM implementation still remains to address.
          - And internally, we will have dual interface maintenance model as
IDOM interface is then used by DOM internally.


  Vote Question:
  ============
  I would like to call for a vote:

      ==>  Which INTERFACE should be the Xerces-C++ public supported W3C DOM
Interface, DOM or IDOM? <===

  Note:
  1. The question is asking which "interface" to be officially supported.
Once the choice of interface is chosen, we can discuss how to solve the
downside of implementation as the next topic.
  2. The one being voted will become the ONLY Xerces-C++ supported public
W3C DOM Interface, and is where the DOM Level 3 being implemented.
  3. The API of the other interface will be deprecated.  And its samples,
and associated Parser will eventually be removed from the distribution