You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Lenny Hoffman <le...@earthlink.net> on 2002/04/30 04:47:25 UTC

RE: Call for Vote: which one to be the Xerces-C++ public supported d W3C DOM interface

I just had a new thought; if having a DOMString class is desired, for
functionality and/or DOM compliance, then the smart pointer approach can
still be used by updating the IDOM classes to return DOMString instances
instead of XMLCh*.  With using smart pointers we would still only have one
set of interfaces to maintain, and performance would be negligibly affected
as I pointed out earlier that I modified DOMString to simply wrap an alias
to the node owned XMLCh* data, and only makes a copy if modified.

Lenny
  -----Original Message-----
  From: Lenny Hoffman [mailto:lennyhoffman@earthlink.net]
  Sent: Monday, April 29, 2002 9:37 PM
  To: xerces-c-dev@xml.apache.org
  Subject: RE: Call for Vote: which one to be the Xerces-C++ public supporte
d W3C DOM interface


  Hi Samar,

  You make good points.

  I would agree that it is reasonable to nix the DOMString, but does anyone
object to that given that DOMString is explicitly specified in the W3C DOM
specification?  Judging so far from the early responders to the vote, no, as
folks voting for the IDOM interface are also voting to nix the DOMString
class.

  (Tinny), do you anticipate the W3C to complain if the C++ binding does not
have a DOMString?  In other words, will we be able to call ourselves DOMx
compliant without it?

  One more consequence of using the smart pointer approach is that backwards
compatibility with the original DOM interfaces is sacrificed for backwards
compatibility with the IDOM interfaces.  I thought that with the original
DOM interfaces being officially supported and around longer that backwards
compatibility to it would be more important, but so far I no one using the
original DOM interface has spoken up.  For my use cases it simply doesn't
matter, what matters most to me is functional behavior and ease of use.

  Just to make it easier to review, here is the earlier example following
your suggestion to avoid using an int operator on node for null comparison:

  if (!pm_Element.isNull())
      pm_Element->getAttribute(...);

  Lenny

    -----Original Message-----
    From: Samar Lotia [mailto:slotia@siebel.com]
    Sent: Monday, April 29, 2002 7:59 PM
    To: 'xerces-c-dev@xml.apache.org'
    Subject: RE: Call for Vote: which one to be the Xerces-C++ public
supporte d W3C DOM interface


    If the desire is to maintain only one interface, then I would be of the
opinion that we should nix the DOMString class and use a 'smart pointer'
class to wrapper the internal interfaces. In many cases, people will likely
have their own preferred string class which they use and will immediately
convert the value extracted from the DOM before passing into any other layer
of their code.

    If we keep DOMString around, I would recommend against having a (const
XMLCh *) operator as this can result in some incredibly hard to track
errors. Most C++ style guides recommend against implicit conversion
operators. Note the lack of such an operator in the C++ standard library
string, i.e. std::basic_string<T>. Having something like rawBuffer, or
XMLCh() would be clearer and lets one control lifetimes in a much clearer
way (IMHO).

    Also, I would recommend against adding an int operator on the smart
pointer class. It is not that much work to call isNull on the object, and is
much clearer from a readability perspective as well as helps catch silly
errors at compile time. If we must have such an operator then it may be
better to add a bool operator instead of int, as this will likely reduce the
number of places where the implicit conversion operator will be called.

    My two bits...

    Samar Lotia
      -----Original Message-----
      From: Lenny Hoffman [mailto:lennyhoffman@earthlink.net]
      Sent: Monday, April 29, 2002 19:38
      To: xerces-c-dev@xml.apache.org
      Subject: RE: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


      Hi Markus,

      Thank you very much for the insight.

      Note that simply accessing the IDOM implementation via handles does
not affect its thread safety-ness, thus your application is safe.

      if (pm_Element)
          pm_Element->getAttribute(...);

      How can I do this with references?

      You do it with the current handles like this:

      if (!pm_Element.isNull())
          pm_Element.getAttribute(...);

      Adding an int operator to DOM_Node would allow even more friendly
syntax; e.g.

      if (pm_Element)
          pm_Element.getAttribute(...);

      This could be easily added.

      In fact, an -> operators could be added to the DOM_Node classes and
get this:

      if (pm_Element)
          pm_Element->getAttribute(...);

      This is now exactly what you started out with, thus is completely
backward compatible with your current use of the IDOM.


      XMLCh* are easier to handle as DOMString-Objects in ATL :  CComBSTR
cBstr = pm_Element->getAttribute(...);

      Good point, the current DOMString class does not have an XMLCh*
operator, which if it did would solve your problem.  I pretty much gutted
the original DOMString class to make it a simple wrapper around an XMLCh*
returned from IDOM implementations, in lieu of suffering the costs of a the
cross document string management of the original DOM.  As far as I can tell
the only reason the original DOMString did not have an XMLCh* operator was
because there was no guarantee that its internal XMLCh* was null terminated;
well, that guarantee does now exist and the operator can be added -- I will
do that.  So your example remains:

      CComBSTR cBstr = pm_Element->getAttribute(...);

      Note that string classes are convenient way to perform various
operations on a string without using the static (read functional) methods
provided by XMLString.  I even implemented COW (copy on write) behavior in
the new DOMString class, so that you can feel free to modify a string
returned from a node without having to manually make a copy.

      If folks don't find the DOMString wrapper to be that important, that
frees me up to simplify the handle classes and address one of Tinny's
concerns.  Tinny pointed out that while the new design hides dual interfaces
(DOM and IDOM) from users, it does not hide them from DOM developers;  as
DOM 3 support is added, each interface change would have to be made to both
DOM and IDOM classes.  The only reason I went with complete interface
replication instead of simple smart pointers for the handle classes was to
be able to translate XMLCh pointers returned from IDOM nodes into
DOMStrings.  If I am allowed to get rid of DOMString altogether I can make
the handle classes simple smart pointers that do not replicate IDOM
interfaces, and thus the duplication of effort is eliminated.

      Lenny

       -----Original Message-----
      From: Markus Fellner [mailto:fellner@gimbio.de]
      Sent: Monday, April 29, 2002 6:17 PM
      To: xerces-c-dev@xml.apache.org; lenny.hoffman@objectivity.com
      Subject: AW: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


        O.k the main reaseon for my IDOM flirtation is...
        I've chosen IDOM cause of its thread-safeness. And now I have
several thousands lines of code using IDOM interface.

        Some other reasons are...
        I have many IDOM_Element*  members (pm_Elem) in my classes. After
parsing they will be assigned one time and than many times checked if they
are really assigned and used for reading and writing attributes.

        if (pm_Element)
            pm_Element->getAttribute(...);

        How can I do this with references?

        XMLCh* are easier to handle as DOMString-Objects in ATL :  CComBSTR
cBstr = pm_Element->getAttribute(...);
        ...

        Sorry for my short answer. I go on holiday tomorrow  and i have to
pack up!

        I'm back in 2 weeks and looking forward to see the results of this
voting.
        It's a pitty to go during a hot discussion on this list.

        Markus
          -----Ursprüngliche Nachricht-----
          Von: Lenny Hoffman [mailto:lennyhoffman@earthlink.net]
          Gesendet: Montag, 29. April 2002 23:54
          An: mf@gimbio.de; xerces-c-dev@xml.apache.org;
lenny.hoffman@objectivity.com
          Betreff: RE: Call for Vote: which one to be the Xerces-C++ public
supported W3C DOM interface


          Hi Markus,

          To be clear, the fix I created for the IDOM was to recycle memory
once a node or string is no longer needed.   To know when a node is no
longer needed I used the original DOM interface, but have them wrapping up
the IDOM as the implementation.  IDOM performance is maintained, but ease of
use is greatly improved.  Without using the DOM handles to know when an IDOM
node is in use or not, application code will be drawn into explicitly
stating when a node is no longer needed and can be recycled, which is yet
another thing to be documented and to for application developers to get
wrong and suffer consequences for.

          If you love and use the IDOM for its performance, you want the
memory problem fixed so that it is really fixed, not a workaround that only
works if your application does everything right, then you will love what I
have done with combining DOM classes as handles, and IDOM classes as bodies.

          If what you love is working with pointers instead of with objects,
please let me know why.

          One thing I have found harder with objects vs.. pointers is down
casting from node to derived objects like element.  The syntax is a bit
cleaner with pointers; e.g.:

              DOM_Node node = ...
              DOM_Element elem =  (const DOM_Element&)node;

          vs:

              IDOM_Node* node = ..
              IDOM_Element* elem = (IDOM_Element*)node;

          It is easy to forget to add the const in the first case, and is
somewhat non-intuitive because slicing can happen, though it is not problem
in this case.

          To solve this problem I have thought of adding overloaded
constructors and assignment operators that take a DOM_Node to DOM_Node
derived classes like DOM_Element.  Thus the first example becomes:

              DOM_Node node = ...
              DOM_Element elem =  node;

          Not only is this code more succinct, but it is safer, as the
overloaded constructor and assignment operator can check for node
compatibility via the getNodeType call.

          Again, please let me know what other aspects of points make things
easier for you.

          > Hope your fix has no effects on thread-safe-ness!

          No affect whatsoever.

          Lenny
            -----Original Message-----
            From: Markus Fellner [mailto:fellner@gimbio.de]
            Sent: Monday, April 29, 2002 4:15 PM
            To: xerces-c-dev@xml.apache.org; lenny.hoffman@objectivity.com
            Subject: AW: Call for Vote: which one to be the Xerces-C++
public supported W3C DOM interface


            Hi Lenny,

            I hope your fix of the IDOM memory problem goes into the next
official release. But I use and love the IDOM interface.
            It's really easier for an old C++ programmer like me! And I use
IDOM cause of its threadsafe properties. Hope your fix has no effects on
thread-safe-ness!

            Markus

              -----Ursprüngliche Nachricht-----
              Von: Lenny Hoffman [mailto:lennyhoffman@earthlink.net]
              Gesendet: Montag, 29. April 2002 17:57
              An: xerces-c-dev@xml.apache.org; mf@gimbio.de
              Betreff: RE: Call for Vote: which one to be the Xerces-C++
public supported W3C DOM interface


              Hi Markus,

              The memory management problem solved by recycling no longer
used nodes and strings.  The only clean way I know to know when nodes and
strings are being used is to use the handle/body pattern, which is what is
used by the original DOM.  What I have done is use the original DOM handles
and the IDOM implementation, but fixed the IDOM memory problem.

              Lenny
                -----Original Message-----
                From: Markus Fellner [mailto:fellner@gimbio.de]
                Sent: Monday, April 29, 2002 10:54 AM
                To: xerces-c-dev@xml.apache.org
                Subject: AW: Call for Vote: which one to be the Xerces-C++
public supported W3C DOM interface


                If the memory management problem is solved, I prefer IDOM!!!
                  -----Ursprüngliche Nachricht-----
                  Von: Tinny Ng [mailto:tng-xml@ca.ibm.com]
                  Gesendet: Montag, 29. April 2002 17:08
                  An: xerces-c-dev@xml.apache.org
                  Betreff: Call for Vote: which one to be the Xerces-C++
public supported W3C DOM interface


                  Hi everyone,

                  I've reviewed Andy's design objective of IDOM, Lenny's
view of old DOM and his proposal of redesign, and some users feedback.
Here is a "quick" summary and I would like to call for a VOTE about the fate
of these two interfaces.

                  1.0 Objective
                  ==========
                  1.  Define the strategy of Xerces-C++ public DOM
interface.  Decide which one to keep, old DOM interface or new IDOM
interface


                  2.0 Motivation
                  ===========
                  1. As a long term strategy, Xerces-C++ shouldn't define
two W3C DOM interfaces which simply confuses users.
                      => We've already got many users' questions about what
the difference, which one to use ... etc.
                  2. With limited resource, we should focus our development
on ONE stream, no more duplicate effort
                      => New DOM Level 3 development should be done on one
interface, not both.
                      => No more dual maintenance: two set of samples (e.g.
DOMPrint vs IDOMPrint), two parsers (DOMParser vs IDOMParser)
                  3. To better place Apache Xerces-C++ in the market, we
should have our Apache Recommended DOM C++ Binding in
http://www.w3.org/DOM/Bindings
                      => To encourage more users to develop DOM application
AND implementation based on this binding.
                      => Such binding should just define a set of abstract
base classes (similar to JAVA interface) where no implementation model is
assumed


                  3.0 History
                  =========
                  'DOM' was the initial "W3C DOM interface" developed by
Xerces-C++.  However the performance of its implementation is not quite
satisfactory.

                  Last year, Andy Heninger came up with a new design with
faster performance, and such implementation came with a new set of interface
=> 'IDOM'.

                  Currently both 'DOM' and 'IDOM' are shipped with
Xerces-C++.  'IDOM' is claimed as experimental (like a prototype) and is
subject to change.

                  More information can be found in :
                  http://xml.apache.org/xerces-c/program.html
                  http://www.apache.org/~andyh/
                  http://marc.theaimsgroup.com/?t=101650188300002&r=1&w=2

http://marc.theaimsgroup.com/?w=2&r=1&s=Proposal%3A+C%2B%2B+Language+Binding
+for+DOM+L&q=t



                  4.0 IDOM
                  =========
                  4.1 Interface
                  ==========

                  4.1.1 Features of IDOM Interface
                  --------------------------------------------------
                  e.g. virtual IDOM_Element*
IDOM_Document::createElement(const XMLCh* tagName) = 0;

                  1. Define as abstract base classes
                  2. Use normal C++ pointers.
                      => So that abstract base class is possible.
                      => Make it more C++ like. Less Java like.


                  4.1.2 Pros and Cons of IDOM Interface
                  ----------------------------------------------------------
                  Pros:
                  1. Abstract base classes that correspond to the W3C DOM
interfaces
                      => Can be recommended as Apache DOM C++ Binding
                      => More standard like, no implementation assumed as
they are just abstract interfaces using pure virtual functions
                  2. (Depends on users' preference)
                      - someone prefers C++ like style

                  Cons:
                  1. IDOM_XXX - weird prefix 'I'
                      Solution:
                          - Proposed to rename to DOMXXXX which also matches
the DOM Level 3 naming convention
                  2. (Depends on users' preference)
                      - someone does not like pointers, and wants Java-like
interface for ease to use, ease to learn and ease to port (from Java).
                  3. As the old DOM interface has been around for a long
time, majority of current Xerces-C++ still uses the old DOM interface,
significant migration impact
                      Solution:
                          - Announce the deprecation of old DOM interface
for a couple of releases before removal

                  4.2 Implementation
                  ===============
                  4.2.1 Features of IDOM Implementation
                  ----------------------------------------------------------
-
                  1. Use an independent storage allocator per document. The
advantage here is that allocation would require no synchronization
                      => Fast, good scalability, reduced memory footprint
                  2. Use plain, null-terminated (XMLCh *) utf-16 strings.
                      => No DOMString class overhead which is another
performance contributor that makes IDOM faster


                  4.2.2 Downside of IDOM Implementation
                  ----------------------------------------------------------
---
                  1. Manual memory management
                      - If document comes from parser, then parser owns the
document.  If document comes from DOMImplementation, then users are
responsible to delete it.
                      Solution:
                          - Provide a means of disassociating a document
from the parser
                          - Add a function "Node::release()", similar to the
idea of "Range::detach", which allows users to indicate the release of the
Node.
                              - From C++ Binding abstract interface
perspective, it's up to implementation how to handle this "release()"
function.
                              - With Xerces-C++ IDOM implementation, the
release() function will delete the 'this' pointer if it is a document, else
no-op.
                  2. Memory retained until the document is deleted.
                      - If you change the value of an attribute or call
removeNode many times,  the memory of the old value is not deallocated for
reuse and the document grows and grows
                      Solution:
                          - This in fact is a tradeoff for the fast
performance offered by independent storage allocator.
                          - There is no immediate good solution in place


                  5.0 old DOM
                  ==========
                  5.1 Interface
                  ==========

                  5.1.1 Features of old DOM Interface
                  -----------------------------------------------------
                  e.g. DOM_Element DOM_Document::createElement(const
DOMString tagName);

                  1. Use smart pointers - Java-like


                  5.1.2 Pros and Cons of old DOM Interface
                  ----------------------------------------------------------
----
                  Pros:
                  1. DOM_XXX - reasonable name
                  2. (Depends on users' preference)
                      - someone wants Java-like interface for ease to use,
ease to learn and ease to port (from Java).
                  3. Not that many users have migrated to IDOM yet, so
migration impact is minimal.

                  Cons:
                  1. Not abstract base class
                      - Cannot be recommended as Apache DOM C++ Binding
                      - Implementation (smart pointer indirection) is
assumed
                      Solution:
                          - This in fact is a tradeoff for the ease of use
of smart pointer design
                          - No solution.
                  2. (Depends on users' preference)
                      - someone wants C++-like as this is C++ interface


                  5.2 Implementation
                  ===============
                  5.2.1 Features of old DOM Implementation
                  ----------------------------------------------------------
------
                  1. Automatic memory management
                      - Memory is released when there is no more handles
pointing to it
                      - Use reference count to keep track of handles
                  2. Use thread-safe DOMString class


                  5.2.2 Downside of old DOM Implementation
                  ----------------------------------------------------------
----------
                  1. Performance is slow
                      - Memory management is the biggest time consumer, and
a lot of memory footprint.
                      - There are a whole lot of blocks allocated when
creating a document and then freed when finished with it. Each and every
node requires at least one and sometimes several separately allocated
blocks. DOMString take three. It adds up.
                      Solution:
                          - Lenny suggests to use IDOM interface internally
in DOM implementation, patch in Bugzilla 5967
                          - Then the performance benefits of IDOM is gained
but the memory retained problem in IDOM implementation still remains to
address.
                          - And internally, we will have dual interface
maintenance model as IDOM interface is then used by DOM internally.


                  Vote Question:
                  ============
                  I would like to call for a vote:

                      ==>  Which INTERFACE should be the Xerces-C++ public
supported W3C DOM Interface, DOM or IDOM? <===

                  Note:
                  1. The question is asking which "interface" to be
officially supported.  Once the choice of interface is chosen, we can
discuss how to solve the downside of implementation as the next topic.
                  2. The one being voted will become the ONLY Xerces-C++
supported public W3C DOM Interface, and is where the DOM Level 3 being
implemented.
                  3. The API of the other interface will be deprecated.  And
its samples, and associated Parser will eventually be removed from the
distribution