You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Khaled Noaman <kn...@ca.ibm.com> on 2002/05/08 15:50:32 UTC

Design discussion - DOM L3

Hi Everyone,

DOM Level 3 has introduced two new interfaces DOMInputSource and
DOMEntityResolver that are similar to the SAX ones. A new DOM parser,
DOMBuilder, which is introduced in DOM L3, is dependent on those 2 new
interfaces. Currently, the Xerces-C internals (XMLScanner, Reader, etc.)
use the SAX InputSource to process the xml data. If we were to add
support for the two new DOM interfaces, we need to make some design
changes to Xerces-C. I am asking for feedback/suggestions from the
xerces-c community on the best way to tackle that issue.

Issue:
------
Support for DOM L3 DOMInputSource and DOMEntityResolver requires some
design changes to the Xerces-C parser.

Background:
-----------
Currently, the Xerces-C internals (XMLScanner, Reader, etc.) use the SAX
InputSource to process the xml data.

The LocalFileInputSource, MemBufInputSource ... etc. provided in
framework are all derived from InputSource.

When resolving entities, the XMLEntityHandler uses a SAX EntityResolver
object to return a SAX InputSource to the XMLScanner.

Problem:
---------
In order to support DOM L3, both the XMLScanner and the XMLEntityHandler
must know how to talk to DOMInputSource and DOMEntityResolver.

Also the DOMInputSource users cannot directly use the
LocalFileInputSource, MemBufInputSource ... etc. provided in framework.

Possible solution scenarios:
----------------------------
1. Use a typedef
=================
Make DOMInputSource and DOMEntityResolver as typedef for InputSource and
EntityResolver (i.e. typedef InputSource DOMInputSource).

Pros:
- No change to the XMLScanner and others which still talk to SAX
Interface directly
- DOMInputSource users can use the framework inputsource directly

Cons:
- No actual interface for DOMEntityResolver/DOMInputSource and thus
cannot be used as forward declarations.
- The Xerces-C internal components (e.g. XMLScanner, XMLEntityHandler)
have a hard-coded dependency on the SAX interface.

2. InputSource Wrapper
======================
    InputSource
        |
        -- LocalFileInputSource
        |
        -- MemBufInputSource
        |
        -- etc.
        |
        -- DOMInputSourceWrapper(DOMInputSource*)

    DOMInputSource
        |
        -- InputSourceWrapper(InputSource*)

Implement two wrappers.

A DOMInputSourceWrapper which wraps a DOMInputSource to SAX InputSource
before talking to XMLScanner.

An InputSourceWrapper which wraps InputSource to DOMInputSource so that
framework input source (e.g. LocalFileInputSource, MemBufInputSource)
can be used.

Pros:
- No change to the XMLScanner and others which still talk to SAX
Interface directly
- DOMInputSource users can use the framework inputsource Indirectly
through wrapper
- Can have actual interface for DOMEntityResolver/DOMInputSource

Cons:
- The Xerces-C internal components (e.g. XMLScanner, XMLEntityHandler)
have a hard-coded dependency on the SAX interface
- Performance could be impacted if we have to create/delete all those
additional wrappers.

3. Use a generic XMLInputSource
================================
Use a generic input source class, XMLInputSource, and have InputSource
and DOMInputSource as children.

    XMLInputSource
        |
        -- InputSource
        |   |
        |   -- LocalFileInputSource
        |   |
        |   -- MemBufInputSource
        |   |
        |   -- etc.
        |
        -- DOMInputSource
            |
            -- InputSourceWrapper(InputSource*)

Modify the XMLScanner/XMLEntityHandler/etc. to talk to this generic
input source, XMLInputSource.

Implement one wrapper,  InputSourceWrapper which wraps the InputSource
to DOMInputSource so that framework inputsource (e.g.
LocalFileInputSource, MemBufInputSource) can be used.

Pros:
----
- The Xerces-C internal components (e.g. XMLScanner, XMLEntityHandler)
no longer have a hard-coded dependency on a particular interface.
- DOMInputSource users can use the framework inputsource indirectly
through wrapper
- Can have actual interface for DOMEntityResolver/DOMInputSource
- (No DOMInputSourceWrapper, one less wrapper compared to Sol'n 2)

Cons:
-----
- Significant changes to XMLScanner, XMLEntityHandler and internal
components
- Applications which use XMLEntityHandler externally are broken and need
to modify to talk to XMLInputSource instead of InputSource.
- Do not conform to the specs (InputSource/DOMInputSource are interfaces
and do not inherit from other classes according to the spec).
- Performance could be impacted if we have to create/delete all those
additional wrappers.

4. A variation to solution 3
============================
Similar to solution 3, but do not have InputSource and DOMInputSource as
children of XMLInputSource so that we are conformant.
We still use a generic input source class, XMLInputSource

    XMLInputSource
        |
        -- LocalFileInputSource
        |
        -- MemBufInputSource
        |
        -- etc.
                     |
        -- X_InputSourceWrapper(InputSource*)
        |
        -- X_DOMInputSourceWrapper(DOMInputSource*)

    InputSource
        |
        -- I_XMLInputSourceWrapper(XMLInputSource*)

    DOMInputSource
        |
        -- D_XMLInputSourceWrapper(XMLInputSource*)

Modify the XMLScanner/XMLEntityHandler/etc. to talk to this generic
input source, XMLInputSource.

All those inputsource classes defined in framework are changed to be
inherited from XMLInputSource.

InputSource and DOMInputSource are provided as a standalone interface.
Four wrappers are implemented in order for them to talk.

Pros:
----
- The Xerces-C internal components (e.g. XMLScanner, XMLEntityHandler)
no longer have a hard-coded dependency on a particular interface.
- A more elegant and cleaner design as the framework inputsource also
become neutral and no longer have a hard-coded dependency on a
particular interface.
- Conform to the specs

Cons:
-----
- Significant changes to XMLScanner, XMLEntityHandler and internal
components
- Applications which use XMLEntityHandler externally are broken and need
to modify to talk to XMLInputSource as well.
- Application which expect framework inputsource of type InputSource are
broken and need to modify to use the wrapper.
- Performance could be impacted if we have to create/delete all those
additional wrappers.


We would appreciate your suggestion/feedback.

Khaled


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Design discussion - DOM L3

Posted by Khaled Noaman <kn...@ca.ibm.com>.
Hi Rhys,

Currently, the SAX InputSource is used as a generic interface for a
general input source. The LocalFileInputSource, MemBufInputSource, etc.
are implementations of a specific type of an input source. DOM level 3
has introduced a new interface DOMBuilder, which is basically a DOM
parser. One way of parsing an XML document is using an input source,
hence the introduction of DOMInputSource which is very similar to the SAX
one (name prefixed with DOM). Also, an entity resolver interface
(DOMEntityResolver) was introduced so that users can plug their own and
return their own input source implementations (i.e. DB input source), or
use an existing one.

If we were to add those 2 new interfaces (DOMInputSource and
DOMEntityResolver), then how do we make them communicate with Xerces
internal components (i.e XMLScanner, XMLReader) that expect a SAX
InputSource implementation (i.e. LocalFileInputSource)? Also, how do you
make use of the input source framework classes (URLInputSource, etc.)
that are already provided by the library. Those framework classes are SAX
InputSource implementations. So, if I have my own DOMEntityResolver, and
I want to return a LocalFileInputSource object, I cannot do that since it
is not a DOMInputSource.

The wrapper classes allow us to indirectly use the framework classes. The
wrapper will delegate all requests to the wrapped input source.
Otherwise, we have to clone all those framework classes to be an
implementation of DOMInputSource.

I think that InputSource and DOMInputSource are similar interface, with
one introduced by SAX, and the other introcuded by DOM. They both have
similar functionality. One question that we can ask, is what happens if
we do not support DOMInputSource/DOMEntityResolver and instead use the
SAX ones. Would that be considered non-conformant?

Khaled

Rhys Black wrote:

> Hi Khaled,
>
> I am very definitely leaning towards recommendations
> three and four (it seems preferably four).  The pros
> are very much worth the cons on these.
>
> For one, it is blatantly unwise to have living code
> (code which one knows will be updated a lot in the
> future) impact multiple disparate implementations...
> if it happens that one must modify the base code, then
> one is then forced to consider the ramifications of
> such modifications on not one child, but two, or more.
>  A seperate implementation is very desirable, and one
> might conjecture that that's why it was called for in
> the DOM Level three implementation.  One would hope
> that the modularity of the code would warrant the
> additional coding time for such an enduring and
> much-used program such as this (I have been looking,
> but can find no other alternative to tying my server
> configuration to a microsoft one, other than this one,
> in using c++ as the codebase).
>
> In addition, as in writing library classes, which I
> have not previously done, myself, every concern for
> performance must be considered and weighted very
> heavily.  For this reason, and the sake of a more
> simple interface (no one likes having to remember more
> classes than they need to), it would be good to skip
> the wrapper classes and have the functionality
> directly implemented in the underlying class.  It is
> my thought that ofttimes wrapper classes are used as a
> sort of "patch" to make legacy code workable when the
> code should just be rewritten (this seems to happen a
> lot in c++, such as with char[]'s and the multitude of
> string wrapper classes, and downright entire new
> classes for an implementation which should work in the
> beginning).
>
> I know it's more work (probably much more work than I
> realize), but it seems that it is a question of...
> well... it doesn't work right now, so do you want to
> patch it, or do it right?  If this set of classes were
> only for a highly educated subset of users, then I
> would say hooray! just patch it.  In this case,
> however, seemingly being the only viable alternative
> to tying oneself irrevocably to microsoft products
> (and therefore losing the choice in the future and
> having to accept what they give us), it would seem to
> be the best choice to make this implementation as
> strong, reliable, and fast as possible - in any way
> possible - as well as making the code easier to
> maintain and modify, later.
>
> I know these are basic principles; I know that I may
> not fully understand the problem; I know that I don't
> have all that much experience in this field; but I
> also know that - if I were coding it, as I may someday
> apply to do - I would do my best to do it right (as
> has been done), then if I find later that I have to go
> back and redo something... well... I'd like to think
> that I would try to do it right the second time
> (because it's the right thing to do, and so that I
> will be less likely to have to modify it significantly
> a third time).
>
> These are just thoughts, take them for what you will.
>
> Have a nice day!
> ~Rhys Black
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Health - your guide to health and wellness
> http://health.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Design discussion - DOM L3

Posted by Rhys Black <pd...@yahoo.com>.
Hi Khaled,

I am very definitely leaning towards recommendations
three and four (it seems preferably four).  The pros
are very much worth the cons on these.

For one, it is blatantly unwise to have living code
(code which one knows will be updated a lot in the
future) impact multiple disparate implementations...
if it happens that one must modify the base code, then
one is then forced to consider the ramifications of
such modifications on not one child, but two, or more.
 A seperate implementation is very desirable, and one
might conjecture that that's why it was called for in
the DOM Level three implementation.  One would hope
that the modularity of the code would warrant the
additional coding time for such an enduring and
much-used program such as this (I have been looking,
but can find no other alternative to tying my server
configuration to a microsoft one, other than this one,
in using c++ as the codebase).

In addition, as in writing library classes, which I
have not previously done, myself, every concern for
performance must be considered and weighted very
heavily.  For this reason, and the sake of a more
simple interface (no one likes having to remember more
classes than they need to), it would be good to skip
the wrapper classes and have the functionality
directly implemented in the underlying class.  It is
my thought that ofttimes wrapper classes are used as a
sort of "patch" to make legacy code workable when the
code should just be rewritten (this seems to happen a
lot in c++, such as with char[]'s and the multitude of
string wrapper classes, and downright entire new
classes for an implementation which should work in the
beginning).

I know it's more work (probably much more work than I
realize), but it seems that it is a question of...
well... it doesn't work right now, so do you want to
patch it, or do it right?  If this set of classes were
only for a highly educated subset of users, then I
would say hooray! just patch it.  In this case,
however, seemingly being the only viable alternative
to tying oneself irrevocably to microsoft products
(and therefore losing the choice in the future and
having to accept what they give us), it would seem to
be the best choice to make this implementation as
strong, reliable, and fast as possible - in any way
possible - as well as making the code easier to
maintain and modify, later.

I know these are basic principles; I know that I may
not fully understand the problem; I know that I don't
have all that much experience in this field; but I
also know that - if I were coding it, as I may someday
apply to do - I would do my best to do it right (as
has been done), then if I find later that I have to go
back and redo something... well... I'd like to think
that I would try to do it right the second time
(because it's the right thing to do, and so that I
will be less likely to have to modify it significantly
a third time).

These are just thoughts, take them for what you will.

Have a nice day!
~Rhys Black

__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org