You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by Klaus Malorny <Kl...@knipp.de> on 2000/02/11 14:12:06 UTC

Xerces-J: namespaces/(no) validation/schemes/off-line usage



Hi everybody,

on my search for a XML parser/DOM with namespace support, I discovered
Xerces-J. I made some simple tests with it and realized that it is not quite
useable for my purposes, because of some problems I encountered. Maybe you are
interested in and maybe you can give me some tips:

  - In my case, I don't need validation, since I have a fixed set of
    documents. So I tried to turn the validation off. But it still attempts
    to load the DTD and/or the schemes specified via DOCTYPE/the namespace
    URIs. This is a performance/memory overhead I would like to get rid of.

  - Since I want to use the original namespace URIs (e.g. for HTML), it
    tries to download stuff from the W3C server. But I would like to
    avoid the download for each document which is parsed, esp. as I want
    to work offline. Well, I got the idea to use an own implementation
    of the EntityResolver, which maps the URIs to local files (or to
    empty input streams). This works with the DTD/DOCTYPE, but
    unfortunately not for the namespace URIs. I tracked the problem down:
    The schema loader creates a new instance of an XML parser, and sets
    the EntityResolver of the parser to a default one, instead of using
    the EntityResolver which was used in the originating parser.

  - I know that DOM2 is still in draft phase, and therefore it makes sense
    that the org.w3c.dom packages still contain only DOM Level 1. On the
    other side, it would be much easier to switch from the DOM2 WD APIs
    to the final APIs instead from the xerces.dom.* APIs. A compile test
    where I "borrowed" the DOM2 WD interfaces from OpenXML showed that
    obviously Xerces already implements all current interfaces.
    So why don't you supply a build version, which includes the
    DOM2 WD APIs? You can mark all classes with "deprecated" and 
    corresponding comments, so that every user will notice the "draft"
    status.

Thanks for reading up to here :-)

regards,

Klaus Malorny

Re: Xerces-J: namespaces/(no) validation/schemes/off-line usage

Posted by Ralf Pfeiffer <rp...@apache.org>.
See below...

Klaus Malorny wrote:

> Hi everybody,
>
> on my search for a XML parser/DOM with namespace support, I discovered
> Xerces-J. I made some simple tests with it and realized that it is not quite
> useable for my purposes, because of some problems I encountered. Maybe you are
> interested in and maybe you can give me some tips:
>
>   - In my case, I don't need validation, since I have a fixed set of
>     documents. So I tried to turn the validation off. But it still attempts
>     to load the DTD and/or the schemes specified via DOCTYPE/the namespace
>     URIs. This is a performance/memory overhead I would like to get rid of.

The problem is that the DTD serves dual puposes of validation and entity mapping.
The XML spec leaves it up to the parser to decide whether you read the DTD when
not validating.

However, this is a legitimate feature request which has come up multiple times and
discussed multiple times. I think we will consider this for the future.

>
>
>   - Since I want to use the original namespace URIs (e.g. for HTML), it
>     tries to download stuff from the W3C server. But I would like to
>     avoid the download for each document which is parsed, esp. as I want
>     to work offline. Well, I got the idea to use an own implementation
>     of the EntityResolver, which maps the URIs to local files (or to
>     empty input streams). This works with the DTD/DOCTYPE, but
>     unfortunately not for the namespace URIs. I tracked the problem down:
>     The schema loader creates a new instance of an XML parser, and sets
>     the EntityResolver of the parser to a default one, instead of using
>     the EntityResolver which was used in the originating parser.

Another good point. I will look into this.

>
>
>   - I know that DOM2 is still in draft phase, and therefore it makes sense
>     that the org.w3c.dom packages still contain only DOM Level 1. On the
>     other side, it would be much easier to switch from the DOM2 WD APIs
>     to the final APIs instead from the xerces.dom.* APIs. A compile test
>     where I "borrowed" the DOM2 WD interfaces from OpenXML showed that
>     obviously Xerces already implements all current interfaces.
>     So why don't you supply a build version, which includes the
>     DOM2 WD APIs? You can mark all classes with "deprecated" and
>     corresponding comments, so that every user will notice the "draft"
>     status.

The current codebase has all the DOM2 and SAX2 interfaces in their
proper packages. You may extract the code or wait for the next release.


>
>
> Thanks for reading up to here :-)
>
> regards,
>
> Klaus Malorny

--
<person name="Ralf I. Pfeiffer" " email="rpfeiffe@apache.org" />