You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Sc...@lotus.com on 2001/03/12 07:36:31 UTC

Re: Potential Xerces regression (bug# 933)

OK.  First of all, I believe the problem with processing XercesJ 1.3.0 and
greater DOMs (which I guess is what xml-stylebook uses, though this makes
my skin itch... it ought to be using stream processing... I hope no one is
complaining about performance) is easily flagged with "build smoketest".
We really need to get "build smoketest" working properly with gump.  I'll
try and put some focus on this next week.

There are at least two major problems.  The first of which I've put a hack
in Xalan to work around, but I'm not sure if there is a work around for the
other problem, which happens *after* Xerces 1.3.0.  Neither of these are
reported in Bugzilla that I can see, so I'll try to do this tomorrow if I
get a chance.

Second of all, sorry for this long note.  I just want to make sure I have
all the information down in one place.

BTW, the testing I am doing in regard to this is all on the main branch,
with a latest source checkout tonight.

========
The first problem is that I believe Xerces at some point decided to use ""
instead of null for null namespaces.  There has been a discussion between
Gary Peskin in Joe Kesselman on xalan-dev about this, but I hadn't been
fully keeping up the the thread that well, and missed it's relation to this
problem.  I include some of the discussion at the end of this note.  I made
Xalan be able to compare a "" namespace to null for now, until we get this
resolved.

The gist of the discussion is:

>>1.  Declare the Xerces-J support of schemas to have a bug and ask that
>>Xerces be corrected to always use a null namespace URI to indicate that
>>there is no default namespace.  Even if the Xerces people change this
>>behavior, is this correct?
>
> Yes. If your description of the problem is accurate (you should probably
> submit a more detailed case so it can be reproduced in the lab), this is
a
> parser/DOM-builder bug.

========
The other problem is "DOM006 Hierarchy request error" when outputting to a
DOM.  For some very strange reason someone decided that
DocumentBuilder#newDocument() should add an element named "root" to the
Document it creates.  Then, when Xalan goes to add the first element out of
the transform to the Document element, you predictable get "DOM006
Hierarchy request error". In a unit test I do:

        DocumentBuilder docBuilder = dfactory.newDocumentBuilder();
        Node xmlDoc = docBuilder.parse(new InputSource("foo.xml"));
        org.w3c.dom.Document outNode = docBuilder.newDocument();
        transformer.transform(new DOMSource(xmlDoc, "foo.xml"),
                              new DOMResult(outNode));

In Xerces 1.2.3 and Xerces 1.3.0, DocumentBuilderImpl#newDocument()
[version 1.2] was (properly, I think) implemented as:

    public Document newDocument() {
        return(new org.apache.xerces.dom.DocumentImpl());
    }

In DocumentBuilderImpl#newDocument() [version 1.3] and on this is
implemented as:

DocumentBuilderImpl#newDocument() is implemented as:
    public Document newDocument() {
        DOMImplementation di = getDOMImplementation();
        // XXX What should the root element be named???
        String qName = "root";
        DocumentType docType = di.createDocumentType(qName, null, null);
        return di.createDocument(null, qName, docType);
    }

Weard.  Version 1.3 of DocumentBuilder has the CVS log:

----------------------------
Revision : 1.3
Date : 2001/2/3 0:28:59
Author : 'edwingo'
State : 'Exp'
Lines : +106 -78
Description :
Merged in from Xerces 2: implementation of parsing
component(javax.xml.parsers) of JAXP 1.1
----------------------------

So the same problem may exist in Xerces2, and I don't want to trace the
culprit before the merge.  Anyway, someone is poorly mistaken if they think
that newDocument() should create a magic root node.  I hope this can be
fixed as soon as possible.

I include the discussion about "" as a null namespace from xalan-dev after
my signature.

-scott


----- Forwarded by Scott Boag/CAM/Lotus on 03/12/2001 01:23 AM -----
                                                                                                                   
                    Gary L Peskin                                                                                  
                    <garyp@firste        To:     xalan-dev@xml.apache.org                                          
                    ch.com>              cc:     Xerces-J Development <xe...@xml.apache.org>, (bcc: Scott   
                                         Boag/CAM/Lotus)                                                           
                    03/02/2001           Subject:     Re: [Fwd: Xalan2 with Xerces1.3]                             
                    12:49 AM                                                                                       
                    Please                                                                                         
                    respond to                                                                                     
                    xalan-dev                                                                                      
                                                                                                                   
                                                                                                                   




Joseph_Kesselman@lotus.com wrote:
>
> Speaking as DOM WG alternate representative:

Thanks, Joe.  Your answers cleared things up for me.  I'll work up a
small test case to demonstrate the problem and submit it to the Xerces-J
list and bugzilla.

Gary


----- Forwarded by Scott Boag/CAM/Lotus on 03/12/2001 01:19 AM -----
                                                                                                                      
                    Joseph_Kesselman                                                                                  
                    @lotus.com              To:     xalan-dev@xml.apache.org                                          
                                            cc:     Xalan Development <xa...@xml.apache.org>, Xerces-J            
                    03/01/2001 10:31        Development <xe...@xml.apache.org>, (bcc: Scott Boag/CAM/Lotus)    
                    PM                      Subject:     Re: [Fwd: Xalan2 with Xerces1.3]                             
                    Please respond                                                                                    
                    to xalan-dev                                                                                      
                                                                                                                      
                                                                                                                      





Speaking as DOM WG alternate representative:

>the null namespace and the "" namespace.  The DOM Level 2 Core document
>states that these are two different namespaces

Yep. That wasn't an easy decision, but it really did seem to be the best
available answer since we didn't want to either force folks to test for
both or "automagically" convert one into the other. As you noted, either
would add overhead in order to suppress something which shouldn't be
allowed to arise in the first place.


>The XML Namespace recommendation indicates the the "" namespace URI is
>the same as the default namespace

Actually, it doesn't. The DOM WG checked that very carefully before we made
the above decision. The namespace spec is trying to say that an XML
namespace declaration with the empty-string value is special-cased as a
request to "undefine" the prefix and return to the default namespace, in
lieu of inventing another syntax or magic name for that case. It was _NOT_
intended to assert that the default namespace's name was the empty string.


>1.  Declare the Xerces-J support of schemas to have a bug and ask that
>Xerces be corrected to always use a null namespace URI to indicate that
>there is no default namespace.  Even if the Xerces people change this
>behavior, is this correct?

Yes. If your description of the problem is accurate (you should probably
submit a more detailed case so it can be reproduced in the lab), this is a
parser/DOM-builder bug.


>Will we have problems with other XML parsers?

Not unless they have similar bugs.

----- Forwarded by Scott Boag/CAM/Lotus on 03/12/2001 01:20 AM -----
                                                                                                                   
                    Gary L Peskin                                                                                  
                    <garyp@firste        To:     Xalan Development <xa...@xml.apache.org>, Xerces-J            
                    ch.com>              Development <xe...@xml.apache.org>                                 
                                         cc:     (bcc: Scott Boag/CAM/Lotus)                                       
                    03/01/2001           Subject:     [Fwd: Xalan2 with Xerces1.3]                                 
                    02:12 PM                                                                                       
                    Please                                                                                         
                    respond to                                                                                     
                    xalan-dev                                                                                      
                                                                                                                   
                                                                                                                   




Joe, Scott, Xerces people --

HELP!!  Somsak has submitted the problem below.  I have investigated and
found out the cause.  The short answer, I think, is a confusion between
the null namespace and the "" namespace.  The DOM Level 2 Core document
states that these are two different namespaces:

"Note that because the DOM does no lexical checking, the empty string
will be treated as a real namespace URI in DOM Level 2 methods.
Applications must use the value null as the namespaceURI parameter for
methods if they wish to have no namespace."

The source of Somsak's immediate problem is that, when a schema is
defined on the input XML document, Xerces creates a node with a ""
namespace URI.  When no schema is defined on the input XML document,
Xerces creates a document with a null namespace URI.

The XML Namespace recommendation indicates the the "" namespace URI is
the same as the default namespace
(http://www.w3.org/TR/1999/REC-xml-names-19990114/#defaulting):

"The default namespace can be set to the empty string. This has the same
effect, within the scope of the declaration, of there being no default
namespace. "

Xerces also uses a null namespace URI to indicate that there is no
default namespace.

As we parse and compile the XPath match pattern in

  <xsl:template match="Class">

we encode this a NodeTest with a null namespaceURI.  Several sections of
Xalan code test for and recognize a null namespaceURI as being the
default namespace.

Now, when we go to match the <Class> node in the input XML, our match
fails when schemas are used because we're matching our null namespaceURI
with the input DOM's namespaceURI of "".  This ends up invoking the
built-in template for <Class> which does an apply-template of the
children which adds a bunch of text strings into the result tree.  These
unparented text strings are what cause the DOM006 error.

If, on the other hand, schemas are not used, Xerces reports a null
namespaceURI in the input XML and our match works fine.

So, on the Xalan team, I guess we have a few options:
1.  Declare the Xerces-J support of schemas to have a bug and ask that
Xerces be corrected to always use a null namespace URI to indicate that
there is no default namespace.  Even if the Xerces people change this
behavior, is this correct?  Will we have problems with other XML
parsers?
2.  The reverse of the above and ask Xerces-J to always use an empty
string to indicate that there is no default namespace.  Same issues as
1.  Also will cause lot's of code changes in Xalan, I suspect.
3.  Have a new compare method for namespaces based on
NodeTest.subPartMatch which tests for either namespace being the empty
string and allowing that to compare equal to null.  This is probably the
most flexible but we will take a performance hit while attempting to
match template match patterns.  Since this is such a heavily used
section of the code, I'm hesitant to add additional path length but
there may be no way around it.

Thoughts?

Gary


Re: Potential Xerces regression (bug# 933)

Posted by Edwin Goei <Ed...@eng.sun.com>.
Scott_Boag@lotus.com wrote:
> 
> ========
> The other problem is "DOM006 Hierarchy request error" when outputting to a
> DOM.  For some very strange reason someone decided that
> DocumentBuilder#newDocument() should add an element named "root" to the
> Document it creates.  Then, when Xalan goes to add the first element out of
> the transform to the Document element, you predictable get "DOM006
> Hierarchy request error". In a unit test I do:
> 
>         DocumentBuilder docBuilder = dfactory.newDocumentBuilder();
>         Node xmlDoc = docBuilder.parse(new InputSource("foo.xml"));
>         org.w3c.dom.Document outNode = docBuilder.newDocument();
>         transformer.transform(new DOMSource(xmlDoc, "foo.xml"),
>                               new DOMResult(outNode));
> 
> In Xerces 1.2.3 and Xerces 1.3.0, DocumentBuilderImpl#newDocument()
> [version 1.2] was (properly, I think) implemented as:
> 
>     public Document newDocument() {
>         return(new org.apache.xerces.dom.DocumentImpl());
>     }
> 
> In DocumentBuilderImpl#newDocument() [version 1.3] and on this is
> implemented as:
> 
> DocumentBuilderImpl#newDocument() is implemented as:
>     public Document newDocument() {
>         DOMImplementation di = getDOMImplementation();
>         // XXX What should the root element be named???
>         String qName = "root";
>         DocumentType docType = di.createDocumentType(qName, null, null);
>         return di.createDocument(null, qName, docType);
>     }

I'll take responsibility for this bug.  The real problem, though, is
that DocumentBuilder.newDocument() was part of JAXP 1.0 which was based
on DOM Level 1 and so it had to be carried over to JAXP 1.1.  I was in a
hurry when implementing this method which should be deprecated since DOM
Level 2 came out.  Still, it should work for backwards compatibility, so
I'll check in a fix.

JAXP 1.1, which is based on DOM Level 2, exposes a new method to create
a DOM Document object to conform to DOM Level 2.  This is the preferred
method to create a DOM Level 2 tree:

DocumentBuilderFactory dbf = new DocumentBuilderFactory();
dbf.setNamespaceAware(true);  // needed b/c default value is false
DocumentBuilder db = dbf.newDocumentBuilder();
DOMImplementation di = db.getDOMImplementation();  // W3C DOM Impl
DocumentType dt = di.createDocumentType("pref:root", pubID, sysId);
Document doc = createDocument("http://someuri", "pref:root", dt);

Substitute appropriate values of the root node qname "pref:root" and
uri.

-Edwin

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: Potential Xerces regression (bug# 933)

Posted by Edwin Goei <Ed...@eng.sun.com>.
Scott_Boag@lotus.com wrote:
> 
> There are at least two major problems.  The first of which I've put a hack
> in Xalan to work around, but I'm not sure if there is a work around for the
> other problem, which happens *after* Xerces 1.3.0.  Neither of these are
> reported in Bugzilla that I can see, so I'll try to do this tomorrow if I
> get a chance.

I checked in a fix for both Xerces 1 and 2 for the second problem having
to do with JAXP.  Alternatively, the preferred fix would be to have the
application create a DOM Document via a DOMImplementation object as
outlined in my other posting.  Or at least, that seems to be what the
DOM WG intended for apps to create Document objects.

-Edwin

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: Potential Xerces regression (bug# 933)

Posted by Edwin Goei <Ed...@eng.sun.com>.
Scott_Boag@lotus.com wrote:
> 
> There are at least two major problems.  The first of which I've put a hack
> in Xalan to work around, but I'm not sure if there is a work around for the
> other problem, which happens *after* Xerces 1.3.0.  Neither of these are
> reported in Bugzilla that I can see, so I'll try to do this tomorrow if I
> get a chance.

I checked in a fix for both Xerces 1 and 2 for the second problem having
to do with JAXP.  Alternatively, the preferred fix would be to have the
application create a DOM Document via a DOMImplementation object as
outlined in my other posting.  Or at least, that seems to be what the
DOM WG intended for apps to create Document objects.

-Edwin

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Potential Xerces regression (bug# 933)

Posted by Edwin Goei <Ed...@eng.sun.com>.
Scott_Boag@lotus.com wrote:
> 
> ========
> The other problem is "DOM006 Hierarchy request error" when outputting to a
> DOM.  For some very strange reason someone decided that
> DocumentBuilder#newDocument() should add an element named "root" to the
> Document it creates.  Then, when Xalan goes to add the first element out of
> the transform to the Document element, you predictable get "DOM006
> Hierarchy request error". In a unit test I do:
> 
>         DocumentBuilder docBuilder = dfactory.newDocumentBuilder();
>         Node xmlDoc = docBuilder.parse(new InputSource("foo.xml"));
>         org.w3c.dom.Document outNode = docBuilder.newDocument();
>         transformer.transform(new DOMSource(xmlDoc, "foo.xml"),
>                               new DOMResult(outNode));
> 
> In Xerces 1.2.3 and Xerces 1.3.0, DocumentBuilderImpl#newDocument()
> [version 1.2] was (properly, I think) implemented as:
> 
>     public Document newDocument() {
>         return(new org.apache.xerces.dom.DocumentImpl());
>     }
> 
> In DocumentBuilderImpl#newDocument() [version 1.3] and on this is
> implemented as:
> 
> DocumentBuilderImpl#newDocument() is implemented as:
>     public Document newDocument() {
>         DOMImplementation di = getDOMImplementation();
>         // XXX What should the root element be named???
>         String qName = "root";
>         DocumentType docType = di.createDocumentType(qName, null, null);
>         return di.createDocument(null, qName, docType);
>     }

I'll take responsibility for this bug.  The real problem, though, is
that DocumentBuilder.newDocument() was part of JAXP 1.0 which was based
on DOM Level 1 and so it had to be carried over to JAXP 1.1.  I was in a
hurry when implementing this method which should be deprecated since DOM
Level 2 came out.  Still, it should work for backwards compatibility, so
I'll check in a fix.

JAXP 1.1, which is based on DOM Level 2, exposes a new method to create
a DOM Document object to conform to DOM Level 2.  This is the preferred
method to create a DOM Level 2 tree:

DocumentBuilderFactory dbf = new DocumentBuilderFactory();
dbf.setNamespaceAware(true);  // needed b/c default value is false
DocumentBuilder db = dbf.newDocumentBuilder();
DOMImplementation di = db.getDOMImplementation();  // W3C DOM Impl
DocumentType dt = di.createDocumentType("pref:root", pubID, sysId);
Document doc = createDocument("http://someuri", "pref:root", dt);

Substitute appropriate values of the root node qname "pref:root" and
uri.

-Edwin

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Potential Xerces regression (bug# 933)

Posted by Edwin Goei <Ed...@eng.sun.com>.
Scott_Boag@lotus.com wrote:
> 
> There are at least two major problems.  The first of which I've put a hack
> in Xalan to work around, but I'm not sure if there is a work around for the
> other problem, which happens *after* Xerces 1.3.0.  Neither of these are
> reported in Bugzilla that I can see, so I'll try to do this tomorrow if I
> get a chance.

I checked in a fix for both Xerces 1 and 2 for the second problem having
to do with JAXP.  Alternatively, the preferred fix would be to have the
application create a DOM Document via a DOMImplementation object as
outlined in my other posting.  Or at least, that seems to be what the
DOM WG intended for apps to create Document objects.

-Edwin

Re: Potential Xerces regression (bug# 933)

Posted by Edwin Goei <Ed...@eng.sun.com>.
Scott_Boag@lotus.com wrote:
> 
> ========
> The other problem is "DOM006 Hierarchy request error" when outputting to a
> DOM.  For some very strange reason someone decided that
> DocumentBuilder#newDocument() should add an element named "root" to the
> Document it creates.  Then, when Xalan goes to add the first element out of
> the transform to the Document element, you predictable get "DOM006
> Hierarchy request error". In a unit test I do:
> 
>         DocumentBuilder docBuilder = dfactory.newDocumentBuilder();
>         Node xmlDoc = docBuilder.parse(new InputSource("foo.xml"));
>         org.w3c.dom.Document outNode = docBuilder.newDocument();
>         transformer.transform(new DOMSource(xmlDoc, "foo.xml"),
>                               new DOMResult(outNode));
> 
> In Xerces 1.2.3 and Xerces 1.3.0, DocumentBuilderImpl#newDocument()
> [version 1.2] was (properly, I think) implemented as:
> 
>     public Document newDocument() {
>         return(new org.apache.xerces.dom.DocumentImpl());
>     }
> 
> In DocumentBuilderImpl#newDocument() [version 1.3] and on this is
> implemented as:
> 
> DocumentBuilderImpl#newDocument() is implemented as:
>     public Document newDocument() {
>         DOMImplementation di = getDOMImplementation();
>         // XXX What should the root element be named???
>         String qName = "root";
>         DocumentType docType = di.createDocumentType(qName, null, null);
>         return di.createDocument(null, qName, docType);
>     }

I'll take responsibility for this bug.  The real problem, though, is
that DocumentBuilder.newDocument() was part of JAXP 1.0 which was based
on DOM Level 1 and so it had to be carried over to JAXP 1.1.  I was in a
hurry when implementing this method which should be deprecated since DOM
Level 2 came out.  Still, it should work for backwards compatibility, so
I'll check in a fix.

JAXP 1.1, which is based on DOM Level 2, exposes a new method to create
a DOM Document object to conform to DOM Level 2.  This is the preferred
method to create a DOM Level 2 tree:

DocumentBuilderFactory dbf = new DocumentBuilderFactory();
dbf.setNamespaceAware(true);  // needed b/c default value is false
DocumentBuilder db = dbf.newDocumentBuilder();
DOMImplementation di = db.getDOMImplementation();  // W3C DOM Impl
DocumentType dt = di.createDocumentType("pref:root", pubID, sysId);
Document doc = createDocument("http://someuri", "pref:root", dt);

Substitute appropriate values of the root node qname "pref:root" and
uri.

-Edwin