You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by Norman Walsh <nd...@nwalsh.com> on 2000/02/27 15:32:00 UTC

Xerces bug: base URI and external parsed entities

[I've just joined this list because it appears to be the place
where Xerces bug reports should be sent. If I'm mistaken, feel
free to correct me :-) ]

This is slightly convoluted, but it happens in real life, I swear:

If you have a DTD x.dtd that includes a PE x.mod and a DTD y.dtd
that redeclares x.mod, x-prime.mod, and then includes x.dtd by
PE ref, Xerces mistakenly attempts to load the redeclared
x-prime.mod from the directory where x.dtd occurs instead of the
directory where y.dtd occurs. This is an error.

I've put a small zip file that demonstrates this problem online
at http://nwalsh.com/xerces/parsetest.zip. Parse x.xml and y.xml
with a validating Xerces parser and you'll see that y.xml fails
to parse.

I'm using SAXCount from XML4J_3_0_0EA3 and Java 1.2.2.

I haven't (yet) tried this against the latest CVS of Xerces but
I will asap. (Though perhaps not before returning from X-Tech).

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <nd...@nwalsh.com>      | People often say that this or that
http://nwalsh.com/                 | person has not yet found himself.
                                   | But the self is not something one
                                   | finds, it is something one
                                   | creates.--Thomas Szasz


Re: Xerces bug: base URI and external parsed entities

Posted by Andy Clark <an...@apache.org>.
Norman Walsh wrote:
> Is it possible to access the URI of the document currently being
> parsed from XMLDTDScanner.scanEntityDecl()?

I think you can query the XMLEntityHandler for this information.
XMLDTDScanner has a reference to it.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: PATCH: Re: Xerces bug: base URI and external parsed entities

Posted by Norman Walsh <nd...@nwalsh.com>.
/ Norman Walsh <nd...@nwalsh.com> was heard to say:
| The following patch seems to fix the relative URI bug. If (one of)
| the Xerces maintainers deems it worthy, please check it in :-)

I hate to be a pest, but has anyone considered this patch? 

| Index: XMLDTDScanner.java
| ===================================================================
| RCS file: /home/cvspublic/xml-xerces/java/src/org/apache/xerces/framework/XMLDTD
| Scanner.java,v
| retrieving revision 1.4
| diff -r1.4 XMLDTDScanner.java
| 1200a1201,1219
| >
| >           // ndw@nwalsh.com
| >           //
| >           // An fSystemLiteral value from an entity declaration may be
| >           // a relative URI. If so, it's important that we make it
| >           // absolute with respect to the context of the document that
| >           // we are currently reading. If we don't, the XMLParser will
| >           // make it absolute with respect to the point of *reference*,
| >           // before attempting to read it. That's definitely wrong.
| >           //
| >           String litSystemId = fStringPool.toString(fSystemLiteral);
| >           String absSystemId = fEntityHandler.expandSystemId(litSystemId);
| >           if (!absSystemId.equals(litSystemId)) {
| >               // REVISIT - Is it kosher to touch fStringPool directly?
| >               // Is there a better way? fEntityReader doesn't seem to
| >               // have an addString method that takes a literal string.
| >               fSystemLiteral = fStringPool.addString(absSystemId);
| >           }
| >
| 2376a2396
| >

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <nd...@nwalsh.com>      | Man's sensitivity to little things
http://nwalsh.com/                 | and insensitivity to the greatest
                                   | are the signs of a strange
                                   | disorder.--Pascal


PATCH: Re: Xerces bug: base URI and external parsed entities

Posted by Norman Walsh <nd...@nwalsh.com>.
The following patch seems to fix the relative URI bug. If (one of)
the Xerces maintainers deems it worthy, please check it in :-)

Index: XMLDTDScanner.java
===================================================================
RCS file: /home/cvspublic/xml-xerces/java/src/org/apache/xerces/framework/XMLDTD
Scanner.java,v
retrieving revision 1.4
diff -r1.4 XMLDTDScanner.java
1200a1201,1219
>
>           // ndw@nwalsh.com
>           //
>           // An fSystemLiteral value from an entity declaration may be
>           // a relative URI. If so, it's important that we make it
>           // absolute with respect to the context of the document that
>           // we are currently reading. If we don't, the XMLParser will
>           // make it absolute with respect to the point of *reference*,
>           // before attempting to read it. That's definitely wrong.
>           //
>           String litSystemId = fStringPool.toString(fSystemLiteral);
>           String absSystemId = fEntityHandler.expandSystemId(litSystemId);
>           if (!absSystemId.equals(litSystemId)) {
>               // REVISIT - Is it kosher to touch fStringPool directly?
>               // Is there a better way? fEntityReader doesn't seem to
>               // have an addString method that takes a literal string.
>               fSystemLiteral = fStringPool.addString(absSystemId);
>           }
>
2376a2396
>

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <nd...@nwalsh.com>      | Nothing ever gets anywhere. The
http://nwalsh.com/                 | earth keeps turning round and gets
                                   | nowhere. The moment is the only
                                   | thing that counts.--Jean Cocteau


Re: Xerces bug: base URI and external parsed entities

Posted by Norman Walsh <nd...@nwalsh.com>.
/ Norman Walsh <nd...@nwalsh.com> was heard to say:
| If you have a DTD x.dtd that includes a PE x.mod and a DTD y.dtd
| that redeclares x.mod, x-prime.mod, and then includes x.dtd by
| PE ref, Xerces mistakenly attempts to load the redeclared
| x-prime.mod from the directory where x.dtd occurs instead of the
| directory where y.dtd occurs. This is an error.
[...]
| I haven't (yet) tried this against the latest CVS of Xerces but
| I will asap. (Though perhaps not before returning from X-Tech).

Alas, it does happen in the xerces that I got from CVS this
morning.  I really want to fix this, so I went digging. The
right answer, I think, is to make the URI absolute much earlier
in the process.  It absolutely has to be done before the
URI of the file that contained the declaration is lost.

One possible place to do this is in
XMLDTDScanner.scanEntityDecl.  At an even deeper level, it could
be done in XMLDTDScanner.scanSystemLiteral. (I don't know what
the benefits would be of doing the expansion before the string
goes in the StringPool. I guess that would be ideal, but I get
the feeling that may be too deep.)

Unfortunately, I'm a little lost in the architectural maze of
Xerces, so I'm going to ask what I hope is an easy question for
someone to answer:

Is it possible to access the URI of the document currently being
parsed from XMLDTDScanner.scanEntityDecl()?

If not, does anyone see another way to solve this?

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <nd...@nwalsh.com>      | The stone fell on the pitcher? Woe
http://nwalsh.com/                 | to the pitcher. The pitcher fell
                                   | on the stone? Woe to the
                                   | pitcher.--Rabbinic Sayning