You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by Norman Walsh <nd...@nwalsh.com> on 2000/03/22 15:42:28 UTC

Re: Xerces bug: base URI and external parsed entities

/ Norman Walsh <nd...@nwalsh.com> was heard to say:
| If you have a DTD x.dtd that includes a PE x.mod and a DTD y.dtd
| that redeclares x.mod, x-prime.mod, and then includes x.dtd by
| PE ref, Xerces mistakenly attempts to load the redeclared
| x-prime.mod from the directory where x.dtd occurs instead of the
| directory where y.dtd occurs. This is an error.
[...]
| I haven't (yet) tried this against the latest CVS of Xerces but
| I will asap. (Though perhaps not before returning from X-Tech).

Alas, it does happen in the xerces that I got from CVS this
morning.  I really want to fix this, so I went digging. The
right answer, I think, is to make the URI absolute much earlier
in the process.  It absolutely has to be done before the
URI of the file that contained the declaration is lost.

One possible place to do this is in
XMLDTDScanner.scanEntityDecl.  At an even deeper level, it could
be done in XMLDTDScanner.scanSystemLiteral. (I don't know what
the benefits would be of doing the expansion before the string
goes in the StringPool. I guess that would be ideal, but I get
the feeling that may be too deep.)

Unfortunately, I'm a little lost in the architectural maze of
Xerces, so I'm going to ask what I hope is an easy question for
someone to answer:

Is it possible to access the URI of the document currently being
parsed from XMLDTDScanner.scanEntityDecl()?

If not, does anyone see another way to solve this?

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <nd...@nwalsh.com>      | The stone fell on the pitcher? Woe
http://nwalsh.com/                 | to the pitcher. The pitcher fell
                                   | on the stone? Woe to the
                                   | pitcher.--Rabbinic Sayning


Re: Xerces bug: base URI and external parsed entities

Posted by Andy Clark <an...@apache.org>.
Norman Walsh wrote:
> Is it possible to access the URI of the document currently being
> parsed from XMLDTDScanner.scanEntityDecl()?

I think you can query the XMLEntityHandler for this information.
XMLDTDScanner has a reference to it.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: PATCH: Re: Xerces bug: base URI and external parsed entities

Posted by Norman Walsh <nd...@nwalsh.com>.
/ Norman Walsh <nd...@nwalsh.com> was heard to say:
| The following patch seems to fix the relative URI bug. If (one of)
| the Xerces maintainers deems it worthy, please check it in :-)

I hate to be a pest, but has anyone considered this patch? 

| Index: XMLDTDScanner.java
| ===================================================================
| RCS file: /home/cvspublic/xml-xerces/java/src/org/apache/xerces/framework/XMLDTD
| Scanner.java,v
| retrieving revision 1.4
| diff -r1.4 XMLDTDScanner.java
| 1200a1201,1219
| >
| >           // ndw@nwalsh.com
| >           //
| >           // An fSystemLiteral value from an entity declaration may be
| >           // a relative URI. If so, it's important that we make it
| >           // absolute with respect to the context of the document that
| >           // we are currently reading. If we don't, the XMLParser will
| >           // make it absolute with respect to the point of *reference*,
| >           // before attempting to read it. That's definitely wrong.
| >           //
| >           String litSystemId = fStringPool.toString(fSystemLiteral);
| >           String absSystemId = fEntityHandler.expandSystemId(litSystemId);
| >           if (!absSystemId.equals(litSystemId)) {
| >               // REVISIT - Is it kosher to touch fStringPool directly?
| >               // Is there a better way? fEntityReader doesn't seem to
| >               // have an addString method that takes a literal string.
| >               fSystemLiteral = fStringPool.addString(absSystemId);
| >           }
| >
| 2376a2396
| >

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <nd...@nwalsh.com>      | Man's sensitivity to little things
http://nwalsh.com/                 | and insensitivity to the greatest
                                   | are the signs of a strange
                                   | disorder.--Pascal


PATCH: Re: Xerces bug: base URI and external parsed entities

Posted by Norman Walsh <nd...@nwalsh.com>.
The following patch seems to fix the relative URI bug. If (one of)
the Xerces maintainers deems it worthy, please check it in :-)

Index: XMLDTDScanner.java
===================================================================
RCS file: /home/cvspublic/xml-xerces/java/src/org/apache/xerces/framework/XMLDTD
Scanner.java,v
retrieving revision 1.4
diff -r1.4 XMLDTDScanner.java
1200a1201,1219
>
>           // ndw@nwalsh.com
>           //
>           // An fSystemLiteral value from an entity declaration may be
>           // a relative URI. If so, it's important that we make it
>           // absolute with respect to the context of the document that
>           // we are currently reading. If we don't, the XMLParser will
>           // make it absolute with respect to the point of *reference*,
>           // before attempting to read it. That's definitely wrong.
>           //
>           String litSystemId = fStringPool.toString(fSystemLiteral);
>           String absSystemId = fEntityHandler.expandSystemId(litSystemId);
>           if (!absSystemId.equals(litSystemId)) {
>               // REVISIT - Is it kosher to touch fStringPool directly?
>               // Is there a better way? fEntityReader doesn't seem to
>               // have an addString method that takes a literal string.
>               fSystemLiteral = fStringPool.addString(absSystemId);
>           }
>
2376a2396
>

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <nd...@nwalsh.com>      | Nothing ever gets anywhere. The
http://nwalsh.com/                 | earth keeps turning round and gets
                                   | nowhere. The moment is the only
                                   | thing that counts.--Jean Cocteau