You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Christian Roth <ro...@visualclick.de> on 2011/07/19 12:28:00 UTC
Relative URLs with protocol specifier in external entity resolution
Hello,
I am having an issue with relative URLs that specify their protocol in external entity declarations.
In short,
<!ENTITY ent SYSTEM "entity.xml">
resolves correctly, the semantically identical
<!ENTITY ent SYSTEM "file:entity.xml">
does not.
In the first case, Xerces correctly calculates the absolute path to entity.xml as being relative to the instance document's base path.
In the second case, Xerces does not - it looks like it assumes "file:entity.xml" is an absolute path and hands it verbatim to the systems entity resolver. This looks like a bug to me.
Here's a sample file set to reproduce the issue (put them all four at the same directory level):
-- "frame-good.xml" : the document which works (no protocol specified) --
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc SYSTEM "doc.dtd"
[
<!ENTITY ent SYSTEM "entity.xml">
]>
<doc>&ent;</doc>
-- eof --
-- "frame-bad.xml" : the document which does NOT work (protocol specified) --
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc SYSTEM "doc.dtd"
[
<!ENTITY ent SYSTEM "file:entity.xml">
]>
<doc>&ent;</doc>
-- eof --
-- "entity.xml" : the file included via entity ref --
<?xml version="1.0" encoding="UTF-8"?>
<dummy/>
-- eof --
-- "doc.dtd" : the DTD file to validate against --
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT doc (dummy) >
<!ELEMENT dummy EMPTY >
-- eof --
I am testing with Xerces J 2.11.0 and am using its samples.jar as follows:
java -classpath xercesImpl.jar:xercesSamples.jar:xml-apis.jar sax.Counter -v frame-good.xml
works,
java -classpath xercesImpl.jar:xercesSamples.jar:xml-apis.jar sax.Counter -v frame-bad.xml
does not but instead gives the following error:
error: Parse error occurred - entity.xml (No such file or directory)
java.io.FileNotFoundException: entity.xml (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at java.io.FileInputStream.<init>(FileInputStream.java:79)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at sax.Counter.main(Unknown Source)
Am I wrong or is Xerces wrong?
Kind regards
Christian
Re: Relative URLs with protocol specifier in external entity resolution
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Christian,
This is not a bug. "file:entity.xml" is already an absolute URI. Resolving
it against a base URI will always result in "file:entity.xml".
See the definition of an absolute URI [1] and the algorithm for relative
resolution [2] described in RFC 3986.
Thanks.
[1] http://tools.ietf.org/html/rfc3986#section-4.3
[2] http://tools.ietf.org/html/rfc3986#section-5.2
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
Christian Roth <ro...@visualclick.de> wrote on 07/19/2011 06:28:00 AM:
> Hello,
>
> I am having an issue with relative URLs that specify their protocol
> in external entity declarations.
>
> In short,
>
> <!ENTITY ent SYSTEM "entity.xml">
>
> resolves correctly, the semantically identical
>
> <!ENTITY ent SYSTEM "file:entity.xml">
>
> does not.
>
> In the first case, Xerces correctly calculates the absolute path to
> entity.xml as being relative to the instance document's base path.
>
> In the second case, Xerces does not - it looks like it assumes "
> file:entity.xml" is an absolute path and hands it verbatim to the
> systems entity resolver. This looks like a bug to me.
>
> Here's a sample file set to reproduce the issue (put them all four
> at the same directory level):
>
>
> -- "frame-good.xml" : the document which works (no protocol specified) --
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE doc SYSTEM "doc.dtd"
> [
> <!ENTITY ent SYSTEM "entity.xml">
> ]>
> <doc>&ent;</doc>
> -- eof --
>
>
> -- "frame-bad.xml" : the document which does NOT work (protocol
specified) --
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE doc SYSTEM "doc.dtd"
> [
> <!ENTITY ent SYSTEM "file:entity.xml">
> ]>
> <doc>&ent;</doc>
> -- eof --
>
>
> -- "entity.xml" : the file included via entity ref --
> <?xml version="1.0" encoding="UTF-8"?>
> <dummy/>
> -- eof --
>
>
> -- "doc.dtd" : the DTD file to validate against --
> <?xml version="1.0" encoding="UTF-8"?>
> <!ELEMENT doc (dummy) >
> <!ELEMENT dummy EMPTY >
> -- eof --
>
>
> I am testing with Xerces J 2.11.0 and am using its samples.jar as
follows:
>
> java -classpath xercesImpl.jar:xercesSamples.jar:xml-apis.jar
> sax.Counter -v frame-good.xml
>
> works,
>
> java -classpath xercesImpl.jar:xercesSamples.jar:xml-apis.jar
> sax.Counter -v frame-bad.xml
>
> does not but instead gives the following error:
>
> error: Parse error occurred - entity.xml (No such file or directory)
> java.io.FileNotFoundException: entity.xml (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.<init>(FileInputStream.java:120)
> at java.io.FileInputStream.<init>(FileInputStream.java:79)
> at sun.net.www.protocol.file.FileURLConnection.connect
> (FileURLConnection.java:70)
> at sun.net.www.protocol.file.FileURLConnection.getInputStream
> (FileURLConnection.java:161)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity
> (Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference
> (Unknown Source)
> at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl
> $FragmentContentDispatcher.dispatch(Unknown Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument
> (Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at sax.Counter.main(Unknown Source)
>
>
>
> Am I wrong or is Xerces wrong?
>
> Kind regards
> Christian
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org