You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Christian Roth <ro...@visualclick.de> on 2011/07/19 12:28:00 UTC

Relative URLs with protocol specifier in external entity resolution

Hello,

I am having an issue with relative URLs that specify their protocol in external entity declarations.

In short, 

  <!ENTITY ent SYSTEM "entity.xml">

resolves correctly, the semantically identical

  <!ENTITY ent SYSTEM "file:entity.xml">

does not.

In the first case, Xerces correctly calculates the absolute path to entity.xml as being relative to the instance document's base path.

In the second case, Xerces does not - it looks like it assumes "file:entity.xml" is an absolute path and hands it verbatim to the systems entity resolver. This looks like a bug to me.

Here's a sample file set to reproduce the issue (put them all four at the same directory level):


-- "frame-good.xml" : the document which works (no protocol specified) --
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc SYSTEM "doc.dtd"
[
<!ENTITY ent SYSTEM "entity.xml">
]>
<doc>&ent;</doc>
-- eof --


-- "frame-bad.xml" : the document which does NOT work (protocol specified) --
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc SYSTEM "doc.dtd"
[
<!ENTITY ent SYSTEM "file:entity.xml">
]>
<doc>&ent;</doc>
-- eof --


-- "entity.xml" : the file included via entity ref --
<?xml version="1.0" encoding="UTF-8"?>
<dummy/>
-- eof --


-- "doc.dtd" : the DTD file to validate against --
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT doc (dummy) >
<!ELEMENT dummy EMPTY >
-- eof --


I am testing with Xerces J 2.11.0 and am using its samples.jar as follows:

java -classpath xercesImpl.jar:xercesSamples.jar:xml-apis.jar sax.Counter -v frame-good.xml 

works,

java -classpath xercesImpl.jar:xercesSamples.jar:xml-apis.jar sax.Counter -v frame-bad.xml 

does not but instead gives the following error:

error: Parse error occurred - entity.xml (No such file or directory)
java.io.FileNotFoundException: entity.xml (No such file or directory)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:120)
	at java.io.FileInputStream.<init>(FileInputStream.java:79)
	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
	at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
	at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
	at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at sax.Counter.main(Unknown Source)



Am I wrong or is Xerces wrong?

Kind regards
Christian

Re: Relative URLs with protocol specifier in external entity resolution

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Christian,

This is not a bug. "file:entity.xml" is already an absolute URI. Resolving
it against a base URI will always result in "file:entity.xml".

See the definition of an absolute URI [1] and the algorithm for relative
resolution [2] described in RFC 3986.

Thanks.

[1] http://tools.ietf.org/html/rfc3986#section-4.3
[2] http://tools.ietf.org/html/rfc3986#section-5.2

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Christian Roth <ro...@visualclick.de> wrote on 07/19/2011 06:28:00 AM:

> Hello,
>
> I am having an issue with relative URLs that specify their protocol
> in external entity declarations.
>
> In short,
>
>   <!ENTITY ent SYSTEM "entity.xml">
>
> resolves correctly, the semantically identical
>
>   <!ENTITY ent SYSTEM "file:entity.xml">
>
> does not.
>
> In the first case, Xerces correctly calculates the absolute path to
> entity.xml as being relative to the instance document's base path.
>
> In the second case, Xerces does not - it looks like it assumes "
> file:entity.xml" is an absolute path and hands it verbatim to the
> systems entity resolver. This looks like a bug to me.
>
> Here's a sample file set to reproduce the issue (put them all four
> at the same directory level):
>
>
> -- "frame-good.xml" : the document which works (no protocol specified) --
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE doc SYSTEM "doc.dtd"
> [
> <!ENTITY ent SYSTEM "entity.xml">
> ]>
> <doc>&ent;</doc>
> -- eof --
>
>
> -- "frame-bad.xml" : the document which does NOT work (protocol
specified) --
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE doc SYSTEM "doc.dtd"
> [
> <!ENTITY ent SYSTEM "file:entity.xml">
> ]>
> <doc>&ent;</doc>
> -- eof --
>
>
> -- "entity.xml" : the file included via entity ref --
> <?xml version="1.0" encoding="UTF-8"?>
> <dummy/>
> -- eof --
>
>
> -- "doc.dtd" : the DTD file to validate against --
> <?xml version="1.0" encoding="UTF-8"?>
> <!ELEMENT doc (dummy) >
> <!ELEMENT dummy EMPTY >
> -- eof --
>
>
> I am testing with Xerces J 2.11.0 and am using its samples.jar as
follows:
>
> java -classpath xercesImpl.jar:xercesSamples.jar:xml-apis.jar
> sax.Counter -v frame-good.xml
>
> works,
>
> java -classpath xercesImpl.jar:xercesSamples.jar:xml-apis.jar
> sax.Counter -v frame-bad.xml
>
> does not but instead gives the following error:
>
> error: Parse error occurred - entity.xml (No such file or directory)
> java.io.FileNotFoundException: entity.xml (No such file or directory)
>    at java.io.FileInputStream.open(Native Method)
>    at java.io.FileInputStream.<init>(FileInputStream.java:120)
>    at java.io.FileInputStream.<init>(FileInputStream.java:79)
>    at sun.net.www.protocol.file.FileURLConnection.connect
> (FileURLConnection.java:70)
>    at sun.net.www.protocol.file.FileURLConnection.getInputStream
> (FileURLConnection.java:161)
>    at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity
> (Unknown Source)
>    at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
>    at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
>    at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference
> (Unknown Source)
>    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl
> $FragmentContentDispatcher.dispatch(Unknown Source)
>    at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument
> (Unknown Source)
>    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>    at sax.Counter.main(Unknown Source)
>
>
>
> Am I wrong or is Xerces wrong?
>
> Kind regards
> Christian
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org