You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Eric Sirianni <es...@stanford.edu> on 2003/11/18 22:56:56 UTC

Resolving Entities

I am trying to parse the following XML document using Xerces-J:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
	<!ENTITY chap1 SYSTEM "chap1.xml">
	<!ENTITY chap2 SYSTEM "chap2.xml">
]>
<book>
&chap1;
&chap2;
</book>

The parser seems to be having an issue resolving the locations of chap1.xml
and chap2.xml.  It appears to be looking for them in the directory from
which I ran java, instead of relative to the original XML doc.  Here is the
error I receive:

java.io.FileNotFoundException: C:\wherever-i-run-java\chap1.xml (The system
cannot find the file specified)

Clearly, I don't want to add full path names to these entity declarations.
The XML document above and chap1.xml and chap2.xml are all in the same
directory, so I would expect the parser to attempt to search there first...
no?

This must be a common issue, but I can't find any information on how to
resolve this...

Thanks,
Eric


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Resolving Entities

Posted by Michael Glavassevich <mr...@apache.org>.
Hi Maksym,

They're not really 'wild guesses' but I bet there's still some URI
bugs (that I don't know of) out there, so you neve know. This problem
seems to come up enough that it should probably be in the FAQs. I'll put
it on my TODO list. :-)

On Tue, 18 Nov 2003, Maksym Kovalenko wrote:

> Michael, how many exactly these wild guesses you have wrote in past
> let's say 3 months. I bet a dozen at least.
> I wanted to reply but thought that you probably have an e-mail template
> for this question ;-)
>
> Michael Glavassevich wrote:
>
> >Hi Eric,
> >
> >I'm going to take a wild guess and assume you're parsing your document
> >from an java.io.InputStream or org.xml.sax.InputSource (depending on the
> >API you're using). Relative URIs require a context for resolution. If
> >you're parsing with an InputSource you need to set the system ID on this
> >object. If you're parsing directly from an InputStream like with JAXP, you
> >need to call the parse method which also accepts a system ID. If you don't
> >do this the parser will just use the current working directory (the value
> >of the system property user.dir) as the base URI for resolution, which
> >in general won't be the desired behaviour.
> >
> >On Tue, 18 Nov 2003, Eric Sirianni wrote:
> >
> >
> >
> >>I am trying to parse the following XML document using Xerces-J:
> >>
> >><?xml version="1.0" encoding="ISO-8859-1"?>
> >><!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> >>"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
> >>	<!ENTITY chap1 SYSTEM "chap1.xml">
> >>	<!ENTITY chap2 SYSTEM "chap2.xml">
> >>]>
> >><book>
> >>&chap1;
> >>&chap2;
> >></book>
> >>
> >>The parser seems to be having an issue resolving the locations of chap1.xml
> >>and chap2.xml.  It appears to be looking for them in the directory from
> >>which I ran java, instead of relative to the original XML doc.  Here is the
> >>error I receive:
> >>
> >>java.io.FileNotFoundException: C:\wherever-i-run-java\chap1.xml (The system
> >>cannot find the file specified)
> >>
> >>Clearly, I don't want to add full path names to these entity declarations.
> >>The XML document above and chap1.xml and chap2.xml are all in the same
> >>directory, so I would expect the parser to attempt to search there first...
> >>no?
> >>
> >>This must be a common issue, but I can't find any information on how to
> >>resolve this...
> >>
> >>Thanks,
> >>Eric
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> >>For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> >>
> >>
> >
> >---------------------------
> >Michael Glavassevich
> >XML Parser Development
> >IBM Toronto Lab
> >E-mail: mrglavas@ca.ibm.com
> >E-mail: mrglavas@apache.org
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> >For additional commands, e-mail: xerces-j-user-help@xml.apache.org

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Resolving Entities

Posted by Maksym Kovalenko <mk...@marketswitch.com>.
Michael, how many exactly these wild guesses you have wrote in past 
let's say 3 months. I bet a dozen at least.
I wanted to reply but thought that you probably have an e-mail template 
for this question ;-)

Michael Glavassevich wrote:

>Hi Eric,
>
>I'm going to take a wild guess and assume you're parsing your document
>from an java.io.InputStream or org.xml.sax.InputSource (depending on the
>API you're using). Relative URIs require a context for resolution. If
>you're parsing with an InputSource you need to set the system ID on this
>object. If you're parsing directly from an InputStream like with JAXP, you
>need to call the parse method which also accepts a system ID. If you don't
>do this the parser will just use the current working directory (the value
>of the system property user.dir) as the base URI for resolution, which
>in general won't be the desired behaviour.
>
>On Tue, 18 Nov 2003, Eric Sirianni wrote:
>
>  
>
>>I am trying to parse the following XML document using Xerces-J:
>>
>><?xml version="1.0" encoding="ISO-8859-1"?>
>><!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
>>"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
>>	<!ENTITY chap1 SYSTEM "chap1.xml">
>>	<!ENTITY chap2 SYSTEM "chap2.xml">
>>]>
>><book>
>>&chap1;
>>&chap2;
>></book>
>>
>>The parser seems to be having an issue resolving the locations of chap1.xml
>>and chap2.xml.  It appears to be looking for them in the directory from
>>which I ran java, instead of relative to the original XML doc.  Here is the
>>error I receive:
>>
>>java.io.FileNotFoundException: C:\wherever-i-run-java\chap1.xml (The system
>>cannot find the file specified)
>>
>>Clearly, I don't want to add full path names to these entity declarations.
>>The XML document above and chap1.xml and chap2.xml are all in the same
>>directory, so I would expect the parser to attempt to search there first...
>>no?
>>
>>This must be a common issue, but I can't find any information on how to
>>resolve this...
>>
>>Thanks,
>>Eric
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
>>For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>>    
>>
>
>---------------------------
>Michael Glavassevich
>XML Parser Development
>IBM Toronto Lab
>E-mail: mrglavas@ca.ibm.com
>E-mail: mrglavas@apache.org
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>  
>

-- 
------------------------------------------------------------------------

Maksym Kovalenko
Software Engineer
Marketswitch Corporation
http://www.marketswitch.com <http://www.marketswitch.com/>
108 Powers Court, Suite 225
Dulles, VA 20166
Phone: +1 (703) 444-6750 ext. 302
Fax: +1 (703) 444-6812


Re: Resolving Entities

Posted by Bob Foster <bo...@objfac.com>.
Eric Sirianni wrote:

> Yes.  Here is the code I am using
> 
>    public static org.apache.lucene.document.Document preParseXMLFile(File
> xmlFile) {
>         try {
>             DOMParser parser = new DOMParser();   
>             FileReader reader = new FileReader(xmlFile);
>             parser.parse(new InputSource(reader));
>           }...
> 
> I'm using org.xml.sax.InputSource.  So if I set the system ID on the
> InputSource object to the fully resolved filename from the xmlFile object
> this should fix the problem?

Yup. The bugs I was referring to had to do with Xerces forgetting the 
base URI when an EntityResolver was used, etc. Michael gave a much more 
insightful answer.

Bob

> I am using xerces 2.5.0 by the way.
> 
> Thanks,
> Eric
> 
> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@apache.org] 
> Sent: Tuesday, November 18, 2003 2:11 PM
> To: xerces-j-user@xml.apache.org
> Subject: Re: Resolving Entities
> 
> 
> Hi Eric,
> 
> I'm going to take a wild guess and assume you're parsing your document from
> an java.io.InputStream or org.xml.sax.InputSource (depending on the API
> you're using). Relative URIs require a context for resolution. If you're
> parsing with an InputSource you need to set the system ID on this object. If
> you're parsing directly from an InputStream like with JAXP, you need to call
> the parse method which also accepts a system ID. If you don't do this the
> parser will just use the current working directory (the value of the system
> property user.dir) as the base URI for resolution, which in general won't be
> the desired behaviour.
> 
> On Tue, 18 Nov 2003, Eric Sirianni wrote:
> 
> 
>>I am trying to parse the following XML document using Xerces-J:
>>
>><?xml version="1.0" encoding="ISO-8859-1"?>
>><!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" 
>>"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
>>	<!ENTITY chap1 SYSTEM "chap1.xml">
>>	<!ENTITY chap2 SYSTEM "chap2.xml">
>>]>
>><book>
>>&chap1;
>>&chap2;
>></book>
>>
>>The parser seems to be having an issue resolving the locations of 
>>chap1.xml and chap2.xml.  It appears to be looking for them in the 
>>directory from which I ran java, instead of relative to the original 
>>XML doc.  Here is the error I receive:
>>
>>java.io.FileNotFoundException: C:\wherever-i-run-java\chap1.xml (The 
>>system cannot find the file specified)
>>
>>Clearly, I don't want to add full path names to these entity 
>>declarations. The XML document above and chap1.xml and chap2.xml are 
>>all in the same directory, so I would expect the parser to attempt to 
>>search there first... no?
>>
>>This must be a common issue, but I can't find any information on how 
>>to resolve this...
>>
>>Thanks,
>>Eric
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
>>For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 
> 
> ---------------------------
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: Resolving Entities

Posted by Michael Glavassevich <mr...@apache.org>.
On Tue, 18 Nov 2003, Eric Sirianni wrote:

> I'm using org.xml.sax.InputSource.  So if I set the system ID on the
> InputSource object to the fully resolved filename from the xmlFile object
> this should fix the problem?

Yes, this should fix the problem.

> I am using xerces 2.5.0 by the way.
>
> Thanks,
> Eric

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


RE: Resolving Entities

Posted by Eric Sirianni <es...@stanford.edu>.
Yes.  Here is the code I am using

   public static org.apache.lucene.document.Document preParseXMLFile(File
xmlFile) {
        try {
            DOMParser parser = new DOMParser();   
            FileReader reader = new FileReader(xmlFile);
            parser.parse(new InputSource(reader));
          }...

I'm using org.xml.sax.InputSource.  So if I set the system ID on the
InputSource object to the fully resolved filename from the xmlFile object
this should fix the problem?

I am using xerces 2.5.0 by the way.

Thanks,
Eric

-----Original Message-----
From: Michael Glavassevich [mailto:mrglavas@apache.org] 
Sent: Tuesday, November 18, 2003 2:11 PM
To: xerces-j-user@xml.apache.org
Subject: Re: Resolving Entities


Hi Eric,

I'm going to take a wild guess and assume you're parsing your document from
an java.io.InputStream or org.xml.sax.InputSource (depending on the API
you're using). Relative URIs require a context for resolution. If you're
parsing with an InputSource you need to set the system ID on this object. If
you're parsing directly from an InputStream like with JAXP, you need to call
the parse method which also accepts a system ID. If you don't do this the
parser will just use the current working directory (the value of the system
property user.dir) as the base URI for resolution, which in general won't be
the desired behaviour.

On Tue, 18 Nov 2003, Eric Sirianni wrote:

> I am trying to parse the following XML document using Xerces-J:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" 
> "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
> 	<!ENTITY chap1 SYSTEM "chap1.xml">
> 	<!ENTITY chap2 SYSTEM "chap2.xml">
> ]>
> <book>
> &chap1;
> &chap2;
> </book>
>
> The parser seems to be having an issue resolving the locations of 
> chap1.xml and chap2.xml.  It appears to be looking for them in the 
> directory from which I ran java, instead of relative to the original 
> XML doc.  Here is the error I receive:
>
> java.io.FileNotFoundException: C:\wherever-i-run-java\chap1.xml (The 
> system cannot find the file specified)
>
> Clearly, I don't want to add full path names to these entity 
> declarations. The XML document above and chap1.xml and chap2.xml are 
> all in the same directory, so I would expect the parser to attempt to 
> search there first... no?
>
> This must be a common issue, but I can't find any information on how 
> to resolve this...
>
> Thanks,
> Eric
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Resolving Entities

Posted by Michael Glavassevich <mr...@apache.org>.
Hi Eric,

I'm going to take a wild guess and assume you're parsing your document
from an java.io.InputStream or org.xml.sax.InputSource (depending on the
API you're using). Relative URIs require a context for resolution. If
you're parsing with an InputSource you need to set the system ID on this
object. If you're parsing directly from an InputStream like with JAXP, you
need to call the parse method which also accepts a system ID. If you don't
do this the parser will just use the current working directory (the value
of the system property user.dir) as the base URI for resolution, which
in general won't be the desired behaviour.

On Tue, 18 Nov 2003, Eric Sirianni wrote:

> I am trying to parse the following XML document using Xerces-J:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
> 	<!ENTITY chap1 SYSTEM "chap1.xml">
> 	<!ENTITY chap2 SYSTEM "chap2.xml">
> ]>
> <book>
> &chap1;
> &chap2;
> </book>
>
> The parser seems to be having an issue resolving the locations of chap1.xml
> and chap2.xml.  It appears to be looking for them in the directory from
> which I ran java, instead of relative to the original XML doc.  Here is the
> error I receive:
>
> java.io.FileNotFoundException: C:\wherever-i-run-java\chap1.xml (The system
> cannot find the file specified)
>
> Clearly, I don't want to add full path names to these entity declarations.
> The XML document above and chap1.xml and chap2.xml are all in the same
> directory, so I would expect the parser to attempt to search there first...
> no?
>
> This must be a common issue, but I can't find any information on how to
> resolve this...
>
> Thanks,
> Eric
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Resolving Entities

Posted by Bob Foster <bo...@objfac.com>.
What version are you running? This bug has popped up in one guise or 
another all over Xerces in the past, but I believe they have fixed it in 
the 2.5.0 version.

Bob Foster

Eric Sirianni wrote:

> I am trying to parse the following XML document using Xerces-J:
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
> 	<!ENTITY chap1 SYSTEM "chap1.xml">
> 	<!ENTITY chap2 SYSTEM "chap2.xml">
> ]>
> <book>
> &chap1;
> &chap2;
> </book>
> 
> The parser seems to be having an issue resolving the locations of chap1.xml
> and chap2.xml.  It appears to be looking for them in the directory from
> which I ran java, instead of relative to the original XML doc.  Here is the
> error I receive:
> 
> java.io.FileNotFoundException: C:\wherever-i-run-java\chap1.xml (The system
> cannot find the file specified)
> 
> Clearly, I don't want to add full path names to these entity declarations.
> The XML document above and chap1.xml and chap2.xml are all in the same
> directory, so I would expect the parser to attempt to search there first...
> no?
> 
> This must be a common issue, but I can't find any information on how to
> resolve this...
> 
> Thanks,
> Eric
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org