You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by "Iantosca, Jonathan" <ji...@tiaa-cref.org> on 2003/12/28 08:04:27 UTC

FW: Digester & doctype declaration

Hello,

I'm trying to digest an xml file with the following doctype declaration.

<!DOCTYPE adaptor SYSTEM "woadaptor.dtd">

I keep getting a java.net.UnknownHostException when this declaration is in
the xml document. As soon as I remove it, the digester has no problems.
Also, before parsing, I'm calling the Digester's setValidating method,
passing in false.

Any Thoughts?

-Jon



**************************************************************
This message, including any attachments, contains confidential information intended for a specific individual and purpose, and is protected by law.  If you are not the intended recipient, please contact sender immediately by reply e-mail and destroy all copies.  You are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited.
TIAA-CREF
**************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: Digester & doctype declaration

Posted by Simon Kitching <si...@ecnetwork.co.nz>.
On Mon, 2003-12-29 at 08:10, Paul Libbrecht wrote:
> On 28-Dec-03, at 19:21 Uhr, Craig R. McClanahan wrote:
> 
> > Quoting "Iantosca, Jonathan" <ji...@tiaa-cref.org>:
> >
> >> Hello,
> >>
> >> I'm trying to digest an xml file with the following doctype 
> >> declaration.
> >>
> >> <!DOCTYPE adaptor SYSTEM "woadaptor.dtd">
> >>
> >> I keep getting a java.net.UnknownHostException when this declaration 
> >> is in
> >> the xml document. As soon as I remove it, the digester has no 
> >> problems.
> >> Also, before parsing, I'm calling the Digester's setValidating method,
> >> passing in false.
> >>
> >> Any Thoughts?
> >>
> >> -Jon
> >>
> >
> > In order for the XML parser to be able to resolve this relative URL
> > ("woadaptor.dtd"), it has to know the URL of the document (that 
> > contains this
> > line) that you are actually parsing.  In turn, that means you need to 
> > use one
> > of the Digester.parse() methods that provides this information -- 
> > either a
> > File, an InputSource, or a String.  Don't use the one that takes an
> > InputStream.
> >
> > Craig
> 
> Dare I add to this that validation being turned off does not mean the 
> DTD will not be loaded. DTDs provide, among others, default values for 
> attributes hence need to be read at every parsing.
> 

Hey, might as well add my $0.02 worth too :-)

If the doctype declaration specifies standalone="yes" then that tells
the xml parser that there *aren't* any default values or other stuff in
the DTD that would affect parsing, so in that case the xml parser is
allowed to skip loading the DTD. I'm not sure if xerces skips DTDs when
standalone is set...

Alternatively, some parsers (xerces at least) has a parser-specific
feature to prevent loading of any external files (see documentation on
custom parser features). The Digester#setFeature method passes its
parameters down to the underlying parser, or you can create & configure
the parser instance yourself rather than let Digester create one. Of
course you'd need to be sure what concrete parser was going to be used
in order to take advantage of this.

And finally you can write your own EntityResolver to customise how DTDs
(and other external entities referenced from an XML document) are
located during parsing. See Digester#setEntityResolver, or create &
configure the parser instance yourself. This is probably the most
portable/flexible way to handle doctype declarations in your input
files. A quick and ugly hack is to always return an empty stream when
asked to locate a DTD. Of course if the DTD *does* declare default
values for attributes, etc., then the result of parsing won't be
correct.

Cheers,

Simon



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: Digester & doctype declaration

Posted by Paul Libbrecht <pa...@activemath.org>.
On 28-Dec-03, at 19:21 Uhr, Craig R. McClanahan wrote:

> Quoting "Iantosca, Jonathan" <ji...@tiaa-cref.org>:
>
>> Hello,
>>
>> I'm trying to digest an xml file with the following doctype 
>> declaration.
>>
>> <!DOCTYPE adaptor SYSTEM "woadaptor.dtd">
>>
>> I keep getting a java.net.UnknownHostException when this declaration 
>> is in
>> the xml document. As soon as I remove it, the digester has no 
>> problems.
>> Also, before parsing, I'm calling the Digester's setValidating method,
>> passing in false.
>>
>> Any Thoughts?
>>
>> -Jon
>>
>
> In order for the XML parser to be able to resolve this relative URL
> ("woadaptor.dtd"), it has to know the URL of the document (that 
> contains this
> line) that you are actually parsing.  In turn, that means you need to 
> use one
> of the Digester.parse() methods that provides this information -- 
> either a
> File, an InputSource, or a String.  Don't use the one that takes an
> InputStream.
>
> Craig

Dare I add to this that validation being turned off does not mean the 
DTD will not be loaded. DTDs provide, among others, default values for 
attributes hence need to be read at every parsing.

Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: FW: Digester & doctype declaration

Posted by "Craig R. McClanahan" <cr...@apache.org>.
Quoting "Iantosca, Jonathan" <ji...@tiaa-cref.org>:

> Hello,
> 
> I'm trying to digest an xml file with the following doctype declaration.
> 
> <!DOCTYPE adaptor SYSTEM "woadaptor.dtd">
> 
> I keep getting a java.net.UnknownHostException when this declaration is in
> the xml document. As soon as I remove it, the digester has no problems.
> Also, before parsing, I'm calling the Digester's setValidating method,
> passing in false.
> 
> Any Thoughts?
> 
> -Jon
> 

In order for the XML parser to be able to resolve this relative URL
("woadaptor.dtd"), it has to know the URL of the document (that contains this
line) that you are actually parsing.  In turn, that means you need to use one
of the Digester.parse() methods that provides this information -- either a
File, an InputSource, or a String.  Don't use the one that takes an
InputStream.

Craig


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org