You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Paolo Valladolid <pv...@dfi-intl.com> on 2004/07/14 22:06:46 UTC

[Digester] How do I get Digester to ignore the tag

I need to use Digester to parse XML that has been retrieved from a
database.  The XML I'm working with was received from elsewhere (ie. Not
created by our team).  How do I get Digester to ignore the <!DOCTYPE>
tag?  I've tried setValidating( false ) and it did not work.

 

Thanks,

 

Paolo Valladolid

Software Developer

DFI International Government Services

 


Re: [Digester] How do I get Digester to ignore the tag

Posted by Simon Kitching <si...@ecnetwork.co.nz>.
On Thu, 2004-07-15 at 09:26, Craig McClanahan wrote:
> Paolo Valladolid wrote:
> 
> >I need to use Digester to parse XML that has been retrieved from a
> >database.  The XML I'm working with was received from elsewhere (ie. Not
> >created by our team).  How do I get Digester to ignore the <!DOCTYPE>
> >tag?  I've tried setValidating( false ) and it did not work.
> >
> >  
> >
> The setValidating(false) call does indeed tell Digester to not validate 
> the XML data.  However, it does *not* tell the underlying XML parser to 
> skip the DOCTYPE, and there is no API in JAXP to say that sort of thing.
> 
> If your problem is unresolved entities, one thing you can do is to 
> provide your own EntityResolver method whose resolveEntity() method 
> always returns null.  That way, the parser won't go traipsing around the 
> network trying to find things that it can't.

Hi Paolo,

I'm presuming the problem is that you have a DOCTYPE like this:
 <!DOCTYPE public "http://www.acme.com/mydtd.dtd">
and want to suppress loading of the referenced document, or have a DTD
which declares <!ENTITY ....> and want to suppress loading of the
entity. 

In other words, you don't want to ignore the DOCTYPE, you want to
suppress loading of external entities.

Craig's suggestion of writing an EntityResolver will work, but he has
made a minor mistake: if you return *null* from the entity resolver
class, then the parser will apply its normal resolving rules, including
retrieving the entity (eg DTD) from the specified URL.

This is explicitly stated in the javadoc for the
org.xml.sax.EntityResolver class.

In order to ignore remote entities, you can instead get your
EntityResolver to return an InputSource that wraps an empty InputStream.

Note, however, that this can change the *meaning* of your xml document.
For example, if the DTD defines an implied value for an attribute, then
ignoring the DTD will result in the attribute not getting its expected
value.

In general, it is better to ensure you have a local copy of the DTD,
then use an EntityResolver to return the local DTD rather than returning
an empty string. Still, if you *know* that the DTD doesn't have this
sort of stuff in it, returning an InputSource which wraps an empty
stream will work ok.

If you happen to know that the underlying xml parser is Xerces then you
can use the setFeature method to disable loading of DTDs. However this
is parser-specific. See the xerces documentation on "features" for more
info.

By the way, this is nothing to do with the Digester; it is related to
JAXP parsing in general. So you may be better off asking this on a list
for xml parsing & JAXP.

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [Digester] How do I get Digester to ignore the tag

Posted by Craig McClanahan <cr...@apache.org>.
Paolo Valladolid wrote:

>I need to use Digester to parse XML that has been retrieved from a
>database.  The XML I'm working with was received from elsewhere (ie. Not
>created by our team).  How do I get Digester to ignore the <!DOCTYPE>
>tag?  I've tried setValidating( false ) and it did not work.
>
>  
>
The setValidating(false) call does indeed tell Digester to not validate 
the XML data.  However, it does *not* tell the underlying XML parser to 
skip the DOCTYPE, and there is no API in JAXP to say that sort of thing.

If your problem is unresolved entities, one thing you can do is to 
provide your own EntityResolver method whose resolveEntity() method 
always returns null.  That way, the parser won't go traipsing around the 
network trying to find things that it can't.

> 
>
>Thanks,
>
> 
>
>Paolo Valladolid
>
>Software Developer
>
>DFI International Government Services
>
> 
>
>
>  
>
Craig


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org