You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Henrik Melander <d9...@sm.luth.se> on 2001/03/28 22:51:28 UTC

forcing validation

I have a server that receives a XML-file over http and responds with
another. I do not have control over the client and they may not send
correct xml. (usually not ;)  Therefore we want to validate the xml file
against the dtd.

Is it possible to force the dom parser to validate against a dtd? I have
not found anything in the api. If not, the best way seems to do a regexp
in the file and insert the "dtd-link".

Regards,
Henrik

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Re: forcing validation

Posted by Jeff Turner <je...@socialchange.net.au>.
I asked this on XML-DEV once.

It seems the best, correct solution is simply to demand valid XML ;) The XML
spec [1] defines "valid XML" as having an associated doctype declaration.
That's the contract between an XML producer and consumer. Contracts are very
important things. The whole *point* of XML is that it gives you a pre-defined,
rigorous contract, to which both sides can agree. If your XML source breaks
that contract, then the correct response is to yell at them until they fix it.
If you "work around" the problem, you're throwing away the contract and losing
the main benefit of XML. 

An illustration: once upon a time, HTML was the "contract" that governed
browsers. That contract was broken during the browser wars, as vendors rushed
to add new tags and competed to see whose browser accepted crappier
pseudo-HTML. End result: buggy, bloated browsers making web developers' lives
miserable.

That said, there are instances where you have no control over the XML source,
and can apply no pressure to get it fixed.

In that case, I suggest you look at Simon St Laurent's "DOCTYPEChangerStream"
class:

http://www.simonstl.com/projects/doctypes/

It is a filter that lets you replace an existing doctype declaration, or add
one if one doesn't exist.


In addition (and regardless of whether you use SimonStL's hack), you'll
probably need some additional code to validate against your "local" DTD,
instead of whatever is specified in the doctype declaration's system id. In
servlet environments, your webapp may be deployed from an unpacked .war, so
it's not a good idea to let the parser resolve the DTD to a file. Here, you can
use a custom EntityResolver which loads the DTD via getResourceAsStream(), and
returns it to the parser. I've attached a class which does this.

You might also want to look at XML Catalogs for using a local DTD instead of
that specified in the doctype. Here's a good article about it:

http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html

HTH,

--Jeff

[1]
  "[Definition:] An XML document is valid if it has an associated document type
  declaration and if the document complies with the constraints expressed in
  it."

 -- http://www.xml.com/axml/target.html#sec-prolog-dtd



On Wed, Mar 28, 2001 at 10:51:28PM +0200, Henrik Melander wrote:
> I have a server that receives a XML-file over http and responds with
> another. I do not have control over the client and they may not send
> correct xml. (usually not ;)  Therefore we want to validate the xml file
> against the dtd.
> 
> Is it possible to force the dom parser to validate against a dtd? I have
> not found anything in the api. If not, the best way seems to do a regexp
> in the file and insert the "dtd-link".
> 
> Regards,
> Henrik