You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Dariush Behboudi <da...@glamm.com> on 2002/11/05 10:30:52 UTC

Using loose.dtd and strict.dtd from xerces

Hi everyone,
I'm new to xerces and I'm trying to validate an Html file using w3c's dtds
strict.dtd and loose.dtd.

My very simple java code is the following:

    SAXParser parser = new SAXParser();
    try {
    parser.setFeature( "http://xml.org/sax/features/validation", true);
    parser.parse("new.xml");
    } catch (Exception e) {
    System.out.println("error in setting up parser feature");
    }

And the Xml file is:
<?xml version="1.0"?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"
   "http://www.w3.org/TR/REC-html40/strict.dtd">

Running my example an error occurs:
org.xml.sax.SAXParseException: The declaration for the entity "ContentType"
must end with '>'.

How can I solve this problem?

Best regards,
Dariush.

Re: Using loose.dtd and strict.dtd from xerces

Posted by Andy Clark <an...@apache.org>.
Dariush Behboudi wrote:

> Hi everyone,
> I'm new to xerces and I'm trying to validate an Html file using w3c's
> dtds strict.dtd and loose.dtd.

HTML DTDs are written in SGML which is a superset
of what is allowed in an XML DTD. If you want to
use HTML but also perform validation, then I would
suggest using XHTML which is the XML version of
the HTML specification.

If validation is not important and you just want
to parse HTML documents in your application, check
out JTidy[1] and NekoHTML[2]. JTidy does a very
good job at cleaning up HTML files but is best
used for automatic conversion and accessing the
document using DOM. NekoHTML is a bit smaller and
offers you the ability to use the SAX API as well
as DOM. If appropriate to your needs, try both
and see which works best for you.

[1] http://lempinen.net/sami/jtidy/
[2] http://www.apache.org/~andyc/neko/doc/html/

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org