You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Andy Clark <an...@apache.org> on 2003/10/27 22:33:42 UTC

Re: parsing 'html' documents using DOMParser

Mushfiqur Rahman wrote:
> I want to parse a HTML document( may not be a XHTML document) using 
> org.apache.xerces.parsers.DOMParser and get a org.w3c.dom.Document after 
> parsing. Can anyone tell me how can I do it?

If you just need a DOM document, there are a
few options. Check out JTidy[1] and NekoHTML[2].
JTidy has been around longer but NekoHTML has
the advantage of using less memory and it is
built on top of Xerces.

But, as with everything, check out all of your
options and pick the one that works for you.

[1] http://sourceforge.net/projects/jtidy/
[2] http://www.apache.org/~andyc/neko/doc/html/

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org