You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Fr...@eFunds.com on 2001/08/01 09:42:52 UTC

Re: HTML Parsing

Gee thanks, that sounds exactly like what I need.

But still, satisfy my curiosity, what is the purpose of the HTML packages
in xerces?

org.apache.html.dom
org.w3c.dom.html

Thanks again.

Franck




Edwin Goei <ed...@sun.com> on 31/07/2001 18:20:55

Please respond to xerces-j-dev@xml.apache.org

To:   xerces-j-dev@xml.apache.org
cc:
Subject:  Re: HTML Parsing


Franck_Schmidlin@eFunds.com wrote:
>
> Hiya,
>
> I am trying to parse some very ugly HTML, complete with missing closing
> tags and non quoted attributes.
>
> I have noticed a few HTML packages in xerces, but I can't make sens of
> them.
>
> Is there a class that I could use to parse HTML and build a proper DOM
out
> of it?

Sounds like you want an HTML parser which converts to XHTML.  Try Tidy
at http://www.w3.org/People/Raggett/tidy/

-Edwin

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org








---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: HTML Parsing

Posted by Andy Clark <an...@apache.org>.
Franck_Schmidlin@eFunds.com wrote:
> But still, satisfy my curiosity, what is the purpose of the HTML 
> packages in xerces?
> 
> org.apache.html.dom
> org.w3c.dom.html

The W3C interfaces are from the HTML API of the DOM. And the Apache
code is an HTML DOM implementation that was donated to the project.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org