You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Gude Reshma <re...@yahoo.com> on 2002/02/08 04:45:24 UTC

parsing an HTML document using Xerces DOM parser

hi,

how can the user the Xerces DOM Parser to parse an HTML document? an HTML document is not well formed therfore i'm unable to parse it using the Xerces parser. I tried to well form the document using HTML Tidy tool given by W3C but this was not helpful. Any suggestions as to how i should parse an HTML document using the DOM Parser?

 

thanx.

 



---------------------------------
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!

Re: parsing an HTML document using Xerces DOM parser

Posted by Andy Clark <an...@apache.org>.
Gude Reshma wrote:
> how can the user the Xerces DOM Parser to parse an HTML document? an
> HTML document is not well formed therfore i'm unable to parse it using
> the Xerces parser. I tried to well form the document using HTML Tidy
> tool given by W3C but this was not helpful. Any suggestions as to how
> i should parse an HTML document using the DOM Parser?

See the mail I sent to the xerces-j-user mailing list on Monday
titled "Can Xerces Parse HTML? Yes, It Can!". I attached a simple
HTML parser for Xerces2. It's very basic but wanted to get some
feedback from people before I put more work into it.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org