You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rajesh Munavalli <fi...@gmail.com> on 2006/03/30 23:14:18 UTC
html parser
Does anyone know where I can get the source code for html parser which is in
the plugins directory?
Re: html parser
Posted by Rajesh Munavalli <fi...@gmail.com>.
Ooops...actually I meant to ask XHTML parser. Is it safe to use HTML parser
to parse XHTML?
On 3/30/06, Andrzej Bialecki <ab...@getopt.org> wrote:
>
> Rajesh Munavalli wrote:
> > Does anyone know where I can get the source code for html parser which
> is in
> > the plugins directory?
> >
>
> Which one? parse-html uses two parsers: one is called CyberNeko, the
> other is called TagSoup. You can find their home pages and their sources
> easily through Google.
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
>
Re: html parser
Posted by Andrzej Bialecki <ab...@getopt.org>.
Rajesh Munavalli wrote:
> Does anyone know where I can get the source code for html parser which is in
> the plugins directory?
>
Which one? parse-html uses two parsers: one is called CyberNeko, the
other is called TagSoup. You can find their home pages and their sources
easily through Google.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com