You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by James liu <li...@gmail.com> on 2006/08/29 14:10:19 UTC
how to write this regexps?
i wanna index html,,,but it have image,flash,javascript, and i wanna make
index quick,,
but i don't know how to get textmode content,,,
anyone can help me?
Re: how to write this regexps?
Posted by d rj <dr...@gmail.com>.
I would recommend using the open source project HTMLParser (
http://htmlparser.sourceforge.net/). It provides an excellent API for
parsing html files and extracting the relevant text.
-drj
On 8/29/06, James liu <li...@gmail.com> wrote:
>
> i wanna index html,,,but it have image,flash,javascript, and i wanna make
> index quick,,
>
> but i don't know how to get textmode content,,,
>
> anyone can help me?
>
>