You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by James liu <li...@gmail.com> on 2006/08/29 14:10:19 UTC

how to write this regexps?

i wanna index html,,,but it have image,flash,javascript, and i wanna make
index quick,,

but i don't know how to get textmode content,,,

anyone can help me?

Re: how to write this regexps?

Posted by d rj <dr...@gmail.com>.
I would recommend using the open source project HTMLParser (
http://htmlparser.sourceforge.net/).  It provides an excellent API for
parsing html files and extracting the relevant text.
-drj

On 8/29/06, James liu <li...@gmail.com> wrote:
>
> i wanna index html,,,but it have image,flash,javascript, and i wanna make
> index quick,,
>
> but i don't know how to get textmode content,,,
>
> anyone can help me?
>
>