You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "Periya.Data" <pe...@gmail.com> on 2011/12/22 03:32:16 UTC
suggestions for removing stop words
Hi all,
What is the suggested way to remove stop words from a given text
document using Tika? I am able to extract the contents of a PDF document
and have it in a nice text format. Now I am looking for ways to remove stop
words and stem words.
Suggestions are very much appreciated.
Thanks,
PD.
Re: suggestions for removing stop words
Posted by Alex Ott <al...@gmail.com>.
This is not a task for tika - you should use Lucene to do this
On Thu, Dec 22, 2011 at 3:32 AM, Periya.Data <pe...@gmail.com> wrote:
> Hi all,
> What is the suggested way to remove stop words from a given text
> document using Tika? I am able to extract the contents of a PDF document and
> have it in a nice text format. Now I am looking for ways to remove stop
> words and stem words.
>
> Suggestions are very much appreciated.
>
> Thanks,
> PD.
--
With best wishes, Alex Ott
http://alexott.net/
Tiwtter: alexott_en (English), alexott (Russian)
Skype: alex.ott