You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "Periya.Data" <pe...@gmail.com> on 2011/12/22 03:32:16 UTC

suggestions for removing stop words

Hi all,
    What is the suggested way to remove stop words from a given text
document using Tika? I am able to extract the contents of a PDF document
and have it in a nice text format. Now I am looking for ways to remove stop
words and stem words.

Suggestions are very much appreciated.

Thanks,
PD.

Re: suggestions for removing stop words

Posted by Alex Ott <al...@gmail.com>.
This is not a task for tika - you should use Lucene to do this

On Thu, Dec 22, 2011 at 3:32 AM, Periya.Data <pe...@gmail.com> wrote:
> Hi all,
>     What is the suggested way to remove stop words from a given text
> document using Tika? I am able to extract the contents of a PDF document and
> have it in a nice text format. Now I am looking for ways to remove stop
> words and stem words.
>
> Suggestions are very much appreciated.
>
> Thanks,
> PD.



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Tiwtter: alexott_en (English), alexott (Russian)
Skype: alex.ott