You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bill Taylor <wa...@as-st.com> on 2006/09/25 04:41:05 UTC

Does anyone know of software for handling English plurals, "ing," etc?

I am developing a program to search indexed documents.  To make the 
search more effective, I am trying to automatically search for plurals, 
words in ending in "ing," verbs ending in "ed," etc.

For example, if someone types "run" into the search box, AND the index 
also has runs and running, I would like to offer the user the option of 
including those terms as well.  If there is no "running" in the index, 
the software would not include it.

Similarly, there is walk and walked, battery and batteries, wake and 
waking, and other such.

I can't POSSIBLY be the first person to have wanted to do this.  Does 
anyone know of software for detecting such combinations in English?

Rumor hath that Google does this sort of thing without telling you; 
that;'s one way they can find millions of hits.

Thanks in advance,

Bill Taylor


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Does anyone know of software for handling English plurals, "ing," etc?

Posted by Greg Colvin <gr...@colvin.org>.
Snowball, a generalized stemming facility:

    http://snowball.tartarus.org/

Where Snowball hooks into Lucene:

    http://lucene.apache.org/java/docs/api/net/sf/snowball/ 
SnowballProgram.html

Where to get Snowball for Lucene:

    http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/snowball/


I'm guessing that this is the kind of thing you are looking for.


On Sep 24, 2006, at 8:41 PM, Bill Taylor wrote:
> I am developing a program to search indexed documents.  To make the  
> search more effective, I am trying to automatically search for  
> plurals, words in ending in "ing," verbs ending in "ed," etc.
>
> For example, if someone types "run" into the search box, AND the  
> index also has runs and running, I would like to offer the user the  
> option of including those terms as well.  If there is no "running"  
> in the index, the software would not include it.
>
> Similarly, there is walk and walked, battery and batteries, wake  
> and waking, and other such.
>
> I can't POSSIBLY be the first person to have wanted to do this.   
> Does anyone know of software for detecting such combinations in  
> English?
>
> Rumor hath that Google does this sort of thing without telling you;  
> that;'s one way they can find millions of hits.
>
> Thanks in advance,
>
> Bill Taylor
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org