You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Tomcat Programmer <tc...@yahoo.com> on 2004/03/11 06:15:10 UTC

incomplete word match

I have a situation where I need to be able to find
incomplete word matches, for example a search for the
string 'ape' would return matches for 'grapes'
'naples' 'staples' etc.  I have been searching the
archives of this user list and can't seem to find any
example of someone doing this. 

At one point I recall finding someone's site (on
Google) who indicated that their search engine was
Lucene, and they offered the capability of doing this
type of matching. However I can't seem to find that
site again to save my life!  

Has anyone been successful in implementing this type
of matching with Lucene? If so, would you be able to
share some insight as to how you did it? 

Thanks in advance! 

-TP

__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you�re looking for faster
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: incomplete word match

Posted by Tomcat Programmer <tc...@yahoo.com>.
Thank you David and to Paul as well for your
suggestions! 


>
>--- David Spencer <da...@tropo.com> wrote:
> SubstringQuery, my humble contribution.
> 
>
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg06388.html
> 


__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: incomplete word match

Posted by David Spencer <da...@tropo.com>.
SubstringQuery, my humble contribution.

http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg06388.html

Tomcat Programmer wrote:

>I have a situation where I need to be able to find
>incomplete word matches, for example a search for the
>string 'ape' would return matches for 'grapes'
>'naples' 'staples' etc.  I have been searching the
>archives of this user list and can't seem to find any
>example of someone doing this. 
>
>At one point I recall finding someone's site (on
>Google) who indicated that their search engine was
>Lucene, and they offered the capability of doing this
>type of matching. However I can't seem to find that
>site again to save my life!  
>
>Has anyone been successful in implementing this type
>of matching with Lucene? If so, would you be able to
>share some insight as to how you did it? 
>
>Thanks in advance! 
>
>-TP
>
>__________________________________
>Do you Yahoo!?
>Yahoo! Search - Find what you’re looking for faster
>http://search.yahoo.com
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: incomplete word match

Posted by Paul Elschot <pa...@xs4all.nl>.
On Thursday 11 March 2004 06:15, Tomcat Programmer wrote:
> I have a situation where I need to be able to find
> incomplete word matches, for example a search for the
> string 'ape' would return matches for 'grapes'
> 'naples' 'staples' etc.  I have been searching the
> archives of this user list and can't seem to find any
> example of someone doing this.
>
> At one point I recall finding someone's site (on
> Google) who indicated that their search engine was
> Lucene, and they offered the capability of doing this
> type of matching. However I can't seem to find that
> site again to save my life!
>
> Has anyone been successful in implementing this type
> of matching with Lucene? If so, would you be able to
> share some insight as to how you did it?

I havn't actually done this, but I would make a first attempt
by indexing all the suffixes in a separate field and use a PrefixQuery
to search.  You would index eg. google as:
google oogle ogle gle le e
all on the same position. To search for substring ogl you
would query ogl* on the field.
To save space you might impose a minimum substring length.
The minimum query length should preferably be the same.
Your index will grow quite a bit, but it's difficult to say how much. 

You can do this by providing your own TokenStream on the field
that returns each substring as a Token with a getPositionIncrement()
of zero just after the the normal full Token (google) with an
increment of 1. See also:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/Token.html

Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org