You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Schmidt, Dennis" <de...@sap.com> on 2011/02/14 00:51:06 UTC

Search requires too long search term

Hi there,

I am using Lucene for an actually quite simple search. I am not indexing long texts, but instead each document only has a couple of fields with texts from one word to a very short sentence (no more than 6 words usually). Now I need to find documents even with only two characters typed in by the user.

So for instance, I have a document with only the fields 'hasName', 'hasTechName' and the value 'create'. What I need is to find this document, when the user typed in 'cr'. But at the moment I would at least have to type in 'creat'. The query I send to the QueryParser looks like this "hasName:*cr* OR hasTechName:*cr*" and I pass true to setAllowLeadingWildcard . For collecting the results I use the TopScoreDocCollector.

Any suggestions on where I would have to change something to make my search be more "generous"? At the moment I have really no idea what to do...
Thanks, Dennis

Dennis Schmidt
Research Intern
SAP Research Sydney | Level 3, 168 Walker Street | 2065 North Sydney, NSW | Australia
mailto:dennis.schmidt@sap.com<ma...@sap.com>
www.sap.com<http://www.sap.com/>
Please consider the impact on the environment before printing this e-mail.

Re: Search requires too long search term

Posted by findbestopensource <fi...@gmail.com>.

You may need to use ngrams.
http://lucene.apache.org/java/3_0_3/api/all/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html

Another option would be doing wildcard query without enabling leading
wildcard search. search for cr* and not *cr* as the auto suggest
feature should give suggestion for words after "cr" and not before "cr".
Regards
Aditya
www.findbestopensource.com


On Mon, Feb 14, 2011 at 5:21 AM, Schmidt, Dennis <de...@sap.com>wrote:

> Hi there,
>
> I am using Lucene for an actually quite simple search. I am not indexing
> long texts, but instead each document only has a couple of fields with texts
> from one word to a very short sentence (no more than 6 words usually). Now I
> need to find documents even with only two characters typed in by the user.
>
> So for instance, I have a document with only the fields 'hasName',
> 'hasTechName' and the value 'create'. What I need is to find this document,
> when the user typed in 'cr'. But at the moment I would at least have to type
> in 'creat'. The query I send to the QueryParser looks like this
> "hasName:*cr* OR hasTechName:*cr*" and I pass true to
> setAllowLeadingWildcard . For collecting the results I use the
> TopScoreDocCollector.
>
> Any suggestions on where I would have to change something to make my search
> be more "generous"? At the moment I have really no idea what to do...
> Thanks, Dennis
>
> Dennis Schmidt
> Research Intern
> SAP Research Sydney | Level 3, 168 Walker Street | 2065 North Sydney, NSW |
> Australia
> mailto:dennis.schmidt@sap.com<ma...@sap.com>
> www.sap.com<http://www.sap.com/>
> Please consider the impact on the environment before printing this e-mail.
>
>
>
>