You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Clemens Wyss <cl...@mysign.ch> on 2011/04/07 10:30:50 UTC

Analyzer which creates terms of one to n words

Is there an analyzer which takes a text and creates search terms based on the following rules:
- all single words
- "two words in a row"
- "three word in a row"
- ...
- "n words in a row"

The reason is the following: 
I have an index which is now being analyzed using WhitespaceAnalyzer. Besides that I have a so called "term index" which is populated with all (search) terms of the "real index". The "term index" is used to provide suggestion for search terms. Typing a single search term workls perfectly. The problem is that  when I type two words to narrow the suggestions no more possible serach terms are found, BECAUSE the "term index" only has single words/terms.

Which analyzer should I use? Ngram? Or is there even an analyzer which does the above?

Thanks for your advices!
Clemens

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Analyzer which creates terms of one to n words

Posted by Israel Tsadok <it...@gmail.com>.
Take a look st
http://lucene.apache.org/java/3_0_3/api/contrib-analyzers/org/apache/lucene/analysis/shingle/package-summary.html

On Thu, Apr 7, 2011 at 11:30 AM, Clemens Wyss <cl...@mysign.ch> wrote:

> Is there an analyzer which takes a text and creates search terms based on
> the following rules:
> - all single words
> - "two words in a row"
> - "three word in a row"
> - ...
> - "n words in a row"
>
> The reason is the following:
> I have an index which is now being analyzed using WhitespaceAnalyzer.
> Besides that I have a so called "term index" which is populated with all
> (search) terms of the "real index". The "term index" is used to provide
> suggestion for search terms. Typing a single search term workls perfectly.
> The problem is that  when I type two words to narrow the suggestions no more
> possible serach terms are found, BECAUSE the "term index" only has single
> words/terms.
>
> Which analyzer should I use? Ngram? Or is there even an analyzer which does
> the above?
>
> Thanks for your advices!
> Clemens
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>