You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2011/12/13 18:24:33 UTC

Combination of edgengram and ngram

I am interested in a new filter type, one that would combine edgengram 
and ngram.  The idea is that it would create all ngrams specified by the 
min/max size, but the ngrams that happen to be edgengrams (specifically 
the left side) would get an index-time boost.  Optionally the boost 
would be higher if it came from the first token.

The use case:  An automatic autosuggest dropdown that populates as a 
user types into a search box.  The index would have one field and it 
would be built from a manually produced list of suggested search 
phrases.  The boosts mentioned would make it so that matches from the 
beginning of a word, and especially from the beginning of the entire 
suggested phrase, would be returned first.

I could get a similar effect by using a copyfield, analyzing one field 
with ngrams and the other with edgengrams, then using edismax to put a 
boost on the edge version.  I will start with this method, but using 
copyfield makes the index bigger, and using dismax makes the ultimate 
parsed queries more complicated.

If I can avoid the copyfield, the index will be smaller and the queries 
very simple, which should make for very high speed.

I will take a look at the source code, but I'm a bit of a Java novice.  
Does anyone have the knowledge, desire, and time to crank this one out 
quickly?  Is it possible someone has already written such a filter?

Thanks,
Shawn