You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2011/12/13 18:24:33 UTC
Combination of edgengram and ngram
I am interested in a new filter type, one that would combine edgengram
and ngram. The idea is that it would create all ngrams specified by the
min/max size, but the ngrams that happen to be edgengrams (specifically
the left side) would get an index-time boost. Optionally the boost
would be higher if it came from the first token.
The use case: An automatic autosuggest dropdown that populates as a
user types into a search box. The index would have one field and it
would be built from a manually produced list of suggested search
phrases. The boosts mentioned would make it so that matches from the
beginning of a word, and especially from the beginning of the entire
suggested phrase, would be returned first.
I could get a similar effect by using a copyfield, analyzing one field
with ngrams and the other with edgengrams, then using edismax to put a
boost on the edge version. I will start with this method, but using
copyfield makes the index bigger, and using dismax makes the ultimate
parsed queries more complicated.
If I can avoid the copyfield, the index will be smaller and the queries
very simple, which should make for very high speed.
I will take a look at the source code, but I'm a bit of a Java novice.
Does anyone have the knowledge, desire, and time to crank this one out
quickly? Is it possible someone has already written such a filter?
Thanks,
Shawn