You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Frank A <fs...@gmail.com> on 2010/07/23 21:59:16 UTC

Scoring Search for autocomplete

Hi, I have an autocomplete that is currently working with an
NGramTokenizer so if I search for "Yo" both "New York" and "Toyota"
are valid results.  However I'm trying to figure out how to best
implement the search so that from a score perspective if the string
matches the beginning of an entire field it ranks first, followed by
the beginning of a term and then in the middle of a term.  For example
if I was searching with "vi" I would want Virginia ahead of West
Virginia ahead of Five.

I think I can do this with three seperate fields, one using a white
space tokenizer and a ngram filter, another using the edge-ngram +
whitespace and another using keyword+edge-ngram, then doing an or on
the 3 fields, so that Virginia would match all 3 and get a higher
score... but this doesn't feel right to me, so I wanted to check for
better options.

Thanks.

Re: Scoring Search for autocomplete

Posted by Chris Hostetter <ho...@fucit.org>.

You weren't really clear on how you are generating your autocomplete 
results -- ie: via TermsComponent on your "main" index? or via a 
search on a custom index where each document is a "word" to suggested?

Assuming the later, then the approach you describe below sounds good to 
me, but it doesn't seem like it would really make sense for hte former.


: Hi, I have an autocomplete that is currently working with an
: NGramTokenizer so if I search for "Yo" both "New York" and "Toyota"
: are valid results.  However I'm trying to figure out how to best
: implement the search so that from a score perspective if the string
: matches the beginning of an entire field it ranks first, followed by
: the beginning of a term and then in the middle of a term.  For example
: if I was searching with "vi" I would want Virginia ahead of West
: Virginia ahead of Five.
: 
: I think I can do this with three seperate fields, one using a white
: space tokenizer and a ngram filter, another using the edge-ngram +
: whitespace and another using keyword+edge-ngram, then doing an or on
: the 3 fields, so that Virginia would match all 3 and get a higher
: score... but this doesn't feel right to me, so I wanted to check for
: better options.
: 
: Thanks.
: 



-Hoss