You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ryan McKinley <ry...@gmail.com> on 2007/09/16 08:48:46 UTC
'suggest' query sorting
Hello-
I'm building an interface where I need to display matching options as a
user types into a search box. Something like google suggest, but it
needs to be a little more flexible in its matches.
It first glance, I thought I just needed to write a filter that chunks
each token into a set of prefixes. Check SOLR-357 -- As Hoss points
out, I may just be able to use the EdgeNGramFilterFactory.
I have the basics working, but need some help getting the details to
behave properly.
Consider the strings:
Canon PowerShot
iPod Cable
Canon EX PIXMA
Video Card
If I query for 'ca' I expect to get all these back. This works fine,
but I need help with is ordering.
How can I boost words where the whole value (not just the token) is
closer to the front of the value? That is, I want 'ca' to return:
1. Canon PowerShot
2. Canon EX PIXMA
3. iPod Cable
4. Video Card
(actually 1&2 could be swapped)
After that works, how do I boost tokens that are closer together? If I
search for 'canon p', how can I make sure the results are returned as:
1. Canon PowerShot
2. Canon EX PIXMA
thanks
ryan
Re: 'suggest' query sorting
Posted by Ryan McKinley <ry...@gmail.com>.
The prefix query work fine with EdgeNGramFilterFactory, but I'm still
not sure how to get the sorting to work.
I'm using:
<fieldType name="prefixing" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="20"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
If you have any ideas on the sorting, let me know!
Matthew Runo wrote:
> Hello! Were you able to find out anything? I'd be interested to know
> what you found out.
>
> +--------------------------------------------------------+
> | Matthew Runo
> | Zappos Development
> | mruno@zappos.com
> | 702-943-7833
> +--------------------------------------------------------+
>
>
> On Sep 15, 2007, at 11:48 PM, Ryan McKinley wrote:
>
>> Hello-
>>
>> I'm building an interface where I need to display matching options as
>> a user types into a search box. Something like google suggest, but it
>> needs to be a little more flexible in its matches.
>>
>> It first glance, I thought I just needed to write a filter that chunks
>> each token into a set of prefixes. Check SOLR-357 -- As Hoss points
>> out, I may just be able to use the EdgeNGramFilterFactory.
>>
>> I have the basics working, but need some help getting the details to
>> behave properly.
>>
>> Consider the strings:
>> Canon PowerShot
>> iPod Cable
>> Canon EX PIXMA
>> Video Card
>>
>> If I query for 'ca' I expect to get all these back. This works fine,
>> but I need help with is ordering.
>>
>> How can I boost words where the whole value (not just the token) is
>> closer to the front of the value? That is, I want 'ca' to return:
>> 1. Canon PowerShot
>> 2. Canon EX PIXMA
>> 3. iPod Cable
>> 4. Video Card
>> (actually 1&2 could be swapped)
>>
>> After that works, how do I boost tokens that are closer together? If
>> I search for 'canon p', how can I make sure the results are returned as:
>> 1. Canon PowerShot
>> 2. Canon EX PIXMA
>>
>>
>> thanks
>> ryan
>>
>>
>>
>>
>>
>
>
Re: 'suggest' query sorting
Posted by Matthew Runo <mr...@zappos.com>.
Hello! Were you able to find out anything? I'd be interested to know
what you found out.
+--------------------------------------------------------+
| Matthew Runo
| Zappos Development
| mruno@zappos.com
| 702-943-7833
+--------------------------------------------------------+
On Sep 15, 2007, at 11:48 PM, Ryan McKinley wrote:
> Hello-
>
> I'm building an interface where I need to display matching options
> as a user types into a search box. Something like google suggest,
> but it needs to be a little more flexible in its matches.
>
> It first glance, I thought I just needed to write a filter that
> chunks each token into a set of prefixes. Check SOLR-357 -- As
> Hoss points out, I may just be able to use the EdgeNGramFilterFactory.
>
> I have the basics working, but need some help getting the details
> to behave properly.
>
> Consider the strings:
> Canon PowerShot
> iPod Cable
> Canon EX PIXMA
> Video Card
>
> If I query for 'ca' I expect to get all these back. This works
> fine, but I need help with is ordering.
>
> How can I boost words where the whole value (not just the token) is
> closer to the front of the value? That is, I want 'ca' to return:
> 1. Canon PowerShot
> 2. Canon EX PIXMA
> 3. iPod Cable
> 4. Video Card
> (actually 1&2 could be swapped)
>
> After that works, how do I boost tokens that are closer together?
> If I search for 'canon p', how can I make sure the results are
> returned as:
> 1. Canon PowerShot
> 2. Canon EX PIXMA
>
>
> thanks
> ryan
>
>
>
>
>
Re: 'suggest' query sorting
Posted by Ryan McKinley <ry...@gmail.com>.
>
> if you really want #3 and #4 to show up, then have two fields: one using
> whitespace tokenizer, one using keyword tokenizer; both using
> EdgeNGramFilter ... boost the query to the first field higher then the
> second field (or just rely on the coordFactor and the fact that "ca" will
> match on both fields for "Canon PowerShot" but only on thesecond field for
> "iPod Cable"
>
I'm working with person names that are sometimes reversed... it needs to
treat the last name (that may be the first name) with the same weight.
Yes, this scheme works great. Thanks.
I added the config I'm using to SOLR-357 and closed the issue.
Hopefully the next person searching for how to do this will know to look
at the "EdgeNGramFilter"
ryan
Re: 'suggest' query sorting
Posted by Chris Hostetter <ho...@fucit.org>.
: How can I boost words where the whole value (not just the token) is closer to
: the front of the value? That is, I want 'ca' to return:
: 1. Canon PowerShot
: 2. Canon EX PIXMA
: 3. iPod Cable
: 4. Video Card
: (actually 1&2 could be swapped)
i would argue that you don't want #3 and #4 at all if you are doing query
suggestion, instead make hte field you query use a KeywordTokenizer with
the EdgeNGramFilter so "ca" only matches #1 and #2.
if you really want #3 and #4 to show up, then have two fields: one using
whitespace tokenizer, one using keyword tokenizer; both using
EdgeNGramFilter ... boost the query to the first field higher then the
second field (or just rely on the coordFactor and the fact that "ca" will
match on both fields for "Canon PowerShot" but only on thesecond field for
"iPod Cable"
: After that works, how do I boost tokens that are closer together? If I search
: for 'canon p', how can I make sure the results are returned as:
: 1. Canon PowerShot
: 2. Canon EX PIXMA
i think the two fields i described above will solve that problem as well.
-Hoss