You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ryan McKinley <ry...@gmail.com> on 2007/09/16 08:48:46 UTC

'suggest' query sorting

Hello-

I'm building an interface where I need to display matching options as a 
user types into a search box.  Something like google suggest, but it 
needs to be a little more flexible in its matches.

It first glance, I thought I just needed to write a filter that chunks 
each token into a set of prefixes.  Check SOLR-357 -- As Hoss points 
out, I may just be able to use the EdgeNGramFilterFactory.

I have the basics working, but need some help getting the details to 
behave properly.

Consider the strings:
  Canon PowerShot
  iPod Cable
  Canon EX PIXMA
  Video Card

If I query for 'ca' I expect to get all these back.  This works fine, 
but I need help with is ordering.

How can I boost words where the whole value (not just the token) is 
closer to the front of the value?  That is, I want 'ca' to return:
  1. Canon PowerShot
  2. Canon EX PIXMA
  3. iPod Cable
  4. Video Card
(actually 1&2 could be swapped)

After that works, how do I boost tokens that are closer together?  If I 
search for 'canon p', how can I make sure the results are returned as:
  1. Canon PowerShot
  2. Canon EX PIXMA


thanks
ryan

Re: 'suggest' query sorting

Posted by Ryan McKinley <ry...@gmail.com>.

The prefix query work fine with EdgeNGramFilterFactory, but I'm still 
not sure how to get the sorting to work.

I'm using:

<fieldType name="prefixing" class="solr.TextField" positionIncrementGap="1">
   <analyzer type="index">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.LowerCaseFilterFactory" />
     <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" 
maxGramSize="20"/>
   </analyzer>
   <analyzer type="query">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.LowerCaseFilterFactory" />
   </analyzer>
</fieldType>

If you have any ideas on the sorting, let me know!



Matthew Runo wrote:
> Hello! Were you able to find out anything? I'd be interested to know 
> what you found out.
> 
> +--------------------------------------------------------+
>  | Matthew Runo
>  | Zappos Development
>  | mruno@zappos.com
>  | 702-943-7833
> +--------------------------------------------------------+
> 
> 
> On Sep 15, 2007, at 11:48 PM, Ryan McKinley wrote:
> 
>> Hello-
>>
>> I'm building an interface where I need to display matching options as 
>> a user types into a search box.  Something like google suggest, but it 
>> needs to be a little more flexible in its matches.
>>
>> It first glance, I thought I just needed to write a filter that chunks 
>> each token into a set of prefixes.  Check SOLR-357 -- As Hoss points 
>> out, I may just be able to use the EdgeNGramFilterFactory.
>>
>> I have the basics working, but need some help getting the details to 
>> behave properly.
>>
>> Consider the strings:
>>  Canon PowerShot
>>  iPod Cable
>>  Canon EX PIXMA
>>  Video Card
>>
>> If I query for 'ca' I expect to get all these back.  This works fine, 
>> but I need help with is ordering.
>>
>> How can I boost words where the whole value (not just the token) is 
>> closer to the front of the value?  That is, I want 'ca' to return:
>>  1. Canon PowerShot
>>  2. Canon EX PIXMA
>>  3. iPod Cable
>>  4. Video Card
>> (actually 1&2 could be swapped)
>>
>> After that works, how do I boost tokens that are closer together?  If 
>> I search for 'canon p', how can I make sure the results are returned as:
>>  1. Canon PowerShot
>>  2. Canon EX PIXMA
>>
>>
>> thanks
>> ryan
>>
>>
>>
>>
>>
> 
>

Re: 'suggest' query sorting

Posted by Matthew Runo <mr...@zappos.com>.

Hello! Were you able to find out anything? I'd be interested to know  
what you found out.

+--------------------------------------------------------+
  | Matthew Runo
  | Zappos Development
  | mruno@zappos.com
  | 702-943-7833
+--------------------------------------------------------+


On Sep 15, 2007, at 11:48 PM, Ryan McKinley wrote:

> Hello-
>
> I'm building an interface where I need to display matching options  
> as a user types into a search box.  Something like google suggest,  
> but it needs to be a little more flexible in its matches.
>
> It first glance, I thought I just needed to write a filter that  
> chunks each token into a set of prefixes.  Check SOLR-357 -- As  
> Hoss points out, I may just be able to use the EdgeNGramFilterFactory.
>
> I have the basics working, but need some help getting the details  
> to behave properly.
>
> Consider the strings:
>  Canon PowerShot
>  iPod Cable
>  Canon EX PIXMA
>  Video Card
>
> If I query for 'ca' I expect to get all these back.  This works  
> fine, but I need help with is ordering.
>
> How can I boost words where the whole value (not just the token) is  
> closer to the front of the value?  That is, I want 'ca' to return:
>  1. Canon PowerShot
>  2. Canon EX PIXMA
>  3. iPod Cable
>  4. Video Card
> (actually 1&2 could be swapped)
>
> After that works, how do I boost tokens that are closer together?   
> If I search for 'canon p', how can I make sure the results are  
> returned as:
>  1. Canon PowerShot
>  2. Canon EX PIXMA
>
>
> thanks
> ryan
>
>
>
>
>

Re: 'suggest' query sorting

Posted by Ryan McKinley <ry...@gmail.com>.

> 
> if you really want #3 and #4 to show up, then have two fields: one using 
> whitespace tokenizer, one using keyword tokenizer; both using 
> EdgeNGramFilter ... boost the query to the first field higher then the 
> second field (or just rely on the coordFactor and the fact that "ca" will 
> match on both fields for "Canon PowerShot" but only on thesecond field for 
> "iPod Cable"
> 

I'm working with person names that are sometimes reversed... it needs to 
treat the last name (that may be the first name) with the same weight.

Yes, this scheme works great.  Thanks.

I added the config I'm using to SOLR-357 and closed the issue. 
Hopefully the next person searching for how to do this will know to look 
at the "EdgeNGramFilter"

ryan

Re: 'suggest' query sorting

Posted by Chris Hostetter <ho...@fucit.org>.

: How can I boost words where the whole value (not just the token) is closer to
: the front of the value?  That is, I want 'ca' to return:
:  1. Canon PowerShot
:  2. Canon EX PIXMA
:  3. iPod Cable
:  4. Video Card
: (actually 1&2 could be swapped)

i would argue that you don't want #3 and #4 at all if you are doing query 
suggestion, instead make hte field you query use a KeywordTokenizer with 
the EdgeNGramFilter so "ca" only matches #1 and #2.

if you really want #3 and #4 to show up, then have two fields: one using 
whitespace tokenizer, one using keyword tokenizer; both using 
EdgeNGramFilter ... boost the query to the first field higher then the 
second field (or just rely on the coordFactor and the fact that "ca" will 
match on both fields for "Canon PowerShot" but only on thesecond field for 
"iPod Cable"

: After that works, how do I boost tokens that are closer together?  If I search
: for 'canon p', how can I make sure the results are returned as:
:  1. Canon PowerShot
:  2. Canon EX PIXMA

i think the two fields i described above will solve that problem as well.




-Hoss