You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ken Krugler <kk...@transpac.com> on 2010/08/04 02:11:07 UTC

Best solution to avoiding multiple query requests

Hi all,

I've got a situation where the key result from an initial search  
request (let's say for "dog") is the list of values from a faceted  
field, sorted by hit count.

For the top 10 of these faceted field values, I need to get the top  
hit for the target request ("dog") restricted to that value for the  
faceted field.

Currently this is 11 total requests, of which the 10 requests  
following the initial query can be made in parallel. But that's still  
a lot of requests.

So my questions are:

1. Is there any magic query to handle this with Solr as-is?

2. if not, is the best solution to create my own request handler?

3. And in that case, any input/tips on developing this type of custom  
request handler?

Thanks,

-- Ken


--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: Best solution to avoiding multiple query requests

Posted by Ken Krugler <kk...@transpac.com>.

Hi Geert-jan,

On Aug 4, 2010, at 12:04pm, Geert-Jan Brits wrote:

> If I understand correctly: you want to sort your collapsed results  
> by 'nr of
> collapsed results'/ hits.
>
> It seems this can't be done out-of-the-box using this patch (I'm not
> entirely sure, at least it doesn't follow from the wiki-page.  
> Perhaps best
> is to check the jira-issues to make sure this isn't already  
> available now,
> but just not updated on the wiki)
>
> Also I found a blogpost (from the patch creator afaik) with in the  
> comments
> someone with the same issue + some pointers.
> http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/

Yup, that's the one - http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/comment-page-1/#comment-1249

So with some modifications to that patch, it could work...thanks for  
the info!

-- Ken

> 2010/8/4 Ken Krugler <kk...@transpac.com>
>
>> Hi Geert-Jan,
>>
>>
>> On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:
>>
>> Field Collapsing (currently as patch) is exactly what you're  
>> looking for
>>> imo.
>>>
>>> http://wiki.apache.org/solr/FieldCollapsing
>>>
>>
>> Thanks for the ref, good stuff.
>>
>> I think it's close, but if I understand this correctly, then I  
>> could get
>> (using just top two, versus top 10 for simplicity) results that  
>> looked like
>>
>> "dog training" (faceted field value A)
>> "super dog" (faceted field value B)
>>
>> but if the actual faceted field value/hit counts were:
>>
>> C (10)
>> D (8)
>> A (2)
>> B (1)
>>
>> Then what I'd want is the top hit for "dog AND facet field:C",  
>> followed by
>> "dog AND facet field:D".
>>
>> Used field collapsing would improve the probability that if I asked  
>> for the
>> top 100 hits, I'd find entries for each of my top N faceted field  
>> values.
>>
>> Thanks again,
>>
>> -- Ken
>>
>>
>> I've got a situation where the key result from an initial search  
>> request
>>>> (let's say for "dog") is the list of values from a faceted field,  
>>>> sorted
>>>> by
>>>> hit count.
>>>>
>>>> For the top 10 of these faceted field values, I need to get the  
>>>> top hit
>>>> for
>>>> the target request ("dog") restricted to that value for the faceted
>>>> field.
>>>>
>>>> Currently this is 11 total requests, of which the 10 requests  
>>>> following
>>>> the
>>>> initial query can be made in parallel. But that's still a lot of
>>>> requests.
>>>>
>>>> So my questions are:
>>>>
>>>> 1. Is there any magic query to handle this with Solr as-is?
>>>>
>>>> 2. if not, is the best solution to create my own request handler?
>>>>
>>>> 3. And in that case, any input/tips on developing this type of  
>>>> custom
>>>> request handler?
>>>>
>>>> Thanks,
>>>>
>>>> -- Ken
>>>>
>>>
>> --------------------------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://bixolabs.com
>> e l a s t i c   w e b   m i n i n g
>>
>>
>>
>>
>>

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: Best solution to avoiding multiple query requests

Posted by Geert-Jan Brits <gb...@gmail.com>.

If I understand correctly: you want to sort your collapsed results by 'nr of
collapsed results'/ hits.

It seems this can't be done out-of-the-box using this patch (I'm not
entirely sure, at least it doesn't follow from the wiki-page. Perhaps best
is to check the jira-issues to make sure this isn't already available now,
but just not updated on the wiki)

Also I found a blogpost (from the patch creator afaik) with in the comments
someone with the same issue + some pointers.
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/

hope that helps,
Geert-jan

2010/8/4 Ken Krugler <kk...@transpac.com>

> Hi Geert-Jan,
>
>
> On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:
>
>  Field Collapsing (currently as patch) is exactly what you're looking for
>> imo.
>>
>> http://wiki.apache.org/solr/FieldCollapsing
>>
>
> Thanks for the ref, good stuff.
>
> I think it's close, but if I understand this correctly, then I could get
> (using just top two, versus top 10 for simplicity) results that looked like
>
> "dog training" (faceted field value A)
> "super dog" (faceted field value B)
>
> but if the actual faceted field value/hit counts were:
>
> C (10)
> D (8)
> A (2)
> B (1)
>
> Then what I'd want is the top hit for "dog AND facet field:C", followed by
> "dog AND facet field:D".
>
> Used field collapsing would improve the probability that if I asked for the
> top 100 hits, I'd find entries for each of my top N faceted field values.
>
> Thanks again,
>
> -- Ken
>
>
>  I've got a situation where the key result from an initial search request
>>> (let's say for "dog") is the list of values from a faceted field, sorted
>>> by
>>> hit count.
>>>
>>> For the top 10 of these faceted field values, I need to get the top hit
>>> for
>>> the target request ("dog") restricted to that value for the faceted
>>> field.
>>>
>>> Currently this is 11 total requests, of which the 10 requests following
>>> the
>>> initial query can be made in parallel. But that's still a lot of
>>> requests.
>>>
>>> So my questions are:
>>>
>>> 1. Is there any magic query to handle this with Solr as-is?
>>>
>>> 2. if not, is the best solution to create my own request handler?
>>>
>>> 3. And in that case, any input/tips on developing this type of custom
>>> request handler?
>>>
>>> Thanks,
>>>
>>> -- Ken
>>>
>>
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>

Re: Best solution to avoiding multiple query requests

Posted by Ken Krugler <kk...@transpac.com>.

Hi Geert-Jan,

On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

> Field Collapsing (currently as patch) is exactly what you're looking  
> for
> imo.
>
> http://wiki.apache.org/solr/FieldCollapsing

Thanks for the ref, good stuff.

I think it's close, but if I understand this correctly, then I could  
get (using just top two, versus top 10 for simplicity) results that  
looked like

"dog training" (faceted field value A)
"super dog" (faceted field value B)

but if the actual faceted field value/hit counts were:

C (10)
D (8)
A (2)
B (1)

Then what I'd want is the top hit for "dog AND facet field:C",  
followed by "dog AND facet field:D".

Used field collapsing would improve the probability that if I asked  
for the top 100 hits, I'd find entries for each of my top N faceted  
field values.

Thanks again,

-- Ken

>> I've got a situation where the key result from an initial search  
>> request
>> (let's say for "dog") is the list of values from a faceted field,  
>> sorted by
>> hit count.
>>
>> For the top 10 of these faceted field values, I need to get the top  
>> hit for
>> the target request ("dog") restricted to that value for the faceted  
>> field.
>>
>> Currently this is 11 total requests, of which the 10 requests  
>> following the
>> initial query can be made in parallel. But that's still a lot of  
>> requests.
>>
>> So my questions are:
>>
>> 1. Is there any magic query to handle this with Solr as-is?
>>
>> 2. if not, is the best solution to create my own request handler?
>>
>> 3. And in that case, any input/tips on developing this type of custom
>> request handler?
>>
>> Thanks,
>>
>> -- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: Best solution to avoiding multiple query requests

Posted by Geert-Jan Brits <gb...@gmail.com>.

Field Collapsing (currently as patch) is exactly what you're looking for
imo.

http://wiki.apache.org/solr/FieldCollapsing

<http://wiki.apache.org/solr/FieldCollapsing>Geert-Jan


2010/8/4 Ken Krugler <kk...@transpac.com>

> Hi all,
>
> I've got a situation where the key result from an initial search request
> (let's say for "dog") is the list of values from a faceted field, sorted by
> hit count.
>
> For the top 10 of these faceted field values, I need to get the top hit for
> the target request ("dog") restricted to that value for the faceted field.
>
> Currently this is 11 total requests, of which the 10 requests following the
> initial query can be made in parallel. But that's still a lot of requests.
>
> So my questions are:
>
> 1. Is there any magic query to handle this with Solr as-is?
>
> 2. if not, is the best solution to create my own request handler?
>
> 3. And in that case, any input/tips on developing this type of custom
> request handler?
>
> Thanks,
>
> -- Ken
>
>
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>

Re: Best solution to avoiding multiple query requests

Posted by kenf_nc <ke...@realestate.com>.

Not sure the processing would be any faster than just querying again, but, in
your original result set the first doc that has a field value that matches a
to 10 facet, will be the number 1 item if you fq on that facet value. So you
don't need to query it again. You would only need to query those that aren't
in your result set.
ie:
   q=dog&facet=on&facet.field=foo
results 10 docs
   id=1, foo=A
   id=2, foo=A
   id=3, foo=B
   id=4, foo=C
   id=5, foo=B
   id=6, foo=A
   id=7, foo=Z
   id=8, foo=T
   id=9, foo=B
   id=10, foo=J

If your facet results top 10 were (A, B, T, J, D, X, Q, O, P, I)
you already have the number 1 for A (id 1), B (id 3), T (id 8) and J (id 10)
in your very first query. You only need to query D, X, Q, O, P, I. 

If your first query returned 100 instead of 10 you may even have more of the
top 10 represented. Again, the processing steps you would need to do may not
be any faster than re-querying, it depends on the speed of your index and
network etc.

I would think that if your second query was
q=dog&fq=(foo=A OR foo=B OR foo=T...etc) then you have even a greater chance
of having the number 1 result for each of the top 10 in just your second
query.

  
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Best-solution-to-avoiding-multiple-query-requests-tp1020886p1022397.html
Sent from the Solr - User mailing list archive at Nabble.com.