You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Aaron Gibbons <ag...@synergydatasystems.com> on 2014/07/11 19:46:16 UTC

Group only top 50 results not All results.

I'm trying to figure out how I can query solr for the top X results THEN
group and count only those top 50 by their owner.

I can run a query to get the top 50 results that I want.
solr/select?q=(current_position_title%3a(TEST))&rows=50

I've tried Faceting but I get all results faceted not just the top 50:
solr/select?q=(current_position_title%3a(TEST))&start=0&rows=50&facet=true&facet.field=recruiterkeyid&facet.limit=-1&facet.mincount=1&facet.sort=true

I've tried Grouping and get all results again grouped not just the top 50.
solr/select?q=(current_position_title%3a(TEST))&rows=50&group=true&group.field=recruiterkeyid&group.limit=1&group.format=grouped&version=2.2

I could also run one search to get the top X record Id's then run a second
Grouped query on those but I was hoping there was a less expensive way run
the search.

So what I need to get back are the distinct recruiterkeyid's from the top X
query and the count of how many there are only in the top X results.  I'll
ultimately want to query the results for each of individual recruiterkeyid
as well.  I'm using SolrNet to build the query.

Thank you for your help,
Aaron

Re: Group only top 50 results not All results.

Posted by Umesh Prasad <um...@gmail.com>.
Another way is to extend the existing Facets component.   FacetsComponent
uses SimpleFacets to compute facets where it passes the matching docset
(rb.getResults.docSet) as an argument in constructor. Instead you can pass
it the ranked docList  by passing (rb.getResults.docList).

Basically 3 steps
1. Develop your custom facet component.
For reference you can look at source cod of FacetsComponent.
https://github.com/apache/lucene-solr/blob/d49f297a4c7ab2c518717fa5a6ceeeda222349c3/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
(line 79 - 82)

2.  Register the Extended FacetComponent as custom component in
solrconfig.xml
It will look something like

  <searchComponent name="myfacet"
class="com.flipkart.solr.handler.component.MyFacetComponent" />

3. Call that as part of your custom request handler pipeline.
    <arr name="last-components">
        <str>myfacet</str>

You can check
http://sujitpal.blogspot.in/2011/04/custom-solr-search-components-2-dev.html
for a sample.





On 13 July 2014 00:02, Joel Bernstein <jo...@gmail.com> wrote:

> I agree with Alex a PostFilter would work. But it would be a somewhat
> tricky PostFilter to write. You would need to collect the top 50 documents
> using a priority queue in the DelegatingCollector.collect() method. Then in
> the DelegatingCollector.finish() method you would send the top documents to
> the lower collectors. Grouping supports PostFilters so this should work
> with Grouping or you could use the CollapsingQParserPlugin.
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Sat, Jul 12, 2014 at 1:31 PM, Alexandre Rafalovitch <arafalov@gmail.com
> >
> wrote:
>
> > I don't think either grouping or faceting work as postfilter.
> > Otherwise, that would be one way. Have a custom post-filter that only
> > allows top 50 documents and have grouping run as an even-higher-cost
> > postfilter after that.
> >
> > Regards,
> >    Alex.
> > Personal: http://www.outerthoughts.com/ and @arafalov
> > Solr resources: http://www.solr-start.com/ and @solrstart
> > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
> >
> >
> > On Sat, Jul 12, 2014 at 11:49 PM, Erick Erickson
> > <er...@gmail.com> wrote:
> > > You could also return the top 50 groups. That will certainly contain
> the
> > top
> > > 50 responses. The app layer could then do some local sorting to figure
> > > out what was correct. Maybe you'd be returning 3 docs in each or
> > something...
> > >
> > > I'd probably only go there if Michael's approach didn't work out
> though.
> > >
> > > On Fri, Jul 11, 2014 at 10:52 AM, Michael Ryan <mr...@moreover.com>
> > wrote:
> > >> I suggest doing this in two queries. In the first query, retrieve the
> > unique ids of the top 50 documents. In the second query, just query for
> > those ids (e.g., q=ids:(2 13 55 62 81)), and add the facet parameters on
> > that query.
> > >>
> > >> -Michael
> > >>
> > >> -----Original Message-----
> > >> From: Aaron Gibbons [mailto:agibbons@synergydatasystems.com]
> > >> Sent: Friday, July 11, 2014 1:46 PM
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Group only top 50 results not All results.
> > >>
> > >> I'm trying to figure out how I can query solr for the top X results
> > THEN group and count only those top 50 by their owner.
> > >>
> > >> I can run a query to get the top 50 results that I want.
> > >> solr/select?q=(current_position_title%3a(TEST))&rows=50
> > >>
> > >> I've tried Faceting but I get all results faceted not just the top 50:
> > >>
> >
> solr/select?q=(current_position_title%3a(TEST))&start=0&rows=50&facet=true&facet.field=recruiterkeyid&facet.limit=-1&facet.mincount=1&facet.sort=true
> > >>
> > >> I've tried Grouping and get all results again grouped not just the top
> > 50.
> > >>
> >
> solr/select?q=(current_position_title%3a(TEST))&rows=50&group=true&group.field=recruiterkeyid&group.limit=1&group.format=grouped&version=2.2
> > >>
> > >> I could also run one search to get the top X record Id's then run a
> > second Grouped query on those but I was hoping there was a less expensive
> > way run the search.
> > >>
> > >> So what I need to get back are the distinct recruiterkeyid's from the
> > top X query and the count of how many there are only in the top X
> results.
> >  I'll ultimately want to query the results for each of individual
> > recruiterkeyid as well.  I'm using SolrNet to build the query.
> > >>
> > >> Thank you for your help,
> > >> Aaron
> >
>



-- 
---
Thanks & Regards
Umesh Prasad

Re: Group only top 50 results not All results.

Posted by Joel Bernstein <jo...@gmail.com>.
I agree with Alex a PostFilter would work. But it would be a somewhat
tricky PostFilter to write. You would need to collect the top 50 documents
using a priority queue in the DelegatingCollector.collect() method. Then in
the DelegatingCollector.finish() method you would send the top documents to
the lower collectors. Grouping supports PostFilters so this should work
with Grouping or you could use the CollapsingQParserPlugin.

Joel Bernstein
Search Engineer at Heliosearch


On Sat, Jul 12, 2014 at 1:31 PM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> I don't think either grouping or faceting work as postfilter.
> Otherwise, that would be one way. Have a custom post-filter that only
> allows top 50 documents and have grouping run as an even-higher-cost
> postfilter after that.
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On Sat, Jul 12, 2014 at 11:49 PM, Erick Erickson
> <er...@gmail.com> wrote:
> > You could also return the top 50 groups. That will certainly contain the
> top
> > 50 responses. The app layer could then do some local sorting to figure
> > out what was correct. Maybe you'd be returning 3 docs in each or
> something...
> >
> > I'd probably only go there if Michael's approach didn't work out though.
> >
> > On Fri, Jul 11, 2014 at 10:52 AM, Michael Ryan <mr...@moreover.com>
> wrote:
> >> I suggest doing this in two queries. In the first query, retrieve the
> unique ids of the top 50 documents. In the second query, just query for
> those ids (e.g., q=ids:(2 13 55 62 81)), and add the facet parameters on
> that query.
> >>
> >> -Michael
> >>
> >> -----Original Message-----
> >> From: Aaron Gibbons [mailto:agibbons@synergydatasystems.com]
> >> Sent: Friday, July 11, 2014 1:46 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Group only top 50 results not All results.
> >>
> >> I'm trying to figure out how I can query solr for the top X results
> THEN group and count only those top 50 by their owner.
> >>
> >> I can run a query to get the top 50 results that I want.
> >> solr/select?q=(current_position_title%3a(TEST))&rows=50
> >>
> >> I've tried Faceting but I get all results faceted not just the top 50:
> >>
> solr/select?q=(current_position_title%3a(TEST))&start=0&rows=50&facet=true&facet.field=recruiterkeyid&facet.limit=-1&facet.mincount=1&facet.sort=true
> >>
> >> I've tried Grouping and get all results again grouped not just the top
> 50.
> >>
> solr/select?q=(current_position_title%3a(TEST))&rows=50&group=true&group.field=recruiterkeyid&group.limit=1&group.format=grouped&version=2.2
> >>
> >> I could also run one search to get the top X record Id's then run a
> second Grouped query on those but I was hoping there was a less expensive
> way run the search.
> >>
> >> So what I need to get back are the distinct recruiterkeyid's from the
> top X query and the count of how many there are only in the top X results.
>  I'll ultimately want to query the results for each of individual
> recruiterkeyid as well.  I'm using SolrNet to build the query.
> >>
> >> Thank you for your help,
> >> Aaron
>

Re: Group only top 50 results not All results.

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I don't think either grouping or faceting work as postfilter.
Otherwise, that would be one way. Have a custom post-filter that only
allows top 50 documents and have grouping run as an even-higher-cost
postfilter after that.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Sat, Jul 12, 2014 at 11:49 PM, Erick Erickson
<er...@gmail.com> wrote:
> You could also return the top 50 groups. That will certainly contain the top
> 50 responses. The app layer could then do some local sorting to figure
> out what was correct. Maybe you'd be returning 3 docs in each or something...
>
> I'd probably only go there if Michael's approach didn't work out though.
>
> On Fri, Jul 11, 2014 at 10:52 AM, Michael Ryan <mr...@moreover.com> wrote:
>> I suggest doing this in two queries. In the first query, retrieve the unique ids of the top 50 documents. In the second query, just query for those ids (e.g., q=ids:(2 13 55 62 81)), and add the facet parameters on that query.
>>
>> -Michael
>>
>> -----Original Message-----
>> From: Aaron Gibbons [mailto:agibbons@synergydatasystems.com]
>> Sent: Friday, July 11, 2014 1:46 PM
>> To: solr-user@lucene.apache.org
>> Subject: Group only top 50 results not All results.
>>
>> I'm trying to figure out how I can query solr for the top X results THEN group and count only those top 50 by their owner.
>>
>> I can run a query to get the top 50 results that I want.
>> solr/select?q=(current_position_title%3a(TEST))&rows=50
>>
>> I've tried Faceting but I get all results faceted not just the top 50:
>> solr/select?q=(current_position_title%3a(TEST))&start=0&rows=50&facet=true&facet.field=recruiterkeyid&facet.limit=-1&facet.mincount=1&facet.sort=true
>>
>> I've tried Grouping and get all results again grouped not just the top 50.
>> solr/select?q=(current_position_title%3a(TEST))&rows=50&group=true&group.field=recruiterkeyid&group.limit=1&group.format=grouped&version=2.2
>>
>> I could also run one search to get the top X record Id's then run a second Grouped query on those but I was hoping there was a less expensive way run the search.
>>
>> So what I need to get back are the distinct recruiterkeyid's from the top X query and the count of how many there are only in the top X results.  I'll ultimately want to query the results for each of individual recruiterkeyid as well.  I'm using SolrNet to build the query.
>>
>> Thank you for your help,
>> Aaron

Re: Group only top 50 results not All results.

Posted by Erick Erickson <er...@gmail.com>.
You could also return the top 50 groups. That will certainly contain the top
50 responses. The app layer could then do some local sorting to figure
out what was correct. Maybe you'd be returning 3 docs in each or something...

I'd probably only go there if Michael's approach didn't work out though.

On Fri, Jul 11, 2014 at 10:52 AM, Michael Ryan <mr...@moreover.com> wrote:
> I suggest doing this in two queries. In the first query, retrieve the unique ids of the top 50 documents. In the second query, just query for those ids (e.g., q=ids:(2 13 55 62 81)), and add the facet parameters on that query.
>
> -Michael
>
> -----Original Message-----
> From: Aaron Gibbons [mailto:agibbons@synergydatasystems.com]
> Sent: Friday, July 11, 2014 1:46 PM
> To: solr-user@lucene.apache.org
> Subject: Group only top 50 results not All results.
>
> I'm trying to figure out how I can query solr for the top X results THEN group and count only those top 50 by their owner.
>
> I can run a query to get the top 50 results that I want.
> solr/select?q=(current_position_title%3a(TEST))&rows=50
>
> I've tried Faceting but I get all results faceted not just the top 50:
> solr/select?q=(current_position_title%3a(TEST))&start=0&rows=50&facet=true&facet.field=recruiterkeyid&facet.limit=-1&facet.mincount=1&facet.sort=true
>
> I've tried Grouping and get all results again grouped not just the top 50.
> solr/select?q=(current_position_title%3a(TEST))&rows=50&group=true&group.field=recruiterkeyid&group.limit=1&group.format=grouped&version=2.2
>
> I could also run one search to get the top X record Id's then run a second Grouped query on those but I was hoping there was a less expensive way run the search.
>
> So what I need to get back are the distinct recruiterkeyid's from the top X query and the count of how many there are only in the top X results.  I'll ultimately want to query the results for each of individual recruiterkeyid as well.  I'm using SolrNet to build the query.
>
> Thank you for your help,
> Aaron

RE: Group only top 50 results not All results.

Posted by Michael Ryan <mr...@moreover.com>.
I suggest doing this in two queries. In the first query, retrieve the unique ids of the top 50 documents. In the second query, just query for those ids (e.g., q=ids:(2 13 55 62 81)), and add the facet parameters on that query.

-Michael

-----Original Message-----
From: Aaron Gibbons [mailto:agibbons@synergydatasystems.com] 
Sent: Friday, July 11, 2014 1:46 PM
To: solr-user@lucene.apache.org
Subject: Group only top 50 results not All results.

I'm trying to figure out how I can query solr for the top X results THEN group and count only those top 50 by their owner.

I can run a query to get the top 50 results that I want.
solr/select?q=(current_position_title%3a(TEST))&rows=50

I've tried Faceting but I get all results faceted not just the top 50:
solr/select?q=(current_position_title%3a(TEST))&start=0&rows=50&facet=true&facet.field=recruiterkeyid&facet.limit=-1&facet.mincount=1&facet.sort=true

I've tried Grouping and get all results again grouped not just the top 50.
solr/select?q=(current_position_title%3a(TEST))&rows=50&group=true&group.field=recruiterkeyid&group.limit=1&group.format=grouped&version=2.2

I could also run one search to get the top X record Id's then run a second Grouped query on those but I was hoping there was a less expensive way run the search.

So what I need to get back are the distinct recruiterkeyid's from the top X query and the count of how many there are only in the top X results.  I'll ultimately want to query the results for each of individual recruiterkeyid as well.  I'm using SolrNet to build the query.

Thank you for your help,
Aaron