You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brent Ryan <br...@gmail.com> on 2013/09/24 19:11:25 UTC

SOLR grouped query sorting on numFound

We ran into 1 snag during development with SOLR and I thought I'd run it by
anyone to see if they had any slick ways to solve this issue.

Basically, we're performing a SOLR query with grouping and want to be able
to sort by the number of documents found within each group.

Our query response from SOLR looks something like this:

{

  "responseHeader":{

    "status":0,

    "QTime":17,

    "params":{

      "indent":"true",

      "q":"*:*",

      "group.limit":"0",

      "group.field":"rfp_stub",

      "group":"true",

      "wt":"json",

      "rows":"10000000"}},

  "grouped":{

    "rfp_stub":{

      "matches":18470,

      "groups":[{


"groupValue":"java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e",

          "doclist":{"*numFound*":3,"start":0,"docs":[]

          }},

        {


"groupValue":"java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce",

          "doclist":{"*numFound*":5,"start":0,"docs":[]

          }},

        {


"groupValue":"java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131",

          "doclist":{"*numFound*":6,"start":0,"docs":[]

          }},

…


The *numFound* shows the number of documents within that group.  Is there
anyway to perform a sort on *numFound* in SOLR ?  I don't believe this is
supported, but wondered if anyone their has come across this and if there
was any suggested workarounds given that the dataset is really too large to
hold in memory on our app servers?

Re: SOLR grouped query sorting on numFound

Posted by Erick Erickson <er...@gmail.com>.
but if it's too large on the client, wouldn't it also be too large on
the server? After all, you have to hold the entire set of groups in
memory since you can't know ahead of time which will be the largest.
Or at least the counts of them all. I suppose you could do some
two-pass process where you returned 1 doc/group with absolutely
minimal data (like score and ID) and then issued a second query that
got the data to display if (and only if) that suited your use-case.
Otherwise I'm afraid you're into custom Solr code....

Best,
Erick

On Wed, Sep 25, 2013 at 6:40 AM, Brent Ryan <br...@gmail.com> wrote:
> ya, that's the problem... you can't sort by "numFound" and it's not
> feasible to do the sort on the client because the grouped result set is too
> large.
>
> Brent
>
>
> On Wed, Sep 25, 2013 at 6:09 AM, Erick Erickson <er...@gmail.com>wrote:
>
>> Hmmm, just specifying &sort= is _almost_ what you want,
>> except it sorts by the value of fields in the doc not numFound.
>>
>> this shouldn't be hard to do on the client though, but you'd
>> have to return all the groups...
>>
>> FWIW,
>> Erick
>>
>> On Tue, Sep 24, 2013 at 1:11 PM, Brent Ryan <br...@gmail.com> wrote:
>> > We ran into 1 snag during development with SOLR and I thought I'd run it
>> by
>> > anyone to see if they had any slick ways to solve this issue.
>> >
>> > Basically, we're performing a SOLR query with grouping and want to be
>> able
>> > to sort by the number of documents found within each group.
>> >
>> > Our query response from SOLR looks something like this:
>> >
>> > {
>> >
>> >   "responseHeader":{
>> >
>> >     "status":0,
>> >
>> >     "QTime":17,
>> >
>> >     "params":{
>> >
>> >       "indent":"true",
>> >
>> >       "q":"*:*",
>> >
>> >       "group.limit":"0",
>> >
>> >       "group.field":"rfp_stub",
>> >
>> >       "group":"true",
>> >
>> >       "wt":"json",
>> >
>> >       "rows":"10000000"}},
>> >
>> >   "grouped":{
>> >
>> >     "rfp_stub":{
>> >
>> >       "matches":18470,
>> >
>> >       "groups":[{
>> >
>> >
>> > "groupValue":"java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e",
>> >
>> >           "doclist":{"*numFound*":3,"start":0,"docs":[]
>> >
>> >           }},
>> >
>> >         {
>> >
>> >
>> > "groupValue":"java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce",
>> >
>> >           "doclist":{"*numFound*":5,"start":0,"docs":[]
>> >
>> >           }},
>> >
>> >         {
>> >
>> >
>> > "groupValue":"java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131",
>> >
>> >           "doclist":{"*numFound*":6,"start":0,"docs":[]
>> >
>> >           }},
>> >
>> > …
>> >
>> >
>> > The *numFound* shows the number of documents within that group.  Is there
>> > anyway to perform a sort on *numFound* in SOLR ?  I don't believe this is
>> > supported, but wondered if anyone their has come across this and if there
>> > was any suggested workarounds given that the dataset is really too large
>> to
>> > hold in memory on our app servers?
>>

Re: SOLR grouped query sorting on numFound

Posted by Brent Ryan <br...@gmail.com>.
ya, that's the problem... you can't sort by "numFound" and it's not
feasible to do the sort on the client because the grouped result set is too
large.

Brent


On Wed, Sep 25, 2013 at 6:09 AM, Erick Erickson <er...@gmail.com>wrote:

> Hmmm, just specifying &sort= is _almost_ what you want,
> except it sorts by the value of fields in the doc not numFound.
>
> this shouldn't be hard to do on the client though, but you'd
> have to return all the groups...
>
> FWIW,
> Erick
>
> On Tue, Sep 24, 2013 at 1:11 PM, Brent Ryan <br...@gmail.com> wrote:
> > We ran into 1 snag during development with SOLR and I thought I'd run it
> by
> > anyone to see if they had any slick ways to solve this issue.
> >
> > Basically, we're performing a SOLR query with grouping and want to be
> able
> > to sort by the number of documents found within each group.
> >
> > Our query response from SOLR looks something like this:
> >
> > {
> >
> >   "responseHeader":{
> >
> >     "status":0,
> >
> >     "QTime":17,
> >
> >     "params":{
> >
> >       "indent":"true",
> >
> >       "q":"*:*",
> >
> >       "group.limit":"0",
> >
> >       "group.field":"rfp_stub",
> >
> >       "group":"true",
> >
> >       "wt":"json",
> >
> >       "rows":"10000000"}},
> >
> >   "grouped":{
> >
> >     "rfp_stub":{
> >
> >       "matches":18470,
> >
> >       "groups":[{
> >
> >
> > "groupValue":"java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e",
> >
> >           "doclist":{"*numFound*":3,"start":0,"docs":[]
> >
> >           }},
> >
> >         {
> >
> >
> > "groupValue":"java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce",
> >
> >           "doclist":{"*numFound*":5,"start":0,"docs":[]
> >
> >           }},
> >
> >         {
> >
> >
> > "groupValue":"java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131",
> >
> >           "doclist":{"*numFound*":6,"start":0,"docs":[]
> >
> >           }},
> >
> > …
> >
> >
> > The *numFound* shows the number of documents within that group.  Is there
> > anyway to perform a sort on *numFound* in SOLR ?  I don't believe this is
> > supported, but wondered if anyone their has come across this and if there
> > was any suggested workarounds given that the dataset is really too large
> to
> > hold in memory on our app servers?
>

Re: SOLR grouped query sorting on numFound

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, just specifying &sort= is _almost_ what you want,
except it sorts by the value of fields in the doc not numFound.

this shouldn't be hard to do on the client though, but you'd
have to return all the groups...

FWIW,
Erick

On Tue, Sep 24, 2013 at 1:11 PM, Brent Ryan <br...@gmail.com> wrote:
> We ran into 1 snag during development with SOLR and I thought I'd run it by
> anyone to see if they had any slick ways to solve this issue.
>
> Basically, we're performing a SOLR query with grouping and want to be able
> to sort by the number of documents found within each group.
>
> Our query response from SOLR looks something like this:
>
> {
>
>   "responseHeader":{
>
>     "status":0,
>
>     "QTime":17,
>
>     "params":{
>
>       "indent":"true",
>
>       "q":"*:*",
>
>       "group.limit":"0",
>
>       "group.field":"rfp_stub",
>
>       "group":"true",
>
>       "wt":"json",
>
>       "rows":"10000000"}},
>
>   "grouped":{
>
>     "rfp_stub":{
>
>       "matches":18470,
>
>       "groups":[{
>
>
> "groupValue":"java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e",
>
>           "doclist":{"*numFound*":3,"start":0,"docs":[]
>
>           }},
>
>         {
>
>
> "groupValue":"java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce",
>
>           "doclist":{"*numFound*":5,"start":0,"docs":[]
>
>           }},
>
>         {
>
>
> "groupValue":"java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131",
>
>           "doclist":{"*numFound*":6,"start":0,"docs":[]
>
>           }},
>
> …
>
>
> The *numFound* shows the number of documents within that group.  Is there
> anyway to perform a sort on *numFound* in SOLR ?  I don't believe this is
> supported, but wondered if anyone their has come across this and if there
> was any suggested workarounds given that the dataset is really too large to
> hold in memory on our app servers?