You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Fischer, Stephen" <sf...@pennmedicine.upenn.edu> on 2020/02/11 23:15:42 UTC

per-field count of documents matched?

Hi wise Solr experts,

For our scientific use-case we want to show users a per-field count of documents that match that field.

We like to do this efficiently because we might return up to a million documents.

For example, if we had documents describing People, and a query of, say, "Stone" we might want to show

Fields matched:
  Last name:  145
  Street: 431
  Favorite rock band:  13
  Home exterior: 2340

Is there an efficient way to do this?

So far, we're trying to leverage highlighting.   But it seems very slow.

Thanks

Re: [External] Re: per-field count of documents matched?

Posted by Susmit <sh...@gmail.com>.
i used json facet api for a similar requirement. it can ignore filters from main query if needed and roll up the hit counts to any field ..


> On Feb 11, 2020, at 6:19 PM, Fischer, Stephen <sf...@pennmedicine.upenn.edu> wrote:
> 
> Thanks very much!   By the way, we are using eDisMax, and the queries our UI supports don't include fancy Booleans, so your ideas just might work
> 
> Thanks again,
> Steve
> 
> -----Original Message-----
> From: Erick Erickson <er...@gmail.com> 
> Sent: Tuesday, February 11, 2020 7:16 PM
> To: solr-user@lucene.apache.org
> Subject: [External] Re: per-field count of documents matched?
> 
> Hmmm, you could do a facet query (or a series of them). facet.query=LastName:stone&facet.query=Street:stone etc….. That’d automatically only tally for the docs that match.
> 
> You could also consider a custom search component. For the exact case you describe, it’s actually fairly simple. The postings list has, for each term, the list of docs that contain it (internal Lucene doc ID). So I might have for field LastName:
> stone -> 1,73,100…
> 
> for field Street:
> stone-> 264,933…
> 
> So it’s simply a matter of, for each term, and each doc the overall query matches go down the list of docs and add them up.
> 
> However… I’m not sure you’d get what you want in either case. Consider a query (A AND B) OR (C AND D). And let’s say doc1 contains A in LastName, and C,D in Street. Should A be counted in the LastName tally for this doc?
> 
> I suppose you could put the full query in the facet.query above. I’m still not sure it’s what you need, since I’m not sure what "per-field count of documents that match” means in your application…
> 
> Best,
> Erick
> 
>> On Feb 11, 2020, at 6:15 PM, Fischer, Stephen <sf...@pennmedicine.upenn.edu> wrote:
>> 
>> Hi wise Solr experts,
>> 
>> For our scientific use-case we want to show users a per-field count of documents that match that field.
>> 
>> We like to do this efficiently because we might return up to a million documents.
>> 
>> For example, if we had documents describing People, and a query of, 
>> say, "Stone" we might want to show
>> 
>> Fields matched:
>> Last name:  145
>> Street: 431
>> Favorite rock band:  13
>> Home exterior: 2340
>> 
>> Is there an efficient way to do this?
>> 
>> So far, we're trying to leverage highlighting.   But it seems very slow.
>> 
>> Thanks
> 

RE: [External] Re: per-field count of documents matched?

Posted by "Fischer, Stephen" <sf...@pennmedicine.upenn.edu>.
Thanks very much!   By the way, we are using eDisMax, and the queries our UI supports don't include fancy Booleans, so your ideas just might work

Thanks again,
Steve

-----Original Message-----
From: Erick Erickson <er...@gmail.com> 
Sent: Tuesday, February 11, 2020 7:16 PM
To: solr-user@lucene.apache.org
Subject: [External] Re: per-field count of documents matched?

Hmmm, you could do a facet query (or a series of them). facet.query=LastName:stone&facet.query=Street:stone etc….. That’d automatically only tally for the docs that match.

You could also consider a custom search component. For the exact case you describe, it’s actually fairly simple. The postings list has, for each term, the list of docs that contain it (internal Lucene doc ID). So I might have for field LastName:
stone -> 1,73,100…

for field Street:
stone-> 264,933…

So it’s simply a matter of, for each term, and each doc the overall query matches go down the list of docs and add them up.

However… I’m not sure you’d get what you want in either case. Consider a query (A AND B) OR (C AND D). And let’s say doc1 contains A in LastName, and C,D in Street. Should A be counted in the LastName tally for this doc?

I suppose you could put the full query in the facet.query above. I’m still not sure it’s what you need, since I’m not sure what "per-field count of documents that match” means in your application…

Best,
Erick

> On Feb 11, 2020, at 6:15 PM, Fischer, Stephen <sf...@pennmedicine.upenn.edu> wrote:
> 
> Hi wise Solr experts,
> 
> For our scientific use-case we want to show users a per-field count of documents that match that field.
> 
> We like to do this efficiently because we might return up to a million documents.
> 
> For example, if we had documents describing People, and a query of, 
> say, "Stone" we might want to show
> 
> Fields matched:
>  Last name:  145
>  Street: 431
>  Favorite rock band:  13
>  Home exterior: 2340
> 
> Is there an efficient way to do this?
> 
> So far, we're trying to leverage highlighting.   But it seems very slow.
> 
> Thanks


Re: per-field count of documents matched?

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, you could do a facet query (or a series of them). facet.query=LastName:stone&facet.query=Street:stone etc….. That’d automatically only tally for the docs that match.

You could also consider a custom search component. For the exact case you describe, it’s actually fairly simple. The postings list has, for each term, the list of docs that contain it (internal Lucene doc ID). So I might have 
for field LastName:
stone -> 1,73,100…

for field Street:
stone-> 264,933…

So it’s simply a matter of, for each term, and each doc the overall query matches go down the list of docs and add them up.

However… I’m not sure you’d get what you want in either case. Consider a query (A AND B) OR (C AND D). And let’s say doc1 contains A in LastName, and C,D in Street. Should A be counted in the LastName tally for this doc?

I suppose you could put the full query in the facet.query above. I’m still not sure it’s what you need, since I’m not sure what "per-field count of documents that match” means in your application…

Best,
Erick

> On Feb 11, 2020, at 6:15 PM, Fischer, Stephen <sf...@pennmedicine.upenn.edu> wrote:
> 
> Hi wise Solr experts,
> 
> For our scientific use-case we want to show users a per-field count of documents that match that field.
> 
> We like to do this efficiently because we might return up to a million documents.
> 
> For example, if we had documents describing People, and a query of, say, "Stone" we might want to show
> 
> Fields matched:
>  Last name:  145
>  Street: 431
>  Favorite rock band:  13
>  Home exterior: 2340
> 
> Is there an efficient way to do this?
> 
> So far, we're trying to leverage highlighting.   But it seems very slow.
> 
> Thanks