You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Olson, Ron" <RO...@lbpc.com> on 2011/06/27 17:44:25 UTC

Unique document count from index?

Hi all-

I have a problem that I'm not sure how it can be (if it can be) solved in Solr. I am using Solr 3.2 with patch 2524 installed to provide grouping. I need to return the count of unique records that match a particular query.

For an example of what I'm talking about, imagine I have an index of music CD orders, created from a SQL database using the DataImportHandler. It's possible that the person ordered multiple records by the same artist (e.g. order #1234 contains Pink Floyd "Wish You Were", Pink Floyd "Meddle", Pink Floyd "Obscured by Clouds"). One of the fields indexed and stored fields in the document is "Artist". If I do a search for Pink Floyd, using the order above, I'd get three documents, all with the same order number, for each of the Pink Floyd records. What I'd like to find out is how many unique orders have Pink Floyd across the entire index. The index has millions of documents.

I have been trying to see if the result grouping functionality provided by patch 2524 will help, but while it does collapse the query above into one document, the matches field is still the same as without the grouping (which I guess makes sense insofar as it is still reporting the number of documents it found for the query). I have also thought a subquery in my DataImportHandler might work, though I'm not sure how I'd structure it.

Thanks for any guidance on how to solve this problem; I know Solr isn't meant to be a data-mining tool and I'm guessing I'm skating perilously close to using it for that purpose, but anything I can do to take load from the actual database is considered a Good Thing by all concerned.

Ron

DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
Thank you.

Re: Unique document count from index?

Posted by Dmitry Kan <dm...@gmail.com>.
can you use facet search?

facet=true&facet.field=order_no&fq=order_no:(1234 OR 5678 OR
...)&fq=artist:Pink Floyd



On Mon, Jun 27, 2011 at 6:44 PM, Olson, Ron <RO...@lbpc.com> wrote:

> Hi all-
>
> I have a problem that I'm not sure how it can be (if it can be) solved in
> Solr. I am using Solr 3.2 with patch 2524 installed to provide grouping. I
> need to return the count of unique records that match a particular query.
>
> For an example of what I'm talking about, imagine I have an index of music
> CD orders, created from a SQL database using the DataImportHandler. It's
> possible that the person ordered multiple records by the same artist (e.g.
> order #1234 contains Pink Floyd "Wish You Were", Pink Floyd "Meddle", Pink
> Floyd "Obscured by Clouds"). One of the fields indexed and stored fields in
> the document is "Artist". If I do a search for Pink Floyd, using the order
> above, I'd get three documents, all with the same order number, for each of
> the Pink Floyd records. What I'd like to find out is how many unique orders
> have Pink Floyd across the entire index. The index has millions of
> documents.
>
> I have been trying to see if the result grouping functionality provided by
> patch 2524 will help, but while it does collapse the query above into one
> document, the matches field is still the same as without the grouping (which
> I guess makes sense insofar as it is still reporting the number of documents
> it found for the query). I have also thought a subquery in my
> DataImportHandler might work, though I'm not sure how I'd structure it.
>
> Thanks for any guidance on how to solve this problem; I know Solr isn't
> meant to be a data-mining tool and I'm guessing I'm skating perilously close
> to using it for that purpose, but anything I can do to take load from the
> actual database is considered a Good Thing by all concerned.
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or
> documents, is intended only for the addressee and may contain CONFIDENTIAL,
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended
> recipient, you are hereby notified that any use, disclosure, copying or
> distribution of this message or any of the information included in or with
> it is  unauthorized and strictly prohibited.  If you have received this
> message in error, please notify the sender immediately by reply e-mail and
> permanently delete and destroy this message and its attachments, along with
> any copies thereof. This message does not create any contractual obligation
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>



-- 
Regards,

Dmitry Kan