You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by BorisCallens <bo...@gmail.com> on 2009/03/17 13:47:46 UTC

get distinct values of one field from query

In my project I have a query that can possibly return several millions of
documents.
>From these documents I always want the unique values from a certain field.
For the sake of clarity we can take for example the "id" field.

Currently I'm pulling out all the values for the id field, distincting them
in my application (c# in my case, but could be any language off course) and
then returning these values.
In scenarios where the query only returns several hundreds of rows, this
works fast enough. But pulling out several million values and distincting
them can take quite some time.

Is there a more performant way to do this?

--Example code (C#)
    hits = searcher.Search(query);
    List<string> idStrings = new List<string>();
    int count = hits.Length();
    for (int i = 0; i < count; i++)
    {
        idStrings.Add(hits.Doc(i).Get("id"));
    }
    idStrings = idStrings.Distinct<string>().ToList();
-- 
View this message in context: http://www.nabble.com/get-distinct-values-of-one-field-from-query-tp22558451p22558451.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: get distinct values of one field from query

Posted by BorisCallens <bo...@gmail.com>.
Thanks, I will have a look.


hossman wrote:
> 
> 
> : In my project I have a query that can possibly return several millions
> of
> : documents.
> : >From these documents I always want the unique values from a certain
> field.
> : For the sake of clarity we can take for example the "id" field.
> 
> what you are describing sounds similar to the general concept of "faceted 
> searching" also frequently discussed on teh mailing lists under the 
> question of "category counts" ... in your specific case however it seems 
> you don't care about the "counts" just the "categories" (ie: the distinct 
> set of values for a field across all matched documents)
> 
> given those search terms, i'm guessing you'll find more then enough 
> information on the basic approaches that can be taken to tackle a faceting 
> problem with any varient of Lucene.
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/get-distinct-values-of-one-field-from-query-tp22558451p22738302.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: get distinct values of one field from query

Posted by Chris Hostetter <ho...@fucit.org>.
: In my project I have a query that can possibly return several millions of
: documents.
: >From these documents I always want the unique values from a certain field.
: For the sake of clarity we can take for example the "id" field.

what you are describing sounds similar to the general concept of "faceted 
searching" also frequently discussed on teh mailing lists under the 
question of "category counts" ... in your specific case however it seems 
you don't care about the "counts" just the "categories" (ie: the distinct 
set of values for a field across all matched documents)

given those search terms, i'm guessing you'll find more then enough 
information on the basic approaches that can be taken to tackle a faceting 
problem with any varient of Lucene.



-Hoss