You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ganesh <em...@yahoo.co.in> on 2009/09/02 12:36:35 UTC

First result in the group

Hello all,

I want to retrieve the first result in the group. How to acheive this? Currently i am parsing all the results, using a hash and avoiding duplicate entries. 

Is there any better way?

Regards
Ganesh
Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: First result in the group

Posted by Mark Harwood <ma...@yahoo.co.uk>.
 >>It removes the duplicates at query time and not in the results.


Not sure I understand that statement. Do you mean you want index-time  
rejection of potentially duplicate inserts?



On 4 Sep 2009, at 07:01, Ganesh wrote:

> It removes the duplicates at query time and not in the results.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: First result in the group

Posted by Ganesh <em...@yahoo.co.in>.
Thanks shai and mark for your suggestions.

I initially tried DuplicateFilter and it is not giving me expected results. It removes the duplicates at query time and not in the results.

Regards
Ganesh

----- Original Message ----- 
From: "mark harwood" <ma...@yahoo.co.uk>
To: <ja...@lucene.apache.org>
Sent: Wednesday, September 02, 2009 5:36 PM
Subject: Re: First result in the group


See "DuplicateFilter" in contrib.

http://markmail.org/message/lsvnpu7mwhht3a4p

Cheers
Mark



----- Original Message ----
From: Ganesh <em...@yahoo.co.in>
To: java-user@lucene.apache.org
Sent: Wednesday, 2 September, 2009 12:38:35
Subject: Re: First result in the group

I have a field called category and all documents will have belong to some category( say some belong to X and some Y etc). The field values may change dynamically. I want the search results to be filterted to retrieve one document per category. 

This is similar to 'group by' feature in database.

Regards
Ganesh


----- Original Message ----- 
From: "Shai Erera" <se...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Wednesday, September 02, 2009 4:33 PM
Subject: Re: First result in the group


> What do you mean by "first result in the group"? What is a group?
> 
> On Wed, Sep 2, 2009 at 1:36 PM, Ganesh <em...@yahoo.co.in> wrote:
> 
>> Hello all,
>>
>> I want to retrieve the first result in the group. How to acheive this?
>> Currently i am parsing all the results, using a hash and avoiding duplicate
>> entries.
>>
>> Is there any better way?
>>
>> Regards
>> Ganesh
>> Send instant messages to your online friends http://in.messenger.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
Send instant messages to your online friends http://in.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: First result in the group

Posted by mark harwood <ma...@yahoo.co.uk>.
See "DuplicateFilter" in contrib.

http://markmail.org/message/lsvnpu7mwhht3a4p

Cheers
Mark



----- Original Message ----
From: Ganesh <em...@yahoo.co.in>
To: java-user@lucene.apache.org
Sent: Wednesday, 2 September, 2009 12:38:35
Subject: Re: First result in the group

I have a field called category and all documents will have belong to some category( say some belong to X and some Y etc). The field values may change dynamically. I want the search results to be filterted to retrieve one document per category. 

This is similar to 'group by' feature in database.

Regards
Ganesh


----- Original Message ----- 
From: "Shai Erera" <se...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Wednesday, September 02, 2009 4:33 PM
Subject: Re: First result in the group


> What do you mean by "first result in the group"? What is a group?
> 
> On Wed, Sep 2, 2009 at 1:36 PM, Ganesh <em...@yahoo.co.in> wrote:
> 
>> Hello all,
>>
>> I want to retrieve the first result in the group. How to acheive this?
>> Currently i am parsing all the results, using a hash and avoiding duplicate
>> entries.
>>
>> Is there any better way?
>>
>> Regards
>> Ganesh
>> Send instant messages to your online friends http://in.messenger.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
Send instant messages to your online friends http://in.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: First result in the group

Posted by Shai Erera <se...@gmail.com>.
I see ... the solution I have in mind is not simple, but it follows the
Collector approach. Index categories as payloads of documents such that
there is one field (cats:all for example) that includes a posting list for
all documents, each has the categories it is associated w/ in its payload:
cats:all --> 0 [cat1, cat2] 1 [cat1, cat3, cat4] 4 [cat2, cat3, cat4] ...

Then when Collector.collect() is called, skip to that doc ID and read its
categories and store the ID in category maps (i.e., you'll have maps for
cat1, cat2, ... catN, each will include all doc IDs, sorted by score, which
were collected by this query).

Then you can fetch all categories whose maps/sets are not empty and display
M docs from each.

If you know in advance which categories are requested to be grouped by, for
example I want a group by on categories 1, 3, 4 and 7, you can optimize the
solution further, but I'm not sure if that's what you requested.

Also, if you can translate category Strings to integers, you can store more
efficient payloads ...

Shai

On Wed, Sep 2, 2009 at 2:38 PM, Ganesh <em...@yahoo.co.in> wrote:

> I have a field called category and all documents will have belong to some
> category( say some belong to X and some Y etc). The field values may change
> dynamically. I want the search results to be filterted to retrieve one
> document per category.
>
> This is similar to 'group by' feature in database.
>
> Regards
> Ganesh
>
>
> ----- Original Message -----
> From: "Shai Erera" <se...@gmail.com>
> To: <ja...@lucene.apache.org>
> Sent: Wednesday, September 02, 2009 4:33 PM
> Subject: Re: First result in the group
>
>
> > What do you mean by "first result in the group"? What is a group?
> >
> > On Wed, Sep 2, 2009 at 1:36 PM, Ganesh <em...@yahoo.co.in> wrote:
> >
> >> Hello all,
> >>
> >> I want to retrieve the first result in the group. How to acheive this?
> >> Currently i am parsing all the results, using a hash and avoiding
> duplicate
> >> entries.
> >>
> >> Is there any better way?
> >>
> >> Regards
> >> Ganesh
> >> Send instant messages to your online friends
> http://in.messenger.yahoo.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: First result in the group

Posted by Ganesh <em...@yahoo.co.in>.
I have a field called category and all documents will have belong to some category( say some belong to X and some Y etc). The field values may change dynamically. I want the search results to be filterted to retrieve one document per category. 

This is similar to 'group by' feature in database.

Regards
Ganesh
 

----- Original Message ----- 
From: "Shai Erera" <se...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Wednesday, September 02, 2009 4:33 PM
Subject: Re: First result in the group


> What do you mean by "first result in the group"? What is a group?
> 
> On Wed, Sep 2, 2009 at 1:36 PM, Ganesh <em...@yahoo.co.in> wrote:
> 
>> Hello all,
>>
>> I want to retrieve the first result in the group. How to acheive this?
>> Currently i am parsing all the results, using a hash and avoiding duplicate
>> entries.
>>
>> Is there any better way?
>>
>> Regards
>> Ganesh
>> Send instant messages to your online friends http://in.messenger.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: First result in the group

Posted by Shai Erera <se...@gmail.com>.
What do you mean by "first result in the group"? What is a group?

On Wed, Sep 2, 2009 at 1:36 PM, Ganesh <em...@yahoo.co.in> wrote:

> Hello all,
>
> I want to retrieve the first result in the group. How to acheive this?
> Currently i am parsing all the results, using a hash and avoiding duplicate
> entries.
>
> Is there any better way?
>
> Regards
> Ganesh
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>