You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lebiram <le...@ymail.com> on 2009/02/05 13:17:42 UTC

TermQuery search returns the same Document several times

Hi All, 

Is it possible to somehow ensure that a document will be returned only once when collecting from HitCollector?


      

Re: TermQuery search returns the same Document several times

Posted by Erick Erickson <er...@gmail.com>.
Your coworker *might* have been talking about a Hits object when
iterating over it for documents past the 100th or so. See the
discussion list of the wiki for the messy details.

Well, you can always sort by a field rather than by score, see
SortField and associated. And you can always specify secondary
and tertiary... sorts.

I'll leave it to others for other suggestions since I'm in a rush.

Best
Erick

On Thu, Feb 5, 2009 at 8:44 AM, Lebiram <le...@ymail.com> wrote:

>
> Sorry, I might have misunderstood what my coworker told me.
>
> If HitCollector only returns a document once then he might be referring to
> an application ID that is assigned to a field that has been indexed twice or
> more with different document IDs.
>
> I'll clarify this with him.
>
> However is there a way to somehow do a group by field on the results? That
> field being the application ID?
>
> Thanks.
>
>
>
>
> ________________________________
> From: Erick Erickson <er...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Thursday, February 5, 2009 1:16:12 PM
> Subject: Re: TermQuery search returns the same Document several times
>
> I don't understand your question. From the API docs for
> HitCollector.collect:
>
> <<<Called once for every non-zero scoring document, with
> the document number and its score.>>>
>
> Can you ask your question another way? Because the
> only answer I can come up with is
> "HitCollector.collect only sees each document once by definition".
>
> Best
> Erick
>
> On Thu, Feb 5, 2009 at 7:17 AM, Lebiram <le...@ymail.com> wrote:
>
> > Hi All,
> >
> > Is it possible to somehow ensure that a document will be returned only
> once
> > when collecting from HitCollector?
> >
> >
> >
>
>
>
>
>

Re: TermQuery search returns the same Document several times

Posted by Karl Wettin <ka...@gmail.com>.
5 feb 2009 kl. 14.44 skrev Lebiram:

> If HitCollector only returns a document once then he might be  
> referring to an application ID that is assigned to a field that has  
> been indexed twice or more with different document IDs.
>
> I'll clarify this with him.
>
> However is there a way to somehow do a group by field on the  
> results? That field being the application ID?


There is no built in feature for your request, I think it needs to be  
handled by post processing of the collected documents. I recently  
implemented that for an application:

(Perhaps it is possible to implement in a better way using a function  
query.)

It collects lots of documents and expose them to the consumer via a  
facade that lazily load documents from the IndexReader as they are  
requested. A Set<MyPrimaryKey> keeps track of if the entity already is  
a member of the results but with a greater score.

This means I must estimate the number of total hits (and how many  
documents to collect in order to collect enough entities as requested  
by the client) with the mean number of documents collected per entity  
in an average query.


       karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TermQuery search returns the same Document several times

Posted by Lebiram <le...@ymail.com>.
Sorry, I might have misunderstood what my coworker told me.

If HitCollector only returns a document once then he might be referring to an application ID that is assigned to a field that has been indexed twice or more with different document IDs.

I'll clarify this with him.

However is there a way to somehow do a group by field on the results? That field being the application ID?

Thanks.




________________________________
From: Erick Erickson <er...@gmail.com>
To: java-user@lucene.apache.org
Sent: Thursday, February 5, 2009 1:16:12 PM
Subject: Re: TermQuery search returns the same Document several times

I don't understand your question. From the API docs for
HitCollector.collect:

<<<Called once for every non-zero scoring document, with
the document number and its score.>>>

Can you ask your question another way? Because the
only answer I can come up with is
"HitCollector.collect only sees each document once by definition".

Best
Erick

On Thu, Feb 5, 2009 at 7:17 AM, Lebiram <le...@ymail.com> wrote:

> Hi All,
>
> Is it possible to somehow ensure that a document will be returned only once
> when collecting from HitCollector?
>
>
>



      

Re: TermQuery search returns the same Document several times

Posted by Erick Erickson <er...@gmail.com>.
I don't understand your question. From the API docs for
HitCollector.collect:

<<<Called once for every non-zero scoring document, with
the document number and its score.>>>

Can you ask your question another way? Because the
only answer I can come up with is
"HitCollector.collect only sees each document once by definition".

Best
Erick

On Thu, Feb 5, 2009 at 7:17 AM, Lebiram <le...@ymail.com> wrote:

> Hi All,
>
> Is it possible to somehow ensure that a document will be returned only once
> when collecting from HitCollector?
>
>
>