You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by nima dilmaghani <ni...@gmail.com> on 2010/04/21 19:40:06 UTC

shuffle or randomize method?

Hello,

Lets say that a particular search results in 100 records with exactly the
same score.  Lets say that we are getting the top 10.  The top 10 will be
the first 10 documents that were indexed.  If source A was indexed first and
Source B second, the user always will see the 10 results from source A only.


Is there a shuffle method that will shuffle the docs that can be called
before a call to Optimize?  I assume it would be an expensive operation.  Is
there a better way to achieve randomness of display only when the scores are
equal.

Many thanks,


Nima

Re: shuffle or randomize method?

Posted by nima dilmaghani <ni...@gmail.com>.
I thought about that and it is not that easy.  Let me try to explain in a
different way,

Source A has 500,000 records
Source B has 500,000 records

Source A is indexed before source B.  (Yes I can round robin the indexing
between Source A and B, but at a later date when I index a new source C,
then I am again in trouble).

Now a search result returns 10 results.  I want the results to be ordered by
relevancy score which is the default in Lucene.  If the top ten results have
different scores, than we are all good, but under circumstances that happen
too often, we end up with a large number of results that have the same
score.  Under that scenario we end up with all the documents from Source A
returning at the top of the list.  Things that are not known to me at design
time are:

1.  If a particular search results in the top docs to have the same score or
different score.
2.  How many of the top docs have the same score.

As a concrete example, we don't know if the top 50 results have the same
score or the top 100, or none.  If I knew that, I could retrieve those docs
and then shuffle them. But that number is unknown.

The only way that I can think of to solve this is to recreate the index in a
round robin fashion anytime a new source is added to the system.

But there must be a better way.  I am surely not the first person who has
encountered this.  Or I am missing something terribly obvious.

Thanks!
On Wed, Apr 21, 2010 at 11:19 AM, Digy <di...@gmail.com> wrote:

> I am not sure that I understand your need well. But why don't you just
> shuffle the search results?
> DIGY
>
> -----Original Message-----
> From: nima dilmaghani [mailto:nimadi@gmail.com]
> Sent: Wednesday, April 21, 2010 8:40 PM
> To: lucene-net-user@lucene.apache.org
> Subject: shuffle or randomize method?
>
> Hello,
>
> Lets say that a particular search results in 100 records with exactly the
> same score.  Lets say that we are getting the top 10.  The top 10 will be
> the first 10 documents that were indexed.  If source A was indexed first
> and
> Source B second, the user always will see the 10 results from source A
> only.
>
>
> Is there a shuffle method that will shuffle the docs that can be called
> before a call to Optimize?  I assume it would be an expensive operation.
>  Is
> there a better way to achieve randomness of display only when the scores
> are
> equal.
>
> Many thanks,
>
>
> Nima
>
>


-- 
Nima

RE: shuffle or randomize method?

Posted by Digy <di...@gmail.com>.
I am not sure that I understand your need well. But why don't you just
shuffle the search results?
DIGY

-----Original Message-----
From: nima dilmaghani [mailto:nimadi@gmail.com] 
Sent: Wednesday, April 21, 2010 8:40 PM
To: lucene-net-user@lucene.apache.org
Subject: shuffle or randomize method?

Hello,

Lets say that a particular search results in 100 records with exactly the
same score.  Lets say that we are getting the top 10.  The top 10 will be
the first 10 documents that were indexed.  If source A was indexed first and
Source B second, the user always will see the 10 results from source A only.


Is there a shuffle method that will shuffle the docs that can be called
before a call to Optimize?  I assume it would be an expensive operation.  Is
there a better way to achieve randomness of display only when the scores are
equal.

Many thanks,


Nima