You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by rgamarra <rg...@gmail.com> on 2021/08/30 15:33:54 UTC

Random Field - # digits

Hi there! I'm using random fields (eg sort=random_1234 DESC) as a tie
breaker.

I'm wondering the underlying random sequence how many digits uses for each
generated number.

My result sets my contain (in principle) millions of results, so I would
like to have an estimation of possible clashes (ie two results ending with
the same random under, and then being a tie in the result set).

Best regards.

--
Rodolfo Federico Gamarra

Re: Random Field - # digits

Posted by rgamarra <rg...@gmail.com>.
Thanks. It's not what I need, but would have it mind.

Sorry, my statement was not clear. I have already replied  to Thomas with
further details.

thanks you all

rodolfo

On Tue, Aug 31, 2021, 9:08 AM Andrew Hankinson
<an...@rism.digital> wrote:

> You could use the UUIDUpdateProcessorFactory to automatically add a UUID
> to each document and use that as the tie-breaker field.
>
>
> https://solr.apache.org/guide/8_1/update-request-processors.html#uuidupdateprocessorfactory
>
> The chances of collision of UUIDs is well-known, and highly unlikely.
>
> https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions
>
>
>
> > On 31 Aug 2021, at 14:04, rgamarra <rg...@gmail.com> wrote:
> >
> > hi,
> >
> >> Random ≠ unique.
> >
> > Agree. They are not the same. I don't want a tie breaker, I want to know
> > how many ties I would face.
> >
> > The implementation where it's being used has some other (posterior)
> sorting
> > criteria. So the question can be rephrased as whether posterior orders
> have
> > any effect or not.
> >
> > For example, given
> >
> > sort= random_1234 DESC, price DESC
> >
> > At the end of the day, does the "price DESC" have any effect (which
> > translates to how often ties in the random do happen)?
> >
> > I took a glimpse at
> >
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/schema/RandomSortField.java
> > and I conclude that
> > - an int is being used.
> > - it's a hashing of the #doc + see, more than a random number generator
> of
> > a certain distribution.
> >
> > Best. Thanks.
> >
> >
> > --
> > Rodolfo Federico Gamarra
> >
> >
> > On Tue, Aug 31, 2021 at 3:00 AM Thomas Corthals <th...@klascement.net>
> > wrote:
> >
> >> Hi Rodolfo
> >>
> >> Random ≠ unique. If you really need a tie breaker, you'll have to sort
> on
> >> the uiqueKey field.
> >>
> >> What is your use case here? When using a cursor, sorting on a random
> field
> >> will yield confusing results.
> >>
> >> Thomas
> >>
> >> Op ma 30 aug. 2021 om 17:33 schreef rgamarra <rg...@gmail.com>:
> >>
> >>> Hi there! I'm using random fields (eg sort=random_1234 DESC) as a tie
> >>> breaker.
> >>>
> >>> I'm wondering the underlying random sequence how many digits uses for
> >> each
> >>> generated number.
> >>>
> >>> My result sets my contain (in principle) millions of results, so I
> would
> >>> like to have an estimation of possible clashes (ie two results ending
> >> with
> >>> the same random under, and then being a tie in the result set).
> >>>
> >>> Best regards.
> >>>
> >>> --
> >>> Rodolfo Federico Gamarra
> >>>
> >>
>
>

Re: Random Field - # digits

Posted by Andrew Hankinson <an...@rism.digital>.
You could use the UUIDUpdateProcessorFactory to automatically add a UUID to each document and use that as the tie-breaker field.

https://solr.apache.org/guide/8_1/update-request-processors.html#uuidupdateprocessorfactory

The chances of collision of UUIDs is well-known, and highly unlikely.

https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions



> On 31 Aug 2021, at 14:04, rgamarra <rg...@gmail.com> wrote:
> 
> hi,
> 
>> Random ≠ unique.
> 
> Agree. They are not the same. I don't want a tie breaker, I want to know
> how many ties I would face.
> 
> The implementation where it's being used has some other (posterior) sorting
> criteria. So the question can be rephrased as whether posterior orders have
> any effect or not.
> 
> For example, given
> 
> sort= random_1234 DESC, price DESC
> 
> At the end of the day, does the "price DESC" have any effect (which
> translates to how often ties in the random do happen)?
> 
> I took a glimpse at
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/schema/RandomSortField.java
> and I conclude that
> - an int is being used.
> - it's a hashing of the #doc + see, more than a random number generator of
> a certain distribution.
> 
> Best. Thanks.
> 
> 
> --
> Rodolfo Federico Gamarra
> 
> 
> On Tue, Aug 31, 2021 at 3:00 AM Thomas Corthals <th...@klascement.net>
> wrote:
> 
>> Hi Rodolfo
>> 
>> Random ≠ unique. If you really need a tie breaker, you'll have to sort on
>> the uiqueKey field.
>> 
>> What is your use case here? When using a cursor, sorting on a random field
>> will yield confusing results.
>> 
>> Thomas
>> 
>> Op ma 30 aug. 2021 om 17:33 schreef rgamarra <rg...@gmail.com>:
>> 
>>> Hi there! I'm using random fields (eg sort=random_1234 DESC) as a tie
>>> breaker.
>>> 
>>> I'm wondering the underlying random sequence how many digits uses for
>> each
>>> generated number.
>>> 
>>> My result sets my contain (in principle) millions of results, so I would
>>> like to have an estimation of possible clashes (ie two results ending
>> with
>>> the same random under, and then being a tie in the result set).
>>> 
>>> Best regards.
>>> 
>>> --
>>> Rodolfo Federico Gamarra
>>> 
>> 


Re: Random Field - # digits

Posted by rgamarra <rg...@gmail.com>.
hi,

> Random ≠ unique.

Agree. They are not the same. I don't want a tie breaker, I want to know
how many ties I would face.

The implementation where it's being used has some other (posterior) sorting
criteria. So the question can be rephrased as whether posterior orders have
any effect or not.

For example, given

sort= random_1234 DESC, price DESC

At the end of the day, does the "price DESC" have any effect (which
translates to how often ties in the random do happen)?

I took a glimpse at
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/schema/RandomSortField.java
and I conclude that
- an int is being used.
- it's a hashing of the #doc + see, more than a random number generator of
a certain distribution.

Best. Thanks.


--
Rodolfo Federico Gamarra


On Tue, Aug 31, 2021 at 3:00 AM Thomas Corthals <th...@klascement.net>
wrote:

> Hi Rodolfo
>
> Random ≠ unique. If you really need a tie breaker, you'll have to sort on
> the uiqueKey field.
>
> What is your use case here? When using a cursor, sorting on a random field
> will yield confusing results.
>
> Thomas
>
> Op ma 30 aug. 2021 om 17:33 schreef rgamarra <rg...@gmail.com>:
>
> > Hi there! I'm using random fields (eg sort=random_1234 DESC) as a tie
> > breaker.
> >
> > I'm wondering the underlying random sequence how many digits uses for
> each
> > generated number.
> >
> > My result sets my contain (in principle) millions of results, so I would
> > like to have an estimation of possible clashes (ie two results ending
> with
> > the same random under, and then being a tie in the result set).
> >
> > Best regards.
> >
> > --
> > Rodolfo Federico Gamarra
> >
>

Re: Random Field - # digits

Posted by Thomas Corthals <th...@klascement.net>.
Hi Rodolfo

Random ≠ unique. If you really need a tie breaker, you'll have to sort on
the uiqueKey field.

What is your use case here? When using a cursor, sorting on a random field
will yield confusing results.

Thomas

Op ma 30 aug. 2021 om 17:33 schreef rgamarra <rg...@gmail.com>:

> Hi there! I'm using random fields (eg sort=random_1234 DESC) as a tie
> breaker.
>
> I'm wondering the underlying random sequence how many digits uses for each
> generated number.
>
> My result sets my contain (in principle) millions of results, so I would
> like to have an estimation of possible clashes (ie two results ending with
> the same random under, and then being a tie in the result set).
>
> Best regards.
>
> --
> Rodolfo Federico Gamarra
>