You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Adam Nelson <sp...@recursivemethod.com> on 2014/11/23 22:58:07 UTC

Random sorting and result consistency across successive calls based on seed

Hi,

We are using the random dynamic type (solr.RandomSortField) with Solr
Cloud, 1 shard and 1 replica.

Our use case is searching and displaying items for sale in two major types
of ad - premium and standard. We don't want all recently-updated or
recently-created ads sorted to the top by default, instead we use the
random type to allow random distribution of results within premium and
standard ads, as a way to not give preference to anyone (other explicit
sorting options like date, price, etc. also exist).

The random dynamic type allows a seed to be specified as part of the field
name, in the default configuration you can sort by random_{seed}, and we
set the {seed} to the current date (yymmdd) to enforce consistency across
pagination of results for the day. For example, sort=random_141124+desc.

This works well enough and allows pagination using the same seed, with the
following two caveats:

1. The shard and the replica give different ordering of the results.
2. Any changes to the underlying index alter the ordering of the results.

We solved issue #1 by ensuring stickiness to whichever Solr instance the
web application first talked to (they're sitting behind a load balancer so
we use cookie stickiness provided by the load balancer).

For issue #2, we looked at the source for RandomSortField (
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/schema/RandomSortField.java)
and noticed that the seed is actually comprised of the hashcode of the seed
we specify in the field name, along with the index version (plus whatever
context.docBase is):

  private static int getSeed(String fieldName, LeafReaderContext context) {
    final DirectoryReader top = (DirectoryReader)
ReaderUtil.getTopLevelContext(context).reader();
    return fieldName.hashCode() + context.docBase + (int)top.getVersion();
  }

Clearly that is the cause of random ordering changing whenever docs are
added to the index.

As a test we created our own random field type based on RandomSortField,
but implemented getSeed differently:

  private static int getSeed(String fieldName, LeafReaderContext context) {
    return fieldName.hashCode();
  }

And after compiling, adding the JAR to the solrconfig.xml to be used in
place of the existing class for the random field type and testing - it
seems to work. We now have consistently random results across pagination
when using the same seed, even when adding documents to the index.

Incidentally this does not fix different results across the shard versus
the replica. Haven't worked that out.

Does anyone know why context.docBase and the index version are part of the
seed in the first place? I wonder what we're missing out on by removing
them from our random class, or any other side effects.

Thanks, Adam.

Re: Random sorting and result consistency across successive calls based on seed

Posted by marcel <ma...@interactivestrategies.com.INVALID>.
Great post. I am having the same issue.

Would you be able to share the .jar file you produced?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html