You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Joel Bernstein (JIRA)" <ji...@apache.org> on 2019/05/28 00:28:00 UTC
[jira] [Created] (SOLR-13494) Improve the performance of random
sampling
Joel Bernstein created SOLR-13494:
-------------------------------------
Summary: Improve the performance of random sampling
Key: SOLR-13494
URL: https://issues.apache.org/jira/browse/SOLR-13494
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Components: streaming expressions
Reporter: Joel Bernstein
Currently the *random* Streaming Expression performs a conventional distributed search. This involves retrieving the top N docs from each shard and then selecting the top N from all the shards in the aggregator node. This technique eventually bogs down as the number of shards goes up and/or N goes up.
Selecting distributed random samples does not actually require this behavior. Instead you can select N/numShards from each shard and simply return all results. This technique will actually get faster as more shards are added instead of slowing down.
This ticket will allow the random Streaming Expression to use the strategy above when N reaches a certain threshold (ie 10000).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org