You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Manuel Lenormand (JIRA)" <ji...@apache.org> on 2014/11/18 14:32:33 UTC
[jira] [Updated] (SOLR-5611) When documents are uniformly
distributed over shards, enable returning approximated results in
distributed query
[ https://issues.apache.org/jira/browse/SOLR-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manuel Lenormand updated SOLR-5611:
-----------------------------------
Attachment: lec5-distributedIndexing.pdf
The equation is on the 10th slide.
Need to write an approximation for this or calculating offline for main values and making a 3d map out of it (#shards, rows, confidence level) that outputs shards.rows for each request
> When documents are uniformly distributed over shards, enable returning approximated results in distributed query
> ----------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-5611
> URL: https://issues.apache.org/jira/browse/SOLR-5611
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Reporter: Isaac Hebsh
> Priority: Minor
> Labels: distributed_search, shard, solrcloud
> Fix For: 4.9, Trunk
>
> Attachments: lec5-distributedIndexing.pdf
>
>
> Query with rows=1000, which sent to a collection of 100 shards (shard key behaviour is default - based on hash of the unique key), will generate 100 requests of rows=1000, on each shard.
> This results to total number of rows*numShards unique keys to be retrieved. This behaviour is getting worst as numShards grows.
> If the documents are uniformly distributed over the shards, the expected number of document should be ~ rows/numShards. Obviously, there might be extreme cases, when all of the top X documents are in a specific shard.
> I suggest adding an optional parameter, say approxResults=true, which decides whether we should limit the rows in the shard requests to rows/numShardsor not. Moreover, we can add a numeric parameter which increases the limit, to be more accurate.
> For example, the query {{approxResults=true&approxResults.factor=1.5}} will retrieve 1.5*rows/numShards from each shard. In the case of 100 shards and rows=1000, each shard will return 15 documents.
> Furthermore, this can reduce the problem of deep paging, because the same thing can be applied there. when requested start=100000, Solr creating shard request with start=0 and rows=START+ROWS. In the approximated approach, start parameter (in the shard requests) can be set to 100000/numShards. The idea of the approxResults.factor creates some difficulties here, though.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org