You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/11/11 18:04:36 UTC
[jira] [Commented] (SOLR-6730) select?replicaAffinity=(node|host) and replicaAffinity.hostPriorities support

    [ https://issues.apache.org/jira/browse/SOLR-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206629#comment-14206629 ] 

ASF GitHub Bot commented on SOLR-6730:
--------------------------------------

GitHub user cpoerschke opened a pull request:

    https://github.com/apache/lucene-solr/pull/104

    select?replicaAffinity=(node|host) and replicaAffinity.hostPriorities support

    https://issues.apache.org/jira/i#browse/SOLR-6730

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bloomberg/lucene-solr trunk-replica-affinity-feature

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/104.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #104
    
----
commit 66b56265bdefec7eb814bfb533c0ff19bb1dcdff
Author: Christine Poerschke <cp...@bloomberg.net>
Date:   2014-08-12T10:32:57Z

    solr: select?replicaAffinity=(node|host) and replicaAffinity.hostPriorities support
    
    This commit also includes changes to reduce SearchHandler's overall use of ShardHandler objects.
    
    ---------
    
    solr: select?replicaAffinity=(node|host) support, select?replicaAffinity=host&replicaAffinity.hostPriorities=hostA,hostB=1,hostC=2,hostD=2,hostE=3 prioritisation support
    
    illustration: `4-hosts-x-2-ports=8-instances 8-shards 2-replica system`
    
      http://host1:port1/solr/collection1_shard1_replicaA/
      http://host1:port1/solr/collection1_shard3_replicaA/
    
      http://host1:port2/solr/collection1_shard5_replicaA/
      http://host1:port2/solr/collection1_shard7_replicaA/
    
      http://host2:port1/solr/collection1_shard2_replicaA/
      http://host2:port1/solr/collection1_shard4_replicaA/
    
      http://host2:port2/solr/collection1_shard6_replicaA/
      http://host2:port2/solr/collection1_shard8_replicaA/
    
      http://host3:port1/solr/collection1_shard1_replicaB/
      http://host3:port1/solr/collection1_shard3_replicaB/
    
      http://host3:port2/solr/collection1_shard5_replicaB/
      http://host3:port2/solr/collection1_shard7_replicaB/
    
      http://host4:port1/solr/collection1_shard2_replicaB/
      http://host4:port1/solr/collection1_shard4_replicaB/
    
      http://host4:port2/solr/collection1_shard6_replicaB/
      http://host4:port2/solr/collection1_shard8_replicaB/
    
    `.../select` plain will route sub-requests to a random selection of solr cores and so could potentially use all 8 JVM instances
    
      http://host1:port1/solr/collection1_shard1_replicaA/
      http://host4:port1/solr/collection1_shard2_replicaB/
      http://host3:port1/solr/collection1_shard3_replicaB/
      http://host2:port1/solr/collection1_shard4_replicaA/
      http://host1:port2/solr/collection1_shard5_replicaA/
      http://host4:port2/solr/collection1_shard6_replicaB/
      http://host3:port2/solr/collection1_shard7_replicaB/
      http://host2:port2/solr/collection1_shard8_replicaA/
    
    `.../select?replicaAffinity=node` will route sub-requests to a random selection of solr cores whilst maintaining node affinity i.e. sub-requests that can go to the same solr instance will go to the same solr instance e.g.
    
      http://host1:port1/solr/collection1_shard1_replicaA/
      http://host4:port1/solr/collection1_shard2_replicaB/
      http://host1:port1/solr/collection1_shard3_replicaA/
      http://host4:port1/solr/collection1_shard4_replicaB/
      http://host3:port2/solr/collection1_shard5_replicaB/
      http://host2:port2/solr/collection1_shard6_replicaA/
      http://host3:port2/solr/collection1_shard7_replicaB/
      http://host2:port2/solr/collection1_shard8_replicaA/
    
    `.../select?replicaAffinity=host` will route sub-requests to a random selection of solr cores whilst maintaining host affinity i.e. sub-requests that can go to the same host machine will go to the same host machine e.g.
    
      http://host1:port1/solr/collection1_shard1_replicaA/
      http://host2:port1/solr/collection1_shard2_replicaA/
      http://host1:port1/solr/collection1_shard3_replicaA/
      http://host2:port1/solr/collection1_shard4_replicaA/
      http://host1:port2/solr/collection1_shard5_replicaA/
      http://host2:port2/solr/collection1_shard6_replicaA/
      http://host1:port2/solr/collection1_shard7_replicaA/
      http://host2:port2/solr/collection1_shard8_replicaA/
    
    `.../select?replicaAffinity=host&replicaAffinity=node` will route sub-requests to a random selection of solr cores whilst maintaining first host affinity and secondly node affinity (the latter clearly only applies if multiple JVMs on a given machine contain the same shard).
    
    If `replicaAffinity=host` is requested then optional `replicaAffinity.hostPriorities` are supported:
    
    `.../select?replicaAffinity=host&replicaAffinity.hostPriorities=hostX=2,hostY=2,hostZ=1` will route sub-requests to hostZ (priority 1) for shards that are available on that host, to randomly either hostX or hostY (both priority 2) for shards available on those two hosts but not available on a priority 1 host.
    
    `replicaAffinity.hostPriorities=hostZ` and `replicaAffinity.hostPriorities=hostZ=1` are equivalent.
    
    If host priorities are supplied they can be just a subset of all hosts, preference will be given to live nodes on the prioritised hosts and random selections will be made for the remaining sub-requests.
    
    ---------
    
    solr: reduce SearchHandler's overall use of ShardHandler objects (from N+1+x to just 1)
    
    before:
     * A search request to an N-shard system constructs N+1+x ShardHandler objects in total:
       * 1 object in the receiving solr instance
       * 1 object in each of the N shards that receive an initial sub-request (for top ids or top group ids)
       * 1 object in each of x shards that receive a subsequent sub-request (for top ids within group or to get fields)
    
    after:
     * A search request to an N-shard systems constructs 1 ShardHandler object in the receiving solr instance only.
    
    summary of change:
     * move non-distrib related code fragments from HttpShardHandler.checkDistrib to SearchHandler
     * rename ShardHandler.checkDistrib to ShardHandler.prepDistrib (to be called for distributed requests only)
     * SearchHandler constructs ShardHandler object only for distributed requests

----


> select?replicaAffinity=(node|host) and replicaAffinity.hostPriorities support
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-6730
>                 URL: https://issues.apache.org/jira/browse/SOLR-6730
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Christine Poerschke
>
> If no shards parameter is supplied with a select request then sub-requests will go to a random selection of live solr nodes hosting shards for the collection of interest. All sub-requests must complete before results can be collated i.e. the slowest sub-request determines how fast the search completes.
> Use of optional replicaAffinity can reduce the number of JVMs hit by a given search (the more JVMs are hit, the higher the chance of hitting a garbage collection pause in one of many JVMs). Preferentially directing requests to certain areas of the cloud can also be useful for debugging or when some replicas reside on 'faster' machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org