You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Monica Skidmore <Mo...@careerbuilder.com> on 2018/05/01 13:41:50 UTC

Re: Load Balancing between Two Cloud Clusters

Thank you, Erick.  This is exactly the information I needed but hadn't correctly parsed as a new Solr cloud user.  You've just made setting up our new configuration much easier!!

Monica Skidmore
Senior Software Engineer
 

 
On 4/30/18, 7:29 PM, "Erick Erickson" <er...@gmail.com> wrote:

    "We need a way to determine that a node is still 'alive' and should be
    in the load balancer, and we need a way to know that a new node is now
    available and fully ready with its replicas to add to the load
    balancer."
    
    Why? If a Solr node is running but the replicas aren't up yet, it'll
    pass the request along to a node that _does_ have live replicas, you
    don't have to do anything. As far as the node being alive, there are
    lots of ways, any API end point has to have a Solr to field it,
    perhaps just use the Collections LIST command?
    
    "How does ZooKeeper make this determination?  Does it do something
    different if multiple collections are on a single cluster?  And, even
    with just one cluster, what is best practice for keeping a current
    list of active nodes in the cluster, especially for extremely high
    query rates?"
    
    This is a common misconception. ZooKeeper isn't interested in Solr at
    all. ZooKeeper will ping the nodes it knows about and, perhaps, remove
    a node from the live_nodes list, but that's all. It isn't involved in
    Solr's operation in terms of routing queries, updates or anything like
    that.
    
    _Solr_ keeps track of all this by _watching_ various znodes. Say Solr
    hosts some replica in a collection. when it comes up it sets a "watch"
    on the /collections/my_collection/state.json Znode. It also published
    its own state. So say it hosts three replicas for the collection. As
    each one is loaded and ready for action, Solr posts an update to the
    relevant state.json file.
    
    ZooKeeper is then responsible for telling an other node who'd set a
    watch that the znode has changed. ZK doesn't know or care whether
    those are Solr nodes or not.
    
    So when a request comes in to a Solr node, it knows what other Solr
    nodes host what particular replicas and does all the sub-requests
    itself, ZK isn't involved at all at that level.
    
    So imagine node1 hosts S1R1 and S2R1 Node2 hosts S1R2 and S2R2 (for
    collection A). When node1 comes up it updates the state in ZK to say
    S1R2 and S1R2 are "active". Now claim node2 is coming up but hasn't
    loaded it's cores yet. If it receives a request it can forward them on
    to node1.
    
    Now node2 loads both its cores. It updates the ZK node for the
    collection, and since node1 is watching, it fetches the updated
    state.json. From this point forward, both nodes have complete
    information about all the replicas in the collection and don't need to
    reference ZK any more at all.
    
    In fact, ZK can completely go away and _queries_ can continue to work
    off their cached state.json. Updates will fail since ZK quorums are
    required for updates to indexes to prevent "split brain" problems.
    
    Best,
    Erick
    
    On Mon, Apr 30, 2018 at 11:03 AM, Monica Skidmore
    <Mo...@careerbuilder.com> wrote:
    > Thank you, Erick.  That confirms our understanding for a single cluster, or once we select a node from one of the two clusters to query.
    >
    > As we try to set up an external load balancer to go between two clusters, though, we still have some questions.  We need a way to determine that a node is still 'alive' and should be in the load balancer, and we need a way to know that a new node is now available and fully ready with its replicas to add to the load balancer.
    >
    > How does ZooKeeper make this determination?  Does it do something different if multiple collections are on a single cluster?  And, even with just one cluster, what is best practice for keeping a current list of active nodes in the cluster, especially for extremely high query rates?
    >
    > Again, if there's some good documentation on this, I'd love a pointer...
    >
    > Monica Skidmore
    > Senior Software Engineer
    >
    >
    >
    > On 4/30/18, 1:09 PM, "Erick Erickson" <er...@gmail.com> wrote:
    >
    >     Multiple clusters with the same dataset aren't load-balanced by Solr,
    >     you'll have to accomplish that from "outside", e.g. something that sends
    >     queries to each cluster.
    >
    >     _Within_ a cluster (collection), as long as a request gets to any Solr
    >     node, sub-requests are distributed with an internal software LB. As far as
    >     a single collection, you're fine just sending any query to any node. Even
    >     if you send a query to a node that hosts no replicas for a collection, Solr
    >     will "do the right thing" and forward it appropiately.
    >
    >     HTH,
    >     Erick
    >
    >     On Mon, Apr 30, 2018 at 9:46 AM, Monica Skidmore <
    >     Monica.Skidmore@careerbuilder.com> wrote:
    >
    >     > We are migrating from a master-slave configuration to Solr cloud (7.3) and
    >     > have questions about the preferred way to load balance between the two
    >     > clusters.  It looks like we want to use a load balancer that directs
    >     > queries to any of the server nodes in either cluster, trusting that node to
    >     > handle the query correctly – true?  If we auto-scale nodes into the
    >     > cluster, are there considerations about when a node becomes ‘ready’ to
    >     > query from a Solr perspective and when it is added to the load balancer?
    >     > Also, what is the preferred method of doing a health-check for the load
    >     > balancer – would it be “bin/solr healthcheck -c myCollection”?
    >     >
    >     >
    >     >
    >     > Pointers in the right direction – especially to any documentation on
    >     > running multiple clusters with the same dataset – would be appreciated.
    >     >
    >     >
    >     >
    >     > *Monica Skidmore*
    >     > *Senior Software Engineer*
    >     >
    >     >
    >     >
    >     > [image: cid:image001.png@01D3A0F1.06327950]
    >     >
    >     >
    >     >
    >
    >
    


Re: Load Balancing between Two Cloud Clusters

Posted by Erick Erickson <er...@gmail.com>.
Glad to help. Yeah, I thought you might have been making it harder
than it needed to be ;).

In SolrCloud you're constantly running up against "it's just magic
until it's not", knowing when magic applies and when it doesn't can be
tricky, very tricky.....

Basically when using LBs, people just throw nodes at the LB when they
come up. If the Solr end points aren't available, then they're skipped
etc.....

I'll also add that SolrJ, the CloudSolrClient specifically, does all
this on the client side, it's ZK-aware so knows the topology of the
active Solr nodes and "does the right thing" via internal LBs.

Best,
Erick

On Tue, May 1, 2018 at 6:41 AM, Monica Skidmore
<Mo...@careerbuilder.com> wrote:
> Thank you, Erick.  This is exactly the information I needed but hadn't correctly parsed as a new Solr cloud user.  You've just made setting up our new configuration much easier!!
>
> Monica Skidmore
> Senior Software Engineer
>
>
>
> On 4/30/18, 7:29 PM, "Erick Erickson" <er...@gmail.com> wrote:
>
>     "We need a way to determine that a node is still 'alive' and should be
>     in the load balancer, and we need a way to know that a new node is now
>     available and fully ready with its replicas to add to the load
>     balancer."
>
>     Why? If a Solr node is running but the replicas aren't up yet, it'll
>     pass the request along to a node that _does_ have live replicas, you
>     don't have to do anything. As far as the node being alive, there are
>     lots of ways, any API end point has to have a Solr to field it,
>     perhaps just use the Collections LIST command?
>
>     "How does ZooKeeper make this determination?  Does it do something
>     different if multiple collections are on a single cluster?  And, even
>     with just one cluster, what is best practice for keeping a current
>     list of active nodes in the cluster, especially for extremely high
>     query rates?"
>
>     This is a common misconception. ZooKeeper isn't interested in Solr at
>     all. ZooKeeper will ping the nodes it knows about and, perhaps, remove
>     a node from the live_nodes list, but that's all. It isn't involved in
>     Solr's operation in terms of routing queries, updates or anything like
>     that.
>
>     _Solr_ keeps track of all this by _watching_ various znodes. Say Solr
>     hosts some replica in a collection. when it comes up it sets a "watch"
>     on the /collections/my_collection/state.json Znode. It also published
>     its own state. So say it hosts three replicas for the collection. As
>     each one is loaded and ready for action, Solr posts an update to the
>     relevant state.json file.
>
>     ZooKeeper is then responsible for telling an other node who'd set a
>     watch that the znode has changed. ZK doesn't know or care whether
>     those are Solr nodes or not.
>
>     So when a request comes in to a Solr node, it knows what other Solr
>     nodes host what particular replicas and does all the sub-requests
>     itself, ZK isn't involved at all at that level.
>
>     So imagine node1 hosts S1R1 and S2R1 Node2 hosts S1R2 and S2R2 (for
>     collection A). When node1 comes up it updates the state in ZK to say
>     S1R2 and S1R2 are "active". Now claim node2 is coming up but hasn't
>     loaded it's cores yet. If it receives a request it can forward them on
>     to node1.
>
>     Now node2 loads both its cores. It updates the ZK node for the
>     collection, and since node1 is watching, it fetches the updated
>     state.json. From this point forward, both nodes have complete
>     information about all the replicas in the collection and don't need to
>     reference ZK any more at all.
>
>     In fact, ZK can completely go away and _queries_ can continue to work
>     off their cached state.json. Updates will fail since ZK quorums are
>     required for updates to indexes to prevent "split brain" problems.
>
>     Best,
>     Erick
>
>     On Mon, Apr 30, 2018 at 11:03 AM, Monica Skidmore
>     <Mo...@careerbuilder.com> wrote:
>     > Thank you, Erick.  That confirms our understanding for a single cluster, or once we select a node from one of the two clusters to query.
>     >
>     > As we try to set up an external load balancer to go between two clusters, though, we still have some questions.  We need a way to determine that a node is still 'alive' and should be in the load balancer, and we need a way to know that a new node is now available and fully ready with its replicas to add to the load balancer.
>     >
>     > How does ZooKeeper make this determination?  Does it do something different if multiple collections are on a single cluster?  And, even with just one cluster, what is best practice for keeping a current list of active nodes in the cluster, especially for extremely high query rates?
>     >
>     > Again, if there's some good documentation on this, I'd love a pointer...
>     >
>     > Monica Skidmore
>     > Senior Software Engineer
>     >
>     >
>     >
>     > On 4/30/18, 1:09 PM, "Erick Erickson" <er...@gmail.com> wrote:
>     >
>     >     Multiple clusters with the same dataset aren't load-balanced by Solr,
>     >     you'll have to accomplish that from "outside", e.g. something that sends
>     >     queries to each cluster.
>     >
>     >     _Within_ a cluster (collection), as long as a request gets to any Solr
>     >     node, sub-requests are distributed with an internal software LB. As far as
>     >     a single collection, you're fine just sending any query to any node. Even
>     >     if you send a query to a node that hosts no replicas for a collection, Solr
>     >     will "do the right thing" and forward it appropiately.
>     >
>     >     HTH,
>     >     Erick
>     >
>     >     On Mon, Apr 30, 2018 at 9:46 AM, Monica Skidmore <
>     >     Monica.Skidmore@careerbuilder.com> wrote:
>     >
>     >     > We are migrating from a master-slave configuration to Solr cloud (7.3) and
>     >     > have questions about the preferred way to load balance between the two
>     >     > clusters.  It looks like we want to use a load balancer that directs
>     >     > queries to any of the server nodes in either cluster, trusting that node to
>     >     > handle the query correctly – true?  If we auto-scale nodes into the
>     >     > cluster, are there considerations about when a node becomes ‘ready’ to
>     >     > query from a Solr perspective and when it is added to the load balancer?
>     >     > Also, what is the preferred method of doing a health-check for the load
>     >     > balancer – would it be “bin/solr healthcheck -c myCollection”?
>     >     >
>     >     >
>     >     >
>     >     > Pointers in the right direction – especially to any documentation on
>     >     > running multiple clusters with the same dataset – would be appreciated.
>     >     >
>     >     >
>     >     >
>     >     > *Monica Skidmore*
>     >     > *Senior Software Engineer*
>     >     >
>     >     >
>     >     >
>     >     > [image: cid:image001.png@01D3A0F1.06327950]
>     >     >
>     >     >
>     >     >
>     >
>     >
>
>