You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Julian Perry <ju...@limitless.co.uk> on 2015/03/01 12:51:22 UTC

Correct connection methodology for Zookeeper/SolrCloud?

Hi

I'm really after best practice guidelines for making queries to
an index on a Solr cluster.  I'm not calling from Java.

I have Solr 4.10.2 up and running, seems stable.

I have about 6 indexes/collections - am running SolrCloud with
two Solr instances (both currently running on the same dev. box -
just one shard each) and standalone Zookeeper with 3 instances.
All seems fine.  I can do queries against either instance, and
perform index updates and replication works fine.

I'm not using Java to talk to Solr - the web pages are built with
PHP (or something similar - happy to call zk/Solr from C).  So I
need to call Solr from the web page code.  Clearly I need
resilience and so don't want to specifically call one of the Solr
instances directly.

I could just set up a load balancer on the two Solr instances and
let client query requests use the load balancer to find a working
instance.

 From what I have read though - I am supposed to make a call to
zookeeper to ask which Solr instances are running up to date and
working replicas of the collection that I need.  Is that right?
I should do that every time I need to make a query?

There seems to be a zookeeper client library in the zk dist - in
zookeeper-3.4.6/src/c/ - can I use that?  It looks like I can
pass in a list of potential zk host:port pairs and it will find
a working zk for me - is that right?

Then I need to ask the working zk which solr instance I should
connect to for the given index/collection - how do I do that -
is that held in clusterstate.json?

So the steps to make a Solr query against my cluster would be:

a) call zk client library with list of zk host/ports

b) ask zk for clusterstate.json

c) pick an active server (at random) for the relevant collection
    (is there some load balancing option in there)

d) call the Solr server returned by (c)

Is that best practice - or am I missing something?

-- 
Cheers
Jules.

Re: Correct connection methodology for Zookeeper/SolrCloud?

Posted by Erick Erickson <er...@gmail.com>.

bq: I could just set up a load balancer on the two Solr instances and
let client query requests use the load balancer to find a working
instance.

That's all you need to do. The client shouldn't have to really even be
aware that Zookeeper exists, there's really no need to query ZK and
route your requests yourself. The _Solr_ instances query ZK and "know"
about each other's state and are notivied of any problems, i.e. nodes
going up/down etc. Once a request hits any running Solr node, it'll be
routed around any problems. In the setup you describe, i.e. not using
SolrJ, your client really shouldn't even need to be aware ZK exists.

Your load balancer should know what nodes are up and route your
requests around any hosed machines.

If you _do_ decide to use SolrJ sometime, CloudSolrServer (renamed
CloudSolrClient in 5x) _does_ take the ZK ensemble and do some smart
routing on the client side, including simple load balancing, and
responds to any solr nodes going up/down for you.

Putting a load balancer in front or some other type of connection,
though, will accomplish much the same thing if Java isn't an option.
The SolrJ stuff is more sophisticated though.

Best,
Erick

On Sun, Mar 1, 2015 at 3:51 AM, Julian Perry <ju...@limitless.co.uk> wrote:
> Hi
>
> I'm really after best practice guidelines for making queries to
> an index on a Solr cluster.  I'm not calling from Java.
>
> I have Solr 4.10.2 up and running, seems stable.
>
> I have about 6 indexes/collections - am running SolrCloud with
> two Solr instances (both currently running on the same dev. box -
> just one shard each) and standalone Zookeeper with 3 instances.
> All seems fine.  I can do queries against either instance, and
> perform index updates and replication works fine.
>
> I'm not using Java to talk to Solr - the web pages are built with
> PHP (or something similar - happy to call zk/Solr from C).  So I
> need to call Solr from the web page code.  Clearly I need
> resilience and so don't want to specifically call one of the Solr
> instances directly.
>
> I could just set up a load balancer on the two Solr instances and
> let client query requests use the load balancer to find a working
> instance.
>
> From what I have read though - I am supposed to make a call to
> zookeeper to ask which Solr instances are running up to date and
> working replicas of the collection that I need.  Is that right?
> I should do that every time I need to make a query?
>
> There seems to be a zookeeper client library in the zk dist - in
> zookeeper-3.4.6/src/c/ - can I use that?  It looks like I can
> pass in a list of potential zk host:port pairs and it will find
> a working zk for me - is that right?
>
> Then I need to ask the working zk which solr instance I should
> connect to for the given index/collection - how do I do that -
> is that held in clusterstate.json?
>
> So the steps to make a Solr query against my cluster would be:
>
> a) call zk client library with list of zk host/ports
>
> b) ask zk for clusterstate.json
>
> c) pick an active server (at random) for the relevant collection
>    (is there some load balancing option in there)
>
> d) call the Solr server returned by (c)
>
> Is that best practice - or am I missing something?
>
> --
> Cheers
> Jules.
>