You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Arcadius Ahouansou <ar...@menelic.com> on 2015/09/04 05:47:55 UTC

Order of hosts in zkHost

Hello.

Let's say we have 10 SolrJ clients all configured with
zkhost=zk1:port,zk2:port,zk3:port

For each of the 10 SolrJ clients, would it make a difference in term of
load on zk1 (the server on the list) if we shuffle around the order of the
ZK servers in zkHost or is it all the same?

I would have thought that shuffling would lower load on zk1.

Thanks.


-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---

Re: Order of hosts in zkHost

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
I believe Arcadius has a point, but I still think the answer is no.
ZooKeeper clients (Solr/SolrJ)  connect to a single ZooKeeper server
instance at a time, and keep that session open to that same server as long
as they can/need. During this time, all interactions between the client and
the ZK ensemble will be done to the same ZK server instance (yes, some
operations will require that server to talk with the leader, but not all,
reads are served locally for example). They will only switch to a different
ZooKeeper server instance if the connection is lost for some reason. If all
the clients are connected to the same ZK server, the load wouldn't be
evenly distributed.

However, according to ZooKeeper documentation [1] (and I haven't tested
this), ZK clients don't chose the servers from the connection string in
order:
"To create a client session the application code must provide a connection
string containing a comma separated list of host:port pairs, each
corresponding to a ZooKeeper server (e.g. "127.0.0.1:4545" or "
127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002"). The ZooKeeper client
library will pick an arbitrary server and try to connect to it."


Tomás

[1] http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html


On Fri, Sep 4, 2015 at 9:12 AM, Erick Erickson <er...@gmail.com>
wrote:

> Arcadius:
>
> Note that one of the more recent changes is "per collection states" in
> ZK. So rather
> than have one huge clusterstate.json that gets passed out to to all
> collection on any
> change, the listeners can now listen only to specific collections.
>
> Reduces the "thundering herd" problem.
>
> Best,
> Erick
>
> On Fri, Sep 4, 2015 at 12:39 AM, Arcadius Ahouansou
> <ar...@menelic.com> wrote:
> > Hello Shawn.
> > This question was raised because IMHO, apart from leader election, there
> > are other load-generating activities such as all 10 solrj
> > clients+solrCloudNodes listening to changes on
> clusterstate.json/state.json
> > and downloading the whole file in case there is a change... And this
> would
> > have  happened on zk1 only if we did not shuffle... That's the theory.
> > I could test this and see.
> > On Sep 4, 2015 6:27 AM, "Shawn Heisey" <ap...@elyograg.org> wrote:
> >
> >> On 9/3/2015 9:47 PM, Arcadius Ahouansou wrote:
> >> > Let's say we have 10 SolrJ clients all configured with
> >> > zkhost=zk1:port,zk2:port,zk3:port
> >> >
> >> > For each of the 10 SolrJ clients, would it make a difference in term
> of
> >> > load on zk1 (the server on the list) if we shuffle around the order of
> >> the
> >> > ZK servers in zkHost or is it all the same?
> >> >
> >> > I would have thought that shuffling would lower load on zk1.
> >>
> >> I don't think this is going to make much difference.  Here's why,
> >> assuming that my understanding of how it all works is correct:
> >>
> >> One of the things zookeeper does is manage elections.  It helps figure
> >> out which member of a cluster is the leader.  I think Zookeeper uses
> >> this concept internally, too.  One of the hosts in the ensemble will be
> >> elected to be the leader, which accepts all input and replicates it to
> >> the other members of the cluster.  All of the clients will be talking to
> >> the leader first, no matter what order the hosts are listed.
> >>
> >> If my understanding of how this works is flawed, then what I just said
> >> is probably wrong.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>

Re: Order of hosts in zkHost

Posted by Erick Erickson <er...@gmail.com>.
Arcadius:

Note that one of the more recent changes is "per collection states" in
ZK. So rather
than have one huge clusterstate.json that gets passed out to to all
collection on any
change, the listeners can now listen only to specific collections.

Reduces the "thundering herd" problem.

Best,
Erick

On Fri, Sep 4, 2015 at 12:39 AM, Arcadius Ahouansou
<ar...@menelic.com> wrote:
> Hello Shawn.
> This question was raised because IMHO, apart from leader election, there
> are other load-generating activities such as all 10 solrj
> clients+solrCloudNodes listening to changes on clusterstate.json/state.json
> and downloading the whole file in case there is a change... And this would
> have  happened on zk1 only if we did not shuffle... That's the theory.
> I could test this and see.
> On Sep 4, 2015 6:27 AM, "Shawn Heisey" <ap...@elyograg.org> wrote:
>
>> On 9/3/2015 9:47 PM, Arcadius Ahouansou wrote:
>> > Let's say we have 10 SolrJ clients all configured with
>> > zkhost=zk1:port,zk2:port,zk3:port
>> >
>> > For each of the 10 SolrJ clients, would it make a difference in term of
>> > load on zk1 (the server on the list) if we shuffle around the order of
>> the
>> > ZK servers in zkHost or is it all the same?
>> >
>> > I would have thought that shuffling would lower load on zk1.
>>
>> I don't think this is going to make much difference.  Here's why,
>> assuming that my understanding of how it all works is correct:
>>
>> One of the things zookeeper does is manage elections.  It helps figure
>> out which member of a cluster is the leader.  I think Zookeeper uses
>> this concept internally, too.  One of the hosts in the ensemble will be
>> elected to be the leader, which accepts all input and replicates it to
>> the other members of the cluster.  All of the clients will be talking to
>> the leader first, no matter what order the hosts are listed.
>>
>> If my understanding of how this works is flawed, then what I just said
>> is probably wrong.
>>
>> Thanks,
>> Shawn
>>
>>

Re: Order of hosts in zkHost

Posted by Arcadius Ahouansou <ar...@menelic.com>.
Hello Shawn.
This question was raised because IMHO, apart from leader election, there
are other load-generating activities such as all 10 solrj
clients+solrCloudNodes listening to changes on clusterstate.json/state.json
and downloading the whole file in case there is a change... And this would
have  happened on zk1 only if we did not shuffle... That's the theory.
I could test this and see.
On Sep 4, 2015 6:27 AM, "Shawn Heisey" <ap...@elyograg.org> wrote:

> On 9/3/2015 9:47 PM, Arcadius Ahouansou wrote:
> > Let's say we have 10 SolrJ clients all configured with
> > zkhost=zk1:port,zk2:port,zk3:port
> >
> > For each of the 10 SolrJ clients, would it make a difference in term of
> > load on zk1 (the server on the list) if we shuffle around the order of
> the
> > ZK servers in zkHost or is it all the same?
> >
> > I would have thought that shuffling would lower load on zk1.
>
> I don't think this is going to make much difference.  Here's why,
> assuming that my understanding of how it all works is correct:
>
> One of the things zookeeper does is manage elections.  It helps figure
> out which member of a cluster is the leader.  I think Zookeeper uses
> this concept internally, too.  One of the hosts in the ensemble will be
> elected to be the leader, which accepts all input and replicates it to
> the other members of the cluster.  All of the clients will be talking to
> the leader first, no matter what order the hosts are listed.
>
> If my understanding of how this works is flawed, then what I just said
> is probably wrong.
>
> Thanks,
> Shawn
>
>

Re: Order of hosts in zkHost

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/3/2015 9:47 PM, Arcadius Ahouansou wrote:
> Let's say we have 10 SolrJ clients all configured with
> zkhost=zk1:port,zk2:port,zk3:port
> 
> For each of the 10 SolrJ clients, would it make a difference in term of
> load on zk1 (the server on the list) if we shuffle around the order of the
> ZK servers in zkHost or is it all the same?
> 
> I would have thought that shuffling would lower load on zk1.

I don't think this is going to make much difference.  Here's why,
assuming that my understanding of how it all works is correct:

One of the things zookeeper does is manage elections.  It helps figure
out which member of a cluster is the leader.  I think Zookeeper uses
this concept internally, too.  One of the hosts in the ensemble will be
elected to be the leader, which accepts all input and replicates it to
the other members of the cluster.  All of the clients will be talking to
the leader first, no matter what order the hosts are listed.

If my understanding of how this works is flawed, then what I just said
is probably wrong.

Thanks,
Shawn