You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Jim Keeney <ji...@fitterweb.com> on 2018/03/02 01:43:36 UTC

Ensemble fails when one node looses connectivity

I'm using Zookeeper with solr to create a cluster and I have come across
what seems like an unexpected behavior. The cluster is setup on AWS using
opsworks.  I am using a 3 node zookeeper ensemble. The zookeeper config on
all three nodes is:

clientPort=2181

dataDir=/var/opt/zookeeper/data

tickTime=2000

autopurge.purgeInterval=24

initLimit=100

syncLimit=5

server.1=172.31.86.130:2888:3888

server.2=172.31.16.234:2888:3888

server.3=172.31.73.122:2888:3888


Here is the issue:

If one node in the ensemble fails or is shut down the ensemble carries on.
However, when the node is restarted it's attempt to connect to the other
members of the cluster are rejected. The only way that I have found to
restore the ensemble is to restart all of the nodes within a short time
span of each other.

If I do that they are able to discover each other  carry on a proper leader
election and restore order.

Once they are restored everything is fine but if one of the nodes goes down
we are faced wit the same problem.

How do I ensure that if a node goes down, it can restart and rejoin the
ensemble with out having to manually restart all the other nodes?

Any help appreciated.

Thanks.

Jim K.




-- 
Jim Keeney
President, FitterWeb
E: jim@fitterweb.com
M: 703-568-5887 <(703)%20568-5887>

*FitterWeb Consulting*
*Are you lean and agile enough? *

Re: Ensemble fails when one node looses connectivity

Posted by Steph van Schalkwyk <sv...@gmail.com>.

Hi Jim

You set it in the java.env file in /opt/zookeeper/conf.

JVMFLAGS=" -Xmx4g -Djute.maxbuffer=2147483648"

The example above is for 2GB, so please change the size :) In this case
(-Xmx4g) the ZK node was running on an 8GB VM.
And yes, make sure that you do that to all the servers.

Here is one reference to it:
https://community.cloudera.com/t5/Storage-Random-Access-HDFS/zookeeper-error-Unexpected-exception-causing-shutdown-while-sock/td-p/30914

If you need more debug information, you can add logging level as well:
-Dzookeeper.log.threshold=INFO

for example: JVMFLAGS=" -Xmx4g  -Djute.maxbuffer=2147483648
-Dzookeeper.log.threshold=DEBUG"

Good luck! I hope this works.
Steph



On Thu, Mar 1, 2018 at 8:59 PM, Jim Keeney <ji...@fitterweb.com> wrote:

> Steph -
>
> Read about the maxbuffer and am pretty sure that this might explain the
> behavior we are seeing since it occurs when there has been a significant
> reboot of all the servers. We have over 2 mb of config files for all of our
> indexes and if all the Solr nodes are sync ing their configs at once it
> seems like that might overflow the buffer.
>
> Newbie question, where would i set the -Djute.maxbuffer ? Should I update
> the zkServer.sh file so this is applied every time zookeeper is started or
> restarted.
>
> Also, I noted the caution and will make sure that all of the nodes are set
> to the same value. Saw some discussion about having to change the zkCli
> settings to be larger than that of the server. Is that true?
>
> Thanks in advance.
>
> Jim K.
>
> On Thu, Mar 1, 2018 at 9:13 PM, Jim Keeney <ji...@fitterweb.com> wrote:
>
> > Thanks, Yes, I have about 2MB stored in the configurations folders. I
> will
> > increase the jute.maxbuffer and see if that helps.
> >
> > Jim K.
> >
> > On Thu, Mar 1, 2018 at 8:58 PM, Steph van Schalkwyk <
> > svanschalkwyk@gmail.com> wrote:
> >
> >> Does the log say anything about timing out on init?
> >> Your initLimit is already pretty big, but then we don't know anything
> >> about
> >> your setup.
> >> Are you storing more than 1MB in a znode? Then increase jute.maxbuffer
> (in
> >> java.env as a -Djute.maxbuffer=xxxxxx).
> >> I've recently run into that with Fusion 3.1.
> >> Post more details, if you would.
> >> Good luck.
> >> Steph
> >>
> >>
> >> On Thu, Mar 1, 2018 at 7:43 PM, Jim Keeney <ji...@fitterweb.com> wrote:
> >>
> >> > I'm using Zookeeper with solr to create a cluster and I have come
> across
> >> > what seems like an unexpected behavior. The cluster is setup on AWS
> >> using
> >> > opsworks.  I am using a 3 node zookeeper ensemble. The zookeeper
> config
> >> > on all three nodes is:
> >> >
> >> > clientPort=2181
> >> >
> >> > dataDir=/var/opt/zookeeper/data
> >> >
> >> > tickTime=2000
> >> >
> >> > autopurge.purgeInterval=24
> >> >
> >> > initLimit=100
> >> >
> >> > syncLimit=5
> >> >
> >> > server.1=172.31.86.130:2888:3888
> >> >
> >> > server.2=172.31.16.234:2888:3888
> >> >
> >> > server.3=172.31.73.122:2888:3888
> >> >
> >> >
> >> > Here is the issue:
> >> >
> >> > If one node in the ensemble fails or is shut down the ensemble carries
> >> on.
> >> > However, when the node is restarted it's attempt to connect to the
> other
> >> > members of the cluster are rejected. The only way that I have found to
> >> > restore the ensemble is to restart all of the nodes within a short
> time
> >> > span of each other.
> >> >
> >> > If I do that they are able to discover each other  carry on a proper
> >> > leader election and restore order.
> >> >
> >> > Once they are restored everything is fine but if one of the nodes goes
> >> > down we are faced wit the same problem.
> >> >
> >> > How do I ensure that if a node goes down, it can restart and rejoin
> the
> >> > ensemble with out having to manually restart all the other nodes?
> >> >
> >> > Any help appreciated.
> >> >
> >> > Thanks.
> >> >
> >> > Jim K.
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Jim Keeney
> >> > President, FitterWeb
> >> > E: jim@fitterweb.com
> >> > M: 703-568-5887 <(703)%20568-5887>
> >> >
> >> > *FitterWeb Consulting*
> >> > *Are you lean and agile enough? *
> >> >
> >>
> >
> >
> >
> > --
> > Jim Keeney
> > President, FitterWeb
> > E: jim@fitterweb.com
> > M: 703-568-5887 <(703)%20568-5887>
> >
> > *FitterWeb Consulting*
> > *Are you lean and agile enough? *
> >
>
>
>
> --
> Jim Keeney
> President, FitterWeb
> E: jim@fitterweb.com
> M: 703-568-5887 <(703)%20568-5887>
>
> *FitterWeb Consulting*
> *Are you lean and agile enough? *
>

Re: Ensemble fails when one node looses connectivity

Posted by Jim Keeney <ji...@fitterweb.com>.

Thanks again Shawn and Steph. So that would tend to rule out the maxbuffer
or heapsize requirements.

I'll double check java and explicitly set the xmx an xms settings.

I think the next step is to try to get more information on what is
happening.

I'll play with log settings and see if I can get more information.

Good thing is I'm pretty sure I can duplicate the behavior.

Thanks.

Jim K.





On Fri, Mar 2, 2018 at 10:47 AM, Shawn Heisey <el...@elyograg.org> wrote:

> On 3/2/2018 6:54 AM, Jim Keeney wrote:
>
>> Thanks for jumping in on the ZK side as well.
>>
>> I will take a hard look at my config files but I checked and I do not have
>> any one file over 1MB. The combined files (10 indexes) is 2.2MB.
>>
>> I am using micros for the nodes which are very limited in memory.
>>
>> I'm not currently using a java.env file so I guess I'm using the default
>> values for the JVM which is typically xmx512M if I remember correctly.
>>
>> Could it be just a memory issue?
>>
>
> Usually Java on Linux has a default heap size of about 4GB.  But it would
> be highly dependent on the amount of memory actually present on the
> machine.  Just yesterday, I saw Java report a 6GB default heap size, on a
> machine with 24GB of memory. Information I can find about AWS instance
> types says that a micro instance has 1GB of memory.  So the default heap
> size is probably quite small.
>
> Even in small server situations, I would strongly recommend that anytime
> you have a java commandline, you define -Xmx for the max heap, and -Xms
> should probably be set as well, to the same value as -Xmx.  That way you're
> not relying on defaults, you're absolutely sure what the heap size is.
>
> For ZK servers handling 2 megabytes of config data plus the rest of a
> small SolrCloud install, something like 256MB or 512MB of heap would
> probably be plenty.  ZK holds a copy of its entire database in memory.
> Small SolrCloud installs won't put much of a load on ZK.  A micro instance
> should be plenty for ZK when the software using it is Solr, as long as
> that's the only thing it's running.
>
> Thanks,
> Shawn
>
>


-- 
Jim Keeney
President, FitterWeb
E: jim@fitterweb.com
M: 703-568-5887 <(703)%20568-5887>

*FitterWeb Consulting*
*Are you lean and agile enough? *

Re: Ensemble fails when one node looses connectivity

Posted by Steph van Schalkwyk <sv...@gmail.com>.

If this is a t2.micro on AWS, then it has 1GB of RAM.




On Fri, Mar 2, 2018 at 9:47 AM, Shawn Heisey <el...@elyograg.org> wrote:

> On 3/2/2018 6:54 AM, Jim Keeney wrote:
>
>> Thanks for jumping in on the ZK side as well.
>>
>> I will take a hard look at my config files but I checked and I do not have
>> any one file over 1MB. The combined files (10 indexes) is 2.2MB.
>>
>> I am using micros for the nodes which are very limited in memory.
>>
>> I'm not currently using a java.env file so I guess I'm using the default
>> values for the JVM which is typically xmx512M if I remember correctly.
>>
>> Could it be just a memory issue?
>>
>
> Usually Java on Linux has a default heap size of about 4GB.  But it would
> be highly dependent on the amount of memory actually present on the
> machine.  Just yesterday, I saw Java report a 6GB default heap size, on a
> machine with 24GB of memory. Information I can find about AWS instance
> types says that a micro instance has 1GB of memory.  So the default heap
> size is probably quite small.
>
> Even in small server situations, I would strongly recommend that anytime
> you have a java commandline, you define -Xmx for the max heap, and -Xms
> should probably be set as well, to the same value as -Xmx.  That way you're
> not relying on defaults, you're absolutely sure what the heap size is.
>
> For ZK servers handling 2 megabytes of config data plus the rest of a
> small SolrCloud install, something like 256MB or 512MB of heap would
> probably be plenty.  ZK holds a copy of its entire database in memory.
> Small SolrCloud installs won't put much of a load on ZK.  A micro instance
> should be plenty for ZK when the software using it is Solr, as long as
> that's the only thing it's running.
>
> Thanks,
> Shawn
>
>

Re: Ensemble fails when one node looses connectivity

Posted by Shawn Heisey <el...@elyograg.org>.

On 3/2/2018 6:54 AM, Jim Keeney wrote:
> Thanks for jumping in on the ZK side as well.
>
> I will take a hard look at my config files but I checked and I do not have
> any one file over 1MB. The combined files (10 indexes) is 2.2MB.
>
> I am using micros for the nodes which are very limited in memory.
>
> I'm not currently using a java.env file so I guess I'm using the default
> values for the JVM which is typically xmx512M if I remember correctly.
>
> Could it be just a memory issue?

Usually Java on Linux has a default heap size of about 4GB.  But it 
would be highly dependent on the amount of memory actually present on 
the machine.  Just yesterday, I saw Java report a 6GB default heap size, 
on a machine with 24GB of memory. Information I can find about AWS 
instance types says that a micro instance has 1GB of memory.  So the 
default heap size is probably quite small.

Even in small server situations, I would strongly recommend that anytime 
you have a java commandline, you define -Xmx for the max heap, and -Xms 
should probably be set as well, to the same value as -Xmx.  That way 
you're not relying on defaults, you're absolutely sure what the heap 
size is.

For ZK servers handling 2 megabytes of config data plus the rest of a 
small SolrCloud install, something like 256MB or 512MB of heap would 
probably be plenty.  ZK holds a copy of its entire database in memory.  
Small SolrCloud installs won't put much of a load on ZK.  A micro 
instance should be plenty for ZK when the software using it is Solr, as 
long as that's the only thing it's running.

Thanks,
Shawn

Re: Ensemble fails when one node looses connectivity

Posted by Jim Keeney <ji...@fitterweb.com>.

Shawn -

Thanks for jumping in on the ZK side as well.

I will take a hard look at my config files but I checked and I do not have
any one file over 1MB. The combined files (10 indexes) is 2.2MB.

I am using micros for the nodes which are very limited in memory.

I'm not currently using a java.env file so I guess I'm using the default
values for the JVM which is typically xmx512M if I remember correctly.

Could it be just a memory issue?

JiM K.

On Thu, Mar 1, 2018 at 11:13 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 3/1/2018 7:59 PM, Jim Keeney wrote:
>
>> Read about the maxbuffer and am pretty sure that this might explain the
>> behavior we are seeing since it occurs when there has been a significant
>> reboot of all the servers. We have over 2 mb of config files for all of
>> our
>> indexes and if all the Solr nodes are sync ing their configs at once it
>> seems like that might overflow the buffer.
>>
>
> You probably recognize me from the Solr side.  Hello again.  I do know
> enough to handle this part, so I'm answering. I didn't consider the
> maxbuffer setting, because I didn't see anything about large packets in the
> logs you shared on the Solr mailing list, and it's very rare for Solr users
> to need to increase it.
>
> You only need to worry about the maxbuffer if any single part of the
> config in ZK (what is called a "znode") is over 1MB. Each file in the
> configs that you upload will go into its own znode.  So if none of the
> individual files in your configs is really large, you probably won't need
> to set jute.maxbuffer.
>
> As for the other things that Solr puts in ZK:  Unless you have a REALLY
> huge cluster (tons of collections, shards, replicas, servers, etc) then
> that information should be quite small.
>
> Newbie question, where would i set the -Djute.maxbuffer ? Should I update
>> the zkServer.sh file so this is applied every time zookeeper is started or
>> restarted.
>>
>
> If jute.maxbuffer is needed, it must be set on the startup options for
> every ZK server and every client that will access large znodes.  Which
> means all your ZK servers, all your Solr servers, and any invocations of
> things like the scripts Solr includes for uploading configs.
>
> Thanks,
> Shawn
>
>


-- 
Jim Keeney
President, FitterWeb
E: jim@fitterweb.com
M: 703-568-5887 <(703)%20568-5887>

*FitterWeb Consulting*
*Are you lean and agile enough? *

Re: Ensemble fails when one node looses connectivity

Posted by Shawn Heisey <ap...@elyograg.org>.

On 3/1/2018 7:59 PM, Jim Keeney wrote:
> Read about the maxbuffer and am pretty sure that this might explain the
> behavior we are seeing since it occurs when there has been a significant
> reboot of all the servers. We have over 2 mb of config files for all of our
> indexes and if all the Solr nodes are sync ing their configs at once it
> seems like that might overflow the buffer.

You probably recognize me from the Solr side.  Hello again.  I do know 
enough to handle this part, so I'm answering. I didn't consider the 
maxbuffer setting, because I didn't see anything about large packets in 
the logs you shared on the Solr mailing list, and it's very rare for 
Solr users to need to increase it.

You only need to worry about the maxbuffer if any single part of the 
config in ZK (what is called a "znode") is over 1MB. Each file in the 
configs that you upload will go into its own znode.  So if none of the 
individual files in your configs is really large, you probably won't 
need to set jute.maxbuffer.

As for the other things that Solr puts in ZK:  Unless you have a REALLY 
huge cluster (tons of collections, shards, replicas, servers, etc) then 
that information should be quite small.

> Newbie question, where would i set the -Djute.maxbuffer ? Should I update
> the zkServer.sh file so this is applied every time zookeeper is started or
> restarted.

If jute.maxbuffer is needed, it must be set on the startup options for 
every ZK server and every client that will access large znodes.  Which 
means all your ZK servers, all your Solr servers, and any invocations of 
things like the scripts Solr includes for uploading configs.

Thanks,
Shawn

Re: Ensemble fails when one node looses connectivity

Posted by Jim Keeney <ji...@fitterweb.com>.

Steph -

Read about the maxbuffer and am pretty sure that this might explain the
behavior we are seeing since it occurs when there has been a significant
reboot of all the servers. We have over 2 mb of config files for all of our
indexes and if all the Solr nodes are sync ing their configs at once it
seems like that might overflow the buffer.

Newbie question, where would i set the -Djute.maxbuffer ? Should I update
the zkServer.sh file so this is applied every time zookeeper is started or
restarted.

Also, I noted the caution and will make sure that all of the nodes are set
to the same value. Saw some discussion about having to change the zkCli
settings to be larger than that of the server. Is that true?

Thanks in advance.

Jim K.

On Thu, Mar 1, 2018 at 9:13 PM, Jim Keeney <ji...@fitterweb.com> wrote:

> Thanks, Yes, I have about 2MB stored in the configurations folders. I will
> increase the jute.maxbuffer and see if that helps.
>
> Jim K.
>
> On Thu, Mar 1, 2018 at 8:58 PM, Steph van Schalkwyk <
> svanschalkwyk@gmail.com> wrote:
>
>> Does the log say anything about timing out on init?
>> Your initLimit is already pretty big, but then we don't know anything
>> about
>> your setup.
>> Are you storing more than 1MB in a znode? Then increase jute.maxbuffer (in
>> java.env as a -Djute.maxbuffer=xxxxxx).
>> I've recently run into that with Fusion 3.1.
>> Post more details, if you would.
>> Good luck.
>> Steph
>>
>>
>> On Thu, Mar 1, 2018 at 7:43 PM, Jim Keeney <ji...@fitterweb.com> wrote:
>>
>> > I'm using Zookeeper with solr to create a cluster and I have come across
>> > what seems like an unexpected behavior. The cluster is setup on AWS
>> using
>> > opsworks.  I am using a 3 node zookeeper ensemble. The zookeeper config
>> > on all three nodes is:
>> >
>> > clientPort=2181
>> >
>> > dataDir=/var/opt/zookeeper/data
>> >
>> > tickTime=2000
>> >
>> > autopurge.purgeInterval=24
>> >
>> > initLimit=100
>> >
>> > syncLimit=5
>> >
>> > server.1=172.31.86.130:2888:3888
>> >
>> > server.2=172.31.16.234:2888:3888
>> >
>> > server.3=172.31.73.122:2888:3888
>> >
>> >
>> > Here is the issue:
>> >
>> > If one node in the ensemble fails or is shut down the ensemble carries
>> on.
>> > However, when the node is restarted it's attempt to connect to the other
>> > members of the cluster are rejected. The only way that I have found to
>> > restore the ensemble is to restart all of the nodes within a short time
>> > span of each other.
>> >
>> > If I do that they are able to discover each other  carry on a proper
>> > leader election and restore order.
>> >
>> > Once they are restored everything is fine but if one of the nodes goes
>> > down we are faced wit the same problem.
>> >
>> > How do I ensure that if a node goes down, it can restart and rejoin the
>> > ensemble with out having to manually restart all the other nodes?
>> >
>> > Any help appreciated.
>> >
>> > Thanks.
>> >
>> > Jim K.
>> >
>> >
>> >
>> >
>> > --
>> > Jim Keeney
>> > President, FitterWeb
>> > E: jim@fitterweb.com
>> > M: 703-568-5887 <(703)%20568-5887>
>> >
>> > *FitterWeb Consulting*
>> > *Are you lean and agile enough? *
>> >
>>
>
>
>
> --
> Jim Keeney
> President, FitterWeb
> E: jim@fitterweb.com
> M: 703-568-5887 <(703)%20568-5887>
>
> *FitterWeb Consulting*
> *Are you lean and agile enough? *
>



-- 
Jim Keeney
President, FitterWeb
E: jim@fitterweb.com
M: 703-568-5887 <(703)%20568-5887>

*FitterWeb Consulting*
*Are you lean and agile enough? *

Re: Ensemble fails when one node looses connectivity

Posted by Jim Keeney <ji...@fitterweb.com>.

Thanks, Yes, I have about 2MB stored in the configurations folders. I will
increase the jute.maxbuffer and see if that helps.

Jim K.

On Thu, Mar 1, 2018 at 8:58 PM, Steph van Schalkwyk <svanschalkwyk@gmail.com
> wrote:

> Does the log say anything about timing out on init?
> Your initLimit is already pretty big, but then we don't know anything about
> your setup.
> Are you storing more than 1MB in a znode? Then increase jute.maxbuffer (in
> java.env as a -Djute.maxbuffer=xxxxxx).
> I've recently run into that with Fusion 3.1.
> Post more details, if you would.
> Good luck.
> Steph
>
>
> On Thu, Mar 1, 2018 at 7:43 PM, Jim Keeney <ji...@fitterweb.com> wrote:
>
> > I'm using Zookeeper with solr to create a cluster and I have come across
> > what seems like an unexpected behavior. The cluster is setup on AWS using
> > opsworks.  I am using a 3 node zookeeper ensemble. The zookeeper config
> > on all three nodes is:
> >
> > clientPort=2181
> >
> > dataDir=/var/opt/zookeeper/data
> >
> > tickTime=2000
> >
> > autopurge.purgeInterval=24
> >
> > initLimit=100
> >
> > syncLimit=5
> >
> > server.1=172.31.86.130:2888:3888
> >
> > server.2=172.31.16.234:2888:3888
> >
> > server.3=172.31.73.122:2888:3888
> >
> >
> > Here is the issue:
> >
> > If one node in the ensemble fails or is shut down the ensemble carries
> on.
> > However, when the node is restarted it's attempt to connect to the other
> > members of the cluster are rejected. The only way that I have found to
> > restore the ensemble is to restart all of the nodes within a short time
> > span of each other.
> >
> > If I do that they are able to discover each other  carry on a proper
> > leader election and restore order.
> >
> > Once they are restored everything is fine but if one of the nodes goes
> > down we are faced wit the same problem.
> >
> > How do I ensure that if a node goes down, it can restart and rejoin the
> > ensemble with out having to manually restart all the other nodes?
> >
> > Any help appreciated.
> >
> > Thanks.
> >
> > Jim K.
> >
> >
> >
> >
> > --
> > Jim Keeney
> > President, FitterWeb
> > E: jim@fitterweb.com
> > M: 703-568-5887 <(703)%20568-5887>
> >
> > *FitterWeb Consulting*
> > *Are you lean and agile enough? *
> >
>



-- 
Jim Keeney
President, FitterWeb
E: jim@fitterweb.com
M: 703-568-5887 <(703)%20568-5887>

*FitterWeb Consulting*
*Are you lean and agile enough? *

Re: Ensemble fails when one node looses connectivity

Posted by Steph van Schalkwyk <sv...@gmail.com>.

Does the log say anything about timing out on init?
Your initLimit is already pretty big, but then we don't know anything about
your setup.
Are you storing more than 1MB in a znode? Then increase jute.maxbuffer (in
java.env as a -Djute.maxbuffer=xxxxxx).
I've recently run into that with Fusion 3.1.
Post more details, if you would.
Good luck.
Steph


On Thu, Mar 1, 2018 at 7:43 PM, Jim Keeney <ji...@fitterweb.com> wrote:

> I'm using Zookeeper with solr to create a cluster and I have come across
> what seems like an unexpected behavior. The cluster is setup on AWS using
> opsworks.  I am using a 3 node zookeeper ensemble. The zookeeper config
> on all three nodes is:
>
> clientPort=2181
>
> dataDir=/var/opt/zookeeper/data
>
> tickTime=2000
>
> autopurge.purgeInterval=24
>
> initLimit=100
>
> syncLimit=5
>
> server.1=172.31.86.130:2888:3888
>
> server.2=172.31.16.234:2888:3888
>
> server.3=172.31.73.122:2888:3888
>
>
> Here is the issue:
>
> If one node in the ensemble fails or is shut down the ensemble carries on.
> However, when the node is restarted it's attempt to connect to the other
> members of the cluster are rejected. The only way that I have found to
> restore the ensemble is to restart all of the nodes within a short time
> span of each other.
>
> If I do that they are able to discover each other  carry on a proper
> leader election and restore order.
>
> Once they are restored everything is fine but if one of the nodes goes
> down we are faced wit the same problem.
>
> How do I ensure that if a node goes down, it can restart and rejoin the
> ensemble with out having to manually restart all the other nodes?
>
> Any help appreciated.
>
> Thanks.
>
> Jim K.
>
>
>
>
> --
> Jim Keeney
> President, FitterWeb
> E: jim@fitterweb.com
> M: 703-568-5887 <(703)%20568-5887>
>
> *FitterWeb Consulting*
> *Are you lean and agile enough? *
>