You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by rammohan ganapavarapu <ra...@gmail.com> on 2018/07/02 19:12:38 UTC

Observer went down with Read timed out exception

All,

I have multi data-center ldap cluster setup with other data-center with all
observers all of sudden all the observer threads went down with the
following message, any idea why they went down? We don't see any network
related issues between data-centers.


2018-06-29 05:32:59,036 [myid:222] - WARN
[QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception when
observing the leader
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
at
org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:75)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
2018-06-29 05:32:59,244 [myid:222] - INFO
[QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown called
java.lang.Exception: shutdown Observer
at org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:731)


Thanks,
Ram

Re: Observer went down with Read timed out exception

Posted by rammohan ganapavarapu <ra...@gmail.com>.
Andor,

Thanks for your time, i am waiting for 3.5 stable version to upgrade. Log
says read timeout right, what kind of packet or data is it reading from
leader?

Ram

On Wed, Jul 4, 2018, 12:24 AM Andor Molnar <an...@cloudera.com.invalid>
wrote:

> Unfortunately I cannot imagine anything other than what Norbert already
> mentioned. If the followers were stable, a problem in the DC-DC link could
> explain why all the observers have gone in a moment. If it had been a
> problem with leader overloading, even the followers would have gone with
> the observers too.
>
> If none of these cases happened, I'm afraid I cannot help more. I'm not
> aware of a similar, existing issue. Maybe more senior devs can comment.
>
> However, your version is quite old. Most production clusters are running
> 3.4.6 or 3.4.9 as far as I'm concerned. You might want to upgrade to the
> latest stable version though which is 3.4.12 at the moment. 3.4.13 will be
> out soon as well.
>
> Regards,
> Andor
>
>
>
>
> On Tue, Jul 3, 2018 at 8:13 PM, rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
> > Andor,
> >
> > Zk  version that i use is zk_version 3.4.5-1392090, built on 09/30/2012
> > 17:52 GMT
> > No Auth or encryption config
> > None my of network graphs showing any dip or unusual pattern thats why i
> am
> > thinking there may not be any n/w issue. I have those nodes in cloud so
> > checking with them to see if any n/w issue between regions.
> >
> > Thanks,
> > Ram
> >
> >
> > On Tue, Jul 3, 2018 at 6:29 AM Andor Molnar <an...@cloudera.com.invalid>
> > wrote:
> >
> > > Hi Rammohan,
> > >
> > > Would you please elaborate on the details of your cluster setup?
> > > Which ZooKeeper version do you use?
> > > Do you use authentication / encryption?
> > > Would you please attach config files and log files of other nodes like
> > > leader and followers?
> > >
> > > How did you make sure that there was no network problem at the time
> when
> > > issue happened?
> > > Would you please attach graphs / diagrams on the network traffic
> > including
> > > latency and bandwidth usage between the affected data centers?
> > >
> > > Regards,
> > > Andor
> > >
> > >
> > >
> > >
> > > On Tue, Jul 3, 2018 at 2:56 PM, rammohan ganapavarapu <
> > > rammohanganap@gmail.com> wrote:
> > >
> > > > Yes I am sure there is no network issues, if leader is busy in GC
> > > followers
> > > > on the same DC would have been shutdown as we right but it wasn't the
> > > case.
> > > >
> > > > On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar
> > <nkalmar@cloudera.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hi Ram,
> > > > >
> > > > > Are you sure there were no network error? For me, this looks like
> it
> > > > could
> > > > > be due to failed heartbeats (as shutdown was called after the
> > timeout).
> > > > >
> > > > > It is also possible the leader was busy (maybe garbage collection
> > > caused
> > > > > pause?) - especially if you store big(ish) chunks of data in
> > ZooKeeper.
> > > > > (There is plan to integrate JVMPauseMonitor to ZooKeeper for this
> > > reason
> > > > > actually).
> > > > >
> > > > > Regards,
> > > > > Norbert
> > > > >
> > > > > On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu <
> > > > > rammohanganap@gmail.com> wrote:
> > > > >
> > > > > > All,
> > > > > >
> > > > > > I have multi data-center ldap cluster setup with other
> data-center
> > > with
> > > > > all
> > > > > > observers all of sudden all the observer threads went down with
> the
> > > > > > following message, any idea why they went down? We don't see any
> > > > network
> > > > > > related issues between data-centers.
> > > > > >
> > > > > >
> > > > > > 2018-06-29 05:32:59,036 [myid:222] - WARN
> > > > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] -
> > Exception
> > > > when
> > > > > > observing the leader
> > > > > > java.net.SocketTimeoutException: Read timed out
> > > > > > at java.net.SocketInputStream.socketRead0(Native Method)
> > > > > > at java.net.SocketInputStream.socketRead(SocketInputStream.
> > java:116)
> > > > > > at java.net.SocketInputStream.read(SocketInputStream.java:170)
> > > > > > at java.net.SocketInputStream.read(SocketInputStream.java:141)
> > > > > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> > > > > > at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> > > > > > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > > > > at org.apache.jute.BinaryInputArchive.readInt(
> > > > BinaryInputArchive.java:63)
> > > > > > at
> > > > > >
> > > > > >
> > > > > org.apache.zookeeper.server.quorum.QuorumPacket.
> > > > deserialize(QuorumPacket.java:83)
> > > > > > at
> > > > > >
> > > > > org.apache.jute.BinaryInputArchive.readRecord(
> > > > BinaryInputArchive.java:108)
> > > > > > at
> > > > > org.apache.zookeeper.server.quorum.Learner.readPacket(
> > Learner.java:152)
> > > > > > at
> > > > > >
> > > > > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> > > > Observer.java:75)
> > > > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> > > > QuorumPeer.java:727)
> > > > > > 2018-06-29 05:32:59,244 [myid:222] - INFO
> > > > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] -
> > shutdown
> > > > > called
> > > > > > java.lang.Exception: shutdown Observer
> > > > > > at
> > > > > org.apache.zookeeper.server.quorum.Observer.shutdown(
> > Observer.java:137)
> > > > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> > > > QuorumPeer.java:731)
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Ram
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Observer went down with Read timed out exception

Posted by Andor Molnar <an...@cloudera.com.INVALID>.
Unfortunately I cannot imagine anything other than what Norbert already
mentioned. If the followers were stable, a problem in the DC-DC link could
explain why all the observers have gone in a moment. If it had been a
problem with leader overloading, even the followers would have gone with
the observers too.

If none of these cases happened, I'm afraid I cannot help more. I'm not
aware of a similar, existing issue. Maybe more senior devs can comment.

However, your version is quite old. Most production clusters are running
3.4.6 or 3.4.9 as far as I'm concerned. You might want to upgrade to the
latest stable version though which is 3.4.12 at the moment. 3.4.13 will be
out soon as well.

Regards,
Andor




On Tue, Jul 3, 2018 at 8:13 PM, rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Andor,
>
> Zk  version that i use is zk_version 3.4.5-1392090, built on 09/30/2012
> 17:52 GMT
> No Auth or encryption config
> None my of network graphs showing any dip or unusual pattern thats why i am
> thinking there may not be any n/w issue. I have those nodes in cloud so
> checking with them to see if any n/w issue between regions.
>
> Thanks,
> Ram
>
>
> On Tue, Jul 3, 2018 at 6:29 AM Andor Molnar <an...@cloudera.com.invalid>
> wrote:
>
> > Hi Rammohan,
> >
> > Would you please elaborate on the details of your cluster setup?
> > Which ZooKeeper version do you use?
> > Do you use authentication / encryption?
> > Would you please attach config files and log files of other nodes like
> > leader and followers?
> >
> > How did you make sure that there was no network problem at the time when
> > issue happened?
> > Would you please attach graphs / diagrams on the network traffic
> including
> > latency and bandwidth usage between the affected data centers?
> >
> > Regards,
> > Andor
> >
> >
> >
> >
> > On Tue, Jul 3, 2018 at 2:56 PM, rammohan ganapavarapu <
> > rammohanganap@gmail.com> wrote:
> >
> > > Yes I am sure there is no network issues, if leader is busy in GC
> > followers
> > > on the same DC would have been shutdown as we right but it wasn't the
> > case.
> > >
> > > On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar
> <nkalmar@cloudera.com.invalid
> > >
> > > wrote:
> > >
> > > > Hi Ram,
> > > >
> > > > Are you sure there were no network error? For me, this looks like it
> > > could
> > > > be due to failed heartbeats (as shutdown was called after the
> timeout).
> > > >
> > > > It is also possible the leader was busy (maybe garbage collection
> > caused
> > > > pause?) - especially if you store big(ish) chunks of data in
> ZooKeeper.
> > > > (There is plan to integrate JVMPauseMonitor to ZooKeeper for this
> > reason
> > > > actually).
> > > >
> > > > Regards,
> > > > Norbert
> > > >
> > > > On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu <
> > > > rammohanganap@gmail.com> wrote:
> > > >
> > > > > All,
> > > > >
> > > > > I have multi data-center ldap cluster setup with other data-center
> > with
> > > > all
> > > > > observers all of sudden all the observer threads went down with the
> > > > > following message, any idea why they went down? We don't see any
> > > network
> > > > > related issues between data-centers.
> > > > >
> > > > >
> > > > > 2018-06-29 05:32:59,036 [myid:222] - WARN
> > > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] -
> Exception
> > > when
> > > > > observing the leader
> > > > > java.net.SocketTimeoutException: Read timed out
> > > > > at java.net.SocketInputStream.socketRead0(Native Method)
> > > > > at java.net.SocketInputStream.socketRead(SocketInputStream.
> java:116)
> > > > > at java.net.SocketInputStream.read(SocketInputStream.java:170)
> > > > > at java.net.SocketInputStream.read(SocketInputStream.java:141)
> > > > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> > > > > at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> > > > > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > > > at org.apache.jute.BinaryInputArchive.readInt(
> > > BinaryInputArchive.java:63)
> > > > > at
> > > > >
> > > > >
> > > > org.apache.zookeeper.server.quorum.QuorumPacket.
> > > deserialize(QuorumPacket.java:83)
> > > > > at
> > > > >
> > > > org.apache.jute.BinaryInputArchive.readRecord(
> > > BinaryInputArchive.java:108)
> > > > > at
> > > > org.apache.zookeeper.server.quorum.Learner.readPacket(
> Learner.java:152)
> > > > > at
> > > > >
> > > > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> > > Observer.java:75)
> > > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> > > QuorumPeer.java:727)
> > > > > 2018-06-29 05:32:59,244 [myid:222] - INFO
> > > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] -
> shutdown
> > > > called
> > > > > java.lang.Exception: shutdown Observer
> > > > > at
> > > > org.apache.zookeeper.server.quorum.Observer.shutdown(
> Observer.java:137)
> > > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> > > QuorumPeer.java:731)
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Ram
> > > > >
> > > >
> > >
> >
>

Re: Observer went down with Read timed out exception

Posted by rammohan ganapavarapu <ra...@gmail.com>.
Andor,

Zk  version that i use is zk_version 3.4.5-1392090, built on 09/30/2012
17:52 GMT
No Auth or encryption config
None my of network graphs showing any dip or unusual pattern thats why i am
thinking there may not be any n/w issue. I have those nodes in cloud so
checking with them to see if any n/w issue between regions.

Thanks,
Ram


On Tue, Jul 3, 2018 at 6:29 AM Andor Molnar <an...@cloudera.com.invalid>
wrote:

> Hi Rammohan,
>
> Would you please elaborate on the details of your cluster setup?
> Which ZooKeeper version do you use?
> Do you use authentication / encryption?
> Would you please attach config files and log files of other nodes like
> leader and followers?
>
> How did you make sure that there was no network problem at the time when
> issue happened?
> Would you please attach graphs / diagrams on the network traffic including
> latency and bandwidth usage between the affected data centers?
>
> Regards,
> Andor
>
>
>
>
> On Tue, Jul 3, 2018 at 2:56 PM, rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
> > Yes I am sure there is no network issues, if leader is busy in GC
> followers
> > on the same DC would have been shutdown as we right but it wasn't the
> case.
> >
> > On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar <nkalmar@cloudera.com.invalid
> >
> > wrote:
> >
> > > Hi Ram,
> > >
> > > Are you sure there were no network error? For me, this looks like it
> > could
> > > be due to failed heartbeats (as shutdown was called after the timeout).
> > >
> > > It is also possible the leader was busy (maybe garbage collection
> caused
> > > pause?) - especially if you store big(ish) chunks of data in ZooKeeper.
> > > (There is plan to integrate JVMPauseMonitor to ZooKeeper for this
> reason
> > > actually).
> > >
> > > Regards,
> > > Norbert
> > >
> > > On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu <
> > > rammohanganap@gmail.com> wrote:
> > >
> > > > All,
> > > >
> > > > I have multi data-center ldap cluster setup with other data-center
> with
> > > all
> > > > observers all of sudden all the observer threads went down with the
> > > > following message, any idea why they went down? We don't see any
> > network
> > > > related issues between data-centers.
> > > >
> > > >
> > > > 2018-06-29 05:32:59,036 [myid:222] - WARN
> > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception
> > when
> > > > observing the leader
> > > > java.net.SocketTimeoutException: Read timed out
> > > > at java.net.SocketInputStream.socketRead0(Native Method)
> > > > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> > > > at java.net.SocketInputStream.read(SocketInputStream.java:170)
> > > > at java.net.SocketInputStream.read(SocketInputStream.java:141)
> > > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> > > > at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> > > > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > > at org.apache.jute.BinaryInputArchive.readInt(
> > BinaryInputArchive.java:63)
> > > > at
> > > >
> > > >
> > > org.apache.zookeeper.server.quorum.QuorumPacket.
> > deserialize(QuorumPacket.java:83)
> > > > at
> > > >
> > > org.apache.jute.BinaryInputArchive.readRecord(
> > BinaryInputArchive.java:108)
> > > > at
> > > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> > > > at
> > > >
> > > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> > Observer.java:75)
> > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> > QuorumPeer.java:727)
> > > > 2018-06-29 05:32:59,244 [myid:222] - INFO
> > > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown
> > > called
> > > > java.lang.Exception: shutdown Observer
> > > > at
> > > org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)
> > > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> > QuorumPeer.java:731)
> > > >
> > > >
> > > > Thanks,
> > > > Ram
> > > >
> > >
> >
>

Re: Observer went down with Read timed out exception

Posted by Andor Molnar <an...@cloudera.com.INVALID>.
Hi Rammohan,

Would you please elaborate on the details of your cluster setup?
Which ZooKeeper version do you use?
Do you use authentication / encryption?
Would you please attach config files and log files of other nodes like
leader and followers?

How did you make sure that there was no network problem at the time when
issue happened?
Would you please attach graphs / diagrams on the network traffic including
latency and bandwidth usage between the affected data centers?

Regards,
Andor




On Tue, Jul 3, 2018 at 2:56 PM, rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Yes I am sure there is no network issues, if leader is busy in GC followers
> on the same DC would have been shutdown as we right but it wasn't the case.
>
> On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar <nk...@cloudera.com.invalid>
> wrote:
>
> > Hi Ram,
> >
> > Are you sure there were no network error? For me, this looks like it
> could
> > be due to failed heartbeats (as shutdown was called after the timeout).
> >
> > It is also possible the leader was busy (maybe garbage collection caused
> > pause?) - especially if you store big(ish) chunks of data in ZooKeeper.
> > (There is plan to integrate JVMPauseMonitor to ZooKeeper for this reason
> > actually).
> >
> > Regards,
> > Norbert
> >
> > On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu <
> > rammohanganap@gmail.com> wrote:
> >
> > > All,
> > >
> > > I have multi data-center ldap cluster setup with other data-center with
> > all
> > > observers all of sudden all the observer threads went down with the
> > > following message, any idea why they went down? We don't see any
> network
> > > related issues between data-centers.
> > >
> > >
> > > 2018-06-29 05:32:59,036 [myid:222] - WARN
> > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception
> when
> > > observing the leader
> > > java.net.SocketTimeoutException: Read timed out
> > > at java.net.SocketInputStream.socketRead0(Native Method)
> > > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> > > at java.net.SocketInputStream.read(SocketInputStream.java:170)
> > > at java.net.SocketInputStream.read(SocketInputStream.java:141)
> > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> > > at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> > > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > at org.apache.jute.BinaryInputArchive.readInt(
> BinaryInputArchive.java:63)
> > > at
> > >
> > >
> > org.apache.zookeeper.server.quorum.QuorumPacket.
> deserialize(QuorumPacket.java:83)
> > > at
> > >
> > org.apache.jute.BinaryInputArchive.readRecord(
> BinaryInputArchive.java:108)
> > > at
> > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> > > at
> > >
> > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> Observer.java:75)
> > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> QuorumPeer.java:727)
> > > 2018-06-29 05:32:59,244 [myid:222] - INFO
> > > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown
> > called
> > > java.lang.Exception: shutdown Observer
> > > at
> > org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)
> > > at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> QuorumPeer.java:731)
> > >
> > >
> > > Thanks,
> > > Ram
> > >
> >
>

Re: Observer went down with Read timed out exception

Posted by rammohan ganapavarapu <ra...@gmail.com>.
Yes I am sure there is no network issues, if leader is busy in GC followers
on the same DC would have been shutdown as we right but it wasn't the case.

On Tue, Jul 3, 2018, 1:56 AM Norbert Kalmar <nk...@cloudera.com.invalid>
wrote:

> Hi Ram,
>
> Are you sure there were no network error? For me, this looks like it could
> be due to failed heartbeats (as shutdown was called after the timeout).
>
> It is also possible the leader was busy (maybe garbage collection caused
> pause?) - especially if you store big(ish) chunks of data in ZooKeeper.
> (There is plan to integrate JVMPauseMonitor to ZooKeeper for this reason
> actually).
>
> Regards,
> Norbert
>
> On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
> > All,
> >
> > I have multi data-center ldap cluster setup with other data-center with
> all
> > observers all of sudden all the observer threads went down with the
> > following message, any idea why they went down? We don't see any network
> > related issues between data-centers.
> >
> >
> > 2018-06-29 05:32:59,036 [myid:222] - WARN
> > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception when
> > observing the leader
> > java.net.SocketTimeoutException: Read timed out
> > at java.net.SocketInputStream.socketRead0(Native Method)
> > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> > at java.net.SocketInputStream.read(SocketInputStream.java:170)
> > at java.net.SocketInputStream.read(SocketInputStream.java:141)
> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> > at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> > at
> >
> >
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> > at
> >
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> > at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> > at
> >
> org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:75)
> > at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
> > 2018-06-29 05:32:59,244 [myid:222] - INFO
> > [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown
> called
> > java.lang.Exception: shutdown Observer
> > at
> org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)
> > at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:731)
> >
> >
> > Thanks,
> > Ram
> >
>

Re: Observer went down with Read timed out exception

Posted by Norbert Kalmar <nk...@cloudera.com.INVALID>.
Hi Ram,

Are you sure there were no network error? For me, this looks like it could
be due to failed heartbeats (as shutdown was called after the timeout).

It is also possible the leader was busy (maybe garbage collection caused
pause?) - especially if you store big(ish) chunks of data in ZooKeeper.
(There is plan to integrate JVMPauseMonitor to ZooKeeper for this reason
actually).

Regards,
Norbert

On Mon, Jul 2, 2018 at 9:13 PM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> All,
>
> I have multi data-center ldap cluster setup with other data-center with all
> observers all of sudden all the observer threads went down with the
> following message, any idea why they went down? We don't see any network
> related issues between data-centers.
>
>
> 2018-06-29 05:32:59,036 [myid:222] - WARN
> [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception when
> observing the leader
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
>
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> at
> org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:75)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
> 2018-06-29 05:32:59,244 [myid:222] - INFO
> [QuorumPeer[myid=222]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown called
> java.lang.Exception: shutdown Observer
> at org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:731)
>
>
> Thanks,
> Ram
>