You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by rammohan ganapavarapu <ra...@gmail.com> on 2017/03/09 17:00:47 UTC

shutdown Observer

Hi,

We have a multi data-center zk cluster with all the followers are in one
data-center and observers in other data-centers, for some reason observers
are going down with the following exception and i am not sure what could be
the reason and how to avoid this issue, any thoughts?

Ram



2017-03-09 09:00:18,305 - WARN
[QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception when
observing the leader
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:152)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
        at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
        at
org.apache.zookeeper.server.quorum.Observer.observeLeader(Observer.java:75)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
2017-03-09 09:00:18,306 - INFO
[QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown called
java.lang.Exception: shutdown Observer
        at
org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)

Re: shutdown Observer

Posted by Mike Richardson <mi...@motum.be>.
Unsubscribe


Mike Richardson

Senior Software Engineer



*MoTuM N.V. | Dellingstraat 34 | B-2800 MECHELEN | Belgium*


T +32(0)15 28 16 63
M +41 7943 69538


www.motum.be

On 9 March 2017 at 19:12, Dan Benediktson <db...@twitter.com.invalid>
wrote:

> It's also likely you have a fair bit of packet loss between your
> datacenters, unless you know you have a solid network between them. If your
> observers are falling offline "randomly", packet loss is a pretty likely
> culprit.
>
> On Thu, Mar 9, 2017 at 9:54 AM, Michael Han <ha...@cloudera.com> wrote:
>
> > The log indicates that your server socket on observer timed out after
> > syncing with leader. It could simply because that the latency between
> your
> > DCs exceeds the socket timeout configuration ZK uses. The timeout is
> > calculated as tickTime * syncLimit so you might want tweak these values
> to
> > fit the latency between your DCs.
> >
> > On Thu, Mar 9, 2017 at 9:00 AM, rammohan ganapavarapu <
> > rammohanganap@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > We have a multi data-center zk cluster with all the followers are in
> one
> > > data-center and observers in other data-centers, for some reason
> > observers
> > > are going down with the following exception and i am not sure what
> could
> > be
> > > the reason and how to avoid this issue, any thoughts?
> > >
> > > Ram
> > >
> > >
> > >
> > > 2017-03-09 09:00:18,305 - WARN
> > > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception
> when
> > > observing the leader
> > > java.net.SocketTimeoutException: Read timed out
> > >         at java.net.SocketInputStream.socketRead0(Native Method)
> > >         at java.net.SocketInputStream.read(SocketInputStream.java:152)
> > >         at java.net.SocketInputStream.read(SocketInputStream.java:122)
> > >         at java.io.BufferedInputStream.fill(BufferedInputStream.java:
> > 235)
> > >         at java.io.BufferedInputStream.read(BufferedInputStream.java:
> > 254)
> > >         at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > >         at
> > > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> > >         at
> > > org.apache.zookeeper.server.quorum.QuorumPacket.
> > > deserialize(QuorumPacket.java:83)
> > >         at
> > > org.apache.jute.BinaryInputArchive.readRecord(
> > BinaryInputArchive.java:108)
> > >         at
> > > org.apache.zookeeper.server.quorum.Learner.readPacket(
> Learner.java:152)
> > >         at
> > > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> > > Observer.java:75)
> > >         at
> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
> > > 2017-03-09 09:00:18,306 - INFO
> > > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown
> > called
> > > java.lang.Exception: shutdown Observer
> > >         at
> > > org.apache.zookeeper.server.quorum.Observer.shutdown(
> Observer.java:137)
> > >
> >
> >
> >
> > --
> > Cheers
> > Michael.
> >
>

Re: shutdown Observer

Posted by Michael Han <ha...@cloudera.com>.
It helps. An extreme case is network partition and packet loss is 100%. ZK
rely on TCP for communications between quorum peers, so the lost packet
will be retransmitted by TCP, so unless your network is partitioned
forever, the system will move forward once the partition heals. There is no
worries about a packet loss forever because of the TCP guarantee. In this
case the timeout can be set to infinite (pass 0 to setSoTimeout) so socket
IO will block indefinitely until partition heals.

The socket timeout is really just to provide an opportunity for ZK server
to take action when we think we should bail out for a bad network condition
rather than blocking indefinitely, as ZK needs to satisfy some basic
liveness guarantee.

On Thu, Mar 9, 2017 at 3:12 PM, Jai Bheemsen Rao Dhanwada <
jaibheemsen@gmail.com> wrote:

> If there is packet loss, does increasing the initLimit value help?
>
> ref: http://efod.se/blog/archive/2013/02/09/zookeeper-initlimit
>
> Any thoughts?
>
> On Thu, Mar 9, 2017 at 10:12 AM, Dan Benediktson <
> dbenediktson@twitter.com.invalid> wrote:
>
> > It's also likely you have a fair bit of packet loss between your
> > datacenters, unless you know you have a solid network between them. If
> your
> > observers are falling offline "randomly", packet loss is a pretty likely
> > culprit.
> >
> > On Thu, Mar 9, 2017 at 9:54 AM, Michael Han <ha...@cloudera.com> wrote:
> >
> > > The log indicates that your server socket on observer timed out after
> > > syncing with leader. It could simply because that the latency between
> > your
> > > DCs exceeds the socket timeout configuration ZK uses. The timeout is
> > > calculated as tickTime * syncLimit so you might want tweak these values
> > to
> > > fit the latency between your DCs.
> > >
> > > On Thu, Mar 9, 2017 at 9:00 AM, rammohan ganapavarapu <
> > > rammohanganap@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > We have a multi data-center zk cluster with all the followers are in
> > one
> > > > data-center and observers in other data-centers, for some reason
> > > observers
> > > > are going down with the following exception and i am not sure what
> > could
> > > be
> > > > the reason and how to avoid this issue, any thoughts?
> > > >
> > > > Ram
> > > >
> > > >
> > > >
> > > > 2017-03-09 09:00:18,305 - WARN
> > > > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception
> > when
> > > > observing the leader
> > > > java.net.SocketTimeoutException: Read timed out
> > > >         at java.net.SocketInputStream.socketRead0(Native Method)
> > > >         at java.net.SocketInputStream.read(SocketInputStream.java:
> 152)
> > > >         at java.net.SocketInputStream.read(SocketInputStream.java:
> 122)
> > > >         at java.io.BufferedInputStream.
> fill(BufferedInputStream.java:
> > > 235)
> > > >         at java.io.BufferedInputStream.
> read(BufferedInputStream.java:
> > > 254)
> > > >         at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > >         at
> > > > org.apache.jute.BinaryInputArchive.readInt(
> BinaryInputArchive.java:63)
> > > >         at
> > > > org.apache.zookeeper.server.quorum.QuorumPacket.
> > > > deserialize(QuorumPacket.java:83)
> > > >         at
> > > > org.apache.jute.BinaryInputArchive.readRecord(
> > > BinaryInputArchive.java:108)
> > > >         at
> > > > org.apache.zookeeper.server.quorum.Learner.readPacket(
> > Learner.java:152)
> > > >         at
> > > > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> > > > Observer.java:75)
> > > >         at
> > > > org.apache.zookeeper.server.quorum.QuorumPeer.run(
> QuorumPeer.java:727)
> > > > 2017-03-09 09:00:18,306 - INFO
> > > > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown
> > > called
> > > > java.lang.Exception: shutdown Observer
> > > >         at
> > > > org.apache.zookeeper.server.quorum.Observer.shutdown(
> > Observer.java:137)
> > > >
> > >
> > >
> > >
> > > --
> > > Cheers
> > > Michael.
> > >
> >
>



-- 
Cheers
Michael.

Re: shutdown Observer

Posted by Jai Bheemsen Rao Dhanwada <ja...@gmail.com>.
If there is packet loss, does increasing the initLimit value help?

ref: http://efod.se/blog/archive/2013/02/09/zookeeper-initlimit

Any thoughts?

On Thu, Mar 9, 2017 at 10:12 AM, Dan Benediktson <
dbenediktson@twitter.com.invalid> wrote:

> It's also likely you have a fair bit of packet loss between your
> datacenters, unless you know you have a solid network between them. If your
> observers are falling offline "randomly", packet loss is a pretty likely
> culprit.
>
> On Thu, Mar 9, 2017 at 9:54 AM, Michael Han <ha...@cloudera.com> wrote:
>
> > The log indicates that your server socket on observer timed out after
> > syncing with leader. It could simply because that the latency between
> your
> > DCs exceeds the socket timeout configuration ZK uses. The timeout is
> > calculated as tickTime * syncLimit so you might want tweak these values
> to
> > fit the latency between your DCs.
> >
> > On Thu, Mar 9, 2017 at 9:00 AM, rammohan ganapavarapu <
> > rammohanganap@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > We have a multi data-center zk cluster with all the followers are in
> one
> > > data-center and observers in other data-centers, for some reason
> > observers
> > > are going down with the following exception and i am not sure what
> could
> > be
> > > the reason and how to avoid this issue, any thoughts?
> > >
> > > Ram
> > >
> > >
> > >
> > > 2017-03-09 09:00:18,305 - WARN
> > > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception
> when
> > > observing the leader
> > > java.net.SocketTimeoutException: Read timed out
> > >         at java.net.SocketInputStream.socketRead0(Native Method)
> > >         at java.net.SocketInputStream.read(SocketInputStream.java:152)
> > >         at java.net.SocketInputStream.read(SocketInputStream.java:122)
> > >         at java.io.BufferedInputStream.fill(BufferedInputStream.java:
> > 235)
> > >         at java.io.BufferedInputStream.read(BufferedInputStream.java:
> > 254)
> > >         at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > >         at
> > > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> > >         at
> > > org.apache.zookeeper.server.quorum.QuorumPacket.
> > > deserialize(QuorumPacket.java:83)
> > >         at
> > > org.apache.jute.BinaryInputArchive.readRecord(
> > BinaryInputArchive.java:108)
> > >         at
> > > org.apache.zookeeper.server.quorum.Learner.readPacket(
> Learner.java:152)
> > >         at
> > > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> > > Observer.java:75)
> > >         at
> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
> > > 2017-03-09 09:00:18,306 - INFO
> > > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown
> > called
> > > java.lang.Exception: shutdown Observer
> > >         at
> > > org.apache.zookeeper.server.quorum.Observer.shutdown(
> Observer.java:137)
> > >
> >
> >
> >
> > --
> > Cheers
> > Michael.
> >
>

Re: shutdown Observer

Posted by Dan Benediktson <db...@twitter.com.INVALID>.
It's also likely you have a fair bit of packet loss between your
datacenters, unless you know you have a solid network between them. If your
observers are falling offline "randomly", packet loss is a pretty likely
culprit.

On Thu, Mar 9, 2017 at 9:54 AM, Michael Han <ha...@cloudera.com> wrote:

> The log indicates that your server socket on observer timed out after
> syncing with leader. It could simply because that the latency between your
> DCs exceeds the socket timeout configuration ZK uses. The timeout is
> calculated as tickTime * syncLimit so you might want tweak these values to
> fit the latency between your DCs.
>
> On Thu, Mar 9, 2017 at 9:00 AM, rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
> > Hi,
> >
> > We have a multi data-center zk cluster with all the followers are in one
> > data-center and observers in other data-centers, for some reason
> observers
> > are going down with the following exception and i am not sure what could
> be
> > the reason and how to avoid this issue, any thoughts?
> >
> > Ram
> >
> >
> >
> > 2017-03-09 09:00:18,305 - WARN
> > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception when
> > observing the leader
> > java.net.SocketTimeoutException: Read timed out
> >         at java.net.SocketInputStream.socketRead0(Native Method)
> >         at java.net.SocketInputStream.read(SocketInputStream.java:152)
> >         at java.net.SocketInputStream.read(SocketInputStream.java:122)
> >         at java.io.BufferedInputStream.fill(BufferedInputStream.java:
> 235)
> >         at java.io.BufferedInputStream.read(BufferedInputStream.java:
> 254)
> >         at java.io.DataInputStream.readInt(DataInputStream.java:387)
> >         at
> > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >         at
> > org.apache.zookeeper.server.quorum.QuorumPacket.
> > deserialize(QuorumPacket.java:83)
> >         at
> > org.apache.jute.BinaryInputArchive.readRecord(
> BinaryInputArchive.java:108)
> >         at
> > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
> >         at
> > org.apache.zookeeper.server.quorum.Observer.observeLeader(
> > Observer.java:75)
> >         at
> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
> > 2017-03-09 09:00:18,306 - INFO
> > [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown
> called
> > java.lang.Exception: shutdown Observer
> >         at
> > org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)
> >
>
>
>
> --
> Cheers
> Michael.
>

Re: shutdown Observer

Posted by Michael Han <ha...@cloudera.com>.
The log indicates that your server socket on observer timed out after
syncing with leader. It could simply because that the latency between your
DCs exceeds the socket timeout configuration ZK uses. The timeout is
calculated as tickTime * syncLimit so you might want tweak these values to
fit the latency between your DCs.

On Thu, Mar 9, 2017 at 9:00 AM, rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Hi,
>
> We have a multi data-center zk cluster with all the followers are in one
> data-center and observers in other data-centers, for some reason observers
> are going down with the following exception and i am not sure what could be
> the reason and how to avoid this issue, any thoughts?
>
> Ram
>
>
>
> 2017-03-09 09:00:18,305 - WARN
> [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@79] - Exception when
> observing the leader
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:152)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at
> org.apache.zookeeper.server.quorum.QuorumPacket.
> deserialize(QuorumPacket.java:83)
>         at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
>         at
> org.apache.zookeeper.server.quorum.Observer.observeLeader(
> Observer.java:75)
>         at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:727)
> 2017-03-09 09:00:18,306 - INFO
> [QuorumPeer[myid=41]/0:0:0:0:0:0:0:0:2181:Observer@137] - shutdown called
> java.lang.Exception: shutdown Observer
>         at
> org.apache.zookeeper.server.quorum.Observer.shutdown(Observer.java:137)
>



-- 
Cheers
Michael.