You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by "James A. Robinson" <ji...@highwire.org> on 2015/07/23 03:12:54 UTC

zookeeper reconnects

Anyone using replicated leveldb and seeing a somewhat regular set of errors
about lost connections to zookeeper?

2015-07-20 09:26:33,568 [hWire.org:2181)] INFO  ClientCnxn
    - Opening socket connection to server
zk2.mydomain.org/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate
using SASL (unknown error)
2015-07-20 09:26:33,569 [hWire.org:2181)] INFO  ClientCnxn
    - Socket connection established to zk2.mydomain.org/xxx.xxx.xxx.xxx:2181,
initiating session
2015-07-20 09:26:33,571 [hWire.org:2181)] INFO  ClientCnxn
    - Unable to read additional data from server sessionid
0x44e19713c020003, likely server has closed socket, closing socket
connection and attempting reconnect
2015-07-20 09:26:34,614 [hWire.org:2181)] INFO  ClientCnxn
    - Opening socket connection to server
zk5.mydomain.org/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate
using SASL (unknown error)
2015-07-20 09:26:34,615 [hWire.org:2181)] INFO  ClientCnxn
    - Socket connection established to zk5.mydomain.org/xxx.xxx.xxx.xxx:2181,
initiating session
2015-07-20 09:26:34,617 [hWire.org:2181)] INFO  ClientCnxn
    - Unable to read additional data from server sessionid
0x44e19713c020003, likely server has closed socket, closing socket
connection and attempting reconnect
2015-07-20 09:26:35,957 [hWire.org:2181)] INFO  ClientCnxn
    - Opening socket connection to server
zk4.mydomain.org/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate
using SASL (unknown error)
2015-07-20 09:26:35,958 [hWire.org:2181)] INFO  ClientCnxn
    - Socket connection established to zk4.mydomain.org/xxx.xxx.xxx.xxx:2181,
initiating session
2015-07-20 09:26:35,960 [hWire.org:2181)] INFO  ClientCnxn
    - Unable to read additional data from server sessionid
0x44e19713c020003, likely server has closed socket, closing socket
connection and attempting reconnect
2015-07-20 09:26:36,384 [hWire.org:2181)] INFO  ClientCnxn
    - Opening socket connection to server
zk6.mydomain.org/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate
using SASL (unknown error)
2015-07-20 09:26:36,385 [hWire.org:2181)] INFO  ClientCnxn
    - Socket connection established to zk6.mydomain.org/xxx.xxx.xxx.xxx:2181,
initiating session
2015-07-20 09:26:36,386 [hWire.org:2181)] INFO  ClientCnxn
    - Unable to read additional data from server sessionid
0x44e19713c020003, likely server has closed socket, closing socket
connection and attempting reconnect
2015-07-20 09:26:37,472 [hWire.org:2181)] INFO  ClientCnxn
    - Opening socket connection to server
zk1.mydomain.org/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate
using SASL (unknown error)
2015-07-20 09:26:37,473 [hWire.org:2181)] INFO  ClientCnxn
    - Socket connection established to zk1.mydomain.org/xxx.xxx.xxx.xxx:2181,
initiating session
2015-07-20 09:26:37,476 [hWire.org:2181)] INFO  ClientCnxn
    - Session establishment complete on server
zk1.mydomain.org/xxx.xxx.xxx.xxx:2181, sessionid = 0x44e19713c020003,
negotiated timeout = 5000

Re: zookeeper reconnects

Posted by Tim Bain <tb...@alumni.duke.edu>.
Ah, I see what you're saying, and I think you should submit a bug in JIRA
for it.  Ideally, provide a minimal configuration that'll reproduce the
problem when you do, if you can create one...

Tim

On Sat, Jul 25, 2015 at 10:06 AM, James A. Robinson <ji...@highwire.org>
wrote:

> Well, I'm pretty sure zookeeper won't let the client specify a value
> outside its own minSessionTimeout and maxSessionTimeout, so I think the
> real question is how close to minSessionTimeout I can get without seeing
> problems. It's really just masking the underlying problem though, right?
> The client ought to reinitialize itself in the event of failures and
> sometimes it is not.
> On Sat, Jul 25, 2015 at 08:25 Tim Bain <tb...@alumni.duke.edu> wrote:
>
> > James,
> >
> > You've tested two of the three cases; would it be possible to test the
> > third one?
> >
> >    - ActiveMQ timeout < ZooKeeper timeout: fails
> >    - ActiveMQ timeout = ZooKeeper timeout: succeeds
> >    - ActiveMQ timeout > ZooKeeper timeout: ???
> >
> > If we can zero in on exactly what the recommendation is, I can update
> > http://activemq.apache.org/replicated-leveldb-store.html (or another
> page
> > if you think there's a more appropriate one) to include the
> recommendation
> > we come to.  But I don't want to say on the official documentation that
> it
> > has to be = if >= would work fine.
> >
> > Tim
> >
> > On Fri, Jul 24, 2015 at 10:39 AM, James A. Robinson <ji...@highwire.org>
> > wrote:
> >
> > > So about 40 hours after fixing the settings so that both activemq and
> > > zookeeper agreed on what the session timeout is, I haven't seen any new
> > > instances of the errors I was seeing before.  Previously it'd typically
> > be
> > > no more than 4 hours between complaints about needing to re-establish
> > > zookeeper connections.
> > >
> > > If I'm right, then this indicates the instability I saw can be masked
> > over
> > > by making sure the two agree on the session timeout, but that the
> > > fundamental fragility of the activemq zookeeper client code is still a
> > > potential risk.
> > >
> > > Jim
> > >
> > > On Wed, Jul 22, 2015 at 6:27 PM James A. Robinson <ji...@highwire.org>
> > > wrote:
> > >
> > > > Hrm...  I'm wondering if this is due to the zookeeper server having a
> > > > default session timeout of 40 seconds vs. a lower one I set for the
> > > > activemq node...
> > > >
> > >
> >
>

Re: zookeeper reconnects

Posted by "James A. Robinson" <ji...@highwire.org>.
Well, I'm pretty sure zookeeper won't let the client specify a value
outside its own minSessionTimeout and maxSessionTimeout, so I think the
real question is how close to minSessionTimeout I can get without seeing
problems. It's really just masking the underlying problem though, right?
The client ought to reinitialize itself in the event of failures and
sometimes it is not.
On Sat, Jul 25, 2015 at 08:25 Tim Bain <tb...@alumni.duke.edu> wrote:

> James,
>
> You've tested two of the three cases; would it be possible to test the
> third one?
>
>    - ActiveMQ timeout < ZooKeeper timeout: fails
>    - ActiveMQ timeout = ZooKeeper timeout: succeeds
>    - ActiveMQ timeout > ZooKeeper timeout: ???
>
> If we can zero in on exactly what the recommendation is, I can update
> http://activemq.apache.org/replicated-leveldb-store.html (or another page
> if you think there's a more appropriate one) to include the recommendation
> we come to.  But I don't want to say on the official documentation that it
> has to be = if >= would work fine.
>
> Tim
>
> On Fri, Jul 24, 2015 at 10:39 AM, James A. Robinson <ji...@highwire.org>
> wrote:
>
> > So about 40 hours after fixing the settings so that both activemq and
> > zookeeper agreed on what the session timeout is, I haven't seen any new
> > instances of the errors I was seeing before.  Previously it'd typically
> be
> > no more than 4 hours between complaints about needing to re-establish
> > zookeeper connections.
> >
> > If I'm right, then this indicates the instability I saw can be masked
> over
> > by making sure the two agree on the session timeout, but that the
> > fundamental fragility of the activemq zookeeper client code is still a
> > potential risk.
> >
> > Jim
> >
> > On Wed, Jul 22, 2015 at 6:27 PM James A. Robinson <ji...@highwire.org>
> > wrote:
> >
> > > Hrm...  I'm wondering if this is due to the zookeeper server having a
> > > default session timeout of 40 seconds vs. a lower one I set for the
> > > activemq node...
> > >
> >
>

Re: zookeeper reconnects

Posted by Tim Bain <tb...@alumni.duke.edu>.
James,

You've tested two of the three cases; would it be possible to test the
third one?

   - ActiveMQ timeout < ZooKeeper timeout: fails
   - ActiveMQ timeout = ZooKeeper timeout: succeeds
   - ActiveMQ timeout > ZooKeeper timeout: ???

If we can zero in on exactly what the recommendation is, I can update
http://activemq.apache.org/replicated-leveldb-store.html (or another page
if you think there's a more appropriate one) to include the recommendation
we come to.  But I don't want to say on the official documentation that it
has to be = if >= would work fine.

Tim

On Fri, Jul 24, 2015 at 10:39 AM, James A. Robinson <ji...@highwire.org>
wrote:

> So about 40 hours after fixing the settings so that both activemq and
> zookeeper agreed on what the session timeout is, I haven't seen any new
> instances of the errors I was seeing before.  Previously it'd typically be
> no more than 4 hours between complaints about needing to re-establish
> zookeeper connections.
>
> If I'm right, then this indicates the instability I saw can be masked over
> by making sure the two agree on the session timeout, but that the
> fundamental fragility of the activemq zookeeper client code is still a
> potential risk.
>
> Jim
>
> On Wed, Jul 22, 2015 at 6:27 PM James A. Robinson <ji...@highwire.org>
> wrote:
>
> > Hrm...  I'm wondering if this is due to the zookeeper server having a
> > default session timeout of 40 seconds vs. a lower one I set for the
> > activemq node...
> >
>

Re: zookeeper reconnects

Posted by "James A. Robinson" <ji...@highwire.org>.
So about 40 hours after fixing the settings so that both activemq and
zookeeper agreed on what the session timeout is, I haven't seen any new
instances of the errors I was seeing before.  Previously it'd typically be
no more than 4 hours between complaints about needing to re-establish
zookeeper connections.

If I'm right, then this indicates the instability I saw can be masked over
by making sure the two agree on the session timeout, but that the
fundamental fragility of the activemq zookeeper client code is still a
potential risk.

Jim

On Wed, Jul 22, 2015 at 6:27 PM James A. Robinson <ji...@highwire.org> wrote:

> Hrm...  I'm wondering if this is due to the zookeeper server having a
> default session timeout of 40 seconds vs. a lower one I set for the
> activemq node...
>

Re: zookeeper reconnects

Posted by "James A. Robinson" <ji...@highwire.org>.
Hrm...  I'm wondering if this is due to the zookeeper server having a
default session timeout of 40 seconds vs. a lower one I set for the
activemq node...