You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Jerry Hebert <je...@gmail.com> on 2019/10/24 22:52:01 UTC

"Connections" incorrectly reported as 1 (3.5.5)

Hey all,

I've upgraded one of my ensembles to 3.5.5 now
(3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, specifically). All of my
metrics appeared to be healthy but after the migration, I noticed that the
new ensemble has *all 5 nodes reporting a connection count of 1* (via the
"stat" 4ltr command as well as zk_num_alive_connections from the "mntr"
output).

The servers are clearly receiving traffic: I can see node counts going up
and down and I can see clients making changes to various keys. I can also
monitor netstat for 2181 connections and again see connections fluctuating
per usual but I still see "Connections: 1" in stats. This translates into
our Datadog agent reporting connections as 1 too. I've been reading through
the code to try to understand how this may be possible but it's a bit of a
slog as I'm unfamiliar with it and I've found myself digging into Netty now.

I pasted a couple of possibly relevant log lines below. In particular, note
that "secure" is false here and I noticed that the conx count is split in
the code depending on whether or not you're in secure mode. I also find it
odd that I'm seeing 0:0:0:0:0:0:0:0 in the logs which looks like ipv6 to me
and I'm using ipv4 (or at least, I partially am...). I also don't
understand the zxid expectation mismatch.

2019-10-24 22:12:08,411 [myid:9] - WARN
 [QuorumPeer[myid=9](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@125]
- Got zxid 0x2f00000001 expected 0x1

2019-10-24 22:11:57,349 [myid:9] - INFO  [main:ServerCnxnFactory@135] -
Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection
factory

Any advice would be greatly appreciated. I don't feel comfortable leaving
this server as-is given that it's misreporting connections. Something is
definitely wrong.

Thanks in advance!

Jerry

Re: "Connections" incorrectly reported as 1 (3.5.5)

Posted by Jerry Hebert <je...@gmail.com>.
blah, user error (of course?). I don't exactly understand how this was
working but all of my new servers were receiving requests because I had one
of the old ensemble's servers still running and even though it had been
removed (via `reconfig`), it was still accepting connections and it seems
like it maybe was forwarding those requests to the new ensemble or maybe
all of those 2181 connections were just due to propagation. Not sure.

Anyway, I'm seeing connection counts as I would expect now that I've killed
the old server.

Thanks!

Jerry


On Thu, Oct 24, 2019 at 3:52 PM Jerry Hebert <je...@gmail.com> wrote:

> Hey all,
>
> I've upgraded one of my ensembles to 3.5.5 now
> (3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, specifically). All of my
> metrics appeared to be healthy but after the migration, I noticed that the
> new ensemble has *all 5 nodes reporting a connection count of 1* (via the
> "stat" 4ltr command as well as zk_num_alive_connections from the "mntr"
> output).
>
> The servers are clearly receiving traffic: I can see node counts going up
> and down and I can see clients making changes to various keys. I can also
> monitor netstat for 2181 connections and again see connections fluctuating
> per usual but I still see "Connections: 1" in stats. This translates into
> our Datadog agent reporting connections as 1 too. I've been reading through
> the code to try to understand how this may be possible but it's a bit of a
> slog as I'm unfamiliar with it and I've found myself digging into Netty now.
>
> I pasted a couple of possibly relevant log lines below. In particular,
> note that "secure" is false here and I noticed that the conx count is split
> in the code depending on whether or not you're in secure mode. I also find
> it odd that I'm seeing 0:0:0:0:0:0:0:0 in the logs which looks like ipv6 to
> me and I'm using ipv4 (or at least, I partially am...). I also don't
> understand the zxid expectation mismatch.
>
> 2019-10-24 22:12:08,411 [myid:9] - WARN
>  [QuorumPeer[myid=9](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@125]
> - Got zxid 0x2f00000001 expected 0x1
>
> 2019-10-24 22:11:57,349 [myid:9] - INFO  [main:ServerCnxnFactory@135] -
> Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection
> factory
>
> Any advice would be greatly appreciated. I don't feel comfortable leaving
> this server as-is given that it's misreporting connections. Something is
> definitely wrong.
>
> Thanks in advance!
>
> Jerry
>
>