You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Martin Cigorraga <ma...@gmail.com> on 2018/07/24 16:30:53 UTC

Trying to figure out what these errors mean

Hi all,

I'm not a DS guys but an Ops guy instead fairly new to Zookeeper and
Kafka; already spent a ton of time trying to understand what's going
on here but so far I just scratched the surface of the issue without
having a complete understanding of it.

The issue:
We have a cluster of Kafka with a Zookeeper coordinating them all; now
the Zookeeper box log file (/var/log/zookeeper/zookeeper.out) is being
fill with this error:

2018-07-24 08:23:43,178 [myid:3] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@383] -
Exception causing close of session 0x3000017d522019e: Len error
1119502
2018-07-24 08:23:43,178 [myid:3] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] -
Closed socket connection for client /10.20.3.16:40618 which had
sessionid 0x3000017d522019e
2018-07-24 08:23:43,492 [myid:3] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] -
Accepted socket connection from /10.20.2.24:49410
2018-07-24 08:23:43,492 [myid:3] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@931] -
Client attempting to renew session 0x3000017d52200b1 at
/10.20.2.24:49410
2018-07-24 08:23:43,493 [myid:3] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@683] -
Established session 0x3000017d52200b1 with negotiated timeout 30000
for client /10.20.2.24:49410

//

The IP 10.20.2.24 refers to the box where we run Kafka Manager and
Kafka Offset Monitor. Looking into the Kafka Offset Monitor logs I can
see:

09:06:56 INFO  ClientCnxn:975 - Opening socket connection to server
ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
authenticate using SASL (unknown error)
09:06:58 INFO  ClientCnxn:975 - Opening socket connection to server
ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
authenticate using SASL (unknown error)
09:06:59 INFO  ClientCnxn:975 - Opening socket connection to server
ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
authenticate using SASL (unknown error)
09:07:03 INFO  ClientCnxn:975 - Opening socket connection to server
ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
authenticate using SASL (unknown error)
09:07:07 INFO  ClientCnxn:975 - Opening socket connection to server
ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
authenticate using SASL (unknown error)
09:07:07 INFO  ClientCnxn:852 - Socket connection established to
ip-10-20-1-6.ec2.internal/10.20.1.6:2181, initiating session
09:07:17 INFO  ClientCnxn:975 - Opening socket connection to server
ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
authenticate using SASL (unknown error)
09:07:17 INFO  ClientCnxn:852 - Socket connection established to
ip-10-20-1-6.ec2.internal/10.20.1.6:2181, initiating session
09:07:17 INFO  ClientCnxn:1235 - Session establishment complete on
server ip-10-20-1-6.ec2.internal/10.20.1.6:2181, sessionid =
0x20000178e5d0083, negotiated timeout = 30000

With 10.20.1.6 being the Zookeeper box. There's definitely something
fishy going on here but I can't untanggle what is it.

I'd thankfuly appreciate if you guys can point me into the right
direction to where can I learn more about these errors.
Thanks!

Re: Trying to figure out what these errors mean

Posted by Patrick Hunt <ph...@apache.org>.
On Tue, Jul 24, 2018 at 10:17 AM Martin Cigorraga <
mailing.lists.forwarder@gmail.com> wrote:

> Hi all,
>
> I'm not a DS guys but an Ops guy instead fairly new to Zookeeper and
> Kafka; already spent a ton of time trying to understand what's going
> on here but so far I just scratched the surface of the issue without
> having a complete understanding of it.
>
> The issue:
> We have a cluster of Kafka with a Zookeeper coordinating them all; now
> the Zookeeper box log file (/var/log/zookeeper/zookeeper.out) is being
> fill with this error:
>
> 2018-07-24 08:23:43,178 [myid:3] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@383] -
> Exception causing close of session 0x3000017d522019e: Len error
> 1119502
>

Hi Martin.

ZK has a configured limit for message lengths - this is being exceeded for
some reason on the session. Likely Kafka is sending too large a dataset, or
perhaps you have too many child znodes - unfortunately both of these
questions are Kafka specific (acting as a zk client) issues that I don't
have insight on.

See the ZK admin guide for more insight on the config - jute.maxbuffer is
the setting here:
http://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html#Unsafe+Options
Notice you are exceeding that limit "1119502".



> 2018-07-24 08:23:43,178 [myid:3] - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] -
> Closed socket connection for client /10.20.3.16:40618 which had
> sessionid 0x3000017d522019e
> 2018-07-24 08:23:43,492 [myid:3] - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] -
> Accepted socket connection from /10.20.2.24:49410
> 2018-07-24 08:23:43,492 [myid:3] - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@931] -
> Client attempting to renew session 0x3000017d52200b1 at
> /10.20.2.24:49410
> 2018-07-24 08:23:43,493 [myid:3] - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@683] -
> Established session 0x3000017d52200b1 with negotiated timeout 30000
> for client /10.20.2.24:49410
>
> //
>
> The IP 10.20.2.24 refers to the box where we run Kafka Manager and
> Kafka Offset Monitor. Looking into the Kafka Offset Monitor logs I can
> see:
>
> 09:06:56 INFO  ClientCnxn:975 - Opening socket connection to server
> ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 09:06:58 INFO  ClientCnxn:975 - Opening socket connection to server
> ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 09:06:59 INFO  ClientCnxn:975 - Opening socket connection to server
> ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 09:07:03 INFO  ClientCnxn:975 - Opening socket connection to server
> ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 09:07:07 INFO  ClientCnxn:975 - Opening socket connection to server
> ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 09:07:07 INFO  ClientCnxn:852 - Socket connection established to
> ip-10-20-1-6.ec2.internal/10.20.1.6:2181, initiating session
> 09:07:17 INFO  ClientCnxn:975 - Opening socket connection to server
> ip-10-20-1-6.ec2.internal/10.20.1.6:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 09:07:17 INFO  ClientCnxn:852 - Socket connection established to
> ip-10-20-1-6.ec2.internal/10.20.1.6:2181, initiating session
> 09:07:17 INFO  ClientCnxn:1235 - Session establishment complete on
> server ip-10-20-1-6.ec2.internal/10.20.1.6:2181, sessionid =
> 0x20000178e5d0083, negotiated timeout = 30000
>
>
Are you using auth? e.g. Kerberos? Basically we attempt to connect using
SASL, if the attempt fails we retry without it. (and as a result no auth)


> With 10.20.1.6 being the Zookeeper box. There's definitely something
> fishy going on here but I can't untanggle what is it.
>
> I'd thankfuly appreciate if you guys can point me into the right
> direction to where can I learn more about these errors.
>

You might check the Kafka docs to see if they talk about the jute.maxbuffer
issue. I'm assuming they see this on larger deployments and must discuss it
somewhere.

Regards,

Patrick



> Thanks!
>