You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Steven McDonald (JIRA)" <ji...@apache.org> on 2019/03/15 11:53:00 UTC

[jira] [Commented] (KAFKA-7898) ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn)

    [ https://issues.apache.org/jira/browse/KAFKA-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793567#comment-16793567 ] 

Steven McDonald commented on KAFKA-7898:
----------------------------------------

Hi!

We encountered this as well and I did some investigation into the problem. It is a bug in Kafka 2.1.x that is fixed in 2.2.x (though not explicitly; the fix is the result of [a refactor|https://github.com/apache/kafka/commit/2155c6d54b087206b6aa1d58747f141761394eaf#diff-8bcd2c427556f434e33cf22abec548c2R217]).

The underlying problem is with the Zookeeper client library's MultiCallback interface. The [documentation|https://zookeeper.apache.org/doc/r3.4.13/api/org/apache/zookeeper/AsyncCallback.MultiCallback.html] for this says that "all opResults are OpResult.ErrorResult", but [some error conditions|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L689] will pass the callback a null pointer in place of a list. Kafka 2.1.x is implemented according to the documentation, so the null pointer case is not handled, leading to this bug.

I also consider it a bug that this NullPointerException leaves the Kafka cluster in a state that it does not recover from automatically. In our case, this bug was hit during a controller election, resulting in a node that was designated as controller but unable to function as such. It would be sufficient for this exception to simply kill the Kafka node so that the remaining nodes can recover, but I think that is a separate bug (which I will raise with Zookeeper first, as the exception is currently caught there).

I can provide additional information on our experience if it's of any interest, but since this is already fixed in Kafka 2.2.x I don't see much point expanding here.

> ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn)
> -------------------------------------------------------------------
>
>                 Key: KAFKA-7898
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7898
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.1.0
>            Reporter: Gabriel Lukacs
>            Priority: Major
>
> We observed a NullPointerException on one of our broker in 3 broker cluster environment. If I list the processes and open ports it seems that the faulty broker is running, but the kafka-connect (we used it also) periodically restarts due to fact that it can not connect to the kafka cluster (configured ssl & plaintext mode too). Is it a bug in kafka/zookeeper?
>  
> [2019-02-05 14:28:11,359] WARN Client session timed out, have not heard from server in 4141ms for sessionid 0x3000010166e0000 (org.apache.zookeeper.ClientCnxn)
> [2019-02-05 14:28:12,525] ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn)
> java.lang.NullPointerException
>  at kafka.zookeeper.ZooKeeperClient$$anon$8.processResult(ZooKeeperClient.scala:217)
>  at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
> [2019-02-05 14:28:12,526] ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn)
> [2019-02-05 14:28:22,701] WARN Client session timed out, have not heard from server in 4004ms for sessionid 0x3000010166e0000 (org.apache.zookeeper.ClientCnxn)
> [2019-02-05 14:28:28,670] WARN Client session timed out, have not heard from server in 4049ms for sessionid 0x3000010166e0000 (org.apache.zookeeper.ClientCnxn)
> [2019-02-05 15:05:20,601] WARN [GroupCoordinator 1]: Failed to write empty metadata for group encodable-emvTokenAccess-delta-encoder-group-emvIssuerAccess-v2-2-0: The group is rebalancing, so a rejoin is needed. (kafka.coordinator.group.GroupCoordinator)
> kafka 7381 1 0 14:22 ? 00:00:19 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Xloggc:/opt/kafka/bin/../logs/zookeeper-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/opt/kafka/bin/../logs -Dlog4j.configuration=file:/opt/kafka/config/zoo-log4j.properties -cp /opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.5.jar:/opt/kafka/bin/../libs/compileScala.mapping:/opt/kafka/bin/../libs/compileScala.mapping.asc:/opt/kafka/bin/../libs/connect-api-2.1.0.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.1.0.jar:/opt/kafka/bin/../libs/connect-file-2.1.0.jar:/opt/kafka/bin/../libs/connect-json-2.1.0.jar:/opt/kafka/bin/../libs/connect-runtime-2.1.0.jar:/opt/kafka/bin/../libs/connect-transforms-2.1.0.jar:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.7.jar:/opt/kafka/bin/../libs/jackson-core-2.9.7.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.7.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.7.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.7.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.7.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-http-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-io-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-security-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-server-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-util-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka_2.12-2.1.0.jar:/opt/kafka/bin/../libs/kafka_2.12-2.1.0-sources.jar:/opt/kafka/bin/../libs/kafka-clients-2.1.0.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.1.0.jar:/opt/kafka/bin/../libs/kafka-tools-2.1.0.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.5.4.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.14.2.jar:/opt/kafka/bin/../libs/scala-library-2.12.7.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.7.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.
> kafka 9806 1 2 14:22 ? 00:02:41 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Xloggc:/opt/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9997 -Dkafka.logs.dir=/opt/kafka/bin/../logs -Dlog4j.configuration=file:/opt/kafka/bin/../config/log4j.properties -cp /opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.5.jar:/opt/kafka/bin/../libs/compileScala.mapping:/opt/kafka/bin/../libs/compileScala.mapping.asc:/opt/kafka/bin/../libs/connect-api-2.1.0.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.1.0.jar:/opt/kafka/bin/../libs/connect-file-2.1.0.jar:/opt/kafka/bin/../libs/connect-json-2.1.0.jar:/opt/kafka/bin/../libs/connect-runtime-2.1.0.jar:/opt/kafka/bin/../libs/connect-transforms-2.1.0.jar:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.7.jar:/opt/kafka/bin/../libs/jackson-core-2.9.7.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.7.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.7.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.7.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.7.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-http-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-io-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-security-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-server-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-util-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka_2.12-2.1.0.jar:/opt/kafka/bin/../libs/kafka_2.12-2.1.0-sources.jar:/opt/kafka/bin/../libs/kafka-clients-2.1.0.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.1.0.jar:/opt/kafka/bin/../libs/kafka-tools-2.1.0.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.5.4.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.14.2.jar:/opt/kafka/bin/../libs/scala-library-2.12.7.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.7.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar
> [root@litest-gateway1 ~]$ netstat -tupnl | grep 7381
> tcp 0 0 0.0.0.0:48097 0.0.0.0:* LISTEN 7381/java
> tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 7381/java
> tcp 0 0 172.31.32.194:3888 0.0.0.0:* LISTEN 7381/java
> [root@litest-gateway1 ~]$ netstat -tupnl | grep 9806
> tcp 0 0 0.0.0.0:9092 0.0.0.0:* LISTEN 9806/java
> tcp 0 0 0.0.0.0:9093 0.0.0.0:* LISTEN 9806/java
> tcp 0 0 0.0.0.0:37062 0.0.0.0:* LISTEN 9806/java
> tcp 0 0 0.0.0.0:9997 0.0.0.0:* LISTEN 9806/java
> tcp 0 0 0.0.0.0:37948 0.0.0.0:* LISTEN 9806/java
> [root@litest-gateway1 ~]$ cat /etc/kafka/server.properties
> broker.id=1
> listeners=PLAINTEXT://:9092,SSL://:9093
> num.network.threads=3
> num.io.threads=8
> socket.send.buffer.bytes=102400
> socket.receive.buffer.bytes=102400
> socket.request.max.bytes=104857600
> auto.create.topics.enable=false
> log.dirs=/var/kafka
> num.partitions=1
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=3
> transaction.state.log.min.isr=2
> log.retention.hours=336
> log.segment.bytes=1073741824
> log.roll.hours=24
> log.cleaner.enable=false
> offsets.retention.minutes=20160
> zookeeper.connect=litest-gateway1:2181,litest-gateway2:2181,litest-gateway3:2181
> zookeeper.connection.timeout.ms=6000
> min.insync.replicas=2
> group.initial.rebalance.delay.ms=0
> ssl.keystore.location=<some location>
> ssl.keystore.password=<some pwd>
> ssl.key.password=<some pwd>
> ssl.truststore.location=<some location>
> ssl.truststore.password=<some pwd>
> ssl.client.auth=requested
> ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
> ssl.keystore.type=JKS
> ssl.truststore.type=JKS
> messages, plus replication traffic, per-broker)
> ssl.secure.random.implementation=SHA1PRNG
> security.inter.broker.protocol=SSL
> [root@litest-gateway1 ~]$ cat /etc/zookeeper/zookeeper.properties
> tickTime=2000
> initLimit=10
> syncLimit=5
> maxClientCnxns=200
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=24
> dataDir = /var/lib/zookeeper
> clientPort = 2181
> maxClientCnxns = 60
> logDir = /var/log/zookeeper
> server.1=litest-gateway1:2888:3888
> server.2=litest-gateway2:2888:3888
> server.3=litest-gateway3:2888:3888
> [root@litest-gateway1 ~]$ cat /etc/system-release
> Oracle Linux Server release 7.2
> [root@litest-gateway1 ~]$ uname -a
> Linux litest-gateway1 3.8.13-118.10.2.el7uek.x86_64 #2 SMP Fri Aug 12 15:08:18 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)