You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Rufus Skyfii (Jira)" <ji...@apache.org> on 2021/03/26 00:30:00 UTC
[jira] [Created] (KAFKA-12560) Accidental delete of some log files
kafka-authorizer.log and kafka-request.log can break topics in cluster
Rufus Skyfii created KAFKA-12560:
------------------------------------
Summary: Accidental delete of some log files kafka-authorizer.log and kafka-request.log can break topics in cluster
Key: KAFKA-12560
URL: https://issues.apache.org/jira/browse/KAFKA-12560
Project: Kafka
Issue Type: Bug
Components: log
Affects Versions: 2.2.1
Environment: AWS EC2 i3.xlarge
Reporter: Rufus Skyfii
These two log files have 0 byte size and last created modified or updated along time ago and these files seem to be created on startup.
When disks fill up, sometimes admins go through old log files and delete them. In this case, these two .log were also picked up. Note that I acknowledge this is a mistake as it should be filtered by extension but nevertheless anything in /var/log should not cause process to have issues and if the file doesn't exist, the process should simply recreate it.
The observed effect of this on a HA kafka cluster with 3 nodes, the replica members were out of sync with just one member being leader and not in sync with rest of the nodes while some topics had two in sync instead of 3 and the leader had a -1 status. Restarting kafka did not resolve so in order to resolve, kafka was flushed in one of the node where it had lots of -1 leader status and then rest of the nodes restarted. However, this should not happen even if someone deleted kafka-authorizer.log and kafka-request.log and should be more resistant of breaking down due to file system changes.
{code:java}
root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime +30 -print | xargs ls -ll |grep '.log$'
-rw-r--r-- 1 root root 0 Nov 26 05:38 /var/log/kafka/kafka-authorizer.log
-rw-r--r-- 1 root root 0 Nov 26 05:38 /var/log/kafka/kafka-request.log
{code}
{code:java}
root@ip-172-17-1-63:/var/log/kafka# stat kafka-request.log
File: kafka-request.log
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 34h/52d Inode: 768131 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-11-26 05:38:04.724561118 +0000
Modify: 2020-11-26 05:38:04.724561118 +0000
Change: 2020-11-26 05:38:04.724561118 +0000
Birth: -
root@ip-172-17-1-63:/var/log/kafka# stat kafka-authorizer.log
File: kafka-authorizer.log
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 34h/52d Inode: 768132 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-11-26 05:38:04.724561118 +0000
Modify: 2020-11-26 05:38:04.724561118 +0000
Change: 2020-11-26 05:38:04.724561118 +0000
Birth: -
{code}
Checking file handles in staging, I can see these files while not updated in 30 days are part of java process and the pid was current:
{code:java}
root@ip-172-17-1-63:/var/log/kafka# lsof kafka-authorizer.log
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 28 root 86w REG 0,52 0 768132 kafka-authorizer.log
root@ip-172-17-1-63:/var/log/kafka# lsof kafka-request.log
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 28 root 85w REG 0,52 0 768131 kafka-request.log
root@ip-172-17-1-63:/var/log/kafka# ps -ef | grep java
root 28 11 83 20:12 ? 02:32:37 /usr/local/openjdk-11/bin/java -Xmx2G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9999 -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/opt/kafka_2.12-2.2.1/bin/../config/log4j.properties -cp /opt/kafka_2.12-2.2.1/bin/../libs/activation-1.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-api-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-file-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-json-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/guava-20.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-client-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-common-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-server-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/log4j-1.2.17.jar:/opt/kafka_2.12-2.2.1/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/reflections-0.9.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-library-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zkclient-0.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zstd-jni-1.3.8-1.jar kafka.Kafka /opt/kafka_2.12-2.2.1/config/server.properties
root 29 10 0 20:12 ? 00:00:38 /usr/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp /etc/zookeeper/conf:/usr/share/java/jline.jar:/usr/share/java/log4j-1.2.jar:/usr/share/java/xercesImpl.jar:/usr/share/java/xmlParserAPIs.jar:/usr/share/java/netty.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-log4j12.jar:/usr/share/java/zookeeper.jar -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=true org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg
{code}
Below confirms that all extensions that do are not current log (identified by just ".log") and match these ".log.x", ".log.x.gz", ".backup" or ".gz" do not have open file handles and can be safely cleaned. To verify, I checked lsof for all files matching logrotate extensions modified in last 2 days:
{code:java}
root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep -e '.log.$|.backup$|.gz$'
768472 136 -rw-r--r-- 1 root root 137614 Mar 25 06:25 /var/log/kafka/server.log.1.gz
768517 16 -rw-r--r-- 1 root root 13823 Mar 24 04:58 /var/log/kafka/log-cleaner.log.2.gz
768431 10248 -rw-r--r-- 1 root root 10485918 Mar 24 01:27 /var/log/kafka/server.log.1-2021032406.backup
768507 688 -rw-r--r-- 1 root root 701207 Mar 24 06:24 /var/log/kafka/server.log.2.gz
768401 51508 -rw-r--r-- 1 root root 52742072 Mar 24 23:59 /var/log/kafka/controller.log.1-2021032506.backup
768459 10248 -rw-r--r-- 1 root root 10485913 Mar 25 05:51 /var/log/kafka/server.log.1-2021032506.backup
768432 1664 -rw-r--r-- 1 root root 1701962 Mar 24 05:54 /var/log/kafka/state-change.log.2.gz
768071 51208 -rw-r--r-- 1 root root 52429175 Mar 24 20:16 /var/log/kafka/state-change.log.1-2021032506.backup
768522 20 -rw-r--r-- 1 root root 16598 Mar 25 06:23 /var/log/kafka/log-cleaner.log.1.gz
768465 1052 -rw-r--r-- 1 root root 1075433 Mar 25 06:08 /var/log/kafka/state-change.log.1.gz
768562 3544 -rw-r--r-- 1 root root 3625106 Mar 25 06:24 /var/log/kafka/controller.log.1.gz
root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep -e '.log.$|.backup$|.gz$' | awk '{print $11}' | xargs lsof
{code}
However, I still think accidental deletion of any log files should not affect operational state of the Kafka cluster.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)