You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Rufus Skyfii (Jira)" <ji...@apache.org> on 2021/03/26 00:30:00 UTC

[jira] [Created] (KAFKA-12560) Accidental delete of some log files kafka-authorizer.log and kafka-request.log can break topics in cluster

Rufus Skyfii created KAFKA-12560:
------------------------------------

             Summary: Accidental delete of some log files kafka-authorizer.log and kafka-request.log can break topics in cluster
                 Key: KAFKA-12560
                 URL: https://issues.apache.org/jira/browse/KAFKA-12560
             Project: Kafka
          Issue Type: Bug
          Components: log
    Affects Versions: 2.2.1
         Environment: AWS EC2 i3.xlarge
            Reporter: Rufus Skyfii


These two log files have 0 byte size and last created modified or updated  along time ago and these files seem to be created on startup.

When disks fill up, sometimes admins go through old log files and delete them. In this case, these two .log were also picked up. Note that I acknowledge this is a mistake as it should be filtered by extension but nevertheless anything in /var/log should not cause process to have issues and if the file doesn't exist, the process should simply recreate it.

The observed effect of this on a HA kafka cluster with 3 nodes, the replica members were out of sync with just one member being leader and not in sync with rest of the nodes while some topics had two in sync instead of 3 and the leader had a -1 status. Restarting kafka did not resolve so in order to resolve, kafka was flushed in one of the node where it had lots of -1 leader status and then rest of the nodes restarted.  However, this should not happen even if someone deleted kafka-authorizer.log and kafka-request.log and should be more resistant of breaking down due to file system changes.

 
{code:java}
root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime +30 -print | xargs ls -ll |grep '.log$'
-rw-r--r-- 1 root root        0 Nov 26 05:38 /var/log/kafka/kafka-authorizer.log
-rw-r--r-- 1 root root        0 Nov 26 05:38 /var/log/kafka/kafka-request.log
{code}
{code:java}
root@ip-172-17-1-63:/var/log/kafka# stat kafka-request.log                                                                                                                                                         
  File: kafka-request.log                                                                                                                                                                                          
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file                                                                                                                                     
Device: 34h/52d Inode: 768131      Links: 1                                                                                                                                                                        
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)                                                                                                                                           
Access: 2020-11-26 05:38:04.724561118 +0000                                                                                                                                                                        
Modify: 2020-11-26 05:38:04.724561118 +0000                                                                                                                                                                        
Change: 2020-11-26 05:38:04.724561118 +0000                                                                                                                                                                        
 Birth: -                                                                                                                                                                                                          
root@ip-172-17-1-63:/var/log/kafka# stat kafka-authorizer.log                                                                                                                                                      
  File: kafka-authorizer.log                                                                                                                                                                                       
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file                                                                                                                                     
Device: 34h/52d Inode: 768132      Links: 1                                                                                                                                                                        
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)                                                                                                                                           
Access: 2020-11-26 05:38:04.724561118 +0000                                                              
Modify: 2020-11-26 05:38:04.724561118 +0000                                                              
Change: 2020-11-26 05:38:04.724561118 +0000                                                              
 Birth: -                                                                                                
{code}
Checking file handles in staging, I can see these files while not updated in 30 days are part of java process and the pid was current:
{code:java}
root@ip-172-17-1-63:/var/log/kafka# lsof kafka-authorizer.log
COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
java     28 root   86w   REG   0,52        0 768132 kafka-authorizer.log
root@ip-172-17-1-63:/var/log/kafka# lsof kafka-request.log
COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
java     28 root   85w   REG   0,52        0 768131 kafka-request.log
root@ip-172-17-1-63:/var/log/kafka# ps -ef | grep java
root        28    11 83 20:12 ?        02:32:37 /usr/local/openjdk-11/bin/java -Xmx2G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9999 -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/opt/kafka_2.12-2.2.1/bin/../config/log4j.properties -cp /opt/kafka_2.12-2.2.1/bin/../libs/activation-1.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-api-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-file-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-json-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/guava-20.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-client-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-common-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jersey-server-2.27.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka_2.12-2.2.1/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka_2.12-2.2.1/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/log4j-1.2.17.jar:/opt/kafka_2.12-2.2.1/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka_2.12-2.2.1/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/reflections-0.9.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-library-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka_2.12-2.2.1/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka_2.12-2.2.1/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka_2.12-2.2.1/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zkclient-0.11.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka_2.12-2.2.1/bin/../libs/zstd-jni-1.3.8-1.jar kafka.Kafka /opt/kafka_2.12-2.2.1/config/server.properties
root        29    10  0 20:12 ?        00:00:38 /usr/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp /etc/zookeeper/conf:/usr/share/java/jline.jar:/usr/share/java/log4j-1.2.jar:/usr/share/java/xercesImpl.jar:/usr/share/java/xmlParserAPIs.jar:/usr/share/java/netty.jar:/usr/share/java/slf4j-api.jar:/usr/share/java/slf4j-log4j12.jar:/usr/share/java/zookeeper.jar -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=true org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/zookeeper/conf/zoo.cfg
{code}
Below confirms that all extensions that do are not current log (identified by just ".log") and match these ".log.x", ".log.x.gz", ".backup" or ".gz" do not have open file handles and can be safely cleaned. To verify, I checked lsof for all files matching logrotate extensions modified in last 2 days:
{code:java}
root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep -e '.log.$|.backup$|.gz$'
   768472    136 -rw-r--r--   1 root     root       137614 Mar 25 06:25 /var/log/kafka/server.log.1.gz
   768517     16 -rw-r--r--   1 root     root        13823 Mar 24 04:58 /var/log/kafka/log-cleaner.log.2.gz
   768431  10248 -rw-r--r--   1 root     root     10485918 Mar 24 01:27 /var/log/kafka/server.log.1-2021032406.backup
   768507    688 -rw-r--r--   1 root     root       701207 Mar 24 06:24 /var/log/kafka/server.log.2.gz
   768401  51508 -rw-r--r--   1 root     root     52742072 Mar 24 23:59 /var/log/kafka/controller.log.1-2021032506.backup
   768459  10248 -rw-r--r--   1 root     root     10485913 Mar 25 05:51 /var/log/kafka/server.log.1-2021032506.backup
   768432   1664 -rw-r--r--   1 root     root      1701962 Mar 24 05:54 /var/log/kafka/state-change.log.2.gz
   768071  51208 -rw-r--r--   1 root     root     52429175 Mar 24 20:16 /var/log/kafka/state-change.log.1-2021032506.backup
   768522     20 -rw-r--r--   1 root     root        16598 Mar 25 06:23 /var/log/kafka/log-cleaner.log.1.gz
   768465   1052 -rw-r--r--   1 root     root      1075433 Mar 25 06:08 /var/log/kafka/state-change.log.1.gz
   768562   3544 -rw-r--r--   1 root     root      3625106 Mar 25 06:24 /var/log/kafka/controller.log.1.gz
root@ip-172-17-1-63:/var/log/kafka# find /var/log/kafka -mtime -2 -ls | egrep -e '.log.$|.backup$|.gz$' | awk '{print $11}' | xargs lsof
{code}
However, I still think accidental deletion of any log files should not affect operational state of the Kafka cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)