You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "songyingshuan (JIRA)" <ji...@apache.org> on 2019/06/26 13:46:00 UTC

[jira] [Created] (KAFKA-8604) kafka log dir was marked as offline because of deleting segments of __consumer_offsets failed

songyingshuan created KAFKA-8604:
------------------------------------

             Summary: kafka log dir was marked as offline because of deleting segments of __consumer_offsets failed
                 Key: KAFKA-8604
                 URL: https://issues.apache.org/jira/browse/KAFKA-8604
             Project: Kafka
          Issue Type: Bug
          Components: log cleaner
    Affects Versions: 1.0.1
            Reporter: songyingshuan


We encountered a problem in our product env without any foresight. When kafka broker trying to clean __consumer_offsets-38 (and only happents to this partition), the log shows
it failed, and marking the whole disk/log dir offline, and this leads to a negative impact on some normal partitions (because of the ISR list of those partitions decrease).
we had to restart the broker server to reuse the disk/dir which was marked as offline. BUT!! this problem occurs periodically with the same reason so we have to restart broker periodically.

we read some source code of kafka-1.0.1, but cannot make sure why this happends. And The cluster status had been good until this problem suddenly attacked us.

the error log is something like this :

 
{code:java}
2019-06-25 00:11:26,241 INFO kafka.log.TimeIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000012855596978.timeindex.deleted
2019-06-25 00:11:26,258 ERROR kafka.server.LogDirFailureChannel: Error while deleting segments for __consumer_offsets-38 in dir /data6/kafka/data
java.io.IOException: Delete of log 00000000000000000000.log.deleted failed.
at kafka.log.LogSegment.delete(LogSegment.scala:496)
at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
at kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599)
at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-06-25 00:11:26,265 ERROR kafka.utils.KafkaScheduler: Uncaught exception in scheduled task 'delete-file'
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for __consumer_offsets-38 in dir /data6/kafka/data
Caused by: java.io.IOException: Delete of log 00000000000000000000.log.deleted failed.
at kafka.log.LogSegment.delete(LogSegment.scala:496)
at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
at kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599)
at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-06-25 00:11:26,268 INFO kafka.server.ReplicaManager: [ReplicaManager broker=1001] Stopping serving replicas in dir /data6/kafka/data
{code}
 and we also find that the cleaning of __consumer_offsets-38 is so frequently that almost 85% of output log is related. something like this :

 
{code:java}
2019-06-25 20:29:03,222 INFO kafka.log.OffsetIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000008457474982.index.deleted 2019-06-25 20:29:03,222 INFO kafka.log.TimeIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000008457474982.timeindex.deleted 2019-06-25 20:29:03,295 INFO kafka.log.Log: Deleting segment 10514999959 from log __consumer_offsets-38. 2019-06-25 20:29:03,298 INFO kafka.log.OffsetIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000010514999959.index.deleted 2019-06-25 20:29:03,298 INFO kafka.log.TimeIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000010514999959.timeindex.deleted 2019-06-25 20:29:03,380 INFO kafka.log.Log: Deleting segment 11374174390 from log __consumer_offsets-38. 2019-06-25 20:29:03,382 INFO kafka.log.OffsetIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000011374174390.index.deleted 2019-06-25 20:29:03,382 INFO kafka.log.TimeIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000011374174390.timeindex.deleted 2019-06-25 20:29:03,455 INFO kafka.log.Log: Deleting segment 13507582773 from log __consumer_offsets-38. 2019-06-25 20:29:03,461 INFO kafka.log.OffsetIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000013507582773.index.deleted 2019-06-25 20:29:03,461 INFO kafka.log.TimeIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000013507582773.timeindex.deleted 2019-06-25 20:29:03,531 INFO kafka.log.Log: Deleting segment 15655066421 from log __consumer_offsets-38. 2019-06-25 20:29:03,531 INFO kafka.log.OffsetIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000015655066421.index.deleted 2019-06-25 20:29:03,531 INFO kafka.log.TimeIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000015655066421.timeindex.deleted 2019-06-25 20:29:03,606 INFO kafka.log.Log: Deleting segment 17802355567 from log __consumer_offsets-38. 2019-06-25 20:29:03,606 INFO kafka.log.Log: Deleting segment 18177617859 from log __consumer_offsets-38. 2019-06-25 20:29:03,606 INFO kafka.log.OffsetIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000017802355567.index.deleted 2019-06-25 20:29:03,606 INFO kafka.log.TimeIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000017802355567.timeindex.deleted 2019-06-25 20:29:03,606 INFO kafka.log.OffsetIndex: Deleting index /data6/kafka/data/__consumer_offsets-38/00000000018177617859.index.deleted
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [jira] [Created] (KAFKA-8604) kafka log dir was marked as offline because of deleting segments of __consumer_offsets failed

Posted by "M. Manna" <ma...@gmail.com>.
This is a known issue for Windows. See KAFKA-6188.

Thanks,

On Wed, 26 Jun 2019 at 14:46, songyingshuan (JIRA) <ji...@apache.org> wrote:

> songyingshuan created KAFKA-8604:
> ------------------------------------
>
>              Summary: kafka log dir was marked as offline because of
> deleting segments of __consumer_offsets failed
>                  Key: KAFKA-8604
>                  URL: https://issues.apache.org/jira/browse/KAFKA-8604
>              Project: Kafka
>           Issue Type: Bug
>           Components: log cleaner
>     Affects Versions: 1.0.1
>             Reporter: songyingshuan
>
>
> We encountered a problem in our product env without any foresight. When
> kafka broker trying to clean __consumer_offsets-38 (and only happents to
> this partition), the log shows
> it failed, and marking the whole disk/log dir offline, and this leads to a
> negative impact on some normal partitions (because of the ISR list of those
> partitions decrease).
> we had to restart the broker server to reuse the disk/dir which was marked
> as offline. BUT!! this problem occurs periodically with the same reason so
> we have to restart broker periodically.
>
> we read some source code of kafka-1.0.1, but cannot make sure why this
> happends. And The cluster status had been good until this problem suddenly
> attacked us.
>
> the error log is something like this :
>
>
> {code:java}
> 2019-06-25 00:11:26,241 INFO kafka.log.TimeIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000012855596978.timeindex.deleted
> 2019-06-25 00:11:26,258 ERROR kafka.server.LogDirFailureChannel: Error
> while deleting segments for __consumer_offsets-38 in dir /data6/kafka/data
> java.io.IOException: Delete of log 00000000000000000000.log.deleted failed.
> at kafka.log.LogSegment.delete(LogSegment.scala:496)
> at
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
> at
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
> at
> kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599)
> at
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2019-06-25 00:11:26,265 ERROR kafka.utils.KafkaScheduler: Uncaught
> exception in scheduled task 'delete-file'
> org.apache.kafka.common.errors.KafkaStorageException: Error while deleting
> segments for __consumer_offsets-38 in dir /data6/kafka/data
> Caused by: java.io.IOException: Delete of log
> 00000000000000000000.log.deleted failed.
> at kafka.log.LogSegment.delete(LogSegment.scala:496)
> at
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
> at
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
> at
> kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599)
> at
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2019-06-25 00:11:26,268 INFO kafka.server.ReplicaManager: [ReplicaManager
> broker=1001] Stopping serving replicas in dir /data6/kafka/data
> {code}
>  and we also find that the cleaning of __consumer_offsets-38 is so
> frequently that almost 85% of output log is related. something like this :
>
>
> {code:java}
> 2019-06-25 20:29:03,222 INFO kafka.log.OffsetIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000008457474982.index.deleted
> 2019-06-25 20:29:03,222 INFO kafka.log.TimeIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000008457474982.timeindex.deleted
> 2019-06-25 20:29:03,295 INFO kafka.log.Log: Deleting segment 10514999959
> from log __consumer_offsets-38. 2019-06-25 20:29:03,298 INFO
> kafka.log.OffsetIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000010514999959.index.deleted
> 2019-06-25 20:29:03,298 INFO kafka.log.TimeIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000010514999959.timeindex.deleted
> 2019-06-25 20:29:03,380 INFO kafka.log.Log: Deleting segment 11374174390
> from log __consumer_offsets-38. 2019-06-25 20:29:03,382 INFO
> kafka.log.OffsetIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000011374174390.index.deleted
> 2019-06-25 20:29:03,382 INFO kafka.log.TimeIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000011374174390.timeindex.deleted
> 2019-06-25 20:29:03,455 INFO kafka.log.Log: Deleting segment 13507582773
> from log __consumer_offsets-38. 2019-06-25 20:29:03,461 INFO
> kafka.log.OffsetIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000013507582773.index.deleted
> 2019-06-25 20:29:03,461 INFO kafka.log.TimeIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000013507582773.timeindex.deleted
> 2019-06-25 20:29:03,531 INFO kafka.log.Log: Deleting segment 15655066421
> from log __consumer_offsets-38. 2019-06-25 20:29:03,531 INFO
> kafka.log.OffsetIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000015655066421.index.deleted
> 2019-06-25 20:29:03,531 INFO kafka.log.TimeIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000015655066421.timeindex.deleted
> 2019-06-25 20:29:03,606 INFO kafka.log.Log: Deleting segment 17802355567
> from log __consumer_offsets-38. 2019-06-25 20:29:03,606 INFO kafka.log.Log:
> Deleting segment 18177617859 from log __consumer_offsets-38. 2019-06-25
> 20:29:03,606 INFO kafka.log.OffsetIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000017802355567.index.deleted
> 2019-06-25 20:29:03,606 INFO kafka.log.TimeIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000017802355567.timeindex.deleted
> 2019-06-25 20:29:03,606 INFO kafka.log.OffsetIndex: Deleting index
> /data6/kafka/data/__consumer_offsets-38/00000000018177617859.index.deleted
> {code}
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>