You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Tom Crayford (JIRA)" <ji...@apache.org> on 2016/06/23 13:57:16 UTC

[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts

    [ https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346463#comment-15346463 ] 

Tom Crayford commented on KAFKA-3894:
-------------------------------------

(disclaimer: I work with Tim)

It feels like there are a few pieces of work to do here:

1. Expose if the log cleaner state as a JMX metric (like BrokerState)
2. Somehow mark logs we've failed to clean as "busted" somewhere, and stop trying to clean them. This way instead of erroring when this occurs the broker doesn't stay completely busted, but continues on working on all other partitions
3. I'm unsure, but is it possible to fix the underlying issue by only compacting partial segments of the log when the buffer size is smaller than the desired offset map? This seems like the hardest but most valuable fix here.

We're happy picking up at least some of these, but would love feedback from the community about priorities and ease/appropriateness of these steps (and suggestions for other things to have).

> Log Cleaner thread crashes and never restarts
> ---------------------------------------------
>
>                 Key: KAFKA-3894
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3894
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.2.2, 0.9.0.1
>         Environment: Oracle JDK 8
> Ubuntu Precise
>            Reporter: Tim Carey-Smith
>              Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner \[kafka-log-cleaner-thread-0\], Error due to  java.lang.IllegalArgumentException: requirement failed: 9750860 messages in segment MY_FAVORITE_TOPIC-2/00000000000047580165.log but offset map can fit only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason for this upper bound, or if this is a value which must be tuned for each specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)