You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (Jira)" <ji...@apache.org> on 2020/01/09 19:29:00 UTC

[jira] [Commented] (KAFKA-9393) DeleteRecords may cause extreme lock contention for large partition directories

    [ https://issues.apache.org/jira/browse/KAFKA-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012170#comment-17012170 ] 

Guozhang Wang commented on KAFKA-9393:
--------------------------------------

Thanks for filing this Lucas. This is good to know.

Regarding the fix what you've proposed looks good to me -- and at the moment I think another "workaround" is to let streams app to be less frequently deleting records on changelogs. cc [~ableegoldman] [~mjsax]

> DeleteRecords may cause extreme lock contention for large partition directories
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-9393
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9393
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.2.0, 2.3.0, 2.4.0
>            Reporter: Lucas Bradstreet
>            Priority: Major
>
> DeleteRecords, frequently used by KStreams triggers a Log.maybeIncrementLogStartOffset call, calling kafka.log.ProducerStateManager.listSnapshotFiles which calls java.io.File.listFiles on the partition dir. The time taken to list this directory can be extreme for partitions with many small segments (e.g 20000) taking multiple seconds to finish. This causes lock contention for the log, and if produce requests are also occurring for the same log can cause a majority of request handler threads to become blocked waiting for the DeleteRecords call to finish.
> I believe this is a problem going back to the initial implementation of the transactional producer, but I need to confirm how far back it goes.
> One possible solution is to maintain a producer state snapshot aligned to the log segment, and simply delete it whenever we delete a segment. This would ensure that we never have to perform a directory scan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)