You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (Jira)" <ji...@apache.org> on 2020/04/24 21:02:00 UTC
[jira] [Commented] (KAFKA-9450) Decouple inner state flushing from committing

    [ https://issues.apache.org/jira/browse/KAFKA-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091899#comment-17091899 ] 

Guozhang Wang commented on KAFKA-9450:
--------------------------------------

We had some more discussion with [~desai.p.rohan] and [~ableegoldman], and there are a few more ideas that we can consider:

1) Simply increasing the commit interval to some large values for EOS while at the same time enable speculative EOS execution so that we can flush larger files. One side-effect is that when we fixed the query uncommitted data issue then IQ may get longer staleness because of that.

2) We do NOT necessarily flush state stores upon committing, and similarly we do not update checkpoint files upon each commit. Instead we flush state stores at a potentially larger interval and only write checkpoint files when flushing. This is because we've observed that even without EOS, at ALOS if we have a large number of tasks (and hence a finer partitioned rocksdb instances) we may still get into not-enough batched flushing IO. And for EOS we can simply not call flush at all since we do not write checkpoint files. The side-effect is that under fail-over scenarios, we may need to restore longer tails of the changelog under ALOS. But given the frequency of such scenarios I think it may be a good trade to make.

> Decouple inner state flushing from committing
> ---------------------------------------------
>
>                 Key: KAFKA-9450
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9450
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Sophie Blee-Goldman
>            Priority: Major
>
> When EOS is turned on, the commit interval is set quite low (100ms) and all the store layers are flushed during a commit. This is necessary for forwarding records in the cache to the changelog, but unfortunately also forces rocksdb to flush the current memtable before it's full. The result is a large number of small writes to disk, losing the benefits of batching, and a large number of very small L0 files that are likely to slow compaction.
> Since we have to delete the stores to recreate from scratch anyways during an unclean shutdown with EOS, we may as well skip flushing the innermost StateStore during a commit and only do so during a graceful shutdown, before a rebalance, etc. This is currently blocked on a refactoring of the state store layers to allow decoupling the flush of the caching layer from the actual state store.
> Note that this is especially problematic with EOS due to the necessarily-low commit interval, but still hurts even with at-least-once and a much larger commit interval. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)