You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Boyang Chen (Jira)" <ji...@apache.org> on 2020/07/06 19:03:00 UTC

[jira] [Created] (KAFKA-10237) Properly handle in-memory stores OOM

Boyang Chen created KAFKA-10237:
-----------------------------------

             Summary: Properly handle in-memory stores OOM
                 Key: KAFKA-10237
                 URL: https://issues.apache.org/jira/browse/KAFKA-10237
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Boyang Chen


We have seen the in-memory store buffered too much data and eventually get OOM. Generally speaking, OOM has no real indication of the underlying problem and increases the difficulty for user debugging, since the failed thread may not be the actual culprit which causes the explosion. If we could get better protection to avoid hitting memory limit, or at least giving out a clear guide, the end user debugging would be much simpler. 

To make it work, we need to enforce a certain memory limit below heap size and take actions  when hitting it. The first question would be, whether we set a numeric limit, such as 100MB or 500MB, or a percentile limit, such as 60% or 80% of total memory.

The second question is about the action itself. One approach would be crashing the store immediately and inform the user to increase their application capacity. The second approach would be opening up an on-disk store spontaneously and offload the data to it.

Personally I'm in favor of approach #2 because it has minimum impact to the on-going application. However it is more complex and potentially requires significant works to define the proper behavior such as the default store configuration, how to manage its lifecycle, etc.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)