You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Martin Schneppenheim (Jira)" <ji...@apache.org> on 2021/05/17 12:48:00 UTC

[jira] [Created] (KAFKA-12797) Quota to mitigate impact of clients that leak Fetch session slots

Martin Schneppenheim created KAFKA-12797:
--------------------------------------------

             Summary: Quota to mitigate impact of clients that leak Fetch session slots
                 Key: KAFKA-12797
                 URL: https://issues.apache.org/jira/browse/KAFKA-12797
             Project: Kafka
          Issue Type: New Feature
    Affects Versions: 2.8.0
            Reporter: Martin Schneppenheim


*Motivation*

KIP-227 introduced fetch sessions and therefore also a fetch session cache that is maintained per Broker and is limited to 1k by default. Accordingly the fetch session slots cache is shared among all clients. 

In a multi tenant environment with hundreds or thousands of different clients misbehaving clients (e.g. Sarama v1.26.0) may leak fetch sessions excessively. This can lead to high eviction rates of fetch sessions at the broker side. Other clients will likely be impacted by this becasue their fetch session can no longer be found in the fetch session cache; in practice log messages like these will pop up:
{noformat}
Node <number> was unable to process the fetch request with (sessionId=<some-number>, epoch=<some-other-number>): FETCH_SESSION_ID_NOT_FOUND.{noformat}
As an operator I don't know how I could identify clients / sasl users that use the most sessions, nor do I have an option to mitigate the impact of clients that create many fetch sessions. The absence of a quota can be exploited by attackers in untrusted multi tenant environments.

*Proposal*

While I'm not really familiar with the Kafka code I assume that a new quota that limits how many fetch session slots a client can maintain (or create in a certain time window) could be introduced.

Additionally I believe that it would be a nice-to-have to monitor the number of fetch session slots created/maintained per SASL user (and/or) ClientID. This way operators can inform misbehaving clients about the problem with fetch sessions which are likely caused by improper client implementations.

 

cc [~dajac] [~gwenshap]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)