You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Stanislav Kozlovski <st...@confluent.io> on 2019/08/01 07:49:48 UTC

Re: [DISCUSS] KIP-498: Add client-side configuration for maximum response size to protect against OOM

Hey Alexandre, thanks for the KIP!

I had personally hit the same problem and found it very annoying.
Have you considered detecting such a scenario in the client and simply
throwing an exception, rather than allocating any memory?
I have an open PR that does just that -
https://github.com/apache/kafka/pull/5940
<https://github.com/apache/kafka/pull/5940/files>

Best,
Stanislav

On Wed, Jul 31, 2019 at 10:35 AM Gokul Ramanan Subramanian <
gokul2411s@gmail.com> wrote:

> Hi Alex.
>
> A rejected alternative is to try to do SSL handshake from the plaintext
> transport layer. This couples various transport layers and is inflexible to
> adding new transport layers in the future. Further, if the plaintext
> transport layer does SSL handshake, it will always get an error,
> irrespective of whether or not the peer supports SSL. Therefore, the
> plaintext transport layer would have to determine if the peer uses SSL or
> not based on the type of error it gets back from the SSL handshake. This
> fits right into the general anti-pattern of using exceptions as control
> flow mechanisms.
>
> Another rejected alternative would be to have a single byte at the
> transport layer represent the security protocol in use. This byte would
> have to be present in every single message between clients and brokers and
> between brokers and brokers. This change is backwards-incompatible and adds
> overhead to Kafka. It is likely never going to get accepted by the
> community.
>
> In the absence of a fool-proof way to do a handhskake across all security
> protocols (plaintext, SSL, other future ones), I think that the proposed
> KIP provides a good solution. Therefore, you have my +1. (Not sure if my +1
> counts, since I am a Kafka noob).
>
> Thanks.
> Gokul Ramanan Subramanian
> Senior SDE, Amazon AWS
>
> On 28/Jul/19 05:43:19PM +0100, Alexandre Dupriez wrote:
>  Hello,
>
>  I have created the KIP-498
>  <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM
> >
>  to
>  propose a minor addition to the producer configuration properties, in
> order
>  to protect against OOM when the response message decoded by a client
>  indicates an "abnormally high" size to be allocated.
>
>  This follows this discussion thread
>  <https://www.mail-archive.com/dev@kafka.apache.org/msg55658.html> and is
>  tracked by KAFKA-4090 <https://issues.apache.org/jira/browse/KAFKA-4090>.
>
>  Please let me know what you think. I created this KIP even though the
>  change seems minimal because it affects client configuration, which is
>  mentioned as a type of change eligible for a KIP on this main wiki page
>  <
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >
>  .
>
>  Thanks,
>  Alexandre
>

Re: [DISCUSS] KIP-498: Add client-side configuration for maximum response size to protect against OOM

Posted by Alexandre Dupriez <al...@gmail.com>.
Hello,

Thanks David to propose a new PR [1] for KIP-498 [2] to address KAFKA-4090 [3].

I find it interesting, I wonder how it stands w.r.t. avoiding
inter-layer violation.

[1] https://github.com/apache/kafka/pull/8066
[2] https://issues.apache.org/jira/browse/KAFKA-4090
[3] https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM

Le mar. 6 août 2019 à 15:37, Gokul Ramanan Subramanian
<go...@gmail.com> a écrit :
>
> Hi Alexandre.
>
> Thanks for this analysis.
>
> IMHO, there are 4 ways to ago about this:
>
> 1. We don't fix the bug directly but instead update the Kafka documentation
> telling clients to configure themselves correctly - Silly but easy to
> achieve.
> 2. We adopt Stanislav's solution that fixes the problem - Easy to achieve,
> potentially adds inflexibility in the future if we want to change handshake
> protocol. However, changing the handshake protocol is going to be a
> backwards incompatible change anyway with or without Stanislav's solution.
> 3. We adopt Alexandre's solution - Easy to achieve, but has shortcomings
> Alexandre has highlighted.
> 4. We pivot KIP-498 to focus on standardizing the handshake protocol - Not
> easy to achieve, since this will be a backwards incompatible change and
> will require client and server redeployments in correct order. Further,
> this can be a hard problem to solve given that various transport layer
> protocols have different headers. In order for the "new handshake" protocol
> to work, it would have to work in the same namespace as those headers,
> which will require careful tuning of handshake constants.
>
> Any thoughts from the community on how we can proceed?
>
> Thanks.
>
> On Tue, Aug 6, 2019 at 12:41 PM Alexandre Dupriez <
> alexandre.dupriez@gmail.com> wrote:
>
> > Hello,
> >
> > I wrote a small change [1] to make clients validate the size of messages
> > received from a broker at the protocol-level.
> > However I don't like the change. It does not really solve the problem which
> > originally motivated the KIP - that is, protocol mismatch (plaintext to SSL
> > endpoint). More specifically, few problems I can see are listed below. This
> > is a non-exhaustive list. They also have been added to KIP-498 [2].
> >
> > 1) Incorrect failure mode
> > As was experimented and as can be seen as part of the integration tests,
> > when an invalid size is detected and the exception InvalidReceiveException
> > is thrown, consumers and producers keeps retrying until the poll timeout
> > expires or the maximum number of retries is reached. This is incorrect if
> > we consider the use case of protocol mismatch which motivated this change.
> > Indeed, producers and consumers would need to fail fast as retries will
> > only prolong the time to remediation from users/administrators.
> >
> > 2) Incomplete remediation
> > The proposed change cannot provide an definite guarantee against OOM in all
> > scenarios - for instance, it will still manifest if the maximum size is set
> > to 100 MB and the JVM is under memory pressure and have less than 100 MB of
> > allocatable memory.
> >
> > 3) Illegitimate message rejection
> > Even worse: what if the property is incorrectly configured and prevent
> > legitimate messages from reaching the client?
> >
> > 4) Unclear configuration parameter
> > 4.a) The name max.response.size intends to mirror the existing
> > max.request.size from the producer's configuration properties. However,
> > max.request.size intends to check the size of producer records as provided
> > by a client; while max.response.size is to check the size directly decoded
> > from the network according to Kafka's binary protocol.
> > 4.b) On the broker, the property socket.request.max.bytes is used to
> > validate the size of messages received by the server. The new property
> > serves the same purpose, which introduces duplicated semantic, even if one
> > property is characterised with the keyword "request" and the other with
> > "response", in both cases reflecting the perspective adopted from either a
> > client or a server.
> >
> > Please let me know what you think. An alternative mitigation may be worth
> > investigated for the detection of protocol mismatch in the client.
> >
> > [1] https://github.com/apache/kafka/pull/7160
> > [2]
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM
> >
> > Le jeu. 1 août 2019 à 09:42, Alexandre Dupriez <
> > alexandre.dupriez@gmail.com>
> > a écrit :
> >
> > > Thanks Gokul and Stanislav for your answers.
> > >
> > > I went through the PR 5940 [1]. Indeed Gokul, your reasoning echoes the
> > > observations of Ismael about a potential inter-protocol layering
> > violation.
> > >
> > > As you said Stanislav, the server-side SSL engine responds with an alert
> > > with code 80 (internal_error) from what I saw when reproducing the OOM.
> > > Since the Alert is generated below the application layer, I am not sure
> > > what could be done on the broker to handle the scenario more gracefully.
> > > Interestingly, the SSL engine emits the possibility of receiving a
> > > plaintext message in debug logs [2].
> > >
> > > The idea was indeed to perform a simple check on the message size decoded
> > > in NetworkReceive against a configurable value, and throw
> > > an InvalidReceiveException in a similar fashion as what happens on the
> > > broker, where a non-unlimited maxSize can be provided. Basically, mimic
> > the
> > > behaviour on the broker. The default value for the maximal request size
> > on
> > > the broker is 100 MB which you are suggesting to use client-side.
> > >
> > > Adding a client configuration property for clients may be an overkill
> > > here. What I am going to ask is naive but - is it theoretically possible
> > > for the broker to legitimately send responses over 100 MB in size?
> > >
> > > Thanks,
> > > Alexandre
> > >
> > > [1] https://github.com/apache/kafka/pull/5940
> > > [2]
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, fatal error: 80: problem
> > > unwrapping net record
> > >
> > > javax.net.ssl.SSLException: Unrecognized SSL message, plaintext
> > connection?
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, SEND TLSv1.2 ALERT:
> > fatal,
> > > description = internal_error
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, WRITE: TLSv1.2 Alert,
> > > length = 2
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeOutbound()
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, closeOutboundInternal()
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeInbound()
> > >
> > > kafka-network-thread-0-ListenerName(SSL)-SSL-4, fatal: engine already
> > > closed.  Rethrowing javax.net.ssl.SSLException: Inbound closed before
> > > receiving peer's close_notify: possible truncation attack?
> > >
> > > [Raw write]: length = 7
> > >
> > > 0000: 15 03 03 00 02 02 50                               ......P
> > >
> > >
> > > Le jeu. 1 août 2019 à 08:50, Stanislav Kozlovski <stanislav@confluent.io
> > >
> > > a écrit :
> > >
> > >> Hey Alexandre, thanks for the KIP!
> > >>
> > >> I had personally hit the same problem and found it very annoying.
> > >> Have you considered detecting such a scenario in the client and simply
> > >> throwing an exception, rather than allocating any memory?
> > >> I have an open PR that does just that -
> > >> https://github.com/apache/kafka/pull/5940
> > >> <https://github.com/apache/kafka/pull/5940/files>
> > >>
> > >> Best,
> > >> Stanislav
> > >>
> > >> On Wed, Jul 31, 2019 at 10:35 AM Gokul Ramanan Subramanian <
> > >> gokul2411s@gmail.com> wrote:
> > >>
> > >> > Hi Alex.
> > >> >
> > >> > A rejected alternative is to try to do SSL handshake from the
> > plaintext
> > >> > transport layer. This couples various transport layers and is
> > >> inflexible to
> > >> > adding new transport layers in the future. Further, if the plaintext
> > >> > transport layer does SSL handshake, it will always get an error,
> > >> > irrespective of whether or not the peer supports SSL. Therefore, the
> > >> > plaintext transport layer would have to determine if the peer uses SSL
> > >> or
> > >> > not based on the type of error it gets back from the SSL handshake.
> > This
> > >> > fits right into the general anti-pattern of using exceptions as
> > control
> > >> > flow mechanisms.
> > >> >
> > >> > Another rejected alternative would be to have a single byte at the
> > >> > transport layer represent the security protocol in use. This byte
> > would
> > >> > have to be present in every single message between clients and brokers
> > >> and
> > >> > between brokers and brokers. This change is backwards-incompatible and
> > >> adds
> > >> > overhead to Kafka. It is likely never going to get accepted by the
> > >> > community.
> > >> >
> > >> > In the absence of a fool-proof way to do a handhskake across all
> > >> security
> > >> > protocols (plaintext, SSL, other future ones), I think that the
> > proposed
> > >> > KIP provides a good solution. Therefore, you have my +1. (Not sure if
> > >> my +1
> > >> > counts, since I am a Kafka noob).
> > >> >
> > >> > Thanks.
> > >> > Gokul Ramanan Subramanian
> > >> > Senior SDE, Amazon AWS
> > >> >
> > >> > On 28/Jul/19 05:43:19PM +0100, Alexandre Dupriez wrote:
> > >> >  Hello,
> > >> >
> > >> >  I have created the KIP-498
> > >> >  <
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM
> > >> > >
> > >> >  to
> > >> >  propose a minor addition to the producer configuration properties, in
> > >> > order
> > >> >  to protect against OOM when the response message decoded by a client
> > >> >  indicates an "abnormally high" size to be allocated.
> > >> >
> > >> >  This follows this discussion thread
> > >> >  <https://www.mail-archive.com/dev@kafka.apache.org/msg55658.html>
> > and
> > >> is
> > >> >  tracked by KAFKA-4090 <
> > >> https://issues.apache.org/jira/browse/KAFKA-4090>.
> > >> >
> > >> >  Please let me know what you think. I created this KIP even though the
> > >> >  change seems minimal because it affects client configuration, which
> > is
> > >> >  mentioned as a type of change eligible for a KIP on this main wiki
> > page
> > >> >  <
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > >> > >
> > >> >  .
> > >> >
> > >> >  Thanks,
> > >> >  Alexandre
> > >> >
> > >>
> > >
> >
>
>
> --
> GOKUL R. SUBRAMANIAN,
> Software Engineer

Re: [DISCUSS] KIP-498: Add client-side configuration for maximum response size to protect against OOM

Posted by Gokul Ramanan Subramanian <go...@gmail.com>.
Hi Alexandre.

Thanks for this analysis.

IMHO, there are 4 ways to ago about this:

1. We don't fix the bug directly but instead update the Kafka documentation
telling clients to configure themselves correctly - Silly but easy to
achieve.
2. We adopt Stanislav's solution that fixes the problem - Easy to achieve,
potentially adds inflexibility in the future if we want to change handshake
protocol. However, changing the handshake protocol is going to be a
backwards incompatible change anyway with or without Stanislav's solution.
3. We adopt Alexandre's solution - Easy to achieve, but has shortcomings
Alexandre has highlighted.
4. We pivot KIP-498 to focus on standardizing the handshake protocol - Not
easy to achieve, since this will be a backwards incompatible change and
will require client and server redeployments in correct order. Further,
this can be a hard problem to solve given that various transport layer
protocols have different headers. In order for the "new handshake" protocol
to work, it would have to work in the same namespace as those headers,
which will require careful tuning of handshake constants.

Any thoughts from the community on how we can proceed?

Thanks.

On Tue, Aug 6, 2019 at 12:41 PM Alexandre Dupriez <
alexandre.dupriez@gmail.com> wrote:

> Hello,
>
> I wrote a small change [1] to make clients validate the size of messages
> received from a broker at the protocol-level.
> However I don't like the change. It does not really solve the problem which
> originally motivated the KIP - that is, protocol mismatch (plaintext to SSL
> endpoint). More specifically, few problems I can see are listed below. This
> is a non-exhaustive list. They also have been added to KIP-498 [2].
>
> 1) Incorrect failure mode
> As was experimented and as can be seen as part of the integration tests,
> when an invalid size is detected and the exception InvalidReceiveException
> is thrown, consumers and producers keeps retrying until the poll timeout
> expires or the maximum number of retries is reached. This is incorrect if
> we consider the use case of protocol mismatch which motivated this change.
> Indeed, producers and consumers would need to fail fast as retries will
> only prolong the time to remediation from users/administrators.
>
> 2) Incomplete remediation
> The proposed change cannot provide an definite guarantee against OOM in all
> scenarios - for instance, it will still manifest if the maximum size is set
> to 100 MB and the JVM is under memory pressure and have less than 100 MB of
> allocatable memory.
>
> 3) Illegitimate message rejection
> Even worse: what if the property is incorrectly configured and prevent
> legitimate messages from reaching the client?
>
> 4) Unclear configuration parameter
> 4.a) The name max.response.size intends to mirror the existing
> max.request.size from the producer's configuration properties. However,
> max.request.size intends to check the size of producer records as provided
> by a client; while max.response.size is to check the size directly decoded
> from the network according to Kafka's binary protocol.
> 4.b) On the broker, the property socket.request.max.bytes is used to
> validate the size of messages received by the server. The new property
> serves the same purpose, which introduces duplicated semantic, even if one
> property is characterised with the keyword "request" and the other with
> "response", in both cases reflecting the perspective adopted from either a
> client or a server.
>
> Please let me know what you think. An alternative mitigation may be worth
> investigated for the detection of protocol mismatch in the client.
>
> [1] https://github.com/apache/kafka/pull/7160
> [2]
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM
>
> Le jeu. 1 août 2019 à 09:42, Alexandre Dupriez <
> alexandre.dupriez@gmail.com>
> a écrit :
>
> > Thanks Gokul and Stanislav for your answers.
> >
> > I went through the PR 5940 [1]. Indeed Gokul, your reasoning echoes the
> > observations of Ismael about a potential inter-protocol layering
> violation.
> >
> > As you said Stanislav, the server-side SSL engine responds with an alert
> > with code 80 (internal_error) from what I saw when reproducing the OOM.
> > Since the Alert is generated below the application layer, I am not sure
> > what could be done on the broker to handle the scenario more gracefully.
> > Interestingly, the SSL engine emits the possibility of receiving a
> > plaintext message in debug logs [2].
> >
> > The idea was indeed to perform a simple check on the message size decoded
> > in NetworkReceive against a configurable value, and throw
> > an InvalidReceiveException in a similar fashion as what happens on the
> > broker, where a non-unlimited maxSize can be provided. Basically, mimic
> the
> > behaviour on the broker. The default value for the maximal request size
> on
> > the broker is 100 MB which you are suggesting to use client-side.
> >
> > Adding a client configuration property for clients may be an overkill
> > here. What I am going to ask is naive but - is it theoretically possible
> > for the broker to legitimately send responses over 100 MB in size?
> >
> > Thanks,
> > Alexandre
> >
> > [1] https://github.com/apache/kafka/pull/5940
> > [2]
> >
> > kafka-network-thread-0-ListenerName(SSL)-SSL-4, fatal error: 80: problem
> > unwrapping net record
> >
> > javax.net.ssl.SSLException: Unrecognized SSL message, plaintext
> connection?
> >
> > kafka-network-thread-0-ListenerName(SSL)-SSL-4, SEND TLSv1.2 ALERT:
> fatal,
> > description = internal_error
> >
> > kafka-network-thread-0-ListenerName(SSL)-SSL-4, WRITE: TLSv1.2 Alert,
> > length = 2
> >
> > kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeOutbound()
> >
> > kafka-network-thread-0-ListenerName(SSL)-SSL-4, closeOutboundInternal()
> >
> > kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeInbound()
> >
> > kafka-network-thread-0-ListenerName(SSL)-SSL-4, fatal: engine already
> > closed.  Rethrowing javax.net.ssl.SSLException: Inbound closed before
> > receiving peer's close_notify: possible truncation attack?
> >
> > [Raw write]: length = 7
> >
> > 0000: 15 03 03 00 02 02 50                               ......P
> >
> >
> > Le jeu. 1 août 2019 à 08:50, Stanislav Kozlovski <stanislav@confluent.io
> >
> > a écrit :
> >
> >> Hey Alexandre, thanks for the KIP!
> >>
> >> I had personally hit the same problem and found it very annoying.
> >> Have you considered detecting such a scenario in the client and simply
> >> throwing an exception, rather than allocating any memory?
> >> I have an open PR that does just that -
> >> https://github.com/apache/kafka/pull/5940
> >> <https://github.com/apache/kafka/pull/5940/files>
> >>
> >> Best,
> >> Stanislav
> >>
> >> On Wed, Jul 31, 2019 at 10:35 AM Gokul Ramanan Subramanian <
> >> gokul2411s@gmail.com> wrote:
> >>
> >> > Hi Alex.
> >> >
> >> > A rejected alternative is to try to do SSL handshake from the
> plaintext
> >> > transport layer. This couples various transport layers and is
> >> inflexible to
> >> > adding new transport layers in the future. Further, if the plaintext
> >> > transport layer does SSL handshake, it will always get an error,
> >> > irrespective of whether or not the peer supports SSL. Therefore, the
> >> > plaintext transport layer would have to determine if the peer uses SSL
> >> or
> >> > not based on the type of error it gets back from the SSL handshake.
> This
> >> > fits right into the general anti-pattern of using exceptions as
> control
> >> > flow mechanisms.
> >> >
> >> > Another rejected alternative would be to have a single byte at the
> >> > transport layer represent the security protocol in use. This byte
> would
> >> > have to be present in every single message between clients and brokers
> >> and
> >> > between brokers and brokers. This change is backwards-incompatible and
> >> adds
> >> > overhead to Kafka. It is likely never going to get accepted by the
> >> > community.
> >> >
> >> > In the absence of a fool-proof way to do a handhskake across all
> >> security
> >> > protocols (plaintext, SSL, other future ones), I think that the
> proposed
> >> > KIP provides a good solution. Therefore, you have my +1. (Not sure if
> >> my +1
> >> > counts, since I am a Kafka noob).
> >> >
> >> > Thanks.
> >> > Gokul Ramanan Subramanian
> >> > Senior SDE, Amazon AWS
> >> >
> >> > On 28/Jul/19 05:43:19PM +0100, Alexandre Dupriez wrote:
> >> >  Hello,
> >> >
> >> >  I have created the KIP-498
> >> >  <
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM
> >> > >
> >> >  to
> >> >  propose a minor addition to the producer configuration properties, in
> >> > order
> >> >  to protect against OOM when the response message decoded by a client
> >> >  indicates an "abnormally high" size to be allocated.
> >> >
> >> >  This follows this discussion thread
> >> >  <https://www.mail-archive.com/dev@kafka.apache.org/msg55658.html>
> and
> >> is
> >> >  tracked by KAFKA-4090 <
> >> https://issues.apache.org/jira/browse/KAFKA-4090>.
> >> >
> >> >  Please let me know what you think. I created this KIP even though the
> >> >  change seems minimal because it affects client configuration, which
> is
> >> >  mentioned as a type of change eligible for a KIP on this main wiki
> page
> >> >  <
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >> > >
> >> >  .
> >> >
> >> >  Thanks,
> >> >  Alexandre
> >> >
> >>
> >
>


-- 
GOKUL R. SUBRAMANIAN,
Software Engineer

Re: [DISCUSS] KIP-498: Add client-side configuration for maximum response size to protect against OOM

Posted by Alexandre Dupriez <al...@gmail.com>.
Hello,

I wrote a small change [1] to make clients validate the size of messages
received from a broker at the protocol-level.
However I don't like the change. It does not really solve the problem which
originally motivated the KIP - that is, protocol mismatch (plaintext to SSL
endpoint). More specifically, few problems I can see are listed below. This
is a non-exhaustive list. They also have been added to KIP-498 [2].

1) Incorrect failure mode
As was experimented and as can be seen as part of the integration tests,
when an invalid size is detected and the exception InvalidReceiveException
is thrown, consumers and producers keeps retrying until the poll timeout
expires or the maximum number of retries is reached. This is incorrect if
we consider the use case of protocol mismatch which motivated this change.
Indeed, producers and consumers would need to fail fast as retries will
only prolong the time to remediation from users/administrators.

2) Incomplete remediation
The proposed change cannot provide an definite guarantee against OOM in all
scenarios - for instance, it will still manifest if the maximum size is set
to 100 MB and the JVM is under memory pressure and have less than 100 MB of
allocatable memory.

3) Illegitimate message rejection
Even worse: what if the property is incorrectly configured and prevent
legitimate messages from reaching the client?

4) Unclear configuration parameter
4.a) The name max.response.size intends to mirror the existing
max.request.size from the producer's configuration properties. However,
max.request.size intends to check the size of producer records as provided
by a client; while max.response.size is to check the size directly decoded
from the network according to Kafka's binary protocol.
4.b) On the broker, the property socket.request.max.bytes is used to
validate the size of messages received by the server. The new property
serves the same purpose, which introduces duplicated semantic, even if one
property is characterised with the keyword "request" and the other with
"response", in both cases reflecting the perspective adopted from either a
client or a server.

Please let me know what you think. An alternative mitigation may be worth
investigated for the detection of protocol mismatch in the client.

[1] https://github.com/apache/kafka/pull/7160
[2]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM

Le jeu. 1 août 2019 à 09:42, Alexandre Dupriez <al...@gmail.com>
a écrit :

> Thanks Gokul and Stanislav for your answers.
>
> I went through the PR 5940 [1]. Indeed Gokul, your reasoning echoes the
> observations of Ismael about a potential inter-protocol layering violation.
>
> As you said Stanislav, the server-side SSL engine responds with an alert
> with code 80 (internal_error) from what I saw when reproducing the OOM.
> Since the Alert is generated below the application layer, I am not sure
> what could be done on the broker to handle the scenario more gracefully.
> Interestingly, the SSL engine emits the possibility of receiving a
> plaintext message in debug logs [2].
>
> The idea was indeed to perform a simple check on the message size decoded
> in NetworkReceive against a configurable value, and throw
> an InvalidReceiveException in a similar fashion as what happens on the
> broker, where a non-unlimited maxSize can be provided. Basically, mimic the
> behaviour on the broker. The default value for the maximal request size on
> the broker is 100 MB which you are suggesting to use client-side.
>
> Adding a client configuration property for clients may be an overkill
> here. What I am going to ask is naive but - is it theoretically possible
> for the broker to legitimately send responses over 100 MB in size?
>
> Thanks,
> Alexandre
>
> [1] https://github.com/apache/kafka/pull/5940
> [2]
>
> kafka-network-thread-0-ListenerName(SSL)-SSL-4, fatal error: 80: problem
> unwrapping net record
>
> javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
>
> kafka-network-thread-0-ListenerName(SSL)-SSL-4, SEND TLSv1.2 ALERT:  fatal,
> description = internal_error
>
> kafka-network-thread-0-ListenerName(SSL)-SSL-4, WRITE: TLSv1.2 Alert,
> length = 2
>
> kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeOutbound()
>
> kafka-network-thread-0-ListenerName(SSL)-SSL-4, closeOutboundInternal()
>
> kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeInbound()
>
> kafka-network-thread-0-ListenerName(SSL)-SSL-4, fatal: engine already
> closed.  Rethrowing javax.net.ssl.SSLException: Inbound closed before
> receiving peer's close_notify: possible truncation attack?
>
> [Raw write]: length = 7
>
> 0000: 15 03 03 00 02 02 50                               ......P
>
>
> Le jeu. 1 août 2019 à 08:50, Stanislav Kozlovski <st...@confluent.io>
> a écrit :
>
>> Hey Alexandre, thanks for the KIP!
>>
>> I had personally hit the same problem and found it very annoying.
>> Have you considered detecting such a scenario in the client and simply
>> throwing an exception, rather than allocating any memory?
>> I have an open PR that does just that -
>> https://github.com/apache/kafka/pull/5940
>> <https://github.com/apache/kafka/pull/5940/files>
>>
>> Best,
>> Stanislav
>>
>> On Wed, Jul 31, 2019 at 10:35 AM Gokul Ramanan Subramanian <
>> gokul2411s@gmail.com> wrote:
>>
>> > Hi Alex.
>> >
>> > A rejected alternative is to try to do SSL handshake from the plaintext
>> > transport layer. This couples various transport layers and is
>> inflexible to
>> > adding new transport layers in the future. Further, if the plaintext
>> > transport layer does SSL handshake, it will always get an error,
>> > irrespective of whether or not the peer supports SSL. Therefore, the
>> > plaintext transport layer would have to determine if the peer uses SSL
>> or
>> > not based on the type of error it gets back from the SSL handshake. This
>> > fits right into the general anti-pattern of using exceptions as control
>> > flow mechanisms.
>> >
>> > Another rejected alternative would be to have a single byte at the
>> > transport layer represent the security protocol in use. This byte would
>> > have to be present in every single message between clients and brokers
>> and
>> > between brokers and brokers. This change is backwards-incompatible and
>> adds
>> > overhead to Kafka. It is likely never going to get accepted by the
>> > community.
>> >
>> > In the absence of a fool-proof way to do a handhskake across all
>> security
>> > protocols (plaintext, SSL, other future ones), I think that the proposed
>> > KIP provides a good solution. Therefore, you have my +1. (Not sure if
>> my +1
>> > counts, since I am a Kafka noob).
>> >
>> > Thanks.
>> > Gokul Ramanan Subramanian
>> > Senior SDE, Amazon AWS
>> >
>> > On 28/Jul/19 05:43:19PM +0100, Alexandre Dupriez wrote:
>> >  Hello,
>> >
>> >  I have created the KIP-498
>> >  <
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM
>> > >
>> >  to
>> >  propose a minor addition to the producer configuration properties, in
>> > order
>> >  to protect against OOM when the response message decoded by a client
>> >  indicates an "abnormally high" size to be allocated.
>> >
>> >  This follows this discussion thread
>> >  <https://www.mail-archive.com/dev@kafka.apache.org/msg55658.html> and
>> is
>> >  tracked by KAFKA-4090 <
>> https://issues.apache.org/jira/browse/KAFKA-4090>.
>> >
>> >  Please let me know what you think. I created this KIP even though the
>> >  change seems minimal because it affects client configuration, which is
>> >  mentioned as a type of change eligible for a KIP on this main wiki page
>> >  <
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>> > >
>> >  .
>> >
>> >  Thanks,
>> >  Alexandre
>> >
>>
>

Re: [DISCUSS] KIP-498: Add client-side configuration for maximum response size to protect against OOM

Posted by Alexandre Dupriez <al...@gmail.com>.
Thanks Gokul and Stanislav for your answers.

I went through the PR 5940 [1]. Indeed Gokul, your reasoning echoes the
observations of Ismael about a potential inter-protocol layering violation.

As you said Stanislav, the server-side SSL engine responds with an alert
with code 80 (internal_error) from what I saw when reproducing the OOM.
Since the Alert is generated below the application layer, I am not sure
what could be done on the broker to handle the scenario more gracefully.
Interestingly, the SSL engine emits the possibility of receiving a
plaintext message in debug logs [2].

The idea was indeed to perform a simple check on the message size decoded
in NetworkReceive against a configurable value, and throw
an InvalidReceiveException in a similar fashion as what happens on the
broker, where a non-unlimited maxSize can be provided. Basically, mimic the
behaviour on the broker. The default value for the maximal request size on
the broker is 100 MB which you are suggesting to use client-side.

Adding a client configuration property for clients may be an overkill here.
What I am going to ask is naive but - is it theoretically possible for the
broker to legitimately send responses over 100 MB in size?

Thanks,
Alexandre

[1] https://github.com/apache/kafka/pull/5940
[2]

kafka-network-thread-0-ListenerName(SSL)-SSL-4, fatal error: 80: problem
unwrapping net record

javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?

kafka-network-thread-0-ListenerName(SSL)-SSL-4, SEND TLSv1.2 ALERT:  fatal,
description = internal_error

kafka-network-thread-0-ListenerName(SSL)-SSL-4, WRITE: TLSv1.2 Alert,
length = 2

kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeOutbound()

kafka-network-thread-0-ListenerName(SSL)-SSL-4, closeOutboundInternal()

kafka-network-thread-0-ListenerName(SSL)-SSL-4, called closeInbound()

kafka-network-thread-0-ListenerName(SSL)-SSL-4, fatal: engine already
closed.  Rethrowing javax.net.ssl.SSLException: Inbound closed before
receiving peer's close_notify: possible truncation attack?

[Raw write]: length = 7

0000: 15 03 03 00 02 02 50                               ......P


Le jeu. 1 août 2019 à 08:50, Stanislav Kozlovski <st...@confluent.io> a
écrit :

> Hey Alexandre, thanks for the KIP!
>
> I had personally hit the same problem and found it very annoying.
> Have you considered detecting such a scenario in the client and simply
> throwing an exception, rather than allocating any memory?
> I have an open PR that does just that -
> https://github.com/apache/kafka/pull/5940
> <https://github.com/apache/kafka/pull/5940/files>
>
> Best,
> Stanislav
>
> On Wed, Jul 31, 2019 at 10:35 AM Gokul Ramanan Subramanian <
> gokul2411s@gmail.com> wrote:
>
> > Hi Alex.
> >
> > A rejected alternative is to try to do SSL handshake from the plaintext
> > transport layer. This couples various transport layers and is inflexible
> to
> > adding new transport layers in the future. Further, if the plaintext
> > transport layer does SSL handshake, it will always get an error,
> > irrespective of whether or not the peer supports SSL. Therefore, the
> > plaintext transport layer would have to determine if the peer uses SSL or
> > not based on the type of error it gets back from the SSL handshake. This
> > fits right into the general anti-pattern of using exceptions as control
> > flow mechanisms.
> >
> > Another rejected alternative would be to have a single byte at the
> > transport layer represent the security protocol in use. This byte would
> > have to be present in every single message between clients and brokers
> and
> > between brokers and brokers. This change is backwards-incompatible and
> adds
> > overhead to Kafka. It is likely never going to get accepted by the
> > community.
> >
> > In the absence of a fool-proof way to do a handhskake across all security
> > protocols (plaintext, SSL, other future ones), I think that the proposed
> > KIP provides a good solution. Therefore, you have my +1. (Not sure if my
> +1
> > counts, since I am a Kafka noob).
> >
> > Thanks.
> > Gokul Ramanan Subramanian
> > Senior SDE, Amazon AWS
> >
> > On 28/Jul/19 05:43:19PM +0100, Alexandre Dupriez wrote:
> >  Hello,
> >
> >  I have created the KIP-498
> >  <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-498%3A+Add+client-side+configuration+for+maximum+response+size+to+protect+against+OOM
> > >
> >  to
> >  propose a minor addition to the producer configuration properties, in
> > order
> >  to protect against OOM when the response message decoded by a client
> >  indicates an "abnormally high" size to be allocated.
> >
> >  This follows this discussion thread
> >  <https://www.mail-archive.com/dev@kafka.apache.org/msg55658.html> and
> is
> >  tracked by KAFKA-4090 <https://issues.apache.org/jira/browse/KAFKA-4090
> >.
> >
> >  Please let me know what you think. I created this KIP even though the
> >  change seems minimal because it affects client configuration, which is
> >  mentioned as a type of change eligible for a KIP on this main wiki page
> >  <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > >
> >  .
> >
> >  Thanks,
> >  Alexandre
> >
>