You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "RivenSun (Jira)" <ji...@apache.org> on 2022/01/05 08:36:00 UTC

[jira] [Created] (KAFKA-13576) Processor.ConnectionQueueSize provides configuration & metrics, SelectorMetrics adds connection-register related metrics

RivenSun created KAFKA-13576:
--------------------------------

             Summary: Processor.ConnectionQueueSize provides configuration & metrics, SelectorMetrics adds connection-register related metrics
                 Key: KAFKA-13576
                 URL: https://issues.apache.org/jira/browse/KAFKA-13576
             Project: Kafka
          Issue Type: Improvement
          Components: metrics, network
    Affects Versions: 3.0.0
            Reporter: RivenSun
            Assignee: Luke Chen


h1. Problem:


After all client machines are switched to the company's private BYOIP, producers who send messages frequently have a significant increase in time consumption. Producers who send messages infrequently often throw out exceptions that send messages to obtain metadata timeout. Everything was normal before switching


h1. RC:


1. The client's BYOIP lacks PTR configuration

2. When the port uses SASL_SSL protocol, the underlying method SaslChannelBuilder#buildTransportLayer of Processor#configureNewConnections will call socketChannel.socket().getInetAddress().getHostName() to trigger DNS reverse lookup. If clientIp lacks PTR configuration, this will cause getHostName() will be time consuming.

3. Several steps in the processor's run method are executed serially. If configureNewConnections takes time, it will inevitably cause the completed response to not be sent to the client in time, resulting in an increase in the ack time for the producer to send messages

4. ConfigureNewConnections is time-consuming, which will cause the elements in Processor.newConnections to not be removed in time, which will increase the time-consuming of the Acceptor#assignNewConnection method. AssignNewConnection will even block in newConnections.put(socketChannel). At this time, the Acceptor thread may reject any new creation TCP connection request.


h1. Solution:


1. Add PTR configuration to the BYOIP of the client


2. Kafka high version has fixed this problem,

https://issues.apache.org/jira/browse/KAFKA-8562

https://github.com/apache/kafka/pull/10059

3. Selector Metrics of each processor’s selector, add *connection-register* related metrics.
Selector#register(String id, SocketChannel socketChannel) In this method, update the connection-register related indicators, the metrics indicator type is expected to use newHistogram, which is similar to the attribute field of *responseQueueTimeMs*

4.

1) The queue size of Processor.newConnections is recommended to be configurable

Source code:
{code:java}
private[kafka] object Processor {
  val IdlePercentMetricName = "IdlePercent"
  val NetworkProcessorMetricTag = "networkProcessor"
  val ListenerMetricTag = "listener"
  val ConnectionQueueSize = 20
}{code}

The current value is 20, and the code is hard-coded here, perhaps for design considerations, but it is still recommended to provide configuration, *queued.max.connections* acts on processors of all ports,

Or the processor of each listener port provides independent configuration
*listener.name.\{listenerName}.queued.max.connections*

2) Provide metrics statistics for each processor’s newConnections queue size: {*}ConnectionQueueSize{*}, ConnectionQueueSize metrics can refer to the *ResponseQueueSize* maintained in RequestChannel



--
This message was sent by Atlassian Jira
(v8.20.1#820001)