You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Dhruv Patel <dh...@gmail.com> on 2021/04/08 17:35:23 UTC

Kafka Producer Send Error Rate High when one of the nodes is down in the cluster

Hi,
  We are facing an issue where we are seeing high producer send error rates
when one of the nodes in the cluster is down for maintenance. We see a lot
of exceptions related to java nio libraries server when this happens. Any
idea what could be causing this? We use min.isr=2 and use at least once
delivery semantics. Moreover, we have one extra node in the cluster so one
node going down should not have any effect on the cluster.

KafkaProducerConfigs
acks=all
retries=5
request.timeout.ms=10000
linger.ms=500
batch.size=32768
buffer.memory=67108864

Rest of the settings are default

*Errors we are seeing on the producer clients are as follows*

2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx]
o.a.k.c.p.i.Sender  [Producer clientId=xxxxxxx] Got error produce
response with correlation id 9224633 on topic-partition
device_telemetry-29, retrying (4 attempts left). Error:
NETWORK_EXCEPTION
2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx]
o.a.k.c.p.i.Sender  [Producer clientId=xxxxxxx] Received invalid
metadata error in produce request on partition device_telemetry-29 due
to org.apache.kafka.common.errors.NetworkException: The server
disconnected before a response was received.. Going to request
metadata update now
2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx]
o.a.k.c.p.i.Sender  [Producer
clientId=networking-monitoring-01002.node.ad1.r2] Got error produce
response with correlation id 9224633 on topic-partition
device_telemetry-1, retrying (4 attempts left). Error:
NETWORK_EXCEPTION
2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx]
o.a.k.c.p.i.Sender  [Producer clientId=xxxxxxx] Received invalid
metadata error in produce request on partition device_telemetry-1 due
to org.apache.kafka.common.errors.NetworkException: The server
disconnected before a response was received.. Going to request
metadata update now
2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx]
o.a.k.c.p.i.Sender  [Producer clientId=xxxxxxx] Got error produce
response with correlation id 9224633 on topic-partition
device_telemetry-33, retrying (4 attempts left). Error:
NETWORK_EXCEPTION
2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx]
o.a.k.c.p.i.Sender  [Producer clientId=xxxxxxx] Received invalid
metadata error in produce request on partition device_telemetry-33 due
to org.apache.kafka.common.errors.NetworkException: The server
disconnected before a response was received.. Going to request
metadata update now
2021-03-19 16:30:06,654 WARN [kafka-producer-network-thread | xxxxxxx]
o.a.k.c.p.i.Sender  [Producer clientId=xxxxxxx] Got error produce
response with correlation id 9224633 on topic-partition
device_telemetry-30, retrying (4 attempts left). Error:
NETWORK_EXCEPTION


Errors seen on the Kafka Server

[2021-03-19 16:12:53,075] INFO [ReplicaFetcher replicaId=1,
leaderId=2, fetcherId=2] Error sending fetch request
(sessionId=1651726962, epoch=293641949) to node 2:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)
[2021-03-19 16:30:06,629] INFO [ReplicaFetcher replicaId=1,
leaderId=3, fetcherId=3] Error sending fetch request
(sessionId=527382701, epoch=5538701) to node 3:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)
[2021-03-19 16:30:06,934] INFO [ReplicaFetcher replicaId=1,
leaderId=3, fetcherId=2] Error sending fetch request
(sessionId=745744629, epoch=409235096) to node 3:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)
[2021-03-19 16:30:06,935] INFO [ReplicaFetcher replicaId=1,
leaderId=4, fetcherId=2] Error sending fetch request
(sessionId=270968958, epoch=4871069) to node 4:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)
[2021-03-19 16:30:06,937] INFO [ReplicaFetcher replicaId=1,
leaderId=4, fetcherId=0] Error sending fetch request
(sessionId=248799819, epoch=208504323) to node 4:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)
[2021-03-19 16:30:06,938] INFO [ReplicaFetcher replicaId=1,
leaderId=3, fetcherId=1] Error sending fetch request
(sessionId=624148312, epoch=212419334) to node 3:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)
[2021-03-19 16:30:06,940] INFO [ReplicaFetcher replicaId=1,
leaderId=4, fetcherId=3] Error sending fetch request
(sessionId=289201163, epoch=1264570088) to node 4:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)
[2021-03-19 16:30:06,941] INFO [ReplicaFetcher replicaId=1,
leaderId=4, fetcherId=1] Error sending fetch request
(sessionId=2006778606, epoch=412437276) to node 4:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)
[2021-03-19 16:30:06,942] INFO [ReplicaFetcher replicaId=1,
leaderId=3, fetcherId=0] Error sending fetch request
(sessionId=192606775, epoch=1246960140) to node 3:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)
[2021-03-19 16:48:19,988] INFO [ReplicaFetcher replicaId=1,
leaderId=2, fetcherId=0] Error sending fetch request
(sessionId=2022443912, epoch=25872) to node 2:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)
[2021-03-19 16:48:19,990] INFO [ReplicaFetcher replicaId=1,
leaderId=2, fetcherId=2] Error sending fetch request
(sessionId=1499198229, epoch=110928) to node 2:
java.nio.channels.ClosedSelectorException.
(org.apache.kafka.clients.FetchSessionHandler)

-- 
*Regards,*
*Dhruv*