You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Wannes De Smet <wa...@gmail.com> on 2016/09/05 14:37:17 UTC
Re: Issue adding server (0.10.0.0)
Hi all
We keep having this issue. After increasing the fetch threads, we cleared
the entire cluster, upgraded to 0.10.0.1, started all nodes, and all was
well. We cannot reduce the fetch size, as it is equal to our
max.message.size. Increasing the number of replica threads to a higher
count increased the memory usage too much, causing countless out of
heapspace / out of direct buffer memory exceptions. We have now set two
fetch threads, which leaves some headroom. Our full config is pasted below
[2].
Today, to make sure this issue was resolved we tried adding a fourth server
to the cluster, and then reassigned all partitions. Unfortunately, the
fourth node will not sync up. This is a snippet from its log file:
...
[2016-09-05 16:13:52,296] WARN [ReplicaFetcherThread-0-2], Error in fetch
kafka.server.ReplicaFetcherThread$FetchRequest@318e6f11
(kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.protocol.types.SchemaException: Error reading field
'responses': Error reading field 'partition_responses': Error reading field
'record_set': Error reading bytes of size 104856899, only 19862997 bytes
available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
at
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
at
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(NetworkClientBlockingOps.scala:136)
at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2016-09-05 16:13:58,227] WARN [ReplicaFetcherThread-0-0], Error in fetch
kafka.server.ReplicaFetcherThread$FetchRequest@5dfb502b
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 0 was disconnected before the response
was read
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
at scala.Option.foreach(Option.scala:257)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(NetworkClientBlockingOps.scala:137)
at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2016-09-05 16:14:03,228] WARN [ReplicaFetcherThread-1-0], Error in fetch
kafka.server.ReplicaFetcherThread$FetchRequest@1831418c
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 0 was disconnected before the response
was read
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
at scala.Option.foreach(Option.scala:257)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(NetworkClientBlockingOps.scala:137)
at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2016-09-05 16:14:14,350] WARN [ReplicaFetcherThread-1-1], Error in fetch
kafka.server.ReplicaFetcherThread$FetchRequest@560e37d5
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 1 was disconnected before the response
was read
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
at scala.Option.foreach(Option.scala:257)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(NetworkClientBlockingOps.scala:137)
at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2016-09-05 16:14:20,803] WARN [ReplicaFetcherThread-0-1], Error in fetch
kafka.server.ReplicaFetcherThread$FetchRequest@4ebf97a4
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 1 was disconnected before the response
was read
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
at scala.Option.foreach(Option.scala:257)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(NetworkClientBlockingOps.scala:137)
at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2016-09-05 16:14:27,274] WARN [ReplicaFetcherThread-1-2], Error in fetch
kafka.server.ReplicaFetcherThread$FetchRequest@e16a83c
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 2 was disconnected before the response
was read
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
at scala.Option.foreach(Option.scala:257)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(NetworkClientBlockingOps.scala:137)
at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2016-09-05 16:14:32,202] WARN [ReplicaFetcherThread-0-2], Error in fetch
kafka.server.ReplicaFetcherThread$FetchRequest@75a17632
(kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.protocol.types.SchemaException: Error reading field
'responses': Error reading field 'partition_responses': Error reading field
'record_set': Error reading bytes of size 104856899, only 23442710 bytes
available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
at
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
at
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(NetworkClientBlockingOps.scala:136)
at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
[2016-09-05 16:14:38,232] WARN [ReplicaFetcherThread-0-0], Error in fetch
kafka.server.ReplicaFetcherThread$FetchRequest@3b531475
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 0 was disconnected before the response
was read
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
at scala.Option.foreach(Option.scala:257)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(NetworkClientBlockingOps.scala:137)
at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:244)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:229)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
...
These message appear continuously, the fourth node never becomes an ISR for
topics on which messages are being actively produced. Some further
searching has brought me to a JIRA issue [1], but this seems scheduled for
0.10.1.0 instead of 0.10.0.2?
We are happy to build and test Kafka ourselves, should this fix already be
available somewhere. Any advice is appreciated as we are a bit stuck.
Thanks
Wannes
[1] https://issues.apache.org/jira/browse/KAFKA-3916
[2] server.properties:
auto.create.topics.enable=true
auto.leader.rebalance.enable=true
background.threads=4
broker.id=3
controlled.shutdown.enable=true
controlled.shutdown.max.retries=3
controlled.shutdown.retry.backoff.ms=5000
controller.message.queue.size=10
controller.socket.timeout.ms=30000
default.replication.factor=3
fetch.purgatory.purge.interval.requests=10000
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=10
log.cleaner.backoff.ms=15000
log.cleaner.dedupe.buffer.size=524288000
log.cleaner.delete.retention.ms=86400000
log.cleaner.enable=false
log.cleaner.io.buffer.load.factor=0.9
log.cleaner.io.buffer.size=524288
log.cleaner.min.cleanable.ratio=0.5
log.cleaner.threads=1
log.cleanup.policy=delete
log.delete.delay.ms=60000
log.dirs=/data/kafka
log.flush.offset.checkpoint.interval.ms=60000
log.flush.scheduler.interval.ms=3000
log.index.interval.bytes=4096
log.index.size.max.bytes=10485760
log.retention.bytes=-1
log.retention.check.interval.ms=300000
log.retention.hours=168
log.retention.minutes=10080
log.roll.hours=168
log.segment.bytes=104857600
message.max.bytes=105906176
num.io.threads=8
num.network.threads=3
num.partitions=16
num.replica.fetchers=2
offset.metadata.max.bytes=1024
port=9092
producer.purgatory.purge.interval.requests=10000
queued.max.requests=500
replica.fetch.backoff.ms=5000
replica.fetch.max.bytes=105906176
replica.fetch.min.bytes=1
replica.fetch.wait.max.ms=5000
replica.high.watermark.checkpoint.interval.ms=5000
replica.lag.max.messages=4000
replica.lag.time.max.ms=60000
replica.socket.receive.buffer.bytes=65536
replica.socket.timeout.ms=30000
retention.ms=3600000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000
zookeeper.session.timeout.ms=6000
zookeeper.sync.time.ms=2000
On Wed, Aug 17, 2016 at 10:38 PM, Jun Rao <ju...@confluent.io> wrote:
> Jarko,
>
> Do you have many topic partitions? Currently, if #partitions *
> fetched_bytes in the response exceeds 2GB, we will get an integer overflow
> and weird things can happen. We are trying to address this better in
> KIP-74. If this is the issue, for now, you can try reducing the fetch size
> or increasing the replica fetch threads to work around the issue.
>
> Thanks,
>
> Jun
>
> On Wed, Aug 17, 2016 at 3:04 AM, J Mes <ja...@gmail.com> wrote:
>
> > Hello,
> >
> > I have a cluster of 3 nodes running kafka v.0.10.0.0. This cluster was
> > starter about a week ago with no data, no issues starting up.
> > Today we noticed 1 of the servers in the cluster did not work anymore, we
> > checked and indeed the server was not working anymore and all data was
> old.
> >
> > We restarted the node without data, thinking it should sync up and then
> > join the cluster again, but we keep getting the following error:
> >
> > [2016-08-17 12:02:23,620] WARN [ReplicaFetcherThread-0-1], Error in fetch
> > kafka.server.ReplicaFetcherThread$FetchRequest@62b3e70c (kafka.server.
> > ReplicaFetcherThread)
> > org.apache.kafka.common.protocol.types.SchemaException: Error reading
> > field 'responses': Error reading field 'partition_responses': Error
> reading
> > field 'record_set': Error reading bytes of size 104856430, only 18764961
> > bytes available
> > at org.apache.kafka.common.protocol.types.Schema.read(
> > Schema.java:73)
> > at org.apache.kafka.clients.NetworkClient.parseResponse(
> > NetworkClient.java:380)
> > at org.apache.kafka.clients.NetworkClient.
> handleCompletedReceives(
> > NetworkClient.java:449)
> > at org.apache.kafka.clients.NetworkClient.poll(
> > NetworkClient.java:269)
> > at kafka.utils.NetworkClientBlockingOps$.recursivePoll$2(
> > NetworkClientBlockingOps.scala:136)
> > at kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> > NetworkClientBlockingOps$$pollContinuously$extension(
> > NetworkClientBlockingOps.scala:143)
> > at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$
> > extension(NetworkClientBlockingOps.scala:80)
> > at kafka.server.ReplicaFetcherThread.sendRequest(
> > ReplicaFetcherThread.scala:244)
> > at kafka.server.ReplicaFetcherThread.fetch(
> > ReplicaFetcherThread.scala:229)
> > at kafka.server.ReplicaFetcherThread.fetch(
> > ReplicaFetcherThread.scala:42)
> > at kafka.server.AbstractFetcherThread.processFetchRequest(
> > AbstractFetcherThread.scala:107)
> > at kafka.server.AbstractFetcherThread.doWork(
> > AbstractFetcherThread.scala:98)
> > at kafka.utils.ShutdownableThread.run(
> ShutdownableThread.scala:63)
> >
> > All nodes are running the exact same version of zookepeer/kafka.
> >
> > When we clear all data from all nodes and start again, everything
> works...
> >
> > Any idea anyone?
> >
> > Kr,
> > Jarko Mesuere
>