You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Vanlerberghe, Luc" <Lu...@bvdinfo.com> on 2017/11/08 16:19:16 UTC

RE: [Possibly spoofed] Re: 0.11.0.1: ReplicaFetcherThread exceptions after clearing corrupted broker data and restarting

Thanks Ismael,

I hope this will solve the issue in the future.
For now, unless we find a workaround soon, we'll probably backup the data we have and rebuild the cluster from scratch...

Luc

-----Original Message-----
From: ismaelj@gmail.com [mailto:ismaelj@gmail.com] On Behalf Of Ismael Juma
Sent: woensdag 8 november 2017 16:57
To: Kafka Users <us...@kafka.apache.org>
Subject: [Possibly spoofed] Re: 0.11.0.1: ReplicaFetcherThread exceptions after clearing corrupted broker data and restarting

Hi Luc,

The first RC for 0.11.0.2 will be released this week.

Ismael

On Wed, Nov 8, 2017 at 3:33 PM, Vanlerberghe, Luc < Luc.Vanlerberghe@bvdinfo.com> wrote:

> Hi,
>
> We have a kafka setup with 6 brokers and topics having replication 
> factor
> 3 (single partition).
>
> After an improper shutdown, we had corrupted index files on two of our 
> production servers, causing "WARN Found a corrupted index file due to 
> requirement failed: Corrupt index found," messages and kafka shutting 
> down on startup with a "FATAL Exiting Kafka.(kafka.server.KafkaServerStartable)"
> message.
>
> All topics are still accessible, but unfortunately the most important 
> one has only a single ISR left.
>
> We decided to clear all kafka data and restart the brokers believing 
> they would fetch all needed data back from the leader to become 
> in-sync again, but on startup we see the following messages in the log 
> (repeating at an alarming rate) WARN [ReplicaFetcherThread-0-3]: 
> Replica 4 for partition <topic>-0 reset its fetch offset from 0 to 
> current leader 3's start offset 0 (kafka.server.
> ReplicaFetcherThread)
> ERROR [ReplicaFetcherThread-0-3]: Current offset 0 for partition 
> [<topic>,0] out of range; reset offset to 0 (kafka.server.
> ReplicaFetcherThread)
>
> This looks to me as a similar problem as https://issues.apache.org/
> jira/browse/KAFKA-6003
>
> While trying to reassign a topic that had lost one of its ISRs (I kept 
> the existing ISRs, but deleted the failing broker and added an 
> existing one) we got the same messages on that existing broker.
>
> [2017-11-08 16:21:30,893] WARN [ReplicaFetcherThread-0-1]: Replica 5 
> for partition <topic>-0 reset its fetch offset from 0 to current 
> leader 1's start offset 0 (kafka.server.ReplicaFetcherThread)
> [2017-11-08 16:21:30,893] ERROR [ReplicaFetcherThread-0-1]: Current 
> offset
> 0 for partition [<topic>,0] out of range; reset offset to 0 (kafka.server.
> ReplicaFetcherThread)
>
> This is even more annoying since I don't want to shut down that broker 
> as well and it generates about 800M logs per hour (fortunately only 
> about 100M
> compressed)
>
> Does anybody have a clue what's going on and how to fix it?
> If the fix in 0.11.0.2 would solve our issue, how soon can we expect 
> the release (if at all)
>
> Thanks,
>
> Luc
>
>