You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Wes Chow <we...@chartbeat.com> on 2015/04/03 19:06:24 UTC

partition reassignment

I'm in the process of reassigning partitions away from failing machines 
and it appears to be stuck. One thought is because our machines are 
failing at a very high rate and so some partitions no longer have any 
live replicas at all. At this point I don't care about the data, I just 
want to get all partitions onto the set of machines that I know work. Is 
there some way I can do this? I am happy to manipulate ZooKeeper and 
bounce nodes if need be.

And a warning... this is due to Amazon EC2 d2 instance type failures. We 
spun up 9 d2.xlarge instances and within a few hours 6 have failed under 
a Kafka workload. So yeah, bleeding edge.

One thing I've done is rebuilt one of these nodes with the same broker 
id and name but under a known working instance type. It came up and now 
is spewing this in the logs:

[2015-04-03 13:05:30,275] 805497 [kafka-request-handler-0] WARN  
kafka.server.KafkaApis  - [KafkaApi-29] Produce request with correlation 
id 5849 from client ping_partitioner on partition [pings,245] failed due 
to Topic pings either doesn't exist or is in the process of being deleted

The topic most certainly should exist, however I'm guessing it's 
complaining because there are no live replicas for that partition. Is 
there some way to get it to just become leader?

Wes