You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Yu Bo (技术中心)" <bo...@sohu-inc.com> on 2014/01/13 15:18:13 UTC

[Kafka-users] badversion exception and unable to recover

Hi all
Because of badversion exception, the id 1 broker is not a leader. But the kafka thread on that server is already in work. The id 1 broker is only for replication.
I want to recover the id 1 broker and became a leader.
I had done :

1、  use the “kafka-reassign-partitions.sh” to remove the broker id 1 from replicas group. And the broker id 1 should be out of touch with kafka cluster.

But, the server.log of other servers are printing lots of error,like:

[2014-01-13 21:56:58,360] INFO Reconnect due to socket error: null (kafka.consumer.SimpleConsumer)

[2014-01-13 21:56:58,362] WARN [ReplicaFetcherThread-0-1], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 35321893; ClientId: ReplicaFetcherThread-0-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [co,4] -> PartitionFetchInfo(0,1048576) (kafka.server.ReplicaFetcherThread)

java.net.ConnectException: Connection refused

        at sun.nio.ch.Net.connect(Native Method)

        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:500)

        at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)

        at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)

        at kafka.consumer.SimpleConsumer.reconnect(SimpleConsumer.scala:57)

        at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79)

        at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)

        at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)


2、kill the kafka thread running on id 1 broker.

3、I start the kafka service on broker id 1, and it never add to replication group. Not to speak of becoming a leader.

The question is :   what should we do inorder to recover the kafka cluster to original?

Thanks.



Best regards,
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
于博  |  搜狐媒体大厦 精准广告研发中心 技术组  |  tel:61135569  15101511427 (H)  |  boyu@sohu-inc.com<ma...@sohu-inc.com>


Re: [Kafka-users] badversion exception and unable to recover

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Bo,

Doing 1 should be good enough under normal operations. Did you see any
errors/exceptions on the controller log regarding partition reassignment?

Guozhang



On Mon, Jan 13, 2014 at 6:18 AM, Yu Bo(技术中心) <bo...@sohu-inc.com> wrote:

> Hi all
> Because of badversion exception, the id 1 broker is not a leader. But the
> kafka thread on that server is already in work. The id 1 broker is only for
> replication.
> I want to recover the id 1 broker and became a leader.
> I had done :
>
> 1、  use the “kafka-reassign-partitions.sh” to remove the broker id 1 from
> replicas group. And the broker id 1 should be out of touch with kafka
> cluster.
>
> But, the server.log of other servers are printing lots of error,like:
>
> [2014-01-13 21:56:58,360] INFO Reconnect due to socket error: null
> (kafka.consumer.SimpleConsumer)
>
> [2014-01-13 21:56:58,362] WARN [ReplicaFetcherThread-0-1], Error in fetch
> Name: FetchRequest; Version: 0; CorrelationId: 35321893; ClientId:
> ReplicaFetcherThread-0-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes;
> RequestInfo: [co,4] -> PartitionFetchInfo(0,1048576)
> (kafka.server.ReplicaFetcherThread)
>
> java.net.ConnectException: Connection refused
>
>         at sun.nio.ch.Net.connect(Native Method)
>
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:500)
>
>         at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
>
>         at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
>
>         at kafka.consumer.SimpleConsumer.reconnect(SimpleConsumer.scala:57)
>
>         at
> kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79)
>
>         at
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
>
>         at
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)
>
>
> 2、kill the kafka thread running on id 1 broker.
>
> 3、I start the kafka service on broker id 1, and it never add to
> replication group. Not to speak of becoming a leader.
>
> The question is :   what should we do inorder to recover the kafka cluster
> to original?
>
> Thanks.
>
>
>
> Best regards,
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 于博  |  搜狐媒体大厦 精准广告研发中心 技术组  |  tel:61135569  15101511427 (H)  |
> boyu@sohu-inc.com<ma...@sohu-inc.com>
>
>


-- 
-- Guozhang