You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Qi Xu <sh...@gmail.com> on 2016/03/23 02:14:15 UTC

Fallout from upgrading to kafka 0.9.0.0 from 0.8.2.1

Hi folks, Rajiv, Jun,
I'd like to bring up this thread again from Rajiv Kurian 3 months ago.
Basically we did the same thing as Rajiv did. I upgraded two machines (out
of 10) from 0.8.2.1 to 0.9. SO after the upgrade, there will be 2 machines
in 0.9 and 8 machines in 0.8.2.1. And initially it all works fine. But
after about 2 hours, all old uploaders and consumers are broken due to no
leader found for all partitions of all topics. The producer just complains
"unknown error for topic xxx when it tries to refresh the metadata". And in
server side there's some error complaining no leader for a partition.
I'm wondering is there any known issue about 0.9 and 0.8.2 co-existing
version in the same cluster? Thanks a lot.


Below is the original thread:

We had to revert to 0.8.3 because three of our topics seem to have gotten
corrupted during the upgrade. As soon as we did the upgrade producers to
the three topics I mentioned stopped being able to do writes. The clients
complained (occasionally) about leader not found exceptions. We restarted
our clients and brokers but that didn't seem to help. Actually even after
reverting to 0.8.3 these three topics were broken. To fix it we had to stop
all clients, delete the topics, create them again and then restart the
clients.

I realize this is not a lot of info. I couldn't wait to get more debug info
because the cluster was actually being used. Has any one run into something
like this? Are there any known issues with old consumers/producers. The
topics that got busted had clients writing to them using the old Java
wrapper over the Scala producer.

Here are the steps I took to upgrade.

For each broker:

1. Stop the broker.
2. Restart with the *0.9* broker running with
inter.broker.protocol.version=*0.8.2*.X
3. Wait for under replicated partitions to go down to 0.
4. Go to step 1.
Once all the brokers were running the *0.9* code with
inter.broker.protocol.version=*0.8.2*.X we restarted them one by one with
inter.broker.protocol.version=0.9.0.0

When reverting I did the following.

For each broker.

1. Stop the broker.
2. Restart with the *0.9* broker running with
inter.broker.protocol.version=*0.8.2*.X
3. Wait for under replicated partitions to go down to 0.
4. Go to step 1.

Once all the brokers were running *0.9* code with
inter.broker.protocol.version=*0.8.2*.X  I restarted them one by one with
the
0.8.2.3 broker code. This however like I mentioned did not fix the three
broken topics.

Re: Fallout from upgrading to kafka 0.9.0.0 from 0.8.2.1

Posted by Qi Xu <sh...@gmail.com>.
More information about the issue:
When the issue happens, the controller is always on the 0.9 version Kafka
broker.
In server.log of other brokers, we can see this kind of error:
[2016-03-23 22:36:02,814] ERROR [ReplicaFetcherThread-0-5], Error for
partition [topic,208] to broker
5:org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.
(kafka.server.ReplicaFetcherThread)

And after restart that controller, everything works again.


On Tue, Mar 22, 2016 at 6:14 PM, Qi Xu <sh...@gmail.com> wrote:

> Hi folks, Rajiv, Jun,
> I'd like to bring up this thread again from Rajiv Kurian 3 months ago.
> Basically we did the same thing as Rajiv did. I upgraded two machines (out
> of 10) from 0.8.2.1 to 0.9. SO after the upgrade, there will be 2 machines
> in 0.9 and 8 machines in 0.8.2.1. And initially it all works fine. But
> after about 2 hours, all old uploaders and consumers are broken due to no
> leader found for all partitions of all topics. The producer just complains
> "unknown error for topic xxx when it tries to refresh the metadata". And in
> server side there's some error complaining no leader for a partition.
> I'm wondering is there any known issue about 0.9 and 0.8.2 co-existing
> version in the same cluster? Thanks a lot.
>
>
> Below is the original thread:
>
> We had to revert to 0.8.3 because three of our topics seem to have gotten
> corrupted during the upgrade. As soon as we did the upgrade producers to
> the three topics I mentioned stopped being able to do writes. The clients
> complained (occasionally) about leader not found exceptions. We restarted
> our clients and brokers but that didn't seem to help. Actually even after
> reverting to 0.8.3 these three topics were broken. To fix it we had to stop
> all clients, delete the topics, create them again and then restart the
> clients.
>
> I realize this is not a lot of info. I couldn't wait to get more debug info
> because the cluster was actually being used. Has any one run into something
> like this? Are there any known issues with old consumers/producers. The
> topics that got busted had clients writing to them using the old Java
> wrapper over the Scala producer.
>
> Here are the steps I took to upgrade.
>
> For each broker:
>
> 1. Stop the broker.
> 2. Restart with the *0.9* broker running with
> inter.broker.protocol.version=*0.8.2*.X
> 3. Wait for under replicated partitions to go down to 0.
> 4. Go to step 1.
> Once all the brokers were running the *0.9* code with
> inter.broker.protocol.version=*0.8.2*.X we restarted them one by one with
> inter.broker.protocol.version=0.9.0.0
>
> When reverting I did the following.
>
> For each broker.
>
> 1. Stop the broker.
> 2. Restart with the *0.9* broker running with
> inter.broker.protocol.version=*0.8.2*.X
> 3. Wait for under replicated partitions to go down to 0.
> 4. Go to step 1.
>
> Once all the brokers were running *0.9* code with
> inter.broker.protocol.version=*0.8.2*.X  I restarted them one by one with
> the
> 0.8.2.3 broker code. This however like I mentioned did not fix the three
> broken topics.
>

Re: Fallout from upgrading to kafka 0.9.0.0 from 0.8.2.1

Posted by Qi Xu <sh...@gmail.com>.
More information about the issue:
When the issue happens, the controller is always on the 0.9 version Kafka
broker.
In server.log of other brokers, we can see this kind of error:
[2016-03-23 22:36:02,814] ERROR [ReplicaFetcherThread-0-5], Error for
partition [topic,208] to broker
5:org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.
(kafka.server.ReplicaFetcherThread)

And after restart that controller, everything works again.


On Tue, Mar 22, 2016 at 6:14 PM, Qi Xu <sh...@gmail.com> wrote:

> Hi folks, Rajiv, Jun,
> I'd like to bring up this thread again from Rajiv Kurian 3 months ago.
> Basically we did the same thing as Rajiv did. I upgraded two machines (out
> of 10) from 0.8.2.1 to 0.9. SO after the upgrade, there will be 2 machines
> in 0.9 and 8 machines in 0.8.2.1. And initially it all works fine. But
> after about 2 hours, all old uploaders and consumers are broken due to no
> leader found for all partitions of all topics. The producer just complains
> "unknown error for topic xxx when it tries to refresh the metadata". And in
> server side there's some error complaining no leader for a partition.
> I'm wondering is there any known issue about 0.9 and 0.8.2 co-existing
> version in the same cluster? Thanks a lot.
>
>
> Below is the original thread:
>
> We had to revert to 0.8.3 because three of our topics seem to have gotten
> corrupted during the upgrade. As soon as we did the upgrade producers to
> the three topics I mentioned stopped being able to do writes. The clients
> complained (occasionally) about leader not found exceptions. We restarted
> our clients and brokers but that didn't seem to help. Actually even after
> reverting to 0.8.3 these three topics were broken. To fix it we had to stop
> all clients, delete the topics, create them again and then restart the
> clients.
>
> I realize this is not a lot of info. I couldn't wait to get more debug info
> because the cluster was actually being used. Has any one run into something
> like this? Are there any known issues with old consumers/producers. The
> topics that got busted had clients writing to them using the old Java
> wrapper over the Scala producer.
>
> Here are the steps I took to upgrade.
>
> For each broker:
>
> 1. Stop the broker.
> 2. Restart with the *0.9* broker running with
> inter.broker.protocol.version=*0.8.2*.X
> 3. Wait for under replicated partitions to go down to 0.
> 4. Go to step 1.
> Once all the brokers were running the *0.9* code with
> inter.broker.protocol.version=*0.8.2*.X we restarted them one by one with
> inter.broker.protocol.version=0.9.0.0
>
> When reverting I did the following.
>
> For each broker.
>
> 1. Stop the broker.
> 2. Restart with the *0.9* broker running with
> inter.broker.protocol.version=*0.8.2*.X
> 3. Wait for under replicated partitions to go down to 0.
> 4. Go to step 1.
>
> Once all the brokers were running *0.9* code with
> inter.broker.protocol.version=*0.8.2*.X  I restarted them one by one with
> the
> 0.8.2.3 broker code. This however like I mentioned did not fix the three
> broken topics.
>