You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Frederic Girard <ex...@lotsys.com> on 2016/11/08 11:05:11 UTC

[kafka] Errors during failover

Hello,

We're planning to use kafka (0.10.1), so we tested it. I've done some fail-over tests, with unexpected results.

We get 3 servers, each one is running a kafka broker. We created 3 messages queues (MSG01, MSG02, MSG03).
Each message queue has only 1 partition, and has a replication factor of 3.

Topic:MSG01     PartitionCount:1        ReplicationFactor:3     Configs:
        Topic: MSG01    Partition: 0    Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
Topic:MSG02     PartitionCount:1        ReplicationFactor:3     Configs:
        Topic: MSG02    Partition: 0    Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
Topic:MSG03     PartitionCount:1        ReplicationFactor:3     Configs:
        Topic: MSG03    Partition: 0    Leader: 1       Replicas: 1,2,0 Isr: 1,2,0


Then we start to send messages to kafka and receive them using a jmeter script (~200 messages sent per second).

*  11:06:22 : PB02 kafka server is killed. (SIGTERM)
*  11:08:10 : PB02 kafka server is restarted 2 minutes later.
*  11:09:23 : 204 errors
*  11:13:35 : PB02 kafka server is killed (SIGTERM): 2 error
*  11:30:27 : PB02 kafka server is restarted 17 minutes later.
*  11:34:23 : 202 errors
*  11:47:20 : PB01 kafka server is killed. (SIGTERM)
*  11:52:05 : PB01 kafka server is restarted 5 minutes later.
*  11:56:02 : 15 errors
*  11:56:02 : PB02 kafka server is killed. (SIGTERM)
*  12:00:02 : PB02 kafka server is restarted 4 minutes later.
*  12:02:28 : 207 errors

When we shutdown a broker then restart it, nothing happens (maybe a few errors). But some minutes later, we get a lot of errors.
I've done this test many times, it always works this way.

When these errors happen, here's the log we get :

[2016-10-20 11:09:23,833] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [MSG01,0] (kafka.server.ReplicaFetcherManager)
[2016-10-20 11:09:23,833] INFO Truncating log MSG01-0 to offset 30117. (kafka.log.Log)
[2016-10-20 11:09:23,837] INFO [ReplicaFetcherManager on broker 2] Added fetcher for partitions List([[MSG01,0], initOffset 30117 to broker BrokerEndPoint(1,perf-bench-02,9092)] ) (kafka.server.ReplicaFetcherManager)
[2016-10-20 11:09:23,838] INFO [ReplicaFetcherThread-0-1], Starting  (kafka.server.ReplicaFetcherThread)
[2016-10-20 11:09:23,839] INFO [ReplicaFetcherThread-0-0], Shutting down (kafka.server.ReplicaFetcherThread)
[2016-10-20 11:09:23,840] INFO [ReplicaFetcherThread-0-0], Stopped  (kafka.server.ReplicaFetcherThread)
[2016-10-20 11:09:23,840] INFO [ReplicaFetcherThread-0-0], Shutdown completed (kafka.server.ReplicaFetcherThread)
[2016-10-20 11:09:23,845] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [MSG03,0] (kafka.server.ReplicaFetcherManager)
[2016-10-20 11:09:23,845] INFO Truncating log MSG03-0 to offset 29935. (kafka.log.Log)
[2016-10-20 11:09:23,847] INFO [ReplicaFetcherManager on broker 2] Added fetcher for partitions List([[MSG03,0], initOffset 29935 to broker BrokerEndPoint(1,perf-bench-02,9092)] ) (kafka.server.ReplicaFetcherManager)
[2016-10-20 11:09:23,851] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [MSG02,0] (kafka.server.ReplicaFetcherManager)
[2016-10-20 11:09:23,851] INFO Truncating log MSG02-0 to offset 30041. (kafka.log.Log)
[2016-10-20 11:09:23,852] INFO [ReplicaFetcherManager on broker 2] Added fetcher for partitions List([[MSG02,0], initOffset 30041 to broker BrokerEndPoint(1,perf-bench-02,9092)] ) (kafka.server.ReplicaFetcherManager)

It seems there's some kind of reorganization of the topics/partition, with offset truncated, could that be the cause of these many errors ?


Regards,
Frederic Girard.

-------------------------------------------------------------------------------------------------------------------
Ce message et les pièces jointes associées sont confidentiels et à l'attention exclusive des destinataires. Si vous avez reçu ce message par erreur, merci d'avertir l'administrateur de la messagerie: postmaster@lotsys.com
 
This email and files transmitted with it are confidential and intended solely for the use of the individual to whom they are addressed. If you have received this email in error, please notify the system manager: postmaster@lotsys.com
-------------------------------------------------------------------------------------------------------------------