You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "John Fung (JIRA)" <ji...@apache.org> on 2012/11/11 23:45:11 UTC

[jira] [Commented] (KAFKA-607) System Test Transient Failure (case 4011 Log Retention) - ConsoleConsumer receives less data

    [ https://issues.apache.org/jira/browse/KAFKA-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495024#comment-13495024 ] 

John Fung commented on KAFKA-607:
---------------------------------

• This issue is transient and is showing randomly in this functional test group. The test case is failing because ConsoleConsumer is receiving less data than producer produced.

• Transient Failure Testcase: 4011

• Functional Test Group: Log Retention (Size) / 2 topics / 2 partitions / Replica Factor = 2 / No. of brokers = 3 / One Leader bouncing
     Unique messages from consumer on [test_1]  :  500
     Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log  :  500
     Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log  :  0
     Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log  :  500
     Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log  :  500
     Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log  :  500
     Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log  :  0
     Unique messages from consumer on [test_2]  :  295
     Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r1.log  :  500
     Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r2.log  :  0
     Unique messages from consumer on [test_2] at simple_consumer_test_2-0_r3.log  :  500
     Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r1.log  :  500
     Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r2.log  :  500
     Unique messages from consumer on [test_2] at simple_consumer_test_2-1_r3.log  :  0
     Unique messages from producer on [test_1]  :  1000
     Unique messages from producer on [test_2]  :  1000
     Validate for data matched on topic [test_1]  :  FAILED
     Validate for data matched on topic [test_1] across replicas  :  PASSED
     Validate for data matched on topic [test_2]  :  FAILED
     Validate for data matched on topic [test_2] across replicas  :  PASSED
     Validate leader election successful  :  PASSED


• By checking the broker and ConsoleConsumer log4j messages, it appears that when leader is stopped, ConsoleConsumer is disconnected from Zookeeper.

• Broker log4j messages:

2012-11-10 04:52:52,857 - INFO - found leader in entity [3] with brokerid [3] for partition [0] (kafka_system_test_utils)
2012-11-10 04:52:52,857 - INFO - stopping leader in entity 3 with pid 16146 (kafka_system_test_utils)
2012-11-10 04:52:53,036 - DEBUG - executing command [ssh ela4-app0999.prod 'kill -15 16148'] (system_test_utils)
2012-11-10 04:52:53,055 - DEBUG - executing command [ssh ela4-app0999.prod 'kill -15 16146'] (system_test_utils)

• Log4j messages in the Broker that was bounced:

[2012-11-10 04:52:53,200] INFO Replica Manager on Broker 3: Shutted down completely (kafka.server.ReplicaManager)
[2012-11-10 04:52:53,201] INFO [Socket Server on Broker 3], shutting down (kafka.network.SocketServer)
[2012-11-10 04:52:53,205] INFO [Socket Server on Broker 3], shutted down completely (kafka.network.SocketServer)
[2012-11-10 04:52:53,212] INFO [Kafka Server 3], shut down completed (kafka.server.KafkaServer)

[2012-11-10 04:53:09,497] INFO Started Kafka CSV metrics reporter with polling period 5 seconds (kafka.metrics.KafkaCSVMetricsReporter)
[2012-11-10 04:53:09,506] INFO [Kafka Server 3], starting (kafka.server.KafkaServer)
[2012-11-10 04:53:09,535] INFO [Log Manager on Broker 3] Loading log 'test_2-0' (kafka.log.LogManager)
[2012-11-10 04:53:09,579] INFO [Log Manager on Broker 3] Loading log 'test_1-0' (kafka.log.LogManager)
[2012-11-10 04:53:09,583] INFO [Log Manager on Broker 3] Starting log cleaner every 60000 ms (kafka.log.LogManager)
[2012-11-10 04:53:09,588] INFO [Log Manager on Broker 3] Starting log flusher every 1000 ms with the following overrides Map() (kafka.log.LogManager)
. . .
[2012-11-10 04:53:09,692] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2012-11-10 04:53:09,694] INFO Registering broker /brokers/ids/3 (kafka.server.KafkaZooKeeper)
[2012-11-10 04:53:09,705] INFO Registering broker /brokers/ids/3 succeeded with id:3,creatorId:172.17.166.125-1352523189694,host:172.17.166.125,port:9093 (kafka.utils.ZkUtils$)
[2012-11-10 04:53:09,705] INFO [Kafka Server 3], Connecting to ZK: ela4-app0996.prod:2188 (kafka.server.KafkaServer)
[2012-11-10 04:53:09,772] INFO Will not load MX4J, mx4j-tools.jar is not in the classpath (kafka.utils.Mx4jLoader$)
[2012-11-10 04:53:09,774] INFO [Controller 3]: Controller starting up (kafka.controller.KafkaController)
[2012-11-10 04:53:09,781] ERROR [KafkaApi-3] error when processing request (test_2,0,150,1048576) (kafka.server.KafkaApis)
kafka.common.UnknownTopicOrPartitionException: Topic test_2 partition 0 doesn't exist on 3
        at kafka.server.ReplicaManager.getLeaderReplicaIfLocal(ReplicaManager.scala:142)
        at kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSet(KafkaApis.scala:363)
        at kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:329)
        at kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:325)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
        at scala.collection.immutable.Map$Map1.map(Map.scala:93)
        at kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSets(KafkaApis.scala:325)
        at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:293)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:58)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:41)
        at java.lang.Thread.run(Thread.java:619)[2012-11-10 04:53:09,781] ERROR [KafkaApi-3] error when processing request (test_1,0,250,1048576) (kafka.server.KafkaApis)
kafka.common.UnknownTopicOrPartitionException: Topic test_1 partition 0 doesn't exist on 3
        at kafka.server.ReplicaManager.getLeaderReplicaIfLocal(ReplicaManager.scala:142)
        at kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSet(KafkaApis.scala:363)
        at kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:329)        at kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:325)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)        at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
        at scala.collection.immutable.Map$Map1.map(Map.scala:93)
        at kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSets(KafkaApis.scala:325)        at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:293)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:58)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:41)
        at java.lang.Thread.run(Thread.java:619)
[2012-11-10 04:53:09,806] INFO conflict in /controller data: 3 stored data: 1 (kafka.utils.ZkUtils$)
[2012-11-10 04:53:09,808] INFO [Controller 3]: Controller startup complete (kafka.controller.KafkaController)
[2012-11-10 04:53:09,825] INFO [Kafka Server 3], started (kafka.server.KafkaServer)
[2012-11-10 04:53:09,826] INFO Verifying properties (kafka.utils.VerifiableProperties)

• ConsoleConsumer log4j messages:

[2012-11-10 04:52:53,550] INFO Reconnect due to socket error:  (kafka.consumer.SimpleConsumer)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
        at kafka.utils.Utils$.read(Utils.scala:388)
        at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
. . .
[2012-11-10 04:52:53,558] INFO Fetching metadata for topic Set(test_1) (kafka.client.ClientUtils$)
[2012-11-10 04:52:53,558] INFO Connected to 172.17.166.123:9091 for producing (kafka.producer.SyncProducer)
[2012-11-10 04:52:53,582] INFO Disconnecting from 172.17.166.123:9091 (kafka.producer.SyncProducer)
[2012-11-10 04:53:36,054] ERROR Error processing message, stopping consumer:  (kafka.consumer.ConsoleConsumer$)
kafka.consumer.ConsumerTimeoutException
        at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:67)
        at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
. . .
[2012-11-10 04:53:36,055] INFO [console-consumer-84076_ela4-app0997.prod-1352523123516-afb2c948], ZKConsumerConnector shutting down (kafka.consumer.ZookeeperConsum
erConnector)
[2012-11-10 04:53:36,055] INFO Shutting down topic event watcher. (kafka.consumer.ZookeeperTopicEventWatcher)
[2012-11-10 04:53:36,055] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)[2012-11-10 04:53:36,057] INFO Session: 0x13ae8a88aae000a closed (org.apache.zookeeper.ZooKeeper)
[2012-11-10 04:53:36,057] INFO EventThread shut down (org.apache.zookeeper.ClientCnxn)[2012-11-10 04:53:36,058] INFO Forcing shutdown of Kafka scheduler (kafka.utils.KafkaScheduler)
[2012-11-10 04:53:36,059] INFO [ConsumerFetcherManager-1352523123614] shutting down (kafka.consumer.ConsumerFetcherManager)
[2012-11-10 04:53:36,059] INFO [console-consumer-84076_ela4-app0997.prod-1352523123516-afb2c948-leader-finder-thread], Shutting down (kafka.consumer.ConsumerFetche
rManager$$anon$1)
[2012-11-10 04:53:36,703] INFO [console-consumer-84076_ela4-app0997.prod-1352523123516-afb2c948], stopping watcher executor thread for consumer console-consumer-84
076_ela4-app0997.prod-1352523123516-afb2c948 (kafka.consumer.ZookeeperConsumerConnector)
[2012-11-10 04:55:02,312] INFO Reconnect due to socket error:  (kafka.consumer.SimpleConsumer)
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
        at kafka.utils.Utils$.read(Utils.scala:388)
        at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
. . .
[2012-11-10 04:55:03,252] INFO Unable to read additional data from server sessionid 0x13ae8a88aae0005, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2012-11-10 04:55:03,353] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)

                
> System Test Transient Failure (case 4011 Log Retention) - ConsoleConsumer receives less data
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-607
>                 URL: https://issues.apache.org/jira/browse/KAFKA-607
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: John Fung
>         Attachments: testcase_4011_data_and_log4j.tar.gz
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira