You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "John Fung (JIRA)" <ji...@apache.org> on 2012/11/11 23:03:12 UTC

[jira] [Commented] (KAFKA-606) System Test Transient Failure (case 0302 GC Pause) - Log segments mismatched across replicas

    [ https://issues.apache.org/jira/browse/KAFKA-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495012#comment-13495012 ] 

John Fung commented on KAFKA-606:
---------------------------------

• This issue is transient and is showing randomly in this functional test group. The test case is failing because the log segment checksums are not matching across the replicas. When a DumpLogSegment is run on one of the log segment files and indexes, it gives error:
    Non-secutive offsets in :00000000000000000405.log
    Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer

• Transient Failure Testcase: 0302, 0308

• Functional Test Group: Replication with Simulated Leader GC Pause / 1 topic / 3 partitions / Replica Factor = 3 / No. of brokers = 3

• validation_status  : 
     Leader Election Latency MAX  :  None
     Leader Election Latency MIN  :  None
     Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log  :  971
     Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log  :  967
     Unique messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log  :  971
     Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log  :  961
     Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log  :  961
     Unique messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log  :  961
     Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r1.log  :  825
     Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r2.log  :  825
     Unique messages from consumer on [test_1] at simple_consumer_test_1-2_r3.log  :  825
     Validate for data matched on topic [test_1] across replicas  :  FAILED
     Validate for merged log segment checksum in cluster [source]  :  FAILED

• Failure is due to log segment checksum doesn't match.

• Leader with brokerID 2 is GC paused:

2012-11-11 04:03:50,967 - DEBUG - brokerid: [2] entity_id: [2] (kafka_system_test_utils)
2012-11-11 04:03:50,968 - INFO - ======================================================
2012-11-11 04:03:50,968 - INFO - Found leader with entity id: 2
2012-11-11 04:03:50,968 - INFO - ======================================================
2012-11-11 04:03:50,968 - DEBUG - executing command [ssh ela4-app0998.prod 'pid=17256; prev_pid=""; echo $pid; while [[ "x$pid" != "x" ]]; do prev_pid=$pid;   for child in $(ps -o pid,ppid ax | awk "{ if ( \$2 == $pid ) { print \$1 }}");     do ech
o $child; pid=$child;   done;   if [ $prev_pid == $pid ]; then     break;   fi; done' 2> /dev/null (system_test_utils)
2012-11-11 04:03:51,120 - DEBUG - executing command [ssh ela4-app0998.prod 'kill -SIGSTOP 17258'] (system_test_utils)2012-11-11 04:03:51,136 - DEBUG - executing command [ssh ela4-app0998.prod 'kill -SIGSTOP 17256'] (system_test_utils)
2012-11-11 04:03:56,159 - DEBUG - executing command [ssh ela4-app0998.prod 'kill -SIGCONT 17256'] (system_test_utils)
2012-11-11 04:03:56,176 - DEBUG - executing command [ssh ela4-app0998.prod 'kill -SIGCONT 17258'] (system_test_utils)

• Log segment files in all brokers

kafka@ela4-app0996 [~/kafka_hudson_test_output_logs/test_1352592075/testcase_0302/logs] ./ls_log_segments.sh 

-rw-rw-r-- 1 kafka kafka 105966 Nov 11 05:30 broker-1/kafka_server_1_logs/test_1-0/00000000000000000000.log
-rw-rw-r-- 1 kafka kafka 105444 Nov 11 05:30 broker-1/kafka_server_1_logs/test_1-0/00000000000000000203.log
-rw-rw-r-- 1 kafka kafka 103878 Nov 11 05:30 broker-1/kafka_server_1_logs/test_1-0/00000000000000000405.log
-rw-rw-r-- 1 kafka kafka 103878 Nov 11 05:30 broker-1/kafka_server_1_logs/test_1-0/00000000000000000604.log
-rw-rw-r-- 1 kafka kafka  87696 Nov 11 05:30 broker-1/kafka_server_1_logs/test_1-0/00000000000000000803.log
. . .
======================================================

-rw-rw-r-- 1 kafka kafka 105966 Nov 11 05:30 broker-2/kafka_server_2_logs/test_1-0/00000000000000000000.log
-rw-rw-r-- 1 kafka kafka 105444 Nov 11 05:30 broker-2/kafka_server_2_logs/test_1-0/00000000000000000203.log
-rw-rw-r-- 1 kafka kafka 105444 Nov 11 05:30 broker-2/kafka_server_2_logs/test_1-0/00000000000000000405.log
-rw-rw-r-- 1 kafka kafka 114840 Nov 11 05:30 broker-2/kafka_server_2_logs/test_1-0/00000000000000000611.log    
-rw-rw-r-- 1 kafka kafka  73080 Nov 11 05:30 broker-2/kafka_server_2_logs/test_1-0/00000000000000000831.log
. . .
======================================================

-rw-rw-r-- 1 kafka kafka 105966 Nov 11 05:30 broker-3/kafka_server_3_logs/test_1-0/00000000000000000000.log
-rw-rw-r-- 1 kafka kafka 105444 Nov 11 05:30 broker-3/kafka_server_3_logs/test_1-0/00000000000000000203.log
-rw-rw-r-- 1 kafka kafka 103878 Nov 11 05:30 broker-3/kafka_server_3_logs/test_1-0/00000000000000000405.log
-rw-rw-r-- 1 kafka kafka 103878 Nov 11 05:30 broker-3/kafka_server_3_logs/test_1-0/00000000000000000604.log
-rw-rw-r-- 1 kafka kafka  87696 Nov 11 05:30 broker-3/kafka_server_3_logs/test_1-0/00000000000000000803.log
. . .

• DumpLogSegment is showing Exception:

kafka@ela4-app0996 [/mnt/u001/kafka_08_replication_system_test] bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files /export/home/kafka/kafka_hudson_test_output_logs/test_1352592075/testcase_0302/logs/broker-2/kafka_server_2_logs/test_1-0/00000000000000000405.log,/export/home/kafka/kafka_hudson_test_output_logs/test_1352592075/testcase_0302/logs/broker-2/kafka_server_2_logs/test_1-0/00000000000000000405.index 
Dumping /export/home/kafka/kafka_hudson_test_output_logs/test_1352592075/testcase_0302/logs/broker-2/kafka_server_2_logs/test_1-0/00000000000000000405.log
Starting offset: 405
offset: 405 position: 0 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 2241529125
offset: 406 position: 522 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 2398559000
offset: 407 position: 1044 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 2470656351
offset: 408 position: 1566 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 3047120790
. . .
offset: 606 position: 102834 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 1253807556
offset: 607 position: 103356 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 1818751757
offset: 608 position: 103878 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 1711972973
offset: 609 position: 104400 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 1980328956
offset: 610 position: 104922 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 1566762723
Dumping /export/home/kafka/kafka_hudson_test_output_logs/test_1352592075/testcase_0302/logs/broker-2/kafka_server_2_logs/test_1-0/00000000000000000405.index
offset: 504 position: 49590
offset: 520 position: 57942
offset: 541 position: 68904
offset: 555 position: 76212
offset: 576 position: 87174
offset: 597 position: 98136
Non-secutive offsets in :00000000000000000405.log
Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
	at scala.runtime.BoxesRunTime.unboxToInt(Unknown Source)
	at scala.Tuple2._1$mcI$sp(Tuple2.scala:23)
	at kafka.tools.DumpLogSegments$$anonfun$main$3$$anonfun$apply$2.apply(DumpLogSegments.scala:85)
	at kafka.tools.DumpLogSegments$$anonfun$main$3$$anonfun$apply$2.apply(DumpLogSegments.scala:84)
	at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
	at scala.collection.immutable.List.foreach(List.scala:45)
	at kafka.tools.DumpLogSegments$$anonfun$main$3.apply(DumpLogSegments.scala:84)
	at kafka.tools.DumpLogSegments$$anonfun$main$3.apply(DumpLogSegments.scala:81)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
	at scala.collection.Iterator$class.foreach(Iterator.scala:631)
	at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
	at kafka.tools.DumpLogSegments$.main(DumpLogSegments.scala:81)
	at kafka.tools.DumpLogSegments.main(DumpLogSegments.scala)

                
> System Test Transient Failure (case 0302 GC Pause) - Log segments mismatched across replicas
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-606
>                 URL: https://issues.apache.org/jira/browse/KAFKA-606
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: John Fung
>         Attachments: testcase_0302_data_and_log4j.tar.gz
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira