You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "John Fung (JIRA)" <ji...@apache.org> on 2012/10/15 00:45:02 UTC

[jira] [Created] (KAFKA-573) System Test : Leader Failure Data Loss When request-num-acks is 1

John Fung created KAFKA-573:
-------------------------------

             Summary: System Test : Leader Failure Data Loss When request-num-acks is 1
                 Key: KAFKA-573
                 URL: https://issues.apache.org/jira/browse/KAFKA-573
             Project: Kafka
          Issue Type: Bug
            Reporter: John Fung


• Test Description:

1. Start a 3-broker cluster as source
2. Send messages to source cluster
3. Find leader and terminate it (kill -15)
4. Start the broker again
5. Start a consumer to consume data
6. Compare the MessageID in the data between producer log and consumer log.

• Issue: There will be data loss if request-num-acks is set to 1. 

• To reproduce this issue, please do the followings:

1. Download the latest 0.8 branch
2. Apply the patch attached to this JIRA
3. Build kafka by running "./sbt update package"
4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
5. This test will execute testcase_2 with the following settings:
    Replica factor : 3
    No. of partitions : 1
    No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479393#comment-13479393 ] 

Joel Koshy commented on KAFKA-573:
----------------------------------

+1

This must have been a pain to track down. Minor comment: In FileMessageSet, since we always open mutable, can you fix the comment (from javadoc) that says it can be opened immutable. Also, the info log reads a bit odd.
                
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>            Assignee: Jun Rao
>             Fix For: 0.8
>
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573.patch, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "Neha Narkhede (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479429#comment-13479429 ] 

Neha Narkhede commented on KAFKA-573:
-------------------------------------

+1, this one was fun to track down. Reminded me of the FileChannel truncate bug.

Minor comment - Probably best to delete the info statement - 
 info("After changed to position %d with size %d".format(channel.position(), channel.size()))

                
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>            Assignee: Jun Rao
>             Fix For: 0.8
>
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573.patch, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "John Fung (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475966#comment-13475966 ] 

John Fung commented on KAFKA-573:
---------------------------------

If ack is set to 1, we actually don't guarantee no data loss. This is because when the client receives an ack, data is only guaranteed to be in the leader, but not necessarily in other replicas. So, if a leader is bounced, some acked data could be lost. 

Nevertheless, merged checksums should still match among all replicas. 
                
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: John Fung
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "John Fung (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Fung updated KAFKA-573:
----------------------------

    Summary: System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1  (was: System Test : Leader Failure Data Loss When request-num-acks is 1)
    
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: John Fung
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao closed KAFKA-573.
-------------------------

    
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>            Assignee: Jun Rao
>             Fix For: 0.8
>
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573.patch, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-573) System Test : Leader Failure Data Loss When request-num-acks is 1

Posted by "John Fung (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Fung updated KAFKA-573:
----------------------------

    Attachment: kafka-573-reproduce-issue.patch
    
> System Test : Leader Failure Data Loss When request-num-acks is 1
> -----------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: John Fung
>         Attachments: kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "John Fung (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479092#comment-13479092 ] 

John Fung commented on KAFKA-573:
---------------------------------

Thanks Jun for the patch. The patch is applied together with the reproducing issue patch to the latest 0.8 branch and it is working correctly now.
                
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>            Assignee: Jun Rao
>             Fix For: 0.8
>
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573.patch, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-573:
--------------------------

    Attachment: kafka-573.patch

Attach a patch. There are 2 problems. The first one is the most severe one. We recently changed FileMessageSet to remove the mutable flag. As a result, everytime a new FileMessageSet is created, the constructor sets the file channel's position to the end of the file. What's happening is that while a file channel is being appended for newly produced data, the file position is moved by FileMessageSet created for fetch requests. Since they are not properly synchronized, occasionally, a message in the log is overwritten. The second issue is that in ByteBufferMessageSet.writeTo. We try to reset the buffer position after writing the data in the buffer to the channel. However, since there is no guarantee that the whole buffer will be written to the channel in a single write, resetting the buffer position could cause incorrect bytes being written to the channel.

The patch fixes both issues. The changes are: (1) Added a new flag in the constructor of FileMessageSet to control whether the channel position is set to the end of the file or not. (2) Changed ByteBufferMessageSet.writeTo so that we wait until the whole buffer is written to the channel before resetting the buffer position. (3) Added a few more logging that I found useful while investigating the issues.

The system test passes now.
                
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>             Fix For: 0.8
>
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573.patch, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "John Fung (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479328#comment-13479328 ] 

John Fung commented on KAFKA-573:
---------------------------------

A full regression test (35 test cases) has been launched with this patch and the following is the summary of the results:

1. There are all together 5 failures in 5 test cases out of 35 cases.

    * testcase_0103 : 2 out of 2300 msg lost (checksum matched) - failure case with Ack == 1 in Sync mode (1 topic, 1 partition)
    * testcase_0105 : 2 out of 2300 msg lost (checksum matched) - failure case with Ack == 1 in Async mode (1 topic, 1 partition)
    * testcase_0114 : 11 out of 5100 msg lost (checksum matched) - failure case with Ack == 1 in Async mode (1 topic, 3 partitions)
    * testcase_0117 : 44 out of 5100 msg lost (checksum matched) - failure case with Ack == 1 in Sync mode (1 topic, 3 partitions)
    * testcase_0122 : 1524 out of 4600 msg lost (checksum matched) - failure case with Ack == 1 in Sync mode ( 2 topics, 3 partitions)

2. The results suggest that the fix for KAFKA-573 is working well (minor data loss in the first 4 cases are expected in leader failure cases when ack == 1).

3. The result in testcase_0122 could be related to a different issue and a new JIRA will be created to keep track of that.

                
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>            Assignee: Jun Rao
>             Fix For: 0.8
>
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573.patch, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-573:
--------------------------

        Fix Version/s: 0.8
    Affects Version/s: 0.8
               Status: Patch Available  (was: Open)
    
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>             Fix For: 0.8
>
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573.patch, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao reassigned KAFKA-573:
-----------------------------

    Assignee: Jun Rao
    
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>            Assignee: Jun Rao
>             Fix For: 0.8
>
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573.patch, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-573) System Test : Leader Failure Data Loss When request-num-acks is 1

Posted by "John Fung (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475919#comment-13475919 ] 

John Fung commented on KAFKA-573:
---------------------------------

These are some manual verification of the differences in the log segment files:

• Merging the log segment files for each broker:
  $ for i in `find kafka_server_1_logs/ -name '0*.log' | sort`; do cat $i >> broker-1-merged/00000000000000000000.log; done
  $ for i in `find kafka_server_2_logs/ -name '0*.log' | sort`; do cat $i >> broker-2-merged/00000000000000000000.log; done
  $ for i in `find kafka_server_3_logs/ -name '0*.log' | sort`; do cat $i >> broker-3-merged/00000000000000000000.log; done

• Verify the checksum of each merged log segment and they are different:
  $ cksum broker-1-merged/00000000000000000000.log 
  1742950004 1638036 broker-1-merged/00000000000000000000.log
  $ cksum broker-2-merged/00000000000000000000.log 
  2050258314 1639080 broker-2-merged/00000000000000000000.log
  $ cksum broker-3-merged/00000000000000000000.log 
  1802214049 1639080 broker-3-merged/00000000000000000000.log

• Get the dump log segments of the merged files:
  $ bin/kafka-run-class.sh kafka.tools.DumpLogSegments broker-1-merged/00000000000000000000.log > broker-1-dump-log-segment.log
  $ bin/kafka-run-class.sh kafka.tools.DumpLogSegments broker-2-merged/00000000000000000000.log > broker-2-dump-log-segment.log
  $ bin/kafka-run-class.sh kafka.tools.DumpLogSegments broker-3-merged/00000000000000000000.log > broker-3-dump-log-segment.log

• Diff the dump log segment between broker-1 & broker-2:

$ diff broker-1-dump-log-segment.log broker-2-dump-log-segment.log 

1c1
< Dumping broker-1-merged/00000000000000000000.log
---
> Dumping broker-2-merged/00000000000000000000.log
113a114
> offset: 112 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 2581499653
168a170
> offset: 168 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 3880215630
387a390
> offset: 389 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 3744939326
2734d2736
< offset: 2737 isvalid: true payloadsize: 500 magic: 2 compresscodec: NoCompressionCodec crc: 314900536

                
> System Test : Leader Failure Data Loss When request-num-acks is 1
> -----------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: John Fung
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-573) System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-573:
--------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Thanks for the review. Removed the unnecessary log statement and committed to 0.8.
                
> System Test : Leader Failure Log Segment Checksum Mismatched When request-num-acks is 1
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>            Assignee: Jun Rao
>             Fix For: 0.8
>
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573.patch, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-573) System Test : Leader Failure Data Loss When request-num-acks is 1

Posted by "John Fung (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Fung updated KAFKA-573:
----------------------------

    Attachment: acks1_leader_failure_data_loss.tar.gz
    
> System Test : Leader Failure Data Loss When request-num-acks is 1
> -----------------------------------------------------------------
>
>                 Key: KAFKA-573
>                 URL: https://issues.apache.org/jira/browse/KAFKA-573
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: John Fung
>         Attachments: acks1_leader_failure_data_loss.tar.gz, kafka-573-reproduce-issue.patch
>
>
> • Test Description:
> 1. Start a 3-broker cluster as source
> 2. Send messages to source cluster
> 3. Find leader and terminate it (kill -15)
> 4. Start the broker again
> 5. Start a consumer to consume data
> 6. Compare the MessageID in the data between producer log and consumer log.
> • Issue: There will be data loss if request-num-acks is set to 1. 
> • To reproduce this issue, please do the followings:
> 1. Download the latest 0.8 branch
> 2. Apply the patch attached to this JIRA
> 3. Build kafka by running "./sbt update package"
> 4. Execute the test in directory "system_test" : "python -B system_test_runner.py"
> 5. This test will execute testcase_2 with the following settings:
>     Replica factor : 3
>     No. of partitions : 1
>     No. of bouncing : 1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira