You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "John Fung (JIRA)" <ji...@apache.org> on 2012/06/21 20:31:44 UTC

[jira] [Created] (KAFKA-372) Consumer doesn't receive all data if there are multiple segment files

John Fung created KAFKA-372:
-------------------------------

             Summary: Consumer doesn't receive all data if there are multiple segment files
                 Key: KAFKA-372
                 URL: https://issues.apache.org/jira/browse/KAFKA-372
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.8
            Reporter: John Fung


This issue happens inconsistently but could be reproduced by following the steps below (repeat step 4 a few times to reproduce it):

1. Check out 0.8 branch (currently reproducible with rev. 1352634)

2. Apply kafka-306-v4.patch

3. Please note that the log.file.size is set to 10000000 in system_test/broker_failure/config/server_*.properties (small enough to trigger multi segment files)

4. Under the directory <kafka home>/system_test/broker_failure, execute command:
$ bin/run-test.sh 20 0

5. After the test is completed, the result will probably look like the following:

========================================================
no. of messages published            : 14000
producer unique msg rec'd            : 14000
source consumer msg rec'd            : 7271
source consumer unique msg rec'd     : 7271
mirror consumer msg rec'd            : 6960
mirror consumer unique msg rec'd     : 6960
total source/mirror duplicate msg    : 0
source/mirror uniq msg count diff    : 311
========================================================

6. By checking the kafka log files, the sum of the sizes of the source cluster segments files are equal to those in the target cluster.

[/tmp] $  find kafka* -name *.kafka -ls

18620155 9860 -rw-r--r--   1 jfung    eng      10096535 Jun 21 11:09 kafka-source3-logs/test01-0/00000000000000000000.kafka
18620161 9772 -rw-r--r--   1 jfung    eng      10004418 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000020105286.kafka
18620160 9776 -rw-r--r--   1 jfung    eng      10008751 Jun 21 11:10 kafka-source3-logs/test01-0/00000000000010096535.kafka
18620162 4708 -rw-r--r--   1 jfung    eng       4819067 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000030109704.kafka
19406431 9920 -rw-r--r--   1 jfung    eng      10157685 Jun 21 11:10 kafka-target2-logs/test01-0/00000000000010335039.kafka
19406429 10096 -rw-r--r--   1 jfung    eng      10335039 Jun 21 11:09 kafka-target2-logs/test01-0/00000000000000000000.kafka
19406432 10300 -rw-r--r--   1 jfung    eng      10544850 Jun 21 11:11 kafka-target2-logs/test01-0/00000000000020492724.kafka
19406433 3800 -rw-r--r--   1 jfung    eng       3891197 Jun 21 11:12 kafka-target2-logs/test01-0/00000000000031037574.kafka

7. If the log.file.size in target cluster is configured to a very large value such that there is only 1 data file, the result would look like this:

========================================================
no. of messages published            : 14000
producer unique msg rec'd            : 14000
source consumer msg rec'd            : 7302
source consumer unique msg rec'd     : 7302
mirror consumer msg rec'd            : 13750
mirror consumer unique msg rec'd     : 13750
total source/mirror duplicate msg    : 0
source/mirror uniq msg count diff    : -6448
========================================================

8. The log files are like these:

[/tmp] $ find kafka* -name *.kafka -ls

18620160 9840 -rw-r--r--   1 jfung    eng      10075058 Jun 21 11:24 kafka-source2-logs/test01-0/00000000000010083679.kafka
18620155 9848 -rw-r--r--   1 jfung    eng      10083679 Jun 21 11:23 kafka-source2-logs/test01-0/00000000000000000000.kafka
18620162 4484 -rw-r--r--   1 jfung    eng       4589474 Jun 21 11:26 kafka-source2-logs/test01-0/00000000000030269045.kafka
18620161 9876 -rw-r--r--   1 jfung    eng      10110308 Jun 21 11:25 kafka-source2-logs/test01-0/00000000000020158737.kafka
19406429 34048 -rw-r--r--   1 jfung    eng      34858519 Jun 21 11:26 kafka-target3-logs/test01-0/00000000000000000000.kafka


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (KAFKA-372) Consumer doesn't receive all data if there are multiple segment files

Posted by "John Fung (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Fung resolved KAFKA-372.
-----------------------------

    Resolution: Fixed

Thanks Jun. It is working correctly after applying kafka-372-v1.patch.
                
> Consumer doesn't receive all data if there are multiple segment files
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-372
>                 URL: https://issues.apache.org/jira/browse/KAFKA-372
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: John Fung
>         Attachments: kafka-372_v1.patch, multi_seg_files_data_loss_debug.patch
>
>
> This issue happens inconsistently but could be reproduced by following the steps below (repeat step 4 a few times to reproduce it):
> 1. Check out 0.8 branch (currently reproducible with rev. 1352634)
> 2. Apply kafka-306-v4.patch
> 3. Please note that the log.file.size is set to 10000000 in system_test/broker_failure/config/server_*.properties (small enough to trigger multi segment files)
> 4. Under the directory <kafka home>/system_test/broker_failure, execute command:
> $ bin/run-test.sh 20 0
> 5. After the test is completed, the result will probably look like the following:
> ========================================================
> no. of messages published            : 14000
> producer unique msg rec'd            : 14000
> source consumer msg rec'd            : 7271
> source consumer unique msg rec'd     : 7271
> mirror consumer msg rec'd            : 6960
> mirror consumer unique msg rec'd     : 6960
> total source/mirror duplicate msg    : 0
> source/mirror uniq msg count diff    : 311
> ========================================================
> 6. By checking the kafka log files, the sum of the sizes of the source cluster segments files are equal to those in the target cluster.
> [/tmp] $  find kafka* -name *.kafka -ls
> 18620155 9860 -rw-r--r--   1 jfung    eng      10096535 Jun 21 11:09 kafka-source3-logs/test01-0/00000000000000000000.kafka
> 18620161 9772 -rw-r--r--   1 jfung    eng      10004418 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000020105286.kafka
> 18620160 9776 -rw-r--r--   1 jfung    eng      10008751 Jun 21 11:10 kafka-source3-logs/test01-0/00000000000010096535.kafka
> 18620162 4708 -rw-r--r--   1 jfung    eng       4819067 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000030109704.kafka
> 19406431 9920 -rw-r--r--   1 jfung    eng      10157685 Jun 21 11:10 kafka-target2-logs/test01-0/00000000000010335039.kafka
> 19406429 10096 -rw-r--r--   1 jfung    eng      10335039 Jun 21 11:09 kafka-target2-logs/test01-0/00000000000000000000.kafka
> 19406432 10300 -rw-r--r--   1 jfung    eng      10544850 Jun 21 11:11 kafka-target2-logs/test01-0/00000000000020492724.kafka
> 19406433 3800 -rw-r--r--   1 jfung    eng       3891197 Jun 21 11:12 kafka-target2-logs/test01-0/00000000000031037574.kafka
> 7. If the log.file.size in target cluster is configured to a very large value such that there is only 1 data file, the result would look like this:
> ========================================================
> no. of messages published            : 14000
> producer unique msg rec'd            : 14000
> source consumer msg rec'd            : 7302
> source consumer unique msg rec'd     : 7302
> mirror consumer msg rec'd            : 13750
> mirror consumer unique msg rec'd     : 13750
> total source/mirror duplicate msg    : 0
> source/mirror uniq msg count diff    : -6448
> ========================================================
> 8. The log files are like these:
> [/tmp] $ find kafka* -name *.kafka -ls
> 18620160 9840 -rw-r--r--   1 jfung    eng      10075058 Jun 21 11:24 kafka-source2-logs/test01-0/00000000000010083679.kafka
> 18620155 9848 -rw-r--r--   1 jfung    eng      10083679 Jun 21 11:23 kafka-source2-logs/test01-0/00000000000000000000.kafka
> 18620162 4484 -rw-r--r--   1 jfung    eng       4589474 Jun 21 11:26 kafka-source2-logs/test01-0/00000000000030269045.kafka
> 18620161 9876 -rw-r--r--   1 jfung    eng      10110308 Jun 21 11:25 kafka-source2-logs/test01-0/00000000000020158737.kafka
> 19406429 34048 -rw-r--r--   1 jfung    eng      34858519 Jun 21 11:26 kafka-target3-logs/test01-0/00000000000000000000.kafka

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-372) Consumer doesn't receive all data if there are multiple segment files

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-372:
--------------------------

    Attachment: kafka-372_v1.patch

There were several issues that caused the problem.

1. Log.nextAppendOffset() calls flush each time. Since this method is called for every produce request, we force a disk flush for every produce request independent of the flush interval in the broker config. This makes producers very slow.

2. The default value for MaxFetchWaitMs in consumer config is 3 secs, which is too long.

3. The script runs console consumer in background and only waits for 20 secs, which is too short. What we should do is to run console consumer in foreground and wait until it finishes (since it has consumer timeout).

Attach patch v1 that fixes items 1 and 2. The test now passes. However, we should address item 3 in the script too.
                
> Consumer doesn't receive all data if there are multiple segment files
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-372
>                 URL: https://issues.apache.org/jira/browse/KAFKA-372
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: John Fung
>         Attachments: kafka-372_v1.patch, multi_seg_files_data_loss_debug.patch
>
>
> This issue happens inconsistently but could be reproduced by following the steps below (repeat step 4 a few times to reproduce it):
> 1. Check out 0.8 branch (currently reproducible with rev. 1352634)
> 2. Apply kafka-306-v4.patch
> 3. Please note that the log.file.size is set to 10000000 in system_test/broker_failure/config/server_*.properties (small enough to trigger multi segment files)
> 4. Under the directory <kafka home>/system_test/broker_failure, execute command:
> $ bin/run-test.sh 20 0
> 5. After the test is completed, the result will probably look like the following:
> ========================================================
> no. of messages published            : 14000
> producer unique msg rec'd            : 14000
> source consumer msg rec'd            : 7271
> source consumer unique msg rec'd     : 7271
> mirror consumer msg rec'd            : 6960
> mirror consumer unique msg rec'd     : 6960
> total source/mirror duplicate msg    : 0
> source/mirror uniq msg count diff    : 311
> ========================================================
> 6. By checking the kafka log files, the sum of the sizes of the source cluster segments files are equal to those in the target cluster.
> [/tmp] $  find kafka* -name *.kafka -ls
> 18620155 9860 -rw-r--r--   1 jfung    eng      10096535 Jun 21 11:09 kafka-source3-logs/test01-0/00000000000000000000.kafka
> 18620161 9772 -rw-r--r--   1 jfung    eng      10004418 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000020105286.kafka
> 18620160 9776 -rw-r--r--   1 jfung    eng      10008751 Jun 21 11:10 kafka-source3-logs/test01-0/00000000000010096535.kafka
> 18620162 4708 -rw-r--r--   1 jfung    eng       4819067 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000030109704.kafka
> 19406431 9920 -rw-r--r--   1 jfung    eng      10157685 Jun 21 11:10 kafka-target2-logs/test01-0/00000000000010335039.kafka
> 19406429 10096 -rw-r--r--   1 jfung    eng      10335039 Jun 21 11:09 kafka-target2-logs/test01-0/00000000000000000000.kafka
> 19406432 10300 -rw-r--r--   1 jfung    eng      10544850 Jun 21 11:11 kafka-target2-logs/test01-0/00000000000020492724.kafka
> 19406433 3800 -rw-r--r--   1 jfung    eng       3891197 Jun 21 11:12 kafka-target2-logs/test01-0/00000000000031037574.kafka
> 7. If the log.file.size in target cluster is configured to a very large value such that there is only 1 data file, the result would look like this:
> ========================================================
> no. of messages published            : 14000
> producer unique msg rec'd            : 14000
> source consumer msg rec'd            : 7302
> source consumer unique msg rec'd     : 7302
> mirror consumer msg rec'd            : 13750
> mirror consumer unique msg rec'd     : 13750
> total source/mirror duplicate msg    : 0
> source/mirror uniq msg count diff    : -6448
> ========================================================
> 8. The log files are like these:
> [/tmp] $ find kafka* -name *.kafka -ls
> 18620160 9840 -rw-r--r--   1 jfung    eng      10075058 Jun 21 11:24 kafka-source2-logs/test01-0/00000000000010083679.kafka
> 18620155 9848 -rw-r--r--   1 jfung    eng      10083679 Jun 21 11:23 kafka-source2-logs/test01-0/00000000000000000000.kafka
> 18620162 4484 -rw-r--r--   1 jfung    eng       4589474 Jun 21 11:26 kafka-source2-logs/test01-0/00000000000030269045.kafka
> 18620161 9876 -rw-r--r--   1 jfung    eng      10110308 Jun 21 11:25 kafka-source2-logs/test01-0/00000000000020158737.kafka
> 19406429 34048 -rw-r--r--   1 jfung    eng      34858519 Jun 21 11:26 kafka-target3-logs/test01-0/00000000000000000000.kafka

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-372) Consumer doesn't receive all data if there are multiple segment files

Posted by "John Fung (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Fung updated KAFKA-372:
----------------------------

    Attachment: multi_seg_files_data_loss_debug.patch

** Uploaded a patch with a simplified scenario to reproduce the data loss in multi segment files. 

** This patch provides a script "run-test-debug.sh" to do the following:
1. Start 1 broker
2. Start a modified version of Producer to send 300 messages with user specified message string length (500 chars will reproduce the issue while 50 chars will not). This producer produces messages with sequence ID and send the messages in sequence starting from 1, 2, 3, … Etc.
3. Start ConsoleConsumer to receive data

** To reproduce the issue, under <kafka home>/system_test/broker_failure, execute the following command:

$ bin/run-test-debug.sh 500 (which means each message string is 500 chars long)

The consumer only receives the first 120 messages. (This is verified by checking kafka.tools.DumpLogSegments.
========================================================
no. of messages published            : 300
producer unique msg rec'd            : 300
source consumer msg rec'd            : 120
source consumer unique msg rec'd     : 120
========================================================

The number of segment files are 

$ ls -l /tmp/kafka-source1-logs/test01-0/
-rw-r--r--   1 jfung  wheel  10431 Jun 24 20:59:21 2012 00000000000000000000.kafka
-rw-r--r--   1 jfung  wheel  10440 Jun 24 20:59:22 2012 00000000000000010431.kafka
-rw-r--r--   1 jfung  wheel  10440 Jun 24 20:59:23 2012 00000000000000020871.kafka
-rw-r--r--   1 jfung  wheel  10440 Jun 24 20:59:24 2012 00000000000000031311.kafka
-rw-r--r--   1 jfung  wheel  10441 Jun 24 20:59:26 2012 00000000000000041751.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:27 2012 00000000000000052192.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:28 2012 00000000000000062652.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:29 2012 00000000000000073112.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:31 2012 00000000000000083572.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:32 2012 00000000000000094032.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:33 2012 00000000000000104492.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:34 2012 00000000000000114952.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:35 2012 00000000000000125412.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:37 2012 00000000000000135872.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:38 2012 00000000000000146332.kafka
-rw-r--r--   1 jfung  wheel      0 Jun 24 20:59:38 2012 00000000000000156792.kafka
-rw-r--r--   1 jfung  wheel      8 Jun 24 21:00:08 2012 highwatermark


** However, if the length of each message string is changed to a lower value 50, the issue won't be showing:

$ bin/run-test-debug.sh 50

The consumer receives all data:
========================================================
no. of messages published            : 300
producer unique msg rec'd            : 300
source consumer msg rec'd            : 300
source consumer unique msg rec'd     : 300
========================================================

The number of segment files are

$  ls -l /tmp/kafka-source1-logs/test01-0
total 64
-rw-r--r--  1 jfung  wheel  10039 Jun 24 20:29:26 2012 00000000000000000000.kafka
-rw-r--r--  1 jfung  wheel  10001 Jun 24 20:29:34 2012 00000000000000010039.kafka
-rw-r--r--  1 jfung  wheel   1752 Jun 24 20:29:36 2012 00000000000000020040.kafka
-rw-r--r--  1 jfung  wheel      8 Jun 24 20:30:06 2012 highwatermark

                
> Consumer doesn't receive all data if there are multiple segment files
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-372
>                 URL: https://issues.apache.org/jira/browse/KAFKA-372
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: John Fung
>         Attachments: multi_seg_files_data_loss_debug.patch
>
>
> This issue happens inconsistently but could be reproduced by following the steps below (repeat step 4 a few times to reproduce it):
> 1. Check out 0.8 branch (currently reproducible with rev. 1352634)
> 2. Apply kafka-306-v4.patch
> 3. Please note that the log.file.size is set to 10000000 in system_test/broker_failure/config/server_*.properties (small enough to trigger multi segment files)
> 4. Under the directory <kafka home>/system_test/broker_failure, execute command:
> $ bin/run-test.sh 20 0
> 5. After the test is completed, the result will probably look like the following:
> ========================================================
> no. of messages published            : 14000
> producer unique msg rec'd            : 14000
> source consumer msg rec'd            : 7271
> source consumer unique msg rec'd     : 7271
> mirror consumer msg rec'd            : 6960
> mirror consumer unique msg rec'd     : 6960
> total source/mirror duplicate msg    : 0
> source/mirror uniq msg count diff    : 311
> ========================================================
> 6. By checking the kafka log files, the sum of the sizes of the source cluster segments files are equal to those in the target cluster.
> [/tmp] $  find kafka* -name *.kafka -ls
> 18620155 9860 -rw-r--r--   1 jfung    eng      10096535 Jun 21 11:09 kafka-source3-logs/test01-0/00000000000000000000.kafka
> 18620161 9772 -rw-r--r--   1 jfung    eng      10004418 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000020105286.kafka
> 18620160 9776 -rw-r--r--   1 jfung    eng      10008751 Jun 21 11:10 kafka-source3-logs/test01-0/00000000000010096535.kafka
> 18620162 4708 -rw-r--r--   1 jfung    eng       4819067 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000030109704.kafka
> 19406431 9920 -rw-r--r--   1 jfung    eng      10157685 Jun 21 11:10 kafka-target2-logs/test01-0/00000000000010335039.kafka
> 19406429 10096 -rw-r--r--   1 jfung    eng      10335039 Jun 21 11:09 kafka-target2-logs/test01-0/00000000000000000000.kafka
> 19406432 10300 -rw-r--r--   1 jfung    eng      10544850 Jun 21 11:11 kafka-target2-logs/test01-0/00000000000020492724.kafka
> 19406433 3800 -rw-r--r--   1 jfung    eng       3891197 Jun 21 11:12 kafka-target2-logs/test01-0/00000000000031037574.kafka
> 7. If the log.file.size in target cluster is configured to a very large value such that there is only 1 data file, the result would look like this:
> ========================================================
> no. of messages published            : 14000
> producer unique msg rec'd            : 14000
> source consumer msg rec'd            : 7302
> source consumer unique msg rec'd     : 7302
> mirror consumer msg rec'd            : 13750
> mirror consumer unique msg rec'd     : 13750
> total source/mirror duplicate msg    : 0
> source/mirror uniq msg count diff    : -6448
> ========================================================
> 8. The log files are like these:
> [/tmp] $ find kafka* -name *.kafka -ls
> 18620160 9840 -rw-r--r--   1 jfung    eng      10075058 Jun 21 11:24 kafka-source2-logs/test01-0/00000000000010083679.kafka
> 18620155 9848 -rw-r--r--   1 jfung    eng      10083679 Jun 21 11:23 kafka-source2-logs/test01-0/00000000000000000000.kafka
> 18620162 4484 -rw-r--r--   1 jfung    eng       4589474 Jun 21 11:26 kafka-source2-logs/test01-0/00000000000030269045.kafka
> 18620161 9876 -rw-r--r--   1 jfung    eng      10110308 Jun 21 11:25 kafka-source2-logs/test01-0/00000000000020158737.kafka
> 19406429 34048 -rw-r--r--   1 jfung    eng      34858519 Jun 21 11:26 kafka-target3-logs/test01-0/00000000000000000000.kafka

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (KAFKA-372) Consumer doesn't receive all data if there are multiple segment files

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401491#comment-13401491 ] 

Jun Rao commented on KAFKA-372:
-------------------------------

Thanks John for reviewing the patch. Committed to 0.8.
                
> Consumer doesn't receive all data if there are multiple segment files
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-372
>                 URL: https://issues.apache.org/jira/browse/KAFKA-372
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: John Fung
>         Attachments: kafka-372_v1.patch, multi_seg_files_data_loss_debug.patch
>
>
> This issue happens inconsistently but could be reproduced by following the steps below (repeat step 4 a few times to reproduce it):
> 1. Check out 0.8 branch (currently reproducible with rev. 1352634)
> 2. Apply kafka-306-v4.patch
> 3. Please note that the log.file.size is set to 10000000 in system_test/broker_failure/config/server_*.properties (small enough to trigger multi segment files)
> 4. Under the directory <kafka home>/system_test/broker_failure, execute command:
> $ bin/run-test.sh 20 0
> 5. After the test is completed, the result will probably look like the following:
> ========================================================
> no. of messages published            : 14000
> producer unique msg rec'd            : 14000
> source consumer msg rec'd            : 7271
> source consumer unique msg rec'd     : 7271
> mirror consumer msg rec'd            : 6960
> mirror consumer unique msg rec'd     : 6960
> total source/mirror duplicate msg    : 0
> source/mirror uniq msg count diff    : 311
> ========================================================
> 6. By checking the kafka log files, the sum of the sizes of the source cluster segments files are equal to those in the target cluster.
> [/tmp] $  find kafka* -name *.kafka -ls
> 18620155 9860 -rw-r--r--   1 jfung    eng      10096535 Jun 21 11:09 kafka-source3-logs/test01-0/00000000000000000000.kafka
> 18620161 9772 -rw-r--r--   1 jfung    eng      10004418 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000020105286.kafka
> 18620160 9776 -rw-r--r--   1 jfung    eng      10008751 Jun 21 11:10 kafka-source3-logs/test01-0/00000000000010096535.kafka
> 18620162 4708 -rw-r--r--   1 jfung    eng       4819067 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000030109704.kafka
> 19406431 9920 -rw-r--r--   1 jfung    eng      10157685 Jun 21 11:10 kafka-target2-logs/test01-0/00000000000010335039.kafka
> 19406429 10096 -rw-r--r--   1 jfung    eng      10335039 Jun 21 11:09 kafka-target2-logs/test01-0/00000000000000000000.kafka
> 19406432 10300 -rw-r--r--   1 jfung    eng      10544850 Jun 21 11:11 kafka-target2-logs/test01-0/00000000000020492724.kafka
> 19406433 3800 -rw-r--r--   1 jfung    eng       3891197 Jun 21 11:12 kafka-target2-logs/test01-0/00000000000031037574.kafka
> 7. If the log.file.size in target cluster is configured to a very large value such that there is only 1 data file, the result would look like this:
> ========================================================
> no. of messages published            : 14000
> producer unique msg rec'd            : 14000
> source consumer msg rec'd            : 7302
> source consumer unique msg rec'd     : 7302
> mirror consumer msg rec'd            : 13750
> mirror consumer unique msg rec'd     : 13750
> total source/mirror duplicate msg    : 0
> source/mirror uniq msg count diff    : -6448
> ========================================================
> 8. The log files are like these:
> [/tmp] $ find kafka* -name *.kafka -ls
> 18620160 9840 -rw-r--r--   1 jfung    eng      10075058 Jun 21 11:24 kafka-source2-logs/test01-0/00000000000010083679.kafka
> 18620155 9848 -rw-r--r--   1 jfung    eng      10083679 Jun 21 11:23 kafka-source2-logs/test01-0/00000000000000000000.kafka
> 18620162 4484 -rw-r--r--   1 jfung    eng       4589474 Jun 21 11:26 kafka-source2-logs/test01-0/00000000000030269045.kafka
> 18620161 9876 -rw-r--r--   1 jfung    eng      10110308 Jun 21 11:25 kafka-source2-logs/test01-0/00000000000020158737.kafka
> 19406429 34048 -rw-r--r--   1 jfung    eng      34858519 Jun 21 11:26 kafka-target3-logs/test01-0/00000000000000000000.kafka

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira