You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "John Fung (JIRA)" <ji...@apache.org> on 2012/06/25 06:11:42 UTC

[jira] [Updated] (KAFKA-372) Consumer doesn't receive all data if there are multiple segment files

     [ https://issues.apache.org/jira/browse/KAFKA-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Fung updated KAFKA-372:
----------------------------

    Attachment: multi_seg_files_data_loss_debug.patch

** Uploaded a patch with a simplified scenario to reproduce the data loss in multi segment files. 

** This patch provides a script "run-test-debug.sh" to do the following:
1. Start 1 broker
2. Start a modified version of Producer to send 300 messages with user specified message string length (500 chars will reproduce the issue while 50 chars will not). This producer produces messages with sequence ID and send the messages in sequence starting from 1, 2, 3, … Etc.
3. Start ConsoleConsumer to receive data

** To reproduce the issue, under <kafka home>/system_test/broker_failure, execute the following command:

$ bin/run-test-debug.sh 500 (which means each message string is 500 chars long)

The consumer only receives the first 120 messages. (This is verified by checking kafka.tools.DumpLogSegments.
========================================================
no. of messages published            : 300
producer unique msg rec'd            : 300
source consumer msg rec'd            : 120
source consumer unique msg rec'd     : 120
========================================================

The number of segment files are 

$ ls -l /tmp/kafka-source1-logs/test01-0/
-rw-r--r--   1 jfung  wheel  10431 Jun 24 20:59:21 2012 00000000000000000000.kafka
-rw-r--r--   1 jfung  wheel  10440 Jun 24 20:59:22 2012 00000000000000010431.kafka
-rw-r--r--   1 jfung  wheel  10440 Jun 24 20:59:23 2012 00000000000000020871.kafka
-rw-r--r--   1 jfung  wheel  10440 Jun 24 20:59:24 2012 00000000000000031311.kafka
-rw-r--r--   1 jfung  wheel  10441 Jun 24 20:59:26 2012 00000000000000041751.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:27 2012 00000000000000052192.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:28 2012 00000000000000062652.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:29 2012 00000000000000073112.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:31 2012 00000000000000083572.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:32 2012 00000000000000094032.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:33 2012 00000000000000104492.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:34 2012 00000000000000114952.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:35 2012 00000000000000125412.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:37 2012 00000000000000135872.kafka
-rw-r--r--   1 jfung  wheel  10460 Jun 24 20:59:38 2012 00000000000000146332.kafka
-rw-r--r--   1 jfung  wheel      0 Jun 24 20:59:38 2012 00000000000000156792.kafka
-rw-r--r--   1 jfung  wheel      8 Jun 24 21:00:08 2012 highwatermark


** However, if the length of each message string is changed to a lower value 50, the issue won't be showing:

$ bin/run-test-debug.sh 50

The consumer receives all data:
========================================================
no. of messages published            : 300
producer unique msg rec'd            : 300
source consumer msg rec'd            : 300
source consumer unique msg rec'd     : 300
========================================================

The number of segment files are

$  ls -l /tmp/kafka-source1-logs/test01-0
total 64
-rw-r--r--  1 jfung  wheel  10039 Jun 24 20:29:26 2012 00000000000000000000.kafka
-rw-r--r--  1 jfung  wheel  10001 Jun 24 20:29:34 2012 00000000000000010039.kafka
-rw-r--r--  1 jfung  wheel   1752 Jun 24 20:29:36 2012 00000000000000020040.kafka
-rw-r--r--  1 jfung  wheel      8 Jun 24 20:30:06 2012 highwatermark

                
> Consumer doesn't receive all data if there are multiple segment files
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-372
>                 URL: https://issues.apache.org/jira/browse/KAFKA-372
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: John Fung
>         Attachments: multi_seg_files_data_loss_debug.patch
>
>
> This issue happens inconsistently but could be reproduced by following the steps below (repeat step 4 a few times to reproduce it):
> 1. Check out 0.8 branch (currently reproducible with rev. 1352634)
> 2. Apply kafka-306-v4.patch
> 3. Please note that the log.file.size is set to 10000000 in system_test/broker_failure/config/server_*.properties (small enough to trigger multi segment files)
> 4. Under the directory <kafka home>/system_test/broker_failure, execute command:
> $ bin/run-test.sh 20 0
> 5. After the test is completed, the result will probably look like the following:
> ========================================================
> no. of messages published            : 14000
> producer unique msg rec'd            : 14000
> source consumer msg rec'd            : 7271
> source consumer unique msg rec'd     : 7271
> mirror consumer msg rec'd            : 6960
> mirror consumer unique msg rec'd     : 6960
> total source/mirror duplicate msg    : 0
> source/mirror uniq msg count diff    : 311
> ========================================================
> 6. By checking the kafka log files, the sum of the sizes of the source cluster segments files are equal to those in the target cluster.
> [/tmp] $  find kafka* -name *.kafka -ls
> 18620155 9860 -rw-r--r--   1 jfung    eng      10096535 Jun 21 11:09 kafka-source3-logs/test01-0/00000000000000000000.kafka
> 18620161 9772 -rw-r--r--   1 jfung    eng      10004418 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000020105286.kafka
> 18620160 9776 -rw-r--r--   1 jfung    eng      10008751 Jun 21 11:10 kafka-source3-logs/test01-0/00000000000010096535.kafka
> 18620162 4708 -rw-r--r--   1 jfung    eng       4819067 Jun 21 11:11 kafka-source3-logs/test01-0/00000000000030109704.kafka
> 19406431 9920 -rw-r--r--   1 jfung    eng      10157685 Jun 21 11:10 kafka-target2-logs/test01-0/00000000000010335039.kafka
> 19406429 10096 -rw-r--r--   1 jfung    eng      10335039 Jun 21 11:09 kafka-target2-logs/test01-0/00000000000000000000.kafka
> 19406432 10300 -rw-r--r--   1 jfung    eng      10544850 Jun 21 11:11 kafka-target2-logs/test01-0/00000000000020492724.kafka
> 19406433 3800 -rw-r--r--   1 jfung    eng       3891197 Jun 21 11:12 kafka-target2-logs/test01-0/00000000000031037574.kafka
> 7. If the log.file.size in target cluster is configured to a very large value such that there is only 1 data file, the result would look like this:
> ========================================================
> no. of messages published            : 14000
> producer unique msg rec'd            : 14000
> source consumer msg rec'd            : 7302
> source consumer unique msg rec'd     : 7302
> mirror consumer msg rec'd            : 13750
> mirror consumer unique msg rec'd     : 13750
> total source/mirror duplicate msg    : 0
> source/mirror uniq msg count diff    : -6448
> ========================================================
> 8. The log files are like these:
> [/tmp] $ find kafka* -name *.kafka -ls
> 18620160 9840 -rw-r--r--   1 jfung    eng      10075058 Jun 21 11:24 kafka-source2-logs/test01-0/00000000000010083679.kafka
> 18620155 9848 -rw-r--r--   1 jfung    eng      10083679 Jun 21 11:23 kafka-source2-logs/test01-0/00000000000000000000.kafka
> 18620162 4484 -rw-r--r--   1 jfung    eng       4589474 Jun 21 11:26 kafka-source2-logs/test01-0/00000000000030269045.kafka
> 18620161 9876 -rw-r--r--   1 jfung    eng      10110308 Jun 21 11:25 kafka-source2-logs/test01-0/00000000000020158737.kafka
> 19406429 34048 -rw-r--r--   1 jfung    eng      34858519 Jun 21 11:26 kafka-target3-logs/test01-0/00000000000000000000.kafka

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira