You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Sriram Subramanian (JIRA)" <ji...@apache.org> on 2013/06/04 01:46:20 UTC
[jira] [Updated] (KAFKA-905) Logs can have same offsets causing recovery failure

     [ https://issues.apache.org/jira/browse/KAFKA-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sriram Subramanian updated KAFKA-905:
-------------------------------------

    Attachment: KAFKA-905-v2.patch

- made the logging changes
- added the missing file
                
> Logs can have same offsets causing recovery failure
> ---------------------------------------------------
>
>                 Key: KAFKA-905
>                 URL: https://issues.apache.org/jira/browse/KAFKA-905
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Sriram Subramanian
>            Assignee: Sriram Subramanian
>             Fix For: 0.8
>
>         Attachments: KAFKA-905.patch, KAFKA-905.rtf, KAFKA-905-v2.patch
>
>
> Consider the following scenario - 
> L                       F
> 1  m1,m2        1 m1,m2
> 3  m3,m4        3 m3,m4
> 5  m5,m6        5 m5,m6
> HW = 6           HW = 4
> Follower goes down and comes back up. Truncates its log to HW
> L                             F
> 1  m1,m2               1 m1,m2
> 3  m3,m4               3 m3,m4
> 5  m5,m6
> HW = 6            HW = 4
> Before follower catches up with the leader, leader goes down and follower becomes the leader. It then gets new messages
> F                       L
> 1  m1,m2        1  m1,m2
> 3  m3,m4        3  m3,m4
> 5  m5,m6      10 m5-m10
> HW=6              HW=4
> follower fetches from offset 7. Since offset 7 is within the compressed message 10 in the leader, the whole message chunk is sent to the follower
> F                        L      
> 1   m1,m2         1  m1,m2
> 3   m3,m4         3  m3,m4  
> 5   m5,m6       10  m5-m10
> 10 m5-m10
> HW=4               HW=10
> The follower logs now contain the same offsets. On recovery, re-indexing will fail due to repeated offsets.
> Possible ways to fix this - 
> 1. The fetcher thread can do deep iteration instead of shallow iteration and drop the offsets that are less than the log end offset. This would however incur performance hit.
> 2. To optimize step 1, we could do the deep iteration till the logical offset of the fetched message set is greater than the log end offset of the follower log and then switch to shallow iteration.
> 3. On recovery we just truncate the active segment and refetch the data.
> All the above 3 steps are hacky. The right fix is to ensure we never corrupt the logs. We can incur data loss but should not compromise consistency. For 0.8, the easiest and simplest fix would be 3. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira