You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2012/10/31 17:49:12 UTC

[jira] [Created] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Jun Rao created KAFKA-596:
-----------------------------

             Summary: LogSegment.firstAppendTime not reset after truncate to
                 Key: KAFKA-596
                 URL: https://issues.apache.org/jira/browse/KAFKA-596
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.8
            Reporter: Jun Rao


Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Swapnil Ghike (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489286#comment-13489286 ] 

Swapnil Ghike commented on KAFKA-596:
-------------------------------------

Actually the modification in maybeRoll() in the conditions that determine whether to roll a new segment or not, is enough for correctly fixing the issue mentioned above. The fix rolls a new segment only if the size of messageSet is > 0. So if we truncated the segment to size 0, maybeRoll() will not roll a new segment at the same starting offset. 

I kept those lines in Log.maybeRoll(), Logsement.truncateTo() and Log.markedDeletedWhile() for optimization. Setting the firstAppendTime to None whenever the size is found to be 0 will postpone the next time based roll and also will not harm correctness.
                
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>              Labels: bugs
>         Attachments: kafka-596.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Swapnil Ghike (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swapnil Ghike updated KAFKA-596:
--------------------------------

    Attachment: kafka-596.patch

Yes, that's a pretty sharp observation. The current time based roll policy only checks whether the segment was not appended for its lifetime. This patch has three fixes: 

1. Initial assignment of firstAppendTime in Logsegment, because the non-primary constructor could potentially initialize the messageSet and set its size > 0. (I don't know if this fix will affect any other jiras.)

2. In maybeRoll(), a new condition makes sure that roll() happens based on time only if the messageset size > 0, thus different segments cannot have identical starting offsets. It also makes sure that a new segment is not rolled if the last segment is not appended with messages until now.

2. A segment is reborn at three places by setting its firstAppendTime to None if the message set size is 0 - 
i. Log.maybeRoll()
ii. Logsement.truncateTo()
iii. Log.markedDeletedWhile()
                
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>              Labels: bugs
>         Attachments: kafka-596.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Swapnil Ghike (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swapnil Ghike updated KAFKA-596:
--------------------------------

    Attachment: kafka-596-v2.patch

Yes, I see your point. Attached a new patch.
                
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>              Labels: bugs
>         Attachments: kafka-596.patch, kafka-596-v2.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489473#comment-13489473 ] 

Jun Rao commented on KAFKA-596:
-------------------------------

I agree that setting firstAppendTime to None in Logsement.truncateTo() is necessary. However, I don't think this is necessary in Log.maybeRoll() and Log.markedDeletedWhile(). In both cases, we are not changing the log segment. So whoever changed the segment last to make its size 0 (either through truncation or creation) would have set firstAppendTime properly. Note that maybeRoll is called on every log append. We don't want to add unnecessary overhead.
                
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>              Labels: bugs
>         Attachments: kafka-596.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao closed KAFKA-596.
-------------------------

    
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-596.patch, kafka-596-v2.patch, kafka-596-v3.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao resolved KAFKA-596.
---------------------------

       Resolution: Fixed
    Fix Version/s: 0.8

Thanks for patch v3. +1. Committed to 0.8.
                
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-596.patch, kafka-596-v2.patch, kafka-596-v3.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487947#comment-13487947 ] 

Jun Rao commented on KAFKA-596:
-------------------------------

The fix is to set LogSegment.firstAppendTime to none if we truncate the segment to size 0. However, this brings up the deeper question of how do we prevent segments with identical starting offset from being created during log roll? Maybe, we should add a check in log.roll to guard this.
                
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>              Labels: bugs
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Neha Narkhede (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490298#comment-13490298 ] 

Neha Narkhede commented on KAFKA-596:
-------------------------------------

Good catch, Jun !

+1 on v3
                
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>              Labels: bugs
>         Attachments: kafka-596.patch, kafka-596-v2.patch, kafka-596-v3.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489266#comment-13489266 ] 

Jun Rao commented on KAFKA-596:
-------------------------------

Thanks for the patch. A couple of comments:

1.  Log.maybeRoll(): Are the following lines needed since we are not creating a new segment?
      if (segment.messageSet.sizeInBytes == 0)
        segment.firstAppendTime = None

2. Log.markedDeletedWhile(): Is the following line needed since we are not creating a new segment?
          view(numToDelete - 1).firstAppendTime = None


                
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>              Labels: bugs
>         Attachments: kafka-596.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Swapnil Ghike (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swapnil Ghike updated KAFKA-596:
--------------------------------

    Attachment: kafka-596-v3.patch

After a discussion with Jun, reverted the conditions in maybeRoll() to the trunk version.

Setting firstAppendTime to None in LogSegment.truncateTo() when messageSet size becomes 0 is enough to make sure that Log.maybeRoll() will not roll a new segment at the same starting offset as the last segment.
                
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>              Labels: bugs
>         Attachments: kafka-596.patch, kafka-596-v2.patch, kafka-596-v3.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (KAFKA-596) LogSegment.firstAppendTime not reset after truncate to

Posted by "Swapnil Ghike (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swapnil Ghike reassigned KAFKA-596:
-----------------------------------

    Assignee: Swapnil Ghike
    
> LogSegment.firstAppendTime not reset after truncate to
> ------------------------------------------------------
>
>                 Key: KAFKA-596
>                 URL: https://issues.apache.org/jira/browse/KAFKA-596
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>              Labels: bugs
>         Attachments: kafka-596.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, we don't reset LogSegment.firstAppendTime after the segment is truncated. What can happen is that we truncate the segment to size 0 and on next append, a new log segment with the same starting offset is rolled because the time-based rolling is triggered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira