You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2012/08/15 17:16:37 UTC

[jira] [Created] (KAFKA-463) log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log

Jun Rao created KAFKA-463:
-----------------------------

             Summary: log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
                 Key: KAFKA-463
                 URL: https://issues.apache.org/jira/browse/KAFKA-463
             Project: Kafka
          Issue Type: Bug
          Components: core
            Reporter: Jun Rao
             Fix For: 0.8


When this happens, we should truncate all existing log segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-463) log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454947#comment-13454947 ] 

Jun Rao commented on KAFKA-463:
-------------------------------

Thanks for the patch. A few comments:

1. Log.truncateTo(): 
1.1 Should synchronize on the lock since we are touching the segment files.
1.2 It doesn't seem that we need to call segments.truncLast(truncatedSegmentIndex) since we are deleting those segments whose start is larger than targetOffset later. If so, we can get rid of of SegmentList.truncLast and the associated unit test. Also, it seems that the current logic will fail to delete some segment files (only the in memory segment list is updated) if there the startOffset is in the middle of a list of segments. 
1.3 The following piece of code is duplicated in 3 places: creating the first segment with offset 0, maybe roll and runcateTo(). Can we create a separate function (something like rollSegement(startOffset)) to share the code?
      val newFile = new File(dir, nameFromOffset(targetOffset))
      if (newFile.exists) {
        warn("newly rolled logsegment " + newFile.getName + " already exists; deleting it first")
        newFile.delete()
      }
      debug("Rolling log '" + name + "' to " + newFile.getName())
      segments.append(new LogSegment(newFile, new FileMessageSet(newFile, true), targetOffset, time))
1.4 Can we add a unit test to verify that if targetOffset is smaller that the smallest offset in the log, after tuncateTo is called, log size is 0 and logEndOffset is targetOffset?
1.5 The following message should say this is failure during truncateTo.
      error("Failed to delete some segments during log recovery")

2. FileMessageSet.truncateTo() : Condition     if(targetSize >= sizeInBytes())  should be changed to     if(targetSize > sizeInBytes())

                
> log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-463
>                 URL: https://issues.apache.org/jira/browse/KAFKA-463
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-463-v1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When this happens, we should truncate all existing log segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-463) log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log

Posted by "Swapnil Ghike (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swapnil Ghike updated KAFKA-463:
--------------------------------

    Attachment: kafka-463-v2.patch

1. Made the changes 1.1, 1.3, 1.5, 2 as recommended. 

2. Made an appropriate change for 1.2. If all segments are deleted then truncate entire segments.view and roll a new segment from that targetOffset, otherwise check for appropriate segment to truncate and change the last segment of segments.view or throw error if targetOffset is out of range. 

3. A unit test that exercises all cases for truncateTo.

Perhaps not super-crucial, but here it goes anyways:

i. Added a check for contents.get().size > 0 in while loop in  trunc and truncLast, otherwise it can throw illegalArgumentException in Array.copy for certain cases even when the passed argument is ok. Eg in trunc when newStart == 0 and curr.length == 0. There is most probably no danger of changing the value of contents between the while condition check and curr = contents.get(), because any calls to trunc and truncLast are synchronized with the same lock, thus there should be no other thread executing between while() and curr = contents.get().

ii. truncLast does not throw exception for argument==-1, but throws exception for anything less. Probably it should behave uniformly for all negative integers. Made a change so that truncLast(-ve no) will wipe out the entire segments.view. This is also similar to what trunc does when the argument is greater than (segments.view.length-1).

iii. Changed the condition for throwing exception in truncLast to > because an index which is equal to (segments.view.length-1) should not throw exception.

iv. Added max/min condition to avoid IllegalArgumentException in truncLast.

v. The unit test for truncLast did not check for all cases properly, fixed it.

vi. Combined the two different test cases for trunc by imitating the structure of unit test for truncLast.

A question:

Since all calls to SegmentList.trunc, truncLast and append (which are the only ways to modify contents) seem to be locked, so only one thread will be able to modify SegmentList.contents at any time. Do we really need a while(compareAndSet) for contents? I could be wrong, pls correct me whenever you have free time.
                
> log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-463
>                 URL: https://issues.apache.org/jira/browse/KAFKA-463
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-463-v1.patch, kafka-463-v2.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When this happens, we should truncate all existing log segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-463) log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log

Posted by "Swapnil Ghike (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swapnil Ghike updated KAFKA-463:
--------------------------------

    Attachment: kafka-463-v1.patch

Truncates all existing log segments if targetOffset is smaller than the lowest offset in the log, creates and appends a new log segment at targetOffset.
                
> log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-463
>                 URL: https://issues.apache.org/jira/browse/KAFKA-463
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-463-v1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When this happens, we should truncate all existing log segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-463) log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455942#comment-13455942 ] 

Jun Rao commented on KAFKA-463:
-------------------------------

Thanks for patch v2. Some more comments:

20. Log:
20.1 rollSegment can be used for rolling the very first segment with offset 0 in the constructor too. Also, add a new line after rollSegment()
20.2 The following log message is bit confusing since Log shouldn't understand hw. Let's changed it to something more generic.
              error("Last checkpointed hw %d cannot be greater than the latest message offset %d in the log %s".
                format(targetOffset, segments.view.last.absoluteEndOffset, segments.view.last.file.getAbsolutePath))

21. SegmentList: 
21.1 The comment of truncLast is wrong. It's truncating everything from newOffset+1 to the end.
21.2 truncLast(): It's better to make sure that newOffset is >=0. Then we don't need the code max(newEnd + 1, 0).
21.3 It is true that callers of trunc and truncLast are already sync-ed on log.lock. All we need is to create a memory boundary so that the updated reference of  contents are exposed to other threads. AtomicReference guarantees this. However, we could just use a volatile var. We can probably leave AtomicReference for now since the overhead shouldn't be too much. However, we probably don't need to use compareAndSet in trunc() and truncLast(). We can just use set and get rid of the loop.


 
                
> log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-463
>                 URL: https://issues.apache.org/jira/browse/KAFKA-463
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-463-v1.patch, kafka-463-v2.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When this happens, we should truncate all existing log segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (KAFKA-463) log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log

Posted by "Swapnil Ghike (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swapnil Ghike reassigned KAFKA-463:
-----------------------------------

    Assignee: Swapnil Ghike
    
> log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-463
>                 URL: https://issues.apache.org/jira/browse/KAFKA-463
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When this happens, we should truncate all existing log segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (KAFKA-463) log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao closed KAFKA-463.
-------------------------

    
> log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-463
>                 URL: https://issues.apache.org/jira/browse/KAFKA-463
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-463-v1.patch, kafka-463-v2.patch, kafka-463-v3.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When this happens, we should truncate all existing log segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (KAFKA-463) log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao resolved KAFKA-463.
---------------------------

    Resolution: Fixed

Thanks for patch v3. Committed to 0.8 with a minor fix: use contents.get().length instead of contents.get().size in SegmetList.truncLast to be consistent with the rest of the code.
                
> log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-463
>                 URL: https://issues.apache.org/jira/browse/KAFKA-463
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-463-v1.patch, kafka-463-v2.patch, kafka-463-v3.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When this happens, we should truncate all existing log segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-463) log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-463:
-----------------------------

    Priority: Blocker  (was: Major)
    
> log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-463
>                 URL: https://issues.apache.org/jira/browse/KAFKA-463
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When this happens, we should truncate all existing log segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-463) log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log

Posted by "Swapnil Ghike (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swapnil Ghike updated KAFKA-463:
--------------------------------

    Attachment: kafka-463-v3.patch

20.1. Letting the constructor be as it is as discussed. 
20.2. Made the changes.

21.1 Oops, had forgotten to update it, thanks.
21.2 Changed.
21.3 Removed compareAndSet loops. AtomicReference is unchanged.
                
> log.truncateTo needs to handle targetOffset smaller than the lowest offset in the log
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-463
>                 URL: https://issues.apache.org/jira/browse/KAFKA-463
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jun Rao
>            Assignee: Swapnil Ghike
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: kafka-463-v1.patch, kafka-463-v2.patch, kafka-463-v3.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When this happens, we should truncate all existing log segments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira