You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Aaron Morton (JIRA)" <ji...@apache.org> on 2012/09/03 00:39:07 UTC

[jira] [Created] (CASSANDRA-4601) Ensure unique commit log file names

Aaron Morton created CASSANDRA-4601:
---------------------------------------

             Summary: Ensure unique commit log file names
                 Key: CASSANDRA-4601
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4601
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: Sun JVM 1.6.33 / Ubuntu 10.04.4 LTS 
            Reporter: Aaron Morton
            Assignee: Aaron Morton
            Priority: Critical


The commit log segment name uses System.nanoTime() as part of the file name. There is no guarantee that successive calls to nanoTime() will return different values. And on less than optimal hypervisors this happens a lot. 

I observed the following in the wild:

{code:java}
ERROR [COMMIT-LOG-ALLOCATOR] 2012-08-31 15:56:49,815 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
java.lang.AssertionError: attempted to delete non-existing file CommitLog-13926764209796414.log
        at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
        at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
        at org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
        at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at java.lang.Thread.run(Unknown Source)
{code}

My _assumption_ is that it was because of duplicate file names. As this is on a hypervisor that is less than optimal. 
 
After a while (about 30 minutes) mutations stopped being processed and the pending count sky rocketed. I _think_ this was because log writing was blocked trying to get a new segment and writers could not submit to the commit log queue. The only way to stop the affected nodes was kill -9. 

Over about 24 hours this happened 5 times. I have deployed a patch that has been running for 12 hours without incident, will attach. 

The affected nodes could still read, and I'm checking logs to see how the other nodes handled the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4601) Ensure unique commit log file names

Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448022#comment-13448022 ] 

Aaron Morton commented on CASSANDRA-4601:
-----------------------------------------

Thanks
                
> Ensure unique commit log file names
> -----------------------------------
>
>                 Key: CASSANDRA-4601
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4601
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: Sun JVM 1.6.33 / Ubuntu 10.04.4 LTS 
>            Reporter: Aaron Morton
>            Assignee: Aaron Morton
>            Priority: Critical
>             Fix For: 1.1.5
>
>         Attachments: cassandra-1.1-4601.patch
>
>
> The commit log segment name uses System.nanoTime() as part of the file name. There is no guarantee that successive calls to nanoTime() will return different values. And on less than optimal hypervisors this happens a lot. 
> I observed the following in the wild:
> {code:java}
> ERROR [COMMIT-LOG-ALLOCATOR] 2012-08-31 15:56:49,815 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
> java.lang.AssertionError: attempted to delete non-existing file CommitLog-13926764209796414.log
>         at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
>         at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
>         at org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
>         at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.lang.Thread.run(Unknown Source)
> {code}
> My _assumption_ is that it was because of duplicate file names. As this is on a hypervisor that is less than optimal. 
>  
> After a while (about 30 minutes) mutations stopped being processed and the pending count sky rocketed. I _think_ this was because log writing was blocked trying to get a new segment and writers could not submit to the commit log queue. The only way to stop the affected nodes was kill -9. 
> Over about 24 hours this happened 5 times. I have deployed a patch that has been running for 12 hours without incident, will attach. 
> The affected nodes could still read, and I'm checking logs to see how the other nodes handled the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4601) Ensure unique commit log file names

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447085#comment-13447085 ] 

Jonathan Ellis commented on CASSANDRA-4601:
-------------------------------------------

LGTM overall.

- I'd go ahead and use time-in-millis as base instead of nanotime
- Looks like this diff is reversed from what it "should" be?  Old lines are +, new are - instead of the inverse.
- patch 1.0 as well?
                
> Ensure unique commit log file names
> -----------------------------------
>
>                 Key: CASSANDRA-4601
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4601
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.10, 1.0.11, 1.1.4
>         Environment: Sun JVM 1.6.33 / Ubuntu 10.04.4 LTS 
>            Reporter: Aaron Morton
>            Assignee: Aaron Morton
>            Priority: Critical
>         Attachments: cassandra-1.1-4601.patch
>
>
> The commit log segment name uses System.nanoTime() as part of the file name. There is no guarantee that successive calls to nanoTime() will return different values. And on less than optimal hypervisors this happens a lot. 
> I observed the following in the wild:
> {code:java}
> ERROR [COMMIT-LOG-ALLOCATOR] 2012-08-31 15:56:49,815 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
> java.lang.AssertionError: attempted to delete non-existing file CommitLog-13926764209796414.log
>         at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
>         at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
>         at org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
>         at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.lang.Thread.run(Unknown Source)
> {code}
> My _assumption_ is that it was because of duplicate file names. As this is on a hypervisor that is less than optimal. 
>  
> After a while (about 30 minutes) mutations stopped being processed and the pending count sky rocketed. I _think_ this was because log writing was blocked trying to get a new segment and writers could not submit to the commit log queue. The only way to stop the affected nodes was kill -9. 
> Over about 24 hours this happened 5 times. I have deployed a patch that has been running for 12 hours without incident, will attach. 
> The affected nodes could still read, and I'm checking logs to see how the other nodes handled the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4601) Ensure unique commit log file names

Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Morton updated CASSANDRA-4601:
------------------------------------

    Attachment: cassandra-1.1-4601.patch
    
> Ensure unique commit log file names
> -----------------------------------
>
>                 Key: CASSANDRA-4601
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4601
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.10, 1.0.11, 1.1.4
>         Environment: Sun JVM 1.6.33 / Ubuntu 10.04.4 LTS 
>            Reporter: Aaron Morton
>            Assignee: Aaron Morton
>            Priority: Critical
>         Attachments: cassandra-1.1-4601.patch
>
>
> The commit log segment name uses System.nanoTime() as part of the file name. There is no guarantee that successive calls to nanoTime() will return different values. And on less than optimal hypervisors this happens a lot. 
> I observed the following in the wild:
> {code:java}
> ERROR [COMMIT-LOG-ALLOCATOR] 2012-08-31 15:56:49,815 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
> java.lang.AssertionError: attempted to delete non-existing file CommitLog-13926764209796414.log
>         at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
>         at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
>         at org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
>         at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.lang.Thread.run(Unknown Source)
> {code}
> My _assumption_ is that it was because of duplicate file names. As this is on a hypervisor that is less than optimal. 
>  
> After a while (about 30 minutes) mutations stopped being processed and the pending count sky rocketed. I _think_ this was because log writing was blocked trying to get a new segment and writers could not submit to the commit log queue. The only way to stop the affected nodes was kill -9. 
> Over about 24 hours this happened 5 times. I have deployed a patch that has been running for 12 hours without incident, will attach. 
> The affected nodes could still read, and I'm checking logs to see how the other nodes handled the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4601) Ensure unique commit log file names

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-4601:
--------------------------------------

             Reviewer: jbellis
    Affects Version/s:     (was: 1.1.4)
                           (was: 1.0.11)
                           (was: 0.8.10)
                       1.1.0
        Fix Version/s: 1.1.5

This actually doesn't affect < 1.1.0; it was introduced by CASSANDRA-3544.
                
> Ensure unique commit log file names
> -----------------------------------
>
>                 Key: CASSANDRA-4601
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4601
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: Sun JVM 1.6.33 / Ubuntu 10.04.4 LTS 
>            Reporter: Aaron Morton
>            Assignee: Aaron Morton
>            Priority: Critical
>             Fix For: 1.1.5
>
>         Attachments: cassandra-1.1-4601.patch
>
>
> The commit log segment name uses System.nanoTime() as part of the file name. There is no guarantee that successive calls to nanoTime() will return different values. And on less than optimal hypervisors this happens a lot. 
> I observed the following in the wild:
> {code:java}
> ERROR [COMMIT-LOG-ALLOCATOR] 2012-08-31 15:56:49,815 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
> java.lang.AssertionError: attempted to delete non-existing file CommitLog-13926764209796414.log
>         at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
>         at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
>         at org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
>         at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.lang.Thread.run(Unknown Source)
> {code}
> My _assumption_ is that it was because of duplicate file names. As this is on a hypervisor that is less than optimal. 
>  
> After a while (about 30 minutes) mutations stopped being processed and the pending count sky rocketed. I _think_ this was because log writing was blocked trying to get a new segment and writers could not submit to the commit log queue. The only way to stop the affected nodes was kill -9. 
> Over about 24 hours this happened 5 times. I have deployed a patch that has been running for 12 hours without incident, will attach. 
> The affected nodes could still read, and I'm checking logs to see how the other nodes handled the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4601) Ensure unique commit log file names

Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Morton updated CASSANDRA-4601:
------------------------------------

    Affects Version/s: 0.8.10
                       1.0.11
                       1.1.4
    
> Ensure unique commit log file names
> -----------------------------------
>
>                 Key: CASSANDRA-4601
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4601
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.10, 1.0.11, 1.1.4
>         Environment: Sun JVM 1.6.33 / Ubuntu 10.04.4 LTS 
>            Reporter: Aaron Morton
>            Assignee: Aaron Morton
>            Priority: Critical
>
> The commit log segment name uses System.nanoTime() as part of the file name. There is no guarantee that successive calls to nanoTime() will return different values. And on less than optimal hypervisors this happens a lot. 
> I observed the following in the wild:
> {code:java}
> ERROR [COMMIT-LOG-ALLOCATOR] 2012-08-31 15:56:49,815 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
> java.lang.AssertionError: attempted to delete non-existing file CommitLog-13926764209796414.log
>         at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
>         at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
>         at org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
>         at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.lang.Thread.run(Unknown Source)
> {code}
> My _assumption_ is that it was because of duplicate file names. As this is on a hypervisor that is less than optimal. 
>  
> After a while (about 30 minutes) mutations stopped being processed and the pending count sky rocketed. I _think_ this was because log writing was blocked trying to get a new segment and writers could not submit to the commit log queue. The only way to stop the affected nodes was kill -9. 
> Over about 24 hours this happened 5 times. I have deployed a patch that has been running for 12 hours without incident, will attach. 
> The affected nodes could still read, and I'm checking logs to see how the other nodes handled the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira