You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (JIRA)" <ji...@apache.org> on 2009/11/11 06:20:27 UTC

[jira] Created: (CASSANDRA-540) Commit log replays should be multithreaded

Commit log replays should be multithreaded
------------------------------------------

                 Key: CASSANDRA-540
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
             Project: Cassandra
          Issue Type: Improvement
         Environment: any
            Reporter: Brandon Williams
            Priority: Minor


Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-540:
----------------------------------------

    Assignee: Jonathan Ellis

> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-540:
-------------------------------------

    Attachment: 540.patch

fixed NPE (StageManager wasn't initialized in the test because that was being done by StorageService.  Moved it into SM.)

CDLatch isn't going to work here since we don't know the row count until we're done, and the first rows are already going to be finished by then.  CompletedTasks will be fine.

> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch, 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777743#action_12777743 ] 

Jonathan Ellis commented on CASSANDRA-540:
------------------------------------------

completed tasks always increments even if there is an error, but you are probably right that CDL is cleaner.

> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-540:
-------------------------------------

      Component/s: Core
    Fix Version/s: 0.5

> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777739#action_12777739 ] 

Stu Hood commented on CASSANDRA-540:
------------------------------------

Rather than sleeping until the number of items completed is larger than rows, you could initialize a http://java.sun.com/javase/6/docs/api/index.html?java/util/concurrent/CountDownLatch.html, and then block for completion.

Additionally, this approach will hang forever if one of the entries in the commitlog is corrupt, won't it?

> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777750#action_12777750 ] 

Jun Rao commented on CASSANDRA-540:
-----------------------------------

>From java doc, it seems that getCompletedTaskCount() only "Returns the approximate total number of tasks that have completed execution." In addition to what Stu suggested, you probably can also check the task queue size.

Also, a couple of unit tests failed.

[junit] Testcase: testWithFlush(org.apache.cassandra.db.RecoveryManager2Test):      Caused an ERROR
[junit] null
[junit] java.lang.NullPointerException
[junit]     at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:291)
[junit]     at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:65)
[junit]     at org.apache.cassandra.db.RecoveryManager2Test.testWithFlush(RecoveryManager2Test.java:54)
[junit]
[junit]
[junit] Test org.apache.cassandra.db.RecoveryManager2Test FAILED
[junit] Testsuite: org.apache.cassandra.db.RecoveryManagerTest
[junit] Tests run: 2, Failures: 0, Errors: 1, Time elapsed: 0.656 sec
[junit]
[junit] Testcase: testOne(org.apache.cassandra.db.RecoveryManagerTest):     Caused an ERROR
[junit] null
[junit] java.lang.NullPointerException
[junit]     at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:291)
[junit]     at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:65)
[junit]     at org.apache.cassandra.db.RecoveryManagerTest.testOne(RecoveryManagerTest.java:66)
[junit]
[junit]
[junit] Test org.apache.cassandra.db.RecoveryManagerTest FAILED


> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-540:
-------------------------------------

    Attachment: 540.patch

multithread row application

> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-540:
-------------------------------------

    Attachment: 540.patch

rebased

> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch, 540.patch, 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778408#action_12778408 ] 

Jun Rao commented on CASSANDRA-540:
-----------------------------------

The new patch looked fine to me.

> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch, 540.patch, 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777803#action_12777803 ] 

Jonathan Ellis commented on CASSANDRA-540:
------------------------------------------

Approximate just means "if you check it twice in a row it may change since it's multithreaded."  If you check the source it's clear that it increments exactly once for each task that completes.

> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-540) Commit log replays should be multithreaded

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778423#action_12778423 ] 

Stu Hood commented on CASSANDRA-540:
------------------------------------

1. I still think using getCompletedTasks and an assertion is hacky... rather than a CDL, you could increment an AtomicInteger?
2. What is with the precedent of registering the database stages in the StageManager? Is there a more appropriate place?

Other than that, looks fine.

> Commit log replays should be multithreaded
> ------------------------------------------
>
>                 Key: CASSANDRA-540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-540
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: any
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 540.patch, 540.patch, 540.patch
>
>
> Commit log replays are currently single threaded.  This makes log replay speed limited when restarting a cassandra node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.