You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/11/09 00:08:10 UTC

[jira] Created: (CASSANDRA-1719) improve read repair

improve read repair
-------------------

                 Key: CASSANDRA-1719
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Jonathan Ellis
            Assignee: Jonathan Ellis
            Priority: Minor
             Fix For: 0.7.0


Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1719) improve read repair

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930346#action_12930346 ] 

Brandon Williams edited comment on CASSANDRA-1719 at 11/9/10 6:18 PM:
----------------------------------------------------------------------

I get odd behavior testing this patch.  Three node cluster, rf=2, HH disabled.  I start them up, take one down after the others see it, then insert data.  If I read it back with the node down, that works, but when I bring it up all kinds of keys are missing (even though I don't read from it directly.)  If I take it back down, everything works again.

      was (Author: brandon.williams):
    I get odd behavior testing this patch.  Three node cluster, rf=2.  I start them up, take one down after the others see it, then insert data.  If I read it back with the node down, that works, but when I bring it up all kinds of keys are missing (even though I don't read from it directly.)  If I take it back down, everything works again.
  
> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1719) improve read repair

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930756#action_12930756 ] 

Brandon Williams commented on CASSANDRA-1719:
---------------------------------------------

Appears to not work in 0.6.7 either.  I got this stacktrace on the node that was missing data:


ERROR 20:14:11,591 Uncaught exception in thread Thread[CACHETABLE-TIMER-3,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:186)
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:141)
        at org.apache.cassandra.utils.ExpiringMap$CacheMonitor.run(ExpiringMap.java:105)
        at java.util.TimerThread.mainLoop(Timer.java:534)
        at java.util.TimerThread.run(Timer.java:484)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.rangeCheck(ArrayList.java:571)
        at java.util.ArrayList.get(ArrayList.java:349)
        at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:131)
        at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:45)
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:182)
        ... 4 more


> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1719) improve read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930660#action_12930660 ] 

Jonathan Ellis commented on CASSANDRA-1719:
-------------------------------------------

bq. when I bring it up all kinds of keys are missing

isn't that just "it was the closest replica, but it didn't have the keys in question since it was down?"  or do you mean they don't RR?

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1719) improve read repair

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930657#action_12930657 ] 

Brandon Williams commented on CASSANDRA-1719:
---------------------------------------------

Actually, I do, so it's not this patch, but it makes testing this one difficult.

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1719) improve read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930672#action_12930672 ] 

Jonathan Ellis commented on CASSANDRA-1719:
-------------------------------------------

Does RR work in 0.6.7?

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1719) improve read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1719:
--------------------------------------

    Attachment: 1719.txt

rebased (against 0.6)

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt, 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1719) improve read repair

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930756#action_12930756 ] 

Brandon Williams edited comment on CASSANDRA-1719 at 11/10/10 3:24 PM:
-----------------------------------------------------------------------

Appears to not work in 0.6.7 either.  I got this stacktrace on two nodes:


ERROR 20:14:11,591 Uncaught exception in thread Thread[CACHETABLE-TIMER-3,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:186)
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:141)
        at org.apache.cassandra.utils.ExpiringMap$CacheMonitor.run(ExpiringMap.java:105)
        at java.util.TimerThread.mainLoop(Timer.java:534)
        at java.util.TimerThread.run(Timer.java:484)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.rangeCheck(ArrayList.java:571)
        at java.util.ArrayList.get(ArrayList.java:349)
        at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:131)
        at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:45)
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:182)
        ... 4 more

And this slightly different one on the third:


ERROR 20:17:08,688 Uncaught exception in thread Thread[CACHETABLE-TIMER-8,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:186)
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:141)
        at org.apache.cassandra.utils.ExpiringMap$CacheMonitor.run(ExpiringMap.java:105)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
        at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:131)
        at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:45)
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:182)
        ... 4 more


      was (Author: brandon.williams):
    Appears to not work in 0.6.7 either.  I got this stacktrace on the node that was missing data:


ERROR 20:14:11,591 Uncaught exception in thread Thread[CACHETABLE-TIMER-3,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:186)
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:141)
        at org.apache.cassandra.utils.ExpiringMap$CacheMonitor.run(ExpiringMap.java:105)
        at java.util.TimerThread.mainLoop(Timer.java:534)
        at java.util.TimerThread.run(Timer.java:484)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.rangeCheck(ArrayList.java:571)
        at java.util.ArrayList.get(ArrayList.java:349)
        at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:131)
        at org.apache.cassandra.service.ReadResponseResolver.resolve(ReadResponseResolver.java:45)
        at org.apache.cassandra.service.ConsistencyChecker$DataRepairHandler.callMe(ConsistencyChecker.java:182)
        ... 4 more

  
> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1719) improve read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930613#action_12930613 ] 

Jonathan Ellis commented on CASSANDRA-1719:
-------------------------------------------

Just to rule out the obvious: you don't see this behavior, w/o this patch?

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1719) improve read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1719:
--------------------------------------

    Fix Version/s: 0.6.8

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1719) improve read repair

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931145#action_12931145 ] 

Brandon Williams commented on CASSANDRA-1719:
---------------------------------------------

+1

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt, 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1719) improve read repair

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930661#action_12930661 ] 

Brandon Williams commented on CASSANDRA-1719:
---------------------------------------------

They never RR.

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1719) improve read repair

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-1719:
----------------------------------------

    Comment: was deleted

(was: I just confirmed RR does not work in any release after 0.6.2 :()

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1719) improve read repair

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930346#action_12930346 ] 

Brandon Williams commented on CASSANDRA-1719:
---------------------------------------------

I get odd behavior testing this patch.  Three node cluster, rf=2.  I start them up, take one down after the others see it, then insert data.  If I read it back with the node down, that works, but when I bring it up all kinds of keys are missing (even though I don't read from it directly.)  If I take it back down, everything works again.

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1719) improve read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1719:
--------------------------------------

    Attachment: 1719.txt

Fixes digest inefficiency, adds class-level docstring, and adds assert that it's being invoked from one of the replicas instead of including buggy code to pretend to handle when it's not.

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CASSANDRA-1719) improve read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-1719.
---------------------------------------

    Resolution: Fixed
      Reviewer: brandon.williams

committed

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt, 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1719) improve read repair

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930804#action_12930804 ] 

Brandon Williams commented on CASSANDRA-1719:
---------------------------------------------

I just confirmed RR does not work in any release after 0.6.2 :(

> improve read repair
> -------------------
>
>                 Key: CASSANDRA-1719
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1719
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6.8, 0.7.0
>
>         Attachments: 1719.txt
>
>
> Read repair recomputes the local digest for each replica, which is a lot of wasted CPU on large reads

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.