You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/02/04 06:30:22 UTC

[jira] Created: (CASSANDRA-2109) Improve default window size for DES

Improve default window size for DES
-----------------------------------

                 Key: CASSANDRA-2109
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2109
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Stu Hood
            Priority: Minor
             Fix For: 0.8


The window size for DES is currently hardcoded at 100 requests. A larger window means that it takes longer to react to a suddenly slow node, but that you have a smoother transition for scores.

An example of bad behaviour: with a window of size 100, we saw a case with a failing node where if enough requests could be answered quickly out of cache or bloomfilters, the window might be momentarily filled with 10 ms requests, pushing out requests that had to go disk and took 10 seconds.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2109) Improve default window size for DES

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990714#comment-12990714 ] 

Stu Hood commented on CASSANDRA-2109:
-------------------------------------

> We should most likely catch the IOError and throw a special error to client
I think this is the case where a client thread decides to do a local read, which will perform the read in the client thread, rather than in the stages. Honestly, I'd prefer to remove this special casing, rather than letting the crash-onliness leak into the client threads and then needing to wrap it.

Opened CASSANDRA-2110

> Improve default window size for DES
> -----------------------------------
>
>                 Key: CASSANDRA-2109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2109
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Minor
>              Labels: des
>             Fix For: 0.8
>
>
> The window size for DES is currently hardcoded at 100 requests. A larger window means that it takes longer to react to a suddenly slow node, but that you have a smoother transition for scores.
> An example of bad behaviour: with a window of size 100, we saw a case with a failing node where if enough requests could be answered quickly out of cache or bloomfilters, the window might be momentarily filled with 10 ms requests, pushing out requests that had to go disk and took 10 seconds.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2109) Improve default window size for DES

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990458#comment-12990458 ] 

Stu Hood commented on CASSANDRA-2109:
-------------------------------------

I'm not prescribing a solution here, but I'd be fine with a hardcoded window size that better minimized the chances of the above case.

> Improve default window size for DES
> -----------------------------------
>
>                 Key: CASSANDRA-2109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2109
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Minor
>              Labels: des
>             Fix For: 0.8
>
>
> The window size for DES is currently hardcoded at 100 requests. A larger window means that it takes longer to react to a suddenly slow node, but that you have a smoother transition for scores.
> An example of bad behaviour: with a window of size 100, we saw a case with a failing node where if enough requests could be answered quickly out of cache or bloomfilters, the window might be momentarily filled with 10 ms requests, pushing out requests that had to go disk and took 10 seconds.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-2109) Improve default window size for DES

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams resolved CASSANDRA-2109.
-----------------------------------------

       Resolution: Duplicate
    Fix Version/s:     (was: 1.3)
                   1.2.0

Resolved by CASSANDRA-4038
                
> Improve default window size for DES
> -----------------------------------
>
>                 Key: CASSANDRA-2109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2109
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Minor
>              Labels: des
>             Fix For: 1.2.0
>
>
> The window size for DES is currently hardcoded at 100 requests. A larger window means that it takes longer to react to a suddenly slow node, but that you have a smoother transition for scores.
> An example of bad behaviour: with a window of size 100, we saw a case with a failing node where if enough requests could be answered quickly out of cache or bloomfilters, the window might be momentarily filled with 10 ms requests, pushing out requests that had to go disk and took 10 seconds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2109) Improve default window size for DES

Posted by "Ryan King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan King updated CASSANDRA-2109:
---------------------------------

    Fix Version/s:     (was: 0.8)
                   1.0

> Improve default window size for DES
> -----------------------------------
>
>                 Key: CASSANDRA-2109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2109
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Minor
>              Labels: des
>             Fix For: 1.0
>
>
> The window size for DES is currently hardcoded at 100 requests. A larger window means that it takes longer to react to a suddenly slow node, but that you have a smoother transition for scores.
> An example of bad behaviour: with a window of size 100, we saw a case with a failing node where if enough requests could be answered quickly out of cache or bloomfilters, the window might be momentarily filled with 10 ms requests, pushing out requests that had to go disk and took 10 seconds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2109) Improve default window size for DES

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990658#comment-12990658 ] 

Chris Goffinet commented on CASSANDRA-2109:
-------------------------------------------

Thoughts about this:

Maybe a histogram? A few scenarios could happen:

1) Bloom Filter Misses
2) Row Caches
3) Data in page cache returning back quickly

We've seen disk failures jump into two scenarios: response timing out because the disk just never returned, and fast fail. We account for the first scenario but not the fast fail cases. When the fast fail case happens, it throws an IOError on the bad node immediately, and the expired map kicks in on the coordinator eventually for adjusting scores. If we do nothing on the bad node, we make the assumptions people have smart clients (which I hope they do) to remove the bad node from the list after enough timeouts. We should most likely catch the IOError and throw a special error to client so he knows the node is Unavailable so the smart client can make a decision. Else he will just get the generic error or timeout.

I am a little inclined to say if a node is seeing a series of IOErrors locally, it should put itself into a failed state and stop accepting traffic. That might be a little fearful for some though. Thoughts?

> Improve default window size for DES
> -----------------------------------
>
>                 Key: CASSANDRA-2109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2109
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Minor
>              Labels: des
>             Fix For: 0.8
>
>
> The window size for DES is currently hardcoded at 100 requests. A larger window means that it takes longer to react to a suddenly slow node, but that you have a smoother transition for scores.
> An example of bad behaviour: with a window of size 100, we saw a case with a failing node where if enough requests could be answered quickly out of cache or bloomfilters, the window might be momentarily filled with 10 ms requests, pushing out requests that had to go disk and took 10 seconds.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira