You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2015/08/19 11:57:45 UTC

[jira] [Commented] (CASSANDRA-10125) ReadFailure is thrown instead of ReadTimeout for range queries

    [ https://issues.apache.org/jira/browse/CASSANDRA-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702788#comment-14702788 ] 

Sylvain Lebresne commented on CASSANDRA-10125:
----------------------------------------------

I think the simplest solution is probably to just re-introduce the use of {{Verb.RANGE_SLICE}} for range queries so we get the proper timeout. Pushed a patch [here|https://github.com/pcmanus/cassandra/commits/10125] to do so. It adds a small amount of cruft but that will mostly go away once we drop backward compatibility with pre-3.0 (and it's not a big deal in the first place). I'll wait on CI to finish to make sure that patch doesn't break anything before calling this ready for review.

> ReadFailure is thrown instead of ReadTimeout for range queries
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-10125
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10125
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0 beta 2
>
>
> CASSANDRA-8099 merged the way single partition and range read messages where handled and has switch to using the same verb ({{Verb.READ}}) for both, effectively deprecating {{Verb.RANGE_SLICE}}. Unfortunately, we are relying on having 2 different verbs for timeouts. More precisely, when adding a callback in the expiring map of {{MessagingService}}, we use the timeout from the {{Verb}}. As a consequence, it's currently set with the single partition read timeout (5s) even for range queries (which have a 10s timeout).  And when a callback expires, it is notified as a failure to the callback (which is debatable imo but a separate issue), which means range queries will generally send a ReadFailure (after 5s) instead of a ReadTimeout (since they do wait 10s before sending those).
> That is the reason for at least the failure of {{nosetests replace_address_test:TestReplaceAddress.replace_first_boot_test}} dtest (the test has 3 nodes, kill one and expects a timeout at CL.THREE but get a failure instead).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)