You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2015/08/19 11:57:45 UTC
[jira] [Commented] (CASSANDRA-10125) ReadFailure is thrown instead
of ReadTimeout for range queries
[ https://issues.apache.org/jira/browse/CASSANDRA-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702788#comment-14702788 ]
Sylvain Lebresne commented on CASSANDRA-10125:
----------------------------------------------
I think the simplest solution is probably to just re-introduce the use of {{Verb.RANGE_SLICE}} for range queries so we get the proper timeout. Pushed a patch [here|https://github.com/pcmanus/cassandra/commits/10125] to do so. It adds a small amount of cruft but that will mostly go away once we drop backward compatibility with pre-3.0 (and it's not a big deal in the first place). I'll wait on CI to finish to make sure that patch doesn't break anything before calling this ready for review.
> ReadFailure is thrown instead of ReadTimeout for range queries
> --------------------------------------------------------------
>
> Key: CASSANDRA-10125
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10125
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Fix For: 3.0 beta 2
>
>
> CASSANDRA-8099 merged the way single partition and range read messages where handled and has switch to using the same verb ({{Verb.READ}}) for both, effectively deprecating {{Verb.RANGE_SLICE}}. Unfortunately, we are relying on having 2 different verbs for timeouts. More precisely, when adding a callback in the expiring map of {{MessagingService}}, we use the timeout from the {{Verb}}. As a consequence, it's currently set with the single partition read timeout (5s) even for range queries (which have a 10s timeout). And when a callback expires, it is notified as a failure to the callback (which is debatable imo but a separate issue), which means range queries will generally send a ReadFailure (after 5s) instead of a ReadTimeout (since they do wait 10s before sending those).
> That is the reason for at least the failure of {{nosetests replace_address_test:TestReplaceAddress.replace_first_boot_test}} dtest (the test has 3 nodes, kill one and expects a timeout at CL.THREE but get a failure instead).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)