You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Christian Spriegel (JIRA)" <ji...@apache.org> on 2015/01/02 16:22:14 UTC

[jira] [Comment Edited] (CASSANDRA-7886) Coordinator should not wait for read timeouts when replicas hit Exceptions

    [ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262943#comment-14262943 ] 

Christian Spriegel edited comment on CASSANDRA-7886 at 1/2/15 3:22 PM:
-----------------------------------------------------------------------

Hi [~thobbs],

uploaded new patch: V6

Here is what I did:
- Fixed logging of TOEs...
-- ... in StorageProxy for local reads
-- ... in MessageDeliveryTask for remote reads
- Added partitionKey(as DecoratedKey) and lastCellName logging to TOE.
- Changed SliceQueryFilter not to throw TOEs Exception for System-keyspace. Cassandra does not seem to like TOEs in system queries. These TOEs will always be logged as warnings instead.


This is how TOEs look like in system.log:
{code}
ERROR [SharedPool-Worker-1] 2015-01-02 15:07:24,878 MessageDeliveryTask.java:81 - Scanned over 201 tombstones in test.test; 100 columns were requested; query aborted (see tombstone_failure_threshold); partitionKey=DecoratedKey(78703492656118554854272571946195123045, 31); lastCell=188; delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}; slices=[-]
{code}

kind regards,
Christian



was (Author: christianmovi):
Hi [~thobbs],

uploaded new patch: V6

Here is what I did:
- Fixed logging of TOEs...
-- ... in StorageProxy for local reads
-- ... in MessageDeliveryTask for remote reads
- Added partitionKey(as DecoratedKey) and lastCellName logging to TOE.
- Changed SliceQueryFilter not to throw TOEs Exception for System-keyspace. Cassandra does not seem to like TOEs in system queries. These TOEs will always be logged as warnings instead.


This is how TOEs look like in system.log:
{quote}
ERROR [SharedPool-Worker-1] 2015-01-02 15:07:24,878 MessageDeliveryTask.java:81 - Scanned over 201 tombstones in test.test; 100 columns were requested; query aborted (see tombstone_failure_threshold); partitionKey=DecoratedKey(78703492656118554854272571946195123045, 31); lastCell=188; delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}; slices=[-]
{quote}

kind regards,
Christian


> Coordinator should not wait for read timeouts when replicas hit Exceptions
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Assignee: Christian Spriegel
>            Priority: Minor
>              Labels: protocolv4
>             Fix For: 3.0
>
>         Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt, 7886_v4_trunk.txt, 7886_v5_trunk.txt, 7886_v6_trunk.txt
>
>
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will cause the query to be simply dropped on every data-node, but no response is sent back to the coordinator. Instead the coordinator waits for the specified read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application is waiting for the timeout interval for every request.Therefore, if our application runs into TombstoneOverwhelmingExceptions, then (sooner or later) our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when they run into a TombstoneOverwhelmingException. Then the coordinator does not have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)