You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Vladimir Yudovin <vl...@winguzone.com> on 2017/07/14 03:53:31 UTC

High CPU after read timeout

Hi,



Cassandra 3.9, I found after some ALLOW FILTERING request running on huge partition fails with Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) nodes continue to consume CPU in ReadStage-N threads, as if they still performing search despite failed request and even disconnected client.



 Is it something known or probably it's worth JIRA filling?





Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting

Re: High CPU after read timeout

Posted by Vladimir Yudovin <vl...@winguzone.com>.

I've created JIRA https://issues.apache.org/jira/browse/CASSANDRA-13695



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






---- On Fri, 14 Jul 2017 07:23:57 -0400 Vladimir Yudovin &lt;vladyu@winguzone.com&gt; wrote ----




&amp;gt; If a client disconnects from a coordinator there is also no way for the replicas to know that the client was disconnected. 

 

Got it. 

 

 

 

&amp;gt; Theres internal mechanisms that don't really have a concept of atimeout and where we would want it to never timeout 

 

Can such timeout be passed to executing thread? For read requests it can be taken from xxx_equest_timeout_in_ms parameters. 

 

Because now one bad SELECT can put nodes in high load for very long time, and actually paralyze cluster in certain situations. 

 

 

 

 

 

Best regards, Vladimir Yudovin, 

 

Winguzone - Cloud Cassandra Hosting 

 

 

 

 

 

 

---- On Fri, 14 Jul 2017 00:57:14 -0400 Chris Lohfink &amp;lt;clohfink85@gmail.com&amp;gt; wrote ---- 

 

 

 

 

There is no mechanism for reads to timeout once they have started. The 

 

messaging service will drop the request when its received on the ReadStage 

 

or RequestResponseStage. This is how its always operated so not unique to 

 

3.9. If a client disconnects from a coordinator there is also no way for 

 

the replicas who received a read request from the coordinator to know that 

 

the client was disconnected. 

 

 

 

Would be an interesting JIRA but as a note, it will likely not be a quick 

 

fix. Theres internal mechanisms that don't really have a concept of a 

 

timeout and where we would want it to never timeout (ie a compaction, 

 

reading system tables to fill meta data, repairs etc) and currently theres 

 

no way of differentiating between them. 

 

 

 

Chris 

 

 

 

On Thu, Jul 13, 2017 at 10:53 PM, Vladimir Yudovin &amp;lt;vladyu@winguzone.com&amp;gt; 

 

wrote: 

 

 

 

&amp;gt; Hi, 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; Cassandra 3.9, I found after some ALLOW FILTERING request running on huge 

 

&amp;gt; partition fails with Cassandra timeout during read query at consistency ONE 

 

&amp;gt; (1 responses were required but only 0 replica responded) nodes continue to 

 

&amp;gt; consume CPU in ReadStage-N threads, as if they still performing search 

 

&amp;gt; despite failed request and even disconnected client. 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; Is it something known or probably it's worth JIRA filling? 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; Best regards, Vladimir Yudovin, 

 

&amp;gt; 

 

&amp;gt; Winguzone - Cloud Cassandra Hosting 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt; 

 

&amp;gt;

Re: High CPU after read timeout

Posted by Vladimir Yudovin <vl...@winguzone.com>.

&gt; If a client disconnects from a coordinator there is also no way for the replicas to know that the client was disconnected.

Got it.



&gt; Theres internal mechanisms that don't really have a concept of atimeout and where we would want it to never timeout

Can such timeout be passed to executing thread? For read requests it can be taken from xxx_equest_timeout_in_ms parameters.

Because now one bad SELECT can put nodes in high load for very long time, and actually paralyze cluster in certain situations. 





Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






---- On Fri, 14 Jul 2017 00:57:14 -0400 Chris Lohfink &lt;clohfink85@gmail.com&gt; wrote ----




There is no mechanism for reads to timeout once they have started. The 

messaging service will drop the request when its received on the ReadStage 

or RequestResponseStage. This is how its always operated so not unique to 

3.9. If a client disconnects from a coordinator there is also no way for 

the replicas who received a read request from the coordinator to know that 

the client was disconnected. 

 

Would be an interesting JIRA but as a note, it will likely not be a quick 

fix. Theres internal mechanisms that don't really have a concept of a 

timeout and where we would want it to never timeout (ie a compaction, 

reading system tables to fill meta data, repairs etc) and currently theres 

no way of differentiating between them. 

 

Chris 

 

On Thu, Jul 13, 2017 at 10:53 PM, Vladimir Yudovin &lt;vladyu@winguzone.com&gt; 

wrote: 

 

&gt; Hi, 

&gt; 

&gt; 

&gt; 

&gt; Cassandra 3.9, I found after some ALLOW FILTERING request running on huge 

&gt; partition fails with Cassandra timeout during read query at consistency ONE 

&gt; (1 responses were required but only 0 replica responded) nodes continue to 

&gt; consume CPU in ReadStage-N threads, as if they still performing search 

&gt; despite failed request and even disconnected client. 

&gt; 

&gt; 

&gt; 

&gt; Is it something known or probably it's worth JIRA filling? 

&gt; 

&gt; 

&gt; 

&gt; 

&gt; 

&gt; Best regards, Vladimir Yudovin, 

&gt; 

&gt; Winguzone - Cloud Cassandra Hosting 

&gt; 

&gt; 

&gt; 

&gt; 

&gt; 

&gt;

Re: High CPU after read timeout

Posted by Chris Lohfink <cl...@gmail.com>.

There is no mechanism for reads to timeout once they have started. The
messaging service will drop the request when its received on the ReadStage
or RequestResponseStage. This is how its always operated so not unique to
3.9. If a client disconnects from a coordinator there is also no way for
the replicas who received a read request from the coordinator to know that
the client was disconnected.

Would be an interesting JIRA but as a note, it will likely not be a quick
fix. Theres internal mechanisms that don't really have a concept of a
timeout and where we would want it to never timeout (ie a compaction,
reading system tables to fill meta data, repairs etc) and currently theres
no way of differentiating between them.

Chris

On Thu, Jul 13, 2017 at 10:53 PM, Vladimir Yudovin <vl...@winguzone.com>
wrote:

> Hi,
>
>
>
> Cassandra 3.9, I found after some ALLOW FILTERING request running on huge
> partition fails with Cassandra timeout during read query at consistency ONE
> (1 responses were required but only 0 replica responded) nodes continue to
> consume CPU in ReadStage-N threads, as if they still performing search
> despite failed request and even disconnected client.
>
>
>
>  Is it something known or probably it's worth JIRA filling?
>
>
>
>
>
> Best regards, Vladimir Yudovin,
>
> Winguzone - Cloud Cassandra Hosting
>
>
>
>
>
>