You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by John Sanda <jo...@gmail.com> on 2018/03/23 03:44:17 UTC

Understanding Blocked and All Time Blocked columns in tpstats

I have been doing some work on a cluster that is impacted by
https://issues.apache.org/jira/browse/CASSANDRA-11363. Reading through the
ticket prompted me to take a closer look at
org.apache.cassandra.concurrent.SEPExecutor. I am looking at the 3.0.14
code. I am a little confused about the Blocked and All Time Blocked columns
reported in nodetool tpstats and reported by StatusLogger. I understand
that there is a queue for tasks. In the case of RequestThreadPoolExecutor,
the size of that queue can be controlled via the
cassandra.max_queued_native_transport_requests system property.

I have been looking at SEPExecutor.addTask(FutureTask<?> task), and here is
my question. If the queue is full, as defined by SEPExector.maxTasksQueued,
are tasks rejected? I do not fully grok the code, but it looks like it is
possible for tasks to be rejected here (some code and comments omitted for
brevity):

public void addTask(FutureTask<?> task)
{
    tasks.add(task);
    ...
    else if (taskPermits >= maxTasksQueued)
    {
        WaitQueue.Signal s = hasRoom.register();

        if (taskPermits(permits.get()) > maxTasksQueued)
        {
            if (takeWorkPermit(true))
                pool.schedule(new Work(this))

            metrics.totalBlocked.inc();
            metrics.currentBlocked.inc();
            s.awaitUninterruptibly();
            metrics.currentBlocked.dec();
        }
        else
            s.cancel();
    }
}

The first thing that happens is that the task is added to the tasks queue.
pool.schedule() only gets called if takeWorkPermit() returns true. I am
still studying the code, but can someone explain what exactly happens when
the queue is full?


- John

Re: Understanding Blocked and All Time Blocked columns in tpstats

Posted by John Sanda <jo...@gmail.com>.

We do small inserts. For a modest size environment we do about 90,000
inserts every 30 seconds. For a larger environment, we could be doing
300,000 or more inserts every 30 seconds. In earlier versions of the
project, each insert was a separate request as each insert targets a
different partition. In more recent versions though, we introduced micro
batching. We batch up to about 25 inserts where inserts are grouped by
token range. Even though batches are used, I assume that does not reduce
the overall number of inserts or mutations. Inserts are always async,
prepared statements. Client code is written with RxJava which makes doing
async, concurrent writes a lot easier.

On Fri, Mar 23, 2018 at 1:29 PM, Chris Lohfink <cl...@apple.com> wrote:

> Increasing queue would increase the number of requests waiting. It could
> make GCs worse if the requests are like large INSERTs, but for a lot of
> super tiny queries it helps to increase queue size (to a point). Might want
> to look into what and how queries are being made, since there are possibly
> options to help with that (ie prepared queries, what queries are, limiting
> number of async inflight queries)
>
> Chris
>
>
> On Mar 23, 2018, at 11:42 AM, John Sanda <jo...@gmail.com> wrote:
>
> Thanks for the explanation. In the past when I have run into problems
> related to CASSANDRA-11363, I have increased the queue size via the
> cassandra.max_queued_native_transport_requests system property. If I find
> that the queue is frequently at capacity, would that be an indicator that
> the node is having trouble keeping up with the load? And if so, will
> increasing the queue size just exacerbate the problem?
>
> On Fri, Mar 23, 2018 at 11:51 AM, Chris Lohfink <cl...@apple.com>
> wrote:
>
>> It blocks the caller attempting to add the task until theres room in
>> queue, applying back pressure. It does not reject it. It mimics the
>> behavior from pre-SEP DebuggableThreadPoolExecutor's
>> RejectionExecutionHandler that the other thread pools use (exception on
>> sampling/trace which just throw away on rejections).
>>
>> Worth noting this is only really possible in the native transport pool
>> (sep pool) last I checked. Since 2.1 at least, before that there were a few
>> others. That changes version to version. For (basically) all other thread
>> pools the queue is limited by memory.
>>
>> Chris
>>
>>
>> On Mar 22, 2018, at 10:44 PM, John Sanda <jo...@gmail.com> wrote:
>>
>> I have been doing some work on a cluster that is impacted by
>> https://issues.apache.org/jira/browse/CASSANDRA-11363. Reading through
>> the ticket prompted me to take a closer look at
>> org.apache.cassandra.concurrent.SEPExecutor. I am looking at the 3.0.14
>> code. I am a little confused about the Blocked and All Time Blocked columns
>> reported in nodetool tpstats and reported by StatusLogger. I understand
>> that there is a queue for tasks. In the case of RequestThreadPoolExecutor,
>> the size of that queue can be controlled via the
>> cassandra.max_queued_native_transport_requests system property.
>>
>> I have been looking at SEPExecutor.addTask(FutureTask<?> task), and here
>> is my question. If the queue is full, as defined by
>> SEPExector.maxTasksQueued, are tasks rejected? I do not fully grok the
>> code, but it looks like it is possible for tasks to be rejected here (some
>> code and comments omitted for brevity):
>>
>> public void addTask(FutureTask<?> task)
>> {
>>     tasks.add(task);
>>     ...
>>     else if (taskPermits >= maxTasksQueued)
>>     {
>>         WaitQueue.Signal s = hasRoom.register();
>>
>>         if (taskPermits(permits.get()) > maxTasksQueued)
>>         {
>>             if (takeWorkPermit(true))
>>                 pool.schedule(new Work(this))
>>
>>             metrics.totalBlocked.inc();
>>             metrics.currentBlocked.inc();
>>             s.awaitUninterruptibly();
>>             metrics.currentBlocked.dec();
>>         }
>>         else
>>             s.cancel();
>>     }
>> }
>>
>> The first thing that happens is that the task is added to the tasks
>> queue. pool.schedule() only gets called if takeWorkPermit() returns true. I
>> am still studying the code, but can someone explain what exactly happens
>> when the queue is full?
>>
>>
>> - John
>>
>>
>>
>
>
> --
>
> - John
>
>
>


-- 

- John

Re: Understanding Blocked and All Time Blocked columns in tpstats

Posted by Chris Lohfink <cl...@apple.com>.

Increasing queue would increase the number of requests waiting. It could make GCs worse if the requests are like large INSERTs, but for a lot of super tiny queries it helps to increase queue size (to a point). Might want to look into what and how queries are being made, since there are possibly options to help with that (ie prepared queries, what queries are, limiting number of async inflight queries)

Chris

> On Mar 23, 2018, at 11:42 AM, John Sanda <jo...@gmail.com> wrote:
> 
> Thanks for the explanation. In the past when I have run into problems related to CASSANDRA-11363, I have increased the queue size via the cassandra.max_queued_native_transport_requests system property. If I find that the queue is frequently at capacity, would that be an indicator that the node is having trouble keeping up with the load? And if so, will increasing the queue size just exacerbate the problem?
> 
> On Fri, Mar 23, 2018 at 11:51 AM, Chris Lohfink <clohfink@apple.com <ma...@apple.com>> wrote:
> It blocks the caller attempting to add the task until theres room in queue, applying back pressure. It does not reject it. It mimics the behavior from pre-SEP DebuggableThreadPoolExecutor's RejectionExecutionHandler that the other thread pools use (exception on sampling/trace which just throw away on rejections).
> 
> Worth noting this is only really possible in the native transport pool (sep pool) last I checked. Since 2.1 at least, before that there were a few others. That changes version to version. For (basically) all other thread pools the queue is limited by memory.
> 
> Chris
> 
> 
>> On Mar 22, 2018, at 10:44 PM, John Sanda <john.sanda@gmail.com <ma...@gmail.com>> wrote:
>> 
>> I have been doing some work on a cluster that is impacted by https://issues.apache.org/jira/browse/CASSANDRA-11363 <https://issues.apache.org/jira/browse/CASSANDRA-11363>. Reading through the ticket prompted me to take a closer look at org.apache.cassandra.concurrent.SEPExecutor. I am looking at the 3.0.14 code. I am a little confused about the Blocked and All Time Blocked columns reported in nodetool tpstats and reported by StatusLogger. I understand that there is a queue for tasks. In the case of RequestThreadPoolExecutor, the size of that queue can be controlled via the cassandra.max_queued_native_transport_requests system property.
>> 
>> I have been looking at SEPExecutor.addTask(FutureTask<?> task), and here is my question. If the queue is full, as defined by SEPExector.maxTasksQueued, are tasks rejected? I do not fully grok the code, but it looks like it is possible for tasks to be rejected here (some code and comments omitted for brevity):
>> 
>> public void addTask(FutureTask<?> task)
>> {
>>     tasks.add(task);
>>     ...
>>     else if (taskPermits >= maxTasksQueued) 
>>     {
>>         WaitQueue.Signal s = hasRoom.register();
>>         
>>         if (taskPermits(permits.get()) > maxTasksQueued)
>>         {
>>             if (takeWorkPermit(true))
>>                 pool.schedule(new Work(this))
>> 
>>             metrics.totalBlocked.inc();
>>             metrics.currentBlocked.inc();
>>             s.awaitUninterruptibly();
>>             metrics.currentBlocked.dec();
>>         }
>>         else
>>             s.cancel();
>>     }   
>> }
>> 
>> The first thing that happens is that the task is added to the tasks queue. pool.schedule() only gets called if takeWorkPermit() returns true. I am still studying the code, but can someone explain what exactly happens when the queue is full?
>> 
>> 
>> - John
> 
> 
> 
> 
> -- 
> 
> - John

Re: Understanding Blocked and All Time Blocked columns in tpstats

Posted by John Sanda <jo...@gmail.com>.

Thanks for the explanation. In the past when I have run into problems
related to CASSANDRA-11363, I have increased the queue size via the
cassandra.max_queued_native_transport_requests system property. If I find
that the queue is frequently at capacity, would that be an indicator that
the node is having trouble keeping up with the load? And if so, will
increasing the queue size just exacerbate the problem?

On Fri, Mar 23, 2018 at 11:51 AM, Chris Lohfink <cl...@apple.com> wrote:

> It blocks the caller attempting to add the task until theres room in
> queue, applying back pressure. It does not reject it. It mimics the
> behavior from pre-SEP DebuggableThreadPoolExecutor's
> RejectionExecutionHandler that the other thread pools use (exception on
> sampling/trace which just throw away on rejections).
>
> Worth noting this is only really possible in the native transport pool
> (sep pool) last I checked. Since 2.1 at least, before that there were a few
> others. That changes version to version. For (basically) all other thread
> pools the queue is limited by memory.
>
> Chris
>
>
> On Mar 22, 2018, at 10:44 PM, John Sanda <jo...@gmail.com> wrote:
>
> I have been doing some work on a cluster that is impacted by
> https://issues.apache.org/jira/browse/CASSANDRA-11363. Reading through
> the ticket prompted me to take a closer look at org.apache.cassandra.concurrent.SEPExecutor.
> I am looking at the 3.0.14 code. I am a little confused about the Blocked
> and All Time Blocked columns reported in nodetool tpstats and reported by
> StatusLogger. I understand that there is a queue for tasks. In the case of
> RequestThreadPoolExecutor, the size of that queue can be controlled via the
> cassandra.max_queued_native_transport_requests system property.
>
> I have been looking at SEPExecutor.addTask(FutureTask<?> task), and here
> is my question. If the queue is full, as defined by
> SEPExector.maxTasksQueued, are tasks rejected? I do not fully grok the
> code, but it looks like it is possible for tasks to be rejected here (some
> code and comments omitted for brevity):
>
> public void addTask(FutureTask<?> task)
> {
>     tasks.add(task);
>     ...
>     else if (taskPermits >= maxTasksQueued)
>     {
>         WaitQueue.Signal s = hasRoom.register();
>
>         if (taskPermits(permits.get()) > maxTasksQueued)
>         {
>             if (takeWorkPermit(true))
>                 pool.schedule(new Work(this))
>
>             metrics.totalBlocked.inc();
>             metrics.currentBlocked.inc();
>             s.awaitUninterruptibly();
>             metrics.currentBlocked.dec();
>         }
>         else
>             s.cancel();
>     }
> }
>
> The first thing that happens is that the task is added to the tasks queue.
> pool.schedule() only gets called if takeWorkPermit() returns true. I am
> still studying the code, but can someone explain what exactly happens when
> the queue is full?
>
>
> - John
>
>
>


-- 

- John

Re: Understanding Blocked and All Time Blocked columns in tpstats

Posted by Chris Lohfink <cl...@apple.com>.

It blocks the caller attempting to add the task until theres room in queue, applying back pressure. It does not reject it. It mimics the behavior from pre-SEP DebuggableThreadPoolExecutor's RejectionExecutionHandler that the other thread pools use (exception on sampling/trace which just throw away on rejections).

Worth noting this is only really possible in the native transport pool (sep pool) last I checked. Since 2.1 at least, before that there were a few others. That changes version to version. For (basically) all other thread pools the queue is limited by memory.

Chris

> On Mar 22, 2018, at 10:44 PM, John Sanda <jo...@gmail.com> wrote:
> 
> I have been doing some work on a cluster that is impacted by https://issues.apache.org/jira/browse/CASSANDRA-11363 <https://issues.apache.org/jira/browse/CASSANDRA-11363>. Reading through the ticket prompted me to take a closer look at org.apache.cassandra.concurrent.SEPExecutor. I am looking at the 3.0.14 code. I am a little confused about the Blocked and All Time Blocked columns reported in nodetool tpstats and reported by StatusLogger. I understand that there is a queue for tasks. In the case of RequestThreadPoolExecutor, the size of that queue can be controlled via the cassandra.max_queued_native_transport_requests system property.
> 
> I have been looking at SEPExecutor.addTask(FutureTask<?> task), and here is my question. If the queue is full, as defined by SEPExector.maxTasksQueued, are tasks rejected? I do not fully grok the code, but it looks like it is possible for tasks to be rejected here (some code and comments omitted for brevity):
> 
> public void addTask(FutureTask<?> task)
> {
>     tasks.add(task);
>     ...
>     else if (taskPermits >= maxTasksQueued) 
>     {
>         WaitQueue.Signal s = hasRoom.register();
>         
>         if (taskPermits(permits.get()) > maxTasksQueued)
>         {
>             if (takeWorkPermit(true))
>                 pool.schedule(new Work(this))
> 
>             metrics.totalBlocked.inc();
>             metrics.currentBlocked.inc();
>             s.awaitUninterruptibly();
>             metrics.currentBlocked.dec();
>         }
>         else
>             s.cancel();
>     }   
> }
> 
> The first thing that happens is that the task is added to the tasks queue. pool.schedule() only gets called if takeWorkPermit() returns true. I am still studying the code, but can someone explain what exactly happens when the queue is full?
> 
> 
> - John