You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Guillermo Winkler <gw...@inconcertcc.com> on 2012/08/16 22:09:49 UTC

Why the StageManager thread pools have 60 seconds keepalive time?

Hi, I have a cassandra cluster where I'm seeing a lot of thread trashing
from the mutation pool.

MutationStage:72031

Where threads get created and disposed in 100's batches every few minutes,
since it's a 16 core server concurrent_writes is set in 100 in the
cassandra.yaml.

concurrent_writes: 100

I've seen in the StageManager class this pools get created with 60 seconds
keepalive time.

DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true);

StageManager-> public static final long KEEPALIVE = 60; // seconds to keep
"extra" threads alive for when idle

Is it a reason for it to be this way?

Why not have a fixed size pool with Integer.MAX_VALUE as keepalive since
corePoolSize and maxPoolSize are set at the same size?

Thanks,
Guille

Re: Why the StageManager thread pools have 60 seconds keepalive time?

Posted by aaron morton <aa...@thelastpickle.com>.

> One thing we did change in the past weeks was the memtable_flush_queue_size in order to occupy less heap space with memtables, this was due to having received this warning message and some OOM exceptions:
Danger. 

> Do you know any strategy to diagnose if memtables flushing to disk and locking on the switchLock being the main cause of the dropped messages? I've went through the source code but haven't seen any metrics reporting on maybeSwitchMemtable blocking times.
As a matter of fact I do :)

Was the first thing in my cassandra Sf talk 
http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance/6
http://www.datastax.com/events/cassandrasummit2012/presentations

If you reduce memtable_flush_queue_size to far writes will block. When this happens you will see the MeteredFlusher say it want to flush X cf's, but you will only see a few messages that say "Enqueuing flush of …"

In a "FlushWriter-*" thread you will see the Memtable log "Writing…" when it starts flushing and "Completed flushing …" when done. If the MeteredFlusher is blocked it will immediately "Enqueuing flush of …" when the Memtable starts writing the next SStable. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/08/2012, at 6:38 AM, Guillermo Winkler <gw...@inconcertcc.com> wrote:

> Aaron, thanks for your answer. 
> 
> We do have big batch updates not always with the columns belonging to the same row(i.e. many threads are needed to handle the updates), but it did not not represented a problem when the CFs had less data on them.
> 
> One thing we did change in the past weeks was the memtable_flush_queue_size in order to occupy less heap space with memtables, this was due to having received this warning message and some OOM exceptions:
> 
>             logger.warn(String.format("Reducing %s capacity from %d to %s to reduce memory pressure",
>                                       cacheType, getCapacity(), newCapacity));
> 
> 
> 
> Do you know any strategy to diagnose if memtables flushing to disk and locking on the switchLock being the main cause of the dropped messages? I've went through the source code but haven't seen any metrics reporting on maybeSwitchMemtable blocking times.
> 
> Thanks again,
> Guille
> 
> On Sun, Aug 19, 2012 at 5:21 AM, aaron morton <aa...@thelastpickle.com> wrote:
> Your seeing dropped mutations reported from nodetool tpstats ? 
> 
> Take a look at the logs. Look for messages from the MessagingService with the pattern "{} {} messages dropped in last {}ms" They will be followed by info about the TP stats.
> 
> First would be the workload. Are you sending very big batch_mutate or multiget requests? Each row in the requests turns into a command in the appropriate thread pool. This can result in other requests waiting a long time for their commands to get processed. 
> 
> Next would be looking for GC and checking the memtable_flush_queue_size is set high enough (check yaml for docs). 
> 
> After that I would look at winding  concurrent_writers (and I assume concurrent_readers) back. Anytime I see weirdness I look for config changes and see what happens when they are returned to the default or near default.  Do you have 16 _physical_ cores?
> 
> Hope that helps. 
>   
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/08/2012, at 10:01 AM, Guillermo Winkler <gw...@inconcertcc.com> wrote:
> 
>> Aaron, thanks for your answer.
>> 
>> I'm actually tracking a problem where mutations get dropped and cfstats show no activity whatsoever, I have 100 threads for the mutation pool, no running or pending tasks, but some mutations get dropped none the less.
>> 
>> I'm thinking about some scheduling problems but not really sure yet.
>> 
>> Have you ever seen a case of dropped mutations with the system under light load?
>> 
>> Thanks,
>> Guille
>> 
>> 
>> On Thu, Aug 16, 2012 at 8:22 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> That's some pretty old code. I would guess it was done that way to conserve resources. And _i think_ thread creation is pretty light weight.
>> 
>> Jonathan / Brandon / others - opinions ? 
>> 
>> Cheers
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 17/08/2012, at 8:09 AM, Guillermo Winkler <gw...@inconcertcc.com> wrote:
>> 
>>> Hi, I have a cassandra cluster where I'm seeing a lot of thread trashing from the mutation pool.
>>> 
>>> MutationStage:72031
>>> 
>>> Where threads get created and disposed in 100's batches every few minutes, since it's a 16 core server concurrent_writes is set in 100 in the cassandra.yaml. 
>>> 
>>> concurrent_writes: 100
>>> 
>>> I've seen in the StageManager class this pools get created with 60 seconds keepalive time.
>>> 
>>> DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true);
>>> 
>>> StageManager-> public static final long KEEPALIVE = 60; // seconds to keep "extra" threads alive for when idle
>>> 
>>> Is it a reason for it to be this way? 
>>> 
>>> Why not have a fixed size pool with Integer.MAX_VALUE as keepalive since corePoolSize and maxPoolSize are set at the same size?
>>> 
>>> Thanks,
>>> Guille
>>> 
>> 
>> 
> 
>

Re: Why the StageManager thread pools have 60 seconds keepalive time?

Posted by Guillermo Winkler <gw...@inconcertcc.com>.

Aaron, thanks for your answer.

We do have big batch updates not always with the columns belonging to the
same row(i.e. many threads are needed to handle the updates), but it did
not not represented a problem when the CFs had less data on them.

One thing we did change in the past weeks was the memtable_flush_queue_size
in order to occupy less heap space with memtables, this was due to having
received this warning message and some OOM exceptions:

            logger.warn(String.format(
"Reducing %s capacity from %d to %s to reduce memory pressure",
                                      cacheType, getCapacity(), newCapacity
));

*
*

Do you know any strategy to diagnose if memtables flushing to disk and
locking on the switchLock being the main cause of the dropped messages?
I've went through the source code but haven't seen any metrics reporting on
maybeSwitchMemtable blocking times.
Thanks again,
Guille

On Sun, Aug 19, 2012 at 5:21 AM, aaron morton <aa...@thelastpickle.com>wrote:

> Your seeing dropped mutations reported from nodetool tpstats ?
>
> Take a look at the logs. Look for messages from the MessagingService with
> the pattern "{} {} messages dropped in last {}ms" They will be followed by
> info about the TP stats.
>
> First would be the workload. Are you sending very big batch_mutate or
> multiget requests? Each row in the requests turns into a command in the
> appropriate thread pool. This can result in other requests waiting a long
> time for their commands to get processed.
>
> Next would be looking for GC and checking the memtable_flush_queue_size is
> set high enough (check yaml for docs).
>
> After that I would look at winding  concurrent_writers (and I assume
> concurrent_readers) back. Anytime I see weirdness I look for config changes
> and see what happens when they are returned to the default or near default.
>  Do you have 16 _physical_ cores?
>
> Hope that helps.
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/08/2012, at 10:01 AM, Guillermo Winkler <gw...@inconcertcc.com>
> wrote:
>
> Aaron, thanks for your answer.
>
> I'm actually tracking a problem where mutations get dropped and cfstats
> show no activity whatsoever, I have 100 threads for the mutation pool, no
> running or pending tasks, but some mutations get dropped none the less.
>
> I'm thinking about some scheduling problems but not really sure yet.
>
> Have you ever seen a case of dropped mutations with the system under light
> load?
>
> Thanks,
> Guille
>
>
> On Thu, Aug 16, 2012 at 8:22 PM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> That's some pretty old code. I would guess it was done that way to
>> conserve resources. And _i think_ thread creation is pretty light weight.
>>
>> Jonathan / Brandon / others - opinions ?
>>
>> Cheers
>>
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 17/08/2012, at 8:09 AM, Guillermo Winkler <gw...@inconcertcc.com>
>> wrote:
>>
>> Hi, I have a cassandra cluster where I'm seeing a lot of thread trashing
>> from the mutation pool.
>>
>> MutationStage:72031
>>
>> Where threads get created and disposed in 100's batches every few
>> minutes, since it's a 16 core server concurrent_writes is set in 100 in the
>> cassandra.yaml.
>>
>> concurrent_writes: 100
>>
>> I've seen in the StageManager class this pools get created with 60
>> seconds keepalive time.
>>
>> DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true);
>>
>> StageManager-> public static final long KEEPALIVE = 60; // seconds to
>> keep "extra" threads alive for when idle
>>
>> Is it a reason for it to be this way?
>>
>> Why not have a fixed size pool with Integer.MAX_VALUE as keepalive since
>> corePoolSize and maxPoolSize are set at the same size?
>>
>> Thanks,
>> Guille
>>
>>
>>
>
>

Re: Why the StageManager thread pools have 60 seconds keepalive time?

Posted by aaron morton <aa...@thelastpickle.com>.

Your seeing dropped mutations reported from nodetool tpstats ? 

Take a look at the logs. Look for messages from the MessagingService with the pattern "{} {} messages dropped in last {}ms" They will be followed by info about the TP stats.

First would be the workload. Are you sending very big batch_mutate or multiget requests? Each row in the requests turns into a command in the appropriate thread pool. This can result in other requests waiting a long time for their commands to get processed. 

Next would be looking for GC and checking the memtable_flush_queue_size is set high enough (check yaml for docs). 

After that I would look at winding  concurrent_writers (and I assume concurrent_readers) back. Anytime I see weirdness I look for config changes and see what happens when they are returned to the default or near default.  Do you have 16 _physical_ cores?

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/08/2012, at 10:01 AM, Guillermo Winkler <gw...@inconcertcc.com> wrote:

> Aaron, thanks for your answer.
> 
> I'm actually tracking a problem where mutations get dropped and cfstats show no activity whatsoever, I have 100 threads for the mutation pool, no running or pending tasks, but some mutations get dropped none the less.
> 
> I'm thinking about some scheduling problems but not really sure yet.
> 
> Have you ever seen a case of dropped mutations with the system under light load?
> 
> Thanks,
> Guille
> 
> 
> On Thu, Aug 16, 2012 at 8:22 PM, aaron morton <aa...@thelastpickle.com> wrote:
> That's some pretty old code. I would guess it was done that way to conserve resources. And _i think_ thread creation is pretty light weight.
> 
> Jonathan / Brandon / others - opinions ? 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 17/08/2012, at 8:09 AM, Guillermo Winkler <gw...@inconcertcc.com> wrote:
> 
>> Hi, I have a cassandra cluster where I'm seeing a lot of thread trashing from the mutation pool.
>> 
>> MutationStage:72031
>> 
>> Where threads get created and disposed in 100's batches every few minutes, since it's a 16 core server concurrent_writes is set in 100 in the cassandra.yaml. 
>> 
>> concurrent_writes: 100
>> 
>> I've seen in the StageManager class this pools get created with 60 seconds keepalive time.
>> 
>> DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true);
>> 
>> StageManager-> public static final long KEEPALIVE = 60; // seconds to keep "extra" threads alive for when idle
>> 
>> Is it a reason for it to be this way? 
>> 
>> Why not have a fixed size pool with Integer.MAX_VALUE as keepalive since corePoolSize and maxPoolSize are set at the same size?
>> 
>> Thanks,
>> Guille
>> 
> 
>

Re: Why the StageManager thread pools have 60 seconds keepalive time?

Posted by Guillermo Winkler <gw...@inconcertcc.com>.

Aaron, thanks for your answer.

I'm actually tracking a problem where mutations get dropped and cfstats
show no activity whatsoever, I have 100 threads for the mutation pool, no
running or pending tasks, but some mutations get dropped none the less.

I'm thinking about some scheduling problems but not really sure yet.

Have you ever seen a case of dropped mutations with the system under light
load?

Thanks,
Guille


On Thu, Aug 16, 2012 at 8:22 PM, aaron morton <aa...@thelastpickle.com>wrote:

> That's some pretty old code. I would guess it was done that way to
> conserve resources. And _i think_ thread creation is pretty light weight.
>
> Jonathan / Brandon / others - opinions ?
>
> Cheers
>
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/08/2012, at 8:09 AM, Guillermo Winkler <gw...@inconcertcc.com>
> wrote:
>
> Hi, I have a cassandra cluster where I'm seeing a lot of thread trashing
> from the mutation pool.
>
> MutationStage:72031
>
> Where threads get created and disposed in 100's batches every few minutes,
> since it's a 16 core server concurrent_writes is set in 100 in the
> cassandra.yaml.
>
> concurrent_writes: 100
>
> I've seen in the StageManager class this pools get created with 60 seconds
> keepalive time.
>
> DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true);
>
> StageManager-> public static final long KEEPALIVE = 60; // seconds to keep
> "extra" threads alive for when idle
>
> Is it a reason for it to be this way?
>
> Why not have a fixed size pool with Integer.MAX_VALUE as keepalive since
> corePoolSize and maxPoolSize are set at the same size?
>
> Thanks,
> Guille
>
>
>

Re: Why the StageManager thread pools have 60 seconds keepalive time?

Posted by aaron morton <aa...@thelastpickle.com>.

That's some pretty old code. I would guess it was done that way to conserve resources. And _i think_ thread creation is pretty light weight.

Jonathan / Brandon / others - opinions ? 

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/08/2012, at 8:09 AM, Guillermo Winkler <gw...@inconcertcc.com> wrote:

> Hi, I have a cassandra cluster where I'm seeing a lot of thread trashing from the mutation pool.
> 
> MutationStage:72031
> 
> Where threads get created and disposed in 100's batches every few minutes, since it's a 16 core server concurrent_writes is set in 100 in the cassandra.yaml. 
> 
> concurrent_writes: 100
> 
> I've seen in the StageManager class this pools get created with 60 seconds keepalive time.
> 
> DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true);
> 
> StageManager-> public static final long KEEPALIVE = 60; // seconds to keep "extra" threads alive for when idle
> 
> Is it a reason for it to be this way? 
> 
> Why not have a fixed size pool with Integer.MAX_VALUE as keepalive since corePoolSize and maxPoolSize are set at the same size?
> 
> Thanks,
> Guille
>