You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Aoi Kadoya <ca...@gmail.com> on 2016/07/12 22:23:53 UTC

CPU high load

Hi,

I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
These nodes are running on VMs(VMware vSphere) that have 8vcpu
(1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)

At first I thought this is because of CPU iowait, however, iowait is
constantly low(in fact it's 0 almost all time time), CPU steal time is
also 0%.

When I took a thread dump, I found some of "SharedPool-Worker" threads
are consuming CPU and those threads seem to be waiting for something
so I assume this is the cause of cpu load.

"SharedPool-Worker-1" #240 daemon prio=5 os_prio=0
tid=0x00007fabf459e000 nid=0x39b3 waiting on condition
[0x00007faad7f02000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85)
at java.lang.Thread.run(Thread.java:745)

Thread dump looks like this, but I am not sure what is this
sharedpool-worker waiting for.
Would you please help me with the further trouble shooting?
I am also reading the thread posted by Yuan as the situation is very
similar to mine but I didn't get any blocked, dropped or pending count
in my tpstat result.

Thanks,
Aoi

Re: CPU high load

Posted by Aoi Kadoya <ca...@gmail.com>.

Thank you, Alain.

There was no frequent GC nor compaction so it have been a
mystery,however, once I stopped chef-client(we're managing the cluster
though chef-cookbook), the load was eased for almost all of the
servers.
so we're now refactoring our cookbook, in the meanwhile, we also
decided to rebuild a cluster with DSE5.0.1.

Thank you very much for your advices on the debugging processes,
Aoi


2016-07-20 4:03 GMT-07:00 Alain RODRIGUEZ <ar...@gmail.com>:
> Hi Aoi,
>
>>
>> since few weeks
>> ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
>> These nodes are running on VMs(VMware vSphere) that have 8vcpu
>> (1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)
>
>
> I take my chance, a few ideas / questions below:
>
> What Cassandra version are you running?
> How is your GC doing?
>
> Run something like: grep "GC" /var/log/cassandra/system.log
> If you have a lot of long CMS pauses you might not be keeping things in the
> new gen long enough: Xmn800M looks too small to me, it has been a default
> but I never saw a case where this setting worked better than a higher value
> (let's say 2G), also tenuring threshold gives better results if set a bit
> higher than default (let's say 16). Those options are in cassandra-env.sh.
>
> Do you have other warnings or errors? Anything about tombstones or
> compacting wide rows incrementally?
> What compaction strategy are you using
> How many concurrent compactors do you use (if you have 8 cores, this value
> should probably be between 2 and 6, 4 is a good starting point)
> If your compaction is not fast enough and disk are doing fine, consider
> increasing the compaction throughput from default 16 to 32 or 64 Mbps to
> mitigate the impact of the point above.
> Do you use compression ? What kind ?
> Did the request count increased recently? Do you consider adding capacity or
> do you think you're hitting a new bug / issue that is worth it investigating
> / solving?
> Are you using default configuration? What did you change?
>
> No matter what you try, do it as much as possible on one canary node first,
> and incrementally (one change at the time - using NEWHEAP = 2GB +
> tenuringThreshold = 16 would be one change, it makes sense to move those 2
> values together)
>
>>
>> I have enabled a auto repair service on opscenter and it's running behind
>
>
> Also when did you do that, starting repairs? Repair is an expensive
> operation, consuming a lot of resources that is often needed, but that is
> hard to tune correctly. Are you sure you have enough CPU power to handle the
> load + repairs?
>
> Some other comments probably not directly related:
>
>>
>> I also realized that my cluster isn't well balanced
>
>
> Well you cluster looks balanced to me 7 GB isn't that far from 11 GB. To
> have a more accurate information, use 'nodetool status mykeyspace'. This way
> ownership will be displayed, replacing (?) by ownership (xx %). Total
> ownership = 300 % in your case (RF=3)
>
>>
>> I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
>> ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
>
>
> By the way, from
> https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/RNdse.html:
>
> "Warning: DataStax does not recommend 4.8.1 or 4.8.2 versions for
> production, see warning. Use 4.8.3 instead.".
>
> I am not sure what happened there but I would move to 4.8.3+ asap, datastax
> people know their products and I don't like this kind of orange and bold
> warnings :-).
>
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-07-14 4:36 GMT+02:00 Aoi Kadoya <ca...@gmail.com>:
>>
>> Hi Romain,
>>
>> No, I don't think we upgraded cassandra version or changed any of
>> those schema elements. After I realized this high load issue, I found
>> that some of the tables have a shorter gc_grace_seconds(1day) than the
>> rest and because it seemed causing constant compaction cycles, I have
>> changed them to 10days. but again, that's after load hit this high
>> number.
>> some of nodes got eased a little bit after changing gc_grace_seconds
>> values and repairing nodes, but since few days ago, all of nodes are
>> constantly reporting load 15-20.
>>
>> Thank you for the suggestion about logging, let me try to change the
>> log level to see what I can get from it.
>>
>> Thanks,
>> Aoi
>>
>>
>> 2016-07-13 13:28 GMT-07:00 Romain Hardouin <ro...@yahoo.fr>:
>> > Did you upgrade from a previous version? DId you make some schema
>> > changes
>> > like compaction strategy, compression, bloom filter, etc.?
>> > What about the R/W requests?
>> > SharedPool Workers are... shared ;-) Put logs in debug to see some
>> > examples
>> > of what services are using this pool (many actually).
>> >
>> > Best,
>> >
>> > Romain
>> >
>> >
>> > Le Mercredi 13 juillet 2016 18h15, Patrick McFadin <pm...@gmail.com>
>> > a
>> > écrit :
>> >
>> >
>> > Might be more clear looking at nodetool tpstats
>> >
>> > From there you can see all the thread pools and if there are any blocks.
>> > Could be something subtle like network.
>> >
>> > On Tue, Jul 12, 2016 at 3:23 PM, Aoi Kadoya <ca...@gmail.com>
>> > wrote:
>> >
>> > Hi,
>> >
>> > I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
>> > ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
>> > These nodes are running on VMs(VMware vSphere) that have 8vcpu
>> > (1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)
>> >
>> > At first I thought this is because of CPU iowait, however, iowait is
>> > constantly low(in fact it's 0 almost all time time), CPU steal time is
>> > also 0%.
>> >
>> > When I took a thread dump, I found some of "SharedPool-Worker" threads
>> > are consuming CPU and those threads seem to be waiting for something
>> > so I assume this is the cause of cpu load.
>> >
>> > "SharedPool-Worker-1" #240 daemon prio=5 os_prio=0
>> > tid=0x00007fabf459e000 nid=0x39b3 waiting on condition
>> > [0x00007faad7f02000]
>> >    java.lang.Thread.State: WAITING (parking)
>> > at sun.misc.Unsafe.park(Native Method)
>> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>> > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85)
>> > at java.lang.Thread.run(Thread.java:745)
>> >
>> > Thread dump looks like this, but I am not sure what is this
>> > sharedpool-worker waiting for.
>> > Would you please help me with the further trouble shooting?
>> > I am also reading the thread posted by Yuan as the situation is very
>> > similar to mine but I didn't get any blocked, dropped or pending count
>> > in my tpstat result.
>> >
>> > Thanks,
>> > Aoi
>> >
>> >
>> >
>> >
>
>

Re: CPU high load

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hi Aoi,


> since few weeks
> ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
> These nodes are running on VMs(VMware vSphere) that have 8vcpu
> (1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)


I take my chance, a few ideas / questions below:

   - What Cassandra version are you running?
   - How is your GC doing?
      - Run something like: grep "GC" /var/log/cassandra/system.log
      - If you have a lot of long CMS pauses you might not be keeping
      things in the new gen long enough: Xmn800M looks too small to me, it has
      been a default but I never saw a case where this setting worked
better than
      a higher value (let's say 2G), also tenuring threshold gives better
      results if set a bit higher than default (let's say 16). Those
options are
      in cassandra-env.sh.
   - Do you have other warnings or errors? Anything about tombstones or
   compacting wide rows incrementally?
   - What compaction strategy are you using
   - How many concurrent compactors do you use (if you have 8 cores, this
   value should probably be between 2 and 6, 4 is a good starting point)
   - If your compaction is not fast enough and disk are doing fine,
   consider increasing the compaction throughput from default 16 to 32 or 64
   Mbps to mitigate the impact of the point above.
   - Do you use compression ? What kind ?
   - Did the request count increased recently? Do you consider adding
   capacity or do you think you're hitting a new bug / issue that is worth it
   investigating / solving?
   - Are you using default configuration? What did you change?

No matter what you try, do it as much as possible on one canary node first,
and incrementally (one change at the time - using NEWHEAP = 2GB +
tenuringThreshold = 16 would be one change, it makes sense to move those 2
values together)


> I have enabled a auto repair service on opscenter and it's running behind


Also when did you do that, starting repairs? Repair is an expensive
operation, consuming a lot of resources that is often needed, but that is
hard to tune correctly. Are you sure you have enough CPU power to handle
the load + repairs?

Some other comments probably not directly related:


> I also realized that my cluster isn't well balanced


Well you cluster looks balanced to me 7 GB isn't that far from 11 GB. To
have a more accurate information, use 'nodetool status mykeyspace'. This
way ownership will be displayed, replacing (?) by ownership (xx %). Total
ownership = 300 % in your case (RF=3)


> I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
> ago, all of the cluster nodes are hitting avg. 15-20 cpu load.


By the way, from
https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/RNdse.html
:

"Warning: DataStax does not recommend 4.8.1 or 4.8.2 versions for
production, see warning. Use 4.8.3 instead.".

I am not sure what happened there but I would move to 4.8.3+ asap, datastax
people know their products and I don't like this kind of orange and bold
warnings :-).

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-07-14 4:36 GMT+02:00 Aoi Kadoya <ca...@gmail.com>:

> Hi Romain,
>
> No, I don't think we upgraded cassandra version or changed any of
> those schema elements. After I realized this high load issue, I found
> that some of the tables have a shorter gc_grace_seconds(1day) than the
> rest and because it seemed causing constant compaction cycles, I have
> changed them to 10days. but again, that's after load hit this high
> number.
> some of nodes got eased a little bit after changing gc_grace_seconds
> values and repairing nodes, but since few days ago, all of nodes are
> constantly reporting load 15-20.
>
> Thank you for the suggestion about logging, let me try to change the
> log level to see what I can get from it.
>
> Thanks,
> Aoi
>
>
> 2016-07-13 13:28 GMT-07:00 Romain Hardouin <ro...@yahoo.fr>:
> > Did you upgrade from a previous version? DId you make some schema changes
> > like compaction strategy, compression, bloom filter, etc.?
> > What about the R/W requests?
> > SharedPool Workers are... shared ;-) Put logs in debug to see some
> examples
> > of what services are using this pool (many actually).
> >
> > Best,
> >
> > Romain
> >
> >
> > Le Mercredi 13 juillet 2016 18h15, Patrick McFadin <pm...@gmail.com>
> a
> > écrit :
> >
> >
> > Might be more clear looking at nodetool tpstats
> >
> > From there you can see all the thread pools and if there are any blocks.
> > Could be something subtle like network.
> >
> > On Tue, Jul 12, 2016 at 3:23 PM, Aoi Kadoya <ca...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
> > ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
> > These nodes are running on VMs(VMware vSphere) that have 8vcpu
> > (1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)
> >
> > At first I thought this is because of CPU iowait, however, iowait is
> > constantly low(in fact it's 0 almost all time time), CPU steal time is
> > also 0%.
> >
> > When I took a thread dump, I found some of "SharedPool-Worker" threads
> > are consuming CPU and those threads seem to be waiting for something
> > so I assume this is the cause of cpu load.
> >
> > "SharedPool-Worker-1" #240 daemon prio=5 os_prio=0
> > tid=0x00007fabf459e000 nid=0x39b3 waiting on condition
> > [0x00007faad7f02000]
> >    java.lang.Thread.State: WAITING (parking)
> > at sun.misc.Unsafe.park(Native Method)
> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > Thread dump looks like this, but I am not sure what is this
> > sharedpool-worker waiting for.
> > Would you please help me with the further trouble shooting?
> > I am also reading the thread posted by Yuan as the situation is very
> > similar to mine but I didn't get any blocked, dropped or pending count
> > in my tpstat result.
> >
> > Thanks,
> > Aoi
> >
> >
> >
> >
>

Re: CPU high load

Posted by Aoi Kadoya <ca...@gmail.com>.

Hi Romain,

No, I don't think we upgraded cassandra version or changed any of
those schema elements. After I realized this high load issue, I found
that some of the tables have a shorter gc_grace_seconds(1day) than the
rest and because it seemed causing constant compaction cycles, I have
changed them to 10days. but again, that's after load hit this high
number.
some of nodes got eased a little bit after changing gc_grace_seconds
values and repairing nodes, but since few days ago, all of nodes are
constantly reporting load 15-20.

Thank you for the suggestion about logging, let me try to change the
log level to see what I can get from it.

Thanks,
Aoi


2016-07-13 13:28 GMT-07:00 Romain Hardouin <ro...@yahoo.fr>:
> Did you upgrade from a previous version? DId you make some schema changes
> like compaction strategy, compression, bloom filter, etc.?
> What about the R/W requests?
> SharedPool Workers are... shared ;-) Put logs in debug to see some examples
> of what services are using this pool (many actually).
>
> Best,
>
> Romain
>
>
> Le Mercredi 13 juillet 2016 18h15, Patrick McFadin <pm...@gmail.com> a
> écrit :
>
>
> Might be more clear looking at nodetool tpstats
>
> From there you can see all the thread pools and if there are any blocks.
> Could be something subtle like network.
>
> On Tue, Jul 12, 2016 at 3:23 PM, Aoi Kadoya <ca...@gmail.com> wrote:
>
> Hi,
>
> I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
> ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
> These nodes are running on VMs(VMware vSphere) that have 8vcpu
> (1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)
>
> At first I thought this is because of CPU iowait, however, iowait is
> constantly low(in fact it's 0 almost all time time), CPU steal time is
> also 0%.
>
> When I took a thread dump, I found some of "SharedPool-Worker" threads
> are consuming CPU and those threads seem to be waiting for something
> so I assume this is the cause of cpu load.
>
> "SharedPool-Worker-1" #240 daemon prio=5 os_prio=0
> tid=0x00007fabf459e000 nid=0x39b3 waiting on condition
> [0x00007faad7f02000]
>    java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85)
> at java.lang.Thread.run(Thread.java:745)
>
> Thread dump looks like this, but I am not sure what is this
> sharedpool-worker waiting for.
> Would you please help me with the further trouble shooting?
> I am also reading the thread posted by Yuan as the situation is very
> similar to mine but I didn't get any blocked, dropped or pending count
> in my tpstat result.
>
> Thanks,
> Aoi
>
>
>
>

Re: CPU high load

Posted by Romain Hardouin <ro...@yahoo.fr>.

Did you upgrade from a previous version? DId you make some schema changes like compaction strategy, compression, bloom filter, etc.?What about the R/W requests?  SharedPool Workers are... shared ;-) Put logs in debug to see some examples of what services are using this pool (many actually).

Best,
Romain 

    Le Mercredi 13 juillet 2016 18h15, Patrick McFadin <pm...@gmail.com> a écrit :
 

 Might be more clear looking at nodetool tpstats 
From there you can see all the thread pools and if there are any blocks. Could be something subtle like network. 
On Tue, Jul 12, 2016 at 3:23 PM, Aoi Kadoya <ca...@gmail.com> wrote:

Hi,

I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
These nodes are running on VMs(VMware vSphere) that have 8vcpu
(1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)

At first I thought this is because of CPU iowait, however, iowait is
constantly low(in fact it's 0 almost all time time), CPU steal time is
also 0%.

When I took a thread dump, I found some of "SharedPool-Worker" threads
are consuming CPU and those threads seem to be waiting for something
so I assume this is the cause of cpu load.

"SharedPool-Worker-1" #240 daemon prio=5 os_prio=0
tid=0x00007fabf459e000 nid=0x39b3 waiting on condition
[0x00007faad7f02000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85)
at java.lang.Thread.run(Thread.java:745)

Thread dump looks like this, but I am not sure what is this
sharedpool-worker waiting for.
Would you please help me with the further trouble shooting?
I am also reading the thread posted by Yuan as the situation is very
similar to mine but I didn't get any blocked, dropped or pending count
in my tpstat result.

Thanks,
Aoi

Re: CPU high load

Posted by Aoi Kadoya <ca...@gmail.com>.

Hi Patrick,

In fact I couldn't see any thread pool named "shared".
here is the result of tpstats from one of my nodes.

Pool Name                    Active   Pending      Completed   Blocked
 All time blocked
MutationStage                     0         0      173237609         0
                0
ReadStage                         0         0       71266557         0
                0
RequestResponseStage              0         0       87617557         0
                0
ReadRepairStage                   0         0          51822         0
                0
CounterMutationStage              0         0              0         0
                0
MiscStage                         0         0              0         0
                0
AntiEntropySessions               0         0           3828         0
                0
HintedHandoff                     0         0             23         0
                0
GossipStage                       0         0        2169599         0
                0
CacheCleanupExecutor              0         0              0         0
                0
InternalResponseStage             0         0              0         0
                0
CommitLogArchiver                 0         0              0         0
                0
CompactionExecutor                0         0        1353194         0
                0
ValidationExecutor                0         0        3337647         0
                0
MigrationStage                    0         0              5         0
                0
AntiEntropyStage                  0         0        7527026         0
                0
PendingRangeCalculator            0         0             24         0
                0
Sampler                           0         0              0         0
                0
MemtableFlushWriter               0         0         118019         0
                0
MemtablePostFlush                 0         0        3398738         0
                0
MemtableReclaimMemory             0         0         122249         0
                0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
MUTATION                     0
COUNTER_MUTATION             0
BINARY                       0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0


I have enabled a auto repair service on opscenter and it's running
behind but I also realized that my cluster isn't well balanced..
other than system/opscenter keyspaces, I only have one keyspace and
its replication factor is 3 (network topology strategy)

Datacenter: xxx
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns    Host ID
              Rack
UN  xxxxxx  10.19 GB   256     ?
6bf8db87-d4cc-4a75-86a5-bc1b27ced32c  RAC1
UN  xxxxxx   10.59 GB   256     ?
2d407831-e10d-4a6b-86c0-26c7a60e613d  RAC1
UN  xxxxxx   7.99 GB    256     ?
1e05d70e-502e-4ac4-a6ed-bf912c332062  RAC1
UN  xxxxxx   7.67 GB    256     ?
41a8e12a-c8e8-42ff-b681-b74f493a2407  RAC1
UN  xxxxxx   11.13 GB   256     ?
67572986-99b8-4a78-9039-aaa0aca8c236  RAC1
UN  xxxxxx   9.54 GB    256     ?
3f22001b-f03d-4bd0-8608-dd467cbc17f0  RAC1

Thanks,
Aoi

2016-07-13 9:15 GMT-07:00 Patrick McFadin <pm...@gmail.com>:
> Might be more clear looking at nodetool tpstats
>
> From there you can see all the thread pools and if there are any blocks.
> Could be something subtle like network.
>
> On Tue, Jul 12, 2016 at 3:23 PM, Aoi Kadoya <ca...@gmail.com> wrote:
>>
>> Hi,
>>
>> I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
>> ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
>> These nodes are running on VMs(VMware vSphere) that have 8vcpu
>> (1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)
>>
>> At first I thought this is because of CPU iowait, however, iowait is
>> constantly low(in fact it's 0 almost all time time), CPU steal time is
>> also 0%.
>>
>> When I took a thread dump, I found some of "SharedPool-Worker" threads
>> are consuming CPU and those threads seem to be waiting for something
>> so I assume this is the cause of cpu load.
>>
>> "SharedPool-Worker-1" #240 daemon prio=5 os_prio=0
>> tid=0x00007fabf459e000 nid=0x39b3 waiting on condition
>> [0x00007faad7f02000]
>>    java.lang.Thread.State: WAITING (parking)
>> at sun.misc.Unsafe.park(Native Method)
>> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Thread dump looks like this, but I am not sure what is this
>> sharedpool-worker waiting for.
>> Would you please help me with the further trouble shooting?
>> I am also reading the thread posted by Yuan as the situation is very
>> similar to mine but I didn't get any blocked, dropped or pending count
>> in my tpstat result.
>>
>> Thanks,
>> Aoi
>
>

Re: CPU high load

Posted by Patrick McFadin <pm...@gmail.com>.

Might be more clear looking at nodetool tpstats

From there you can see all the thread pools and if there are any blocks.
Could be something subtle like network.

On Tue, Jul 12, 2016 at 3:23 PM, Aoi Kadoya <ca...@gmail.com> wrote:

> Hi,
>
> I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
> ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
> These nodes are running on VMs(VMware vSphere) that have 8vcpu
> (1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)
>
> At first I thought this is because of CPU iowait, however, iowait is
> constantly low(in fact it's 0 almost all time time), CPU steal time is
> also 0%.
>
> When I took a thread dump, I found some of "SharedPool-Worker" threads
> are consuming CPU and those threads seem to be waiting for something
> so I assume this is the cause of cpu load.
>
> "SharedPool-Worker-1" #240 daemon prio=5 os_prio=0
> tid=0x00007fabf459e000 nid=0x39b3 waiting on condition
> [0x00007faad7f02000]
>    java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85)
> at java.lang.Thread.run(Thread.java:745)
>
> Thread dump looks like this, but I am not sure what is this
> sharedpool-worker waiting for.
> Would you please help me with the further trouble shooting?
> I am also reading the thread posted by Yuan as the situation is very
> similar to mine but I didn't get any blocked, dropped or pending count
> in my tpstat result.
>
> Thanks,
> Aoi
>