You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Alaa Zubaidi (PDF)" <al...@pdf.com> on 2015/10/16 03:04:51 UTC

Cassandra 2.2.1 stuck at 100% on Windows

Hi,
We are running Cassandra 2.2.1 on Windows 2008R2, and we see that multiple
Nodes are stuck at 99% CPU bringing the whole VM to a halt.
We suspect that there is another process that IT/Windows is causing the CPU
issue, but the problem is Cassandra does NOT recover, the CPU utilization
start climbing until the VM is not usable. If we restart Cassandra, things
go back to normal.
Anyone have seen this before?

Thanks
-- Alaa

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*

Re: Cassandra 2.2.1 stuck at 100% on Windows

Posted by "Alaa Zubaidi (PDF)" <al...@pdf.com>.
It was a process installed by IT that triggered this, when disabling the
process everything went to normal.

Thanks.
Alaa

On Fri, Oct 16, 2015 at 11:32 AM, Alaa Zubaidi (PDF) <al...@pdf.com>
wrote:

> Thanks guys,
> I will look into this more, and put an update here, if I find anything
>
> On Fri, Oct 16, 2015 at 10:40 AM, Josh McKenzie <jm...@apache.org>
> wrote:
>
>> One option: use process explorer to find out the TID's of the java
>> process (instructions
>> <https://superuser.com/questions/462969/how-can-i-view-the-active-threads-of-a-running-program>),
>> screen cap that, then also run jstack against the running cassandra process
>> out to a file a few times (instructions
>> <https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html>
>> ).
>>
>> We should be able to at least link up the TID to the hex thread # in the
>> jstack output to figure out who/what is spinning on there.
>>
>> On Fri, Oct 16, 2015 at 1:28 PM, Michael Shuler <mi...@pbandjelly.org>
>> wrote:
>>
>>> On 10/16/2015 12:02 PM, Alaa Zubaidi (PDF) wrote:
>>>
>>>> No OOM in any of the log files, and NO long GC at that time.
>>>> I attached the last 2 minutes before it hangs until we restart cassandra
>>>> after hour an half.
>>>>
>>>
>>> Your logs show gossip issues with some seed nodes. `nodetool gossipinfo`
>>> on all nodes might be an interesting place to start.
>>>
>>> --
>>> Michael
>>>
>>
>>
>
>
> --
>
> Alaa Zubaidi
> PDF Solutions, Inc.
> 333 West San Carlos Street, Suite 1000
> San Jose, CA 95110  USA
> Tel: 408-283-5639
> fax: 408-938-6479
> email: alaa.zubaidi@pdf.com
>
>


-- 

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zubaidi@pdf.com

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*

Re: Cassandra 2.2.1 stuck at 100% on Windows

Posted by "Alaa Zubaidi (PDF)" <al...@pdf.com>.
Thanks guys,
I will look into this more, and put an update here, if I find anything

On Fri, Oct 16, 2015 at 10:40 AM, Josh McKenzie <jm...@apache.org>
wrote:

> One option: use process explorer to find out the TID's of the java process
> (instructions
> <https://superuser.com/questions/462969/how-can-i-view-the-active-threads-of-a-running-program>),
> screen cap that, then also run jstack against the running cassandra process
> out to a file a few times (instructions
> <https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html>
> ).
>
> We should be able to at least link up the TID to the hex thread # in the
> jstack output to figure out who/what is spinning on there.
>
> On Fri, Oct 16, 2015 at 1:28 PM, Michael Shuler <mi...@pbandjelly.org>
> wrote:
>
>> On 10/16/2015 12:02 PM, Alaa Zubaidi (PDF) wrote:
>>
>>> No OOM in any of the log files, and NO long GC at that time.
>>> I attached the last 2 minutes before it hangs until we restart cassandra
>>> after hour an half.
>>>
>>
>> Your logs show gossip issues with some seed nodes. `nodetool gossipinfo`
>> on all nodes might be an interesting place to start.
>>
>> --
>> Michael
>>
>
>


-- 

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zubaidi@pdf.com

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*

Re: Cassandra 2.2.1 stuck at 100% on Windows

Posted by Josh McKenzie <jm...@apache.org>.
One option: use process explorer to find out the TID's of the java process (
instructions
<https://superuser.com/questions/462969/how-can-i-view-the-active-threads-of-a-running-program>),
screen cap that, then also run jstack against the running cassandra process
out to a file a few times (instructions
<https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html>).

We should be able to at least link up the TID to the hex thread # in the
jstack output to figure out who/what is spinning on there.

On Fri, Oct 16, 2015 at 1:28 PM, Michael Shuler <mi...@pbandjelly.org>
wrote:

> On 10/16/2015 12:02 PM, Alaa Zubaidi (PDF) wrote:
>
>> No OOM in any of the log files, and NO long GC at that time.
>> I attached the last 2 minutes before it hangs until we restart cassandra
>> after hour an half.
>>
>
> Your logs show gossip issues with some seed nodes. `nodetool gossipinfo`
> on all nodes might be an interesting place to start.
>
> --
> Michael
>

Re: Cassandra 2.2.1 stuck at 100% on Windows

Posted by Michael Shuler <mi...@pbandjelly.org>.
On 10/16/2015 12:02 PM, Alaa Zubaidi (PDF) wrote:
> No OOM in any of the log files, and NO long GC at that time.
> I attached the last 2 minutes before it hangs until we restart cassandra
> after hour an half.

Your logs show gossip issues with some seed nodes. `nodetool gossipinfo` 
on all nodes might be an interesting place to start.

-- 
Michael

Re: Cassandra 2.2.1 stuck at 100% on Windows

Posted by "Alaa Zubaidi (PDF)" <al...@pdf.com>.
Thanks Rob,

No OOM in any of the log files, and NO long GC at that time.
I attached the last 2 minutes before it hangs until we restart cassandra
after hour an half.
Regards,
Alaa

On Thu, Oct 15, 2015 at 6:29 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Oct 15, 2015 at 6:04 PM, Alaa Zubaidi (PDF) <al...@pdf.com>
> wrote:
>
>> We are running Cassandra 2.2.1 on Windows 2008R2, and we see that
>> multiple Nodes are stuck at 99% CPU bringing the whole VM to a halt.
>> We suspect that there is another process that IT/Windows is causing the
>> CPU issue, but the problem is Cassandra does NOT recover, the CPU
>> utilization start climbing until the VM is not usable. If we restart
>> Cassandra, things go back to normal.
>>
>
> Most cases where a JVM does not recover and churns at maxed CPU are the
> result of GC failure and/or OOM.
>
> Check your logs for OOM and long GCs.
>
> Also FWIW you are among a relatively small group of Windows operators.
> Other than with the people working at datastax to support Windows, there is
> not a whole lot of well understood operational best practice for Cassandra
> on Windows.
>
> =Rob
>
>


-- 

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zubaidi@pdf.com

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*

Re: Cassandra 2.2.1 stuck at 100% on Windows

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Oct 15, 2015 at 6:04 PM, Alaa Zubaidi (PDF) <al...@pdf.com>
wrote:

> We are running Cassandra 2.2.1 on Windows 2008R2, and we see that multiple
> Nodes are stuck at 99% CPU bringing the whole VM to a halt.
> We suspect that there is another process that IT/Windows is causing the
> CPU issue, but the problem is Cassandra does NOT recover, the CPU
> utilization start climbing until the VM is not usable. If we restart
> Cassandra, things go back to normal.
>

Most cases where a JVM does not recover and churns at maxed CPU are the
result of GC failure and/or OOM.

Check your logs for OOM and long GCs.

Also FWIW you are among a relatively small group of Windows operators.
Other than with the people working at datastax to support Windows, there is
not a whole lot of well understood operational best practice for Cassandra
on Windows.

=Rob