You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Alaa Zubaidi (PDF)" <al...@pdf.com> on 2015/10/16 03:04:51 UTC
Cassandra 2.2.1 stuck at 100% on Windows
Hi,
We are running Cassandra 2.2.1 on Windows 2008R2, and we see that multiple
Nodes are stuck at 99% CPU bringing the whole VM to a halt.
We suspect that there is another process that IT/Windows is causing the CPU
issue, but the problem is Cassandra does NOT recover, the CPU utilization
start climbing until the VM is not usable. If we restart Cassandra, things
go back to normal.
Anyone have seen this before?
Thanks
-- Alaa
--
*This message may contain confidential and privileged information. If it
has been sent to you in error, please reply to advise the sender of the
error and then immediately permanently delete it and all attachments to it
from your systems. If you are not the intended recipient, do not read,
copy, disclose or otherwise use this message or any attachments to it. The
sender disclaims any liability for such unauthorized use. PLEASE NOTE that
all incoming e-mails sent to PDF e-mail accounts will be archived and may
be scanned by us and/or by external service providers to detect and prevent
threats to our systems, investigate illegal or inappropriate behavior,
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*
Re: Cassandra 2.2.1 stuck at 100% on Windows
Posted by "Alaa Zubaidi (PDF)" <al...@pdf.com>.
It was a process installed by IT that triggered this, when disabling the
process everything went to normal.
Thanks.
Alaa
On Fri, Oct 16, 2015 at 11:32 AM, Alaa Zubaidi (PDF) <al...@pdf.com>
wrote:
> Thanks guys,
> I will look into this more, and put an update here, if I find anything
>
> On Fri, Oct 16, 2015 at 10:40 AM, Josh McKenzie <jm...@apache.org>
> wrote:
>
>> One option: use process explorer to find out the TID's of the java
>> process (instructions
>> <https://superuser.com/questions/462969/how-can-i-view-the-active-threads-of-a-running-program>),
>> screen cap that, then also run jstack against the running cassandra process
>> out to a file a few times (instructions
>> <https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html>
>> ).
>>
>> We should be able to at least link up the TID to the hex thread # in the
>> jstack output to figure out who/what is spinning on there.
>>
>> On Fri, Oct 16, 2015 at 1:28 PM, Michael Shuler <mi...@pbandjelly.org>
>> wrote:
>>
>>> On 10/16/2015 12:02 PM, Alaa Zubaidi (PDF) wrote:
>>>
>>>> No OOM in any of the log files, and NO long GC at that time.
>>>> I attached the last 2 minutes before it hangs until we restart cassandra
>>>> after hour an half.
>>>>
>>>
>>> Your logs show gossip issues with some seed nodes. `nodetool gossipinfo`
>>> on all nodes might be an interesting place to start.
>>>
>>> --
>>> Michael
>>>
>>
>>
>
>
> --
>
> Alaa Zubaidi
> PDF Solutions, Inc.
> 333 West San Carlos Street, Suite 1000
> San Jose, CA 95110 USA
> Tel: 408-283-5639
> fax: 408-938-6479
> email: alaa.zubaidi@pdf.com
>
>
--
Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110 USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zubaidi@pdf.com
--
*This message may contain confidential and privileged information. If it
has been sent to you in error, please reply to advise the sender of the
error and then immediately permanently delete it and all attachments to it
from your systems. If you are not the intended recipient, do not read,
copy, disclose or otherwise use this message or any attachments to it. The
sender disclaims any liability for such unauthorized use. PLEASE NOTE that
all incoming e-mails sent to PDF e-mail accounts will be archived and may
be scanned by us and/or by external service providers to detect and prevent
threats to our systems, investigate illegal or inappropriate behavior,
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*
Re: Cassandra 2.2.1 stuck at 100% on Windows
Posted by "Alaa Zubaidi (PDF)" <al...@pdf.com>.
Thanks guys,
I will look into this more, and put an update here, if I find anything
On Fri, Oct 16, 2015 at 10:40 AM, Josh McKenzie <jm...@apache.org>
wrote:
> One option: use process explorer to find out the TID's of the java process
> (instructions
> <https://superuser.com/questions/462969/how-can-i-view-the-active-threads-of-a-running-program>),
> screen cap that, then also run jstack against the running cassandra process
> out to a file a few times (instructions
> <https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html>
> ).
>
> We should be able to at least link up the TID to the hex thread # in the
> jstack output to figure out who/what is spinning on there.
>
> On Fri, Oct 16, 2015 at 1:28 PM, Michael Shuler <mi...@pbandjelly.org>
> wrote:
>
>> On 10/16/2015 12:02 PM, Alaa Zubaidi (PDF) wrote:
>>
>>> No OOM in any of the log files, and NO long GC at that time.
>>> I attached the last 2 minutes before it hangs until we restart cassandra
>>> after hour an half.
>>>
>>
>> Your logs show gossip issues with some seed nodes. `nodetool gossipinfo`
>> on all nodes might be an interesting place to start.
>>
>> --
>> Michael
>>
>
>
--
Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110 USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zubaidi@pdf.com
--
*This message may contain confidential and privileged information. If it
has been sent to you in error, please reply to advise the sender of the
error and then immediately permanently delete it and all attachments to it
from your systems. If you are not the intended recipient, do not read,
copy, disclose or otherwise use this message or any attachments to it. The
sender disclaims any liability for such unauthorized use. PLEASE NOTE that
all incoming e-mails sent to PDF e-mail accounts will be archived and may
be scanned by us and/or by external service providers to detect and prevent
threats to our systems, investigate illegal or inappropriate behavior,
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*
Re: Cassandra 2.2.1 stuck at 100% on Windows
Posted by Josh McKenzie <jm...@apache.org>.
One option: use process explorer to find out the TID's of the java process (
instructions
<https://superuser.com/questions/462969/how-can-i-view-the-active-threads-of-a-running-program>),
screen cap that, then also run jstack against the running cassandra process
out to a file a few times (instructions
<https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html>).
We should be able to at least link up the TID to the hex thread # in the
jstack output to figure out who/what is spinning on there.
On Fri, Oct 16, 2015 at 1:28 PM, Michael Shuler <mi...@pbandjelly.org>
wrote:
> On 10/16/2015 12:02 PM, Alaa Zubaidi (PDF) wrote:
>
>> No OOM in any of the log files, and NO long GC at that time.
>> I attached the last 2 minutes before it hangs until we restart cassandra
>> after hour an half.
>>
>
> Your logs show gossip issues with some seed nodes. `nodetool gossipinfo`
> on all nodes might be an interesting place to start.
>
> --
> Michael
>
Re: Cassandra 2.2.1 stuck at 100% on Windows
Posted by Michael Shuler <mi...@pbandjelly.org>.
On 10/16/2015 12:02 PM, Alaa Zubaidi (PDF) wrote:
> No OOM in any of the log files, and NO long GC at that time.
> I attached the last 2 minutes before it hangs until we restart cassandra
> after hour an half.
Your logs show gossip issues with some seed nodes. `nodetool gossipinfo`
on all nodes might be an interesting place to start.
--
Michael
Re: Cassandra 2.2.1 stuck at 100% on Windows
Posted by "Alaa Zubaidi (PDF)" <al...@pdf.com>.
Thanks Rob,
No OOM in any of the log files, and NO long GC at that time.
I attached the last 2 minutes before it hangs until we restart cassandra
after hour an half.
Regards,
Alaa
On Thu, Oct 15, 2015 at 6:29 PM, Robert Coli <rc...@eventbrite.com> wrote:
> On Thu, Oct 15, 2015 at 6:04 PM, Alaa Zubaidi (PDF) <al...@pdf.com>
> wrote:
>
>> We are running Cassandra 2.2.1 on Windows 2008R2, and we see that
>> multiple Nodes are stuck at 99% CPU bringing the whole VM to a halt.
>> We suspect that there is another process that IT/Windows is causing the
>> CPU issue, but the problem is Cassandra does NOT recover, the CPU
>> utilization start climbing until the VM is not usable. If we restart
>> Cassandra, things go back to normal.
>>
>
> Most cases where a JVM does not recover and churns at maxed CPU are the
> result of GC failure and/or OOM.
>
> Check your logs for OOM and long GCs.
>
> Also FWIW you are among a relatively small group of Windows operators.
> Other than with the people working at datastax to support Windows, there is
> not a whole lot of well understood operational best practice for Cassandra
> on Windows.
>
> =Rob
>
>
--
Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110 USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zubaidi@pdf.com
--
*This message may contain confidential and privileged information. If it
has been sent to you in error, please reply to advise the sender of the
error and then immediately permanently delete it and all attachments to it
from your systems. If you are not the intended recipient, do not read,
copy, disclose or otherwise use this message or any attachments to it. The
sender disclaims any liability for such unauthorized use. PLEASE NOTE that
all incoming e-mails sent to PDF e-mail accounts will be archived and may
be scanned by us and/or by external service providers to detect and prevent
threats to our systems, investigate illegal or inappropriate behavior,
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*
Re: Cassandra 2.2.1 stuck at 100% on Windows
Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Oct 15, 2015 at 6:04 PM, Alaa Zubaidi (PDF) <al...@pdf.com>
wrote:
> We are running Cassandra 2.2.1 on Windows 2008R2, and we see that multiple
> Nodes are stuck at 99% CPU bringing the whole VM to a halt.
> We suspect that there is another process that IT/Windows is causing the
> CPU issue, but the problem is Cassandra does NOT recover, the CPU
> utilization start climbing until the VM is not usable. If we restart
> Cassandra, things go back to normal.
>
Most cases where a JVM does not recover and churns at maxed CPU are the
result of GC failure and/or OOM.
Check your logs for OOM and long GCs.
Also FWIW you are among a relatively small group of Windows operators.
Other than with the people working at datastax to support Windows, there is
not a whole lot of well understood operational best practice for Cassandra
on Windows.
=Rob