You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Hiller, Dean" <De...@nrel.gov> on 2013/02/15 17:41:14 UTC

odd production issue today 1.1.4

We ran into an issue today where website became around 10 times slower.  We found out node 5 out of our 6 nodes was hitting 2100% cpu (cat /proc/cpuinfo reveals a 16 processor machine).  I am really not sure how to hit 2100% unless we had 21 processors.  It bounces between 300% and 2100% so I tried to a do a thread dump and had to use –F which then hotspot hit a nullpointer :(.

I copied off all my logs after restarting(should have done it before restarting it).  Any ideas what I could even look for as to what went wrong with this node?

Also, we know our astyanax for some reason is not setup properly yet so we probably would not have seen an issue had we had all nodes in the seed list(which we changed today) as astyanax is supposed to be measuring time per request and changing which nodes it hits but we know it only hits nodes in our seedlist right now as we have not fixed that yet.  Our astyanax was hitting 3,4,5,6 and did not have 1 and 2 in the seed list (we rollout a new version next wed. with the new seedlist including the last two delaying the dynamic discovery config we need to look at).

Thanks,
Dean

Commands I ran with jstack that didn't work out too well….

[cassandra@a5 ~]$ jstack -l 20907 > threads.txt
20907: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
[cassandra@a5 ~]$ jstack -l -F  20907 > threads.txt
Attaching to process ID 20907, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.7-b02
java.lang.NullPointerException
at sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:426)
at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
at sun.tools.jstack.JStack.main(JStack.java:84)
[cassandra@a5 ~]$ java -version
java version "1.6.0_32"

Re: odd production issue today 1.1.4

Posted by aaron morton <aa...@thelastpickle.com>.
There is always this old chestnut http://wiki.apache.org/cassandra/FAQ#ubuntu_hangs

A
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/02/2013, at 8:22 AM, Edward Capriolo <ed...@gmail.com> wrote:

> With hyper threading a core can show up as two or maybe even four
> physical system processors, this is something the kernel does.
> 
> On Fri, Feb 15, 2013 at 11:41 AM, Hiller, Dean <De...@nrel.gov> wrote:
>> We ran into an issue today where website became around 10 times slower.  We found out node 5 out of our 6 nodes was hitting 2100% cpu (cat /proc/cpuinfo reveals a 16 processor machine).  I am really not sure how to hit 2100% unless we had 21 processors.  It bounces between 300% and 2100% so I tried to a do a thread dump and had to use –F which then hotspot hit a nullpointer :(.
>> 
>> I copied off all my logs after restarting(should have done it before restarting it).  Any ideas what I could even look for as to what went wrong with this node?
>> 
>> Also, we know our astyanax for some reason is not setup properly yet so we probably would not have seen an issue had we had all nodes in the seed list(which we changed today) as astyanax is supposed to be measuring time per request and changing which nodes it hits but we know it only hits nodes in our seedlist right now as we have not fixed that yet.  Our astyanax was hitting 3,4,5,6 and did not have 1 and 2 in the seed list (we rollout a new version next wed. with the new seedlist including the last two delaying the dynamic discovery config we need to look at).
>> 
>> Thanks,
>> Dean
>> 
>> Commands I ran with jstack that didn't work out too well….
>> 
>> [cassandra@a5 ~]$ jstack -l 20907 > threads.txt
>> 20907: Unable to open socket file: target process not responding or HotSpot VM not loaded
>> The -F option can be used when the target process is not responding
>> [cassandra@a5 ~]$ jstack -l -F  20907 > threads.txt
>> Attaching to process ID 20907, please wait...
>> Debugger attached successfully.
>> Server compiler detected.
>> JVM version is 20.7-b02
>> java.lang.NullPointerException
>> at sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:426)
>> at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
>> at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
>> at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
>> at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
>> at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
>> at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
>> at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
>> at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
>> at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
>> at sun.tools.jstack.JStack.main(JStack.java:84)
>> [cassandra@a5 ~]$ java -version
>> java version "1.6.0_32"


Re: odd production issue today 1.1.4

Posted by Edward Capriolo <ed...@gmail.com>.
With hyper threading a core can show up as two or maybe even four
physical system processors, this is something the kernel does.

On Fri, Feb 15, 2013 at 11:41 AM, Hiller, Dean <De...@nrel.gov> wrote:
> We ran into an issue today where website became around 10 times slower.  We found out node 5 out of our 6 nodes was hitting 2100% cpu (cat /proc/cpuinfo reveals a 16 processor machine).  I am really not sure how to hit 2100% unless we had 21 processors.  It bounces between 300% and 2100% so I tried to a do a thread dump and had to use –F which then hotspot hit a nullpointer :(.
>
> I copied off all my logs after restarting(should have done it before restarting it).  Any ideas what I could even look for as to what went wrong with this node?
>
> Also, we know our astyanax for some reason is not setup properly yet so we probably would not have seen an issue had we had all nodes in the seed list(which we changed today) as astyanax is supposed to be measuring time per request and changing which nodes it hits but we know it only hits nodes in our seedlist right now as we have not fixed that yet.  Our astyanax was hitting 3,4,5,6 and did not have 1 and 2 in the seed list (we rollout a new version next wed. with the new seedlist including the last two delaying the dynamic discovery config we need to look at).
>
> Thanks,
> Dean
>
> Commands I ran with jstack that didn't work out too well….
>
> [cassandra@a5 ~]$ jstack -l 20907 > threads.txt
> 20907: Unable to open socket file: target process not responding or HotSpot VM not loaded
> The -F option can be used when the target process is not responding
> [cassandra@a5 ~]$ jstack -l -F  20907 > threads.txt
> Attaching to process ID 20907, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 20.7-b02
> java.lang.NullPointerException
> at sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:426)
> at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
> at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
> at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
> at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
> at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
> at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
> at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
> at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
> at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
> at sun.tools.jstack.JStack.main(JStack.java:84)
> [cassandra@a5 ~]$ java -version
> java version "1.6.0_32"