You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Riccardo Ferrari <fe...@gmail.com> on 2016/08/11 09:28:35 UTC

JVM Crash on 3.0.6

Hi C* users,

In recent time I had couple of my nodes crashing (on different dates). I
don't have core dumps however my JVM crash logs goes like this:
===========================================
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f8f608c8e40, pid=6916, tid=140253195458304
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build
1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# C  [liblz4-java6471621810388748482.so+0x5e40]  LZ4_decompress_fast+0xa0
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
...
---------------  T H R E A D  ---------------


Current thread (0x00007f8f5c7b2d50):  JavaThread "CompactionExecutor:11952"
daemon [_thread_in_native, id=16219,
stack(0x00007f8f3de0d000,0x00007f8f3de4e000)]
...
Stack: [0x00007f8f3de0d000,0x00007f8f3de4e000],  sp=0x00007f8f3de4c0e0,
 free space=252k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
C  [liblz4-java6471621810388748482.so+0x5e40]  LZ4_decompress_fast+0xa0

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 4150
 net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast([BLjava/nio/ByteBuffer;I[BLjava/nio/ByteBuffer;II)I
(0 bytes) @ 0x00007f8f791e4723 [0x00007f8f791e4680+0xa3]
J 19836 C2
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap()V
(354 bytes) @ 0x00007f8f7b714930 [0x00007f8f7b714320+0x610]
J 6662 C2
org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(Lorg/apache/cassandra/io/sstable/format/SSTableReader;Lorg/apache/cassandra/io/util/FileDataInput;Lorg/apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/RowIndexEntry;Lorg/apache
/cassandra/db/filter/ColumnFilter;Z)V (389 bytes) @ 0x00007f8f79c1cdb8
[0x00007f8f79c1c500+0x8b8]
J 22393 C2
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(Lorg/apache/cassandra/db/ColumnFamilyStore;Z)Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;
(818 bytes) @ 0x00007f8f7c1d4364 [0x00007f8f7c1d2f40+0x1424]
J 22166 C1
org.apache.cassandra.db.Keyspace.indexPartition(Lorg/apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/ColumnFamilyStore;Ljava/util/Set;)V
(274 bytes) @ 0x00007f8f7beb6304 [0x00007f8f7beb5420+0xee4]
j  org.apache.cassandra.index.SecondaryIndexBuilder.build()V+46
j  org.apache.cassandra.db.compaction.CompactionManager$11.run()V+18
J 22293 C2
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
(225 bytes) @ 0x00007f8f7b17727c [0x00007f8f7b176da0+0x4dc]
J 21302 C2 java.lang.Thread.run()V (17 bytes) @ 0x00007f8f79fe59f8
[0x00007f8f79fe59a0+0x58]
v  ~StubRoutines::call_stub
...
VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

Heap:
 par new generation   total 368640K, used 123009K [0x00000006d5e00000,
0x00000006eee00000, 0x00000006eee00000)
  eden space 327680K,  34% used [0x00000006d5e00000, 0x00000006dcaf35c8,
0x00000006e9e00000)
  from space 40960K,  27% used [0x00000006e9e00000, 0x00000006ea92cf00,
0x00000006ec600000)
  to   space 40960K,   0% used [0x00000006ec600000, 0x00000006ec600000,
0x00000006eee00000)
 concurrent mark-sweep generation total 3426304K, used 1288977K
[0x00000006eee00000, 0x00000007c0000000, 0x00000007c0000000)
 Metaspace       used 41685K, capacity 42832K, committed 43156K, reserved
1087488K
  class space    used 4455K, capacity 4702K, committed 4756K, reserved
1048576K
...
OS:DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04.1 LTS"

uname:Linux 3.2.0-35-virtual #55-Ubuntu SMP Wed Dec 5 18:02:05 UTC 2012
x86_64
libc:glibc 2.15 NPTL 2.15
rlimit: STACK 8192k, CORE 0k, NPROC 119708, NOFILE 100000, AS infinity
load average:2.96 1.08 0.60

What am I missing?
Both crashes seems to happen during compaction and when running native code
(LZ4).
Both crashes happens when the nodes are doing scheduled repair (so under
increased load).
Machines are 4vCPUs and 15GB ram (m1.xlarge)
Any hint?

Best,

Re: JVM Crash on 3.0.6

Posted by Stefano Ortolani <os...@gmail.com>.

Not really related, but know that on 12.04 I had to disable jemalloc,
otherwise nodes would randomly die at startup (
https://issues.apache.org/jira/browse/CASSANDRA-11723)

Regards,
Stefano

On Thu, Aug 11, 2016 at 10:28 AM, Riccardo Ferrari <fe...@gmail.com>
wrote:

> Hi C* users,
>
> In recent time I had couple of my nodes crashing (on different dates). I
> don't have core dumps however my JVM crash logs goes like this:
> ===========================================
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f8f608c8e40, pid=6916, tid=140253195458304
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build
> 1.8.0_60-b27)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [liblz4-java6471621810388748482.so+0x5e40]  LZ4_decompress_fast+0xa0
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> ...
> ---------------  T H R E A D  ---------------
>
>
> Current thread (0x00007f8f5c7b2d50):  JavaThread
> "CompactionExecutor:11952" daemon [_thread_in_native, id=16219,
> stack(0x00007f8f3de0d000,0x00007f8f3de4e000)]
> ...
> Stack: [0x00007f8f3de0d000,0x00007f8f3de4e000],  sp=0x00007f8f3de4c0e0,
>  free space=252k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> C  [liblz4-java6471621810388748482.so+0x5e40]  LZ4_decompress_fast+0xa0
>
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> J 4150  net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast([BLjava/nio/
> ByteBuffer;I[BLjava/nio/ByteBuffer;II)I (0 bytes) @ 0x00007f8f791e4723
> [0x00007f8f791e4680+0xa3]
> J 19836 C2 org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap()V
> (354 bytes) @ 0x00007f8f7b714930 [0x00007f8f7b714320+0x610]
> J 6662 C2 org.apache.cassandra.db.columniterator.
> AbstractSSTableIterator.<init>(Lorg/apache/cassandra/io/
> sstable/format/SSTableReader;Lorg/apache/cassandra/io/util/
> FileDataInput;Lorg/apache/cassandra/db/DecoratedKey;
> Lorg/apache/cassandra/db/RowIndexEntry;Lorg/apache
> /cassandra/db/filter/ColumnFilter;Z)V (389 bytes) @ 0x00007f8f79c1cdb8
> [0x00007f8f79c1c500+0x8b8]
> J 22393 C2 org.apache.cassandra.db.SinglePartitionReadCommand.
> queryMemtableAndDiskInternal(Lorg/apache/cassandra/db/
> ColumnFamilyStore;Z)Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;
> (818 bytes) @ 0x00007f8f7c1d4364 [0x00007f8f7c1d2f40+0x1424]
> J 22166 C1 org.apache.cassandra.db.Keyspace.indexPartition(Lorg/
> apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/
> ColumnFamilyStore;Ljava/util/Set;)V (274 bytes) @ 0x00007f8f7beb6304
> [0x00007f8f7beb5420+0xee4]
> j  org.apache.cassandra.index.SecondaryIndexBuilder.build()V+46
> j  org.apache.cassandra.db.compaction.CompactionManager$11.run()V+18
> J 22293 C2 java.util.concurrent.ThreadPoolExecutor.runWorker(
> Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (225 bytes) @
> 0x00007f8f7b17727c [0x00007f8f7b176da0+0x4dc]
> J 21302 C2 java.lang.Thread.run()V (17 bytes) @ 0x00007f8f79fe59f8
> [0x00007f8f79fe59a0+0x58]
> v  ~StubRoutines::call_stub
> ...
> VM state:not at safepoint (normal execution)
>
> VM Mutex/Monitor currently owned by a thread: None
>
> Heap:
>  par new generation   total 368640K, used 123009K [0x00000006d5e00000,
> 0x00000006eee00000, 0x00000006eee00000)
>   eden space 327680K,  34% used [0x00000006d5e00000, 0x00000006dcaf35c8,
> 0x00000006e9e00000)
>   from space 40960K,  27% used [0x00000006e9e00000, 0x00000006ea92cf00,
> 0x00000006ec600000)
>   to   space 40960K,   0% used [0x00000006ec600000, 0x00000006ec600000,
> 0x00000006eee00000)
>  concurrent mark-sweep generation total 3426304K, used 1288977K
> [0x00000006eee00000, 0x00000007c0000000, 0x00000007c0000000)
>  Metaspace       used 41685K, capacity 42832K, committed 43156K, reserved
> 1087488K
>   class space    used 4455K, capacity 4702K, committed 4756K, reserved
> 1048576K
> ...
> OS:DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=12.04
> DISTRIB_CODENAME=precise
> DISTRIB_DESCRIPTION="Ubuntu 12.04.1 LTS"
>
> uname:Linux 3.2.0-35-virtual #55-Ubuntu SMP Wed Dec 5 18:02:05 UTC 2012
> x86_64
> libc:glibc 2.15 NPTL 2.15
> rlimit: STACK 8192k, CORE 0k, NPROC 119708, NOFILE 100000, AS infinity
> load average:2.96 1.08 0.60
>
> What am I missing?
> Both crashes seems to happen during compaction and when running native
> code (LZ4).
> Both crashes happens when the nodes are doing scheduled repair (so under
> increased load).
> Machines are 4vCPUs and 15GB ram (m1.xlarge)
> Any hint?
>
> Best,
>