You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jose Martinez Poblete (JIRA)" <ji...@apache.org> on 2014/09/22 16:10:34 UTC

[jira] [Issue Comment Deleted] (CASSANDRA-7983) nodetool repair triggers OOM

     [ https://issues.apache.org/jira/browse/CASSANDRA-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jose Martinez Poblete updated CASSANDRA-7983:
---------------------------------------------
    Comment: was deleted

(was: The rest of the system logs)

> nodetool repair triggers OOM
> ----------------------------
>
>                 Key: CASSANDRA-7983
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7983
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment:  
> {noformat}
>  INFO [main] 2014-09-16 14:23:14,621 DseDaemon.java (line 368) DSE version: 4.5.0
>  INFO [main] 2014-09-16 14:23:14,622 DseDaemon.java (line 369) Hadoop version: 1.0.4.13
>  INFO [main] 2014-09-16 14:23:14,627 DseDaemon.java (line 370) Hive version: 0.12.0.3
>  INFO [main] 2014-09-16 14:23:14,628 DseDaemon.java (line 371) Pig version: 0.10.1
>  INFO [main] 2014-09-16 14:23:14,629 DseDaemon.java (line 372) Solr version: 4.6.0.2.4
>  INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 373) Sqoop version: 1.4.4.14.1
>  INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 374) Mahout version: 0.8
>  INFO [main] 2014-09-16 14:23:14,631 DseDaemon.java (line 375) Appender version: 3.0.2
>  INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 376) Spark version: 0.9.1
>  INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 377) Shark version: 0.9.1.1
>  INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 160) JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_51
>  INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 188) Heap size: 6316621824/6316621824
> {noformat}
>            Reporter: Jose Martinez Poblete
>         Attachments: gc.log.0, nbcqa-chc-a01_systemlog.tar.Z, nbcqa-chc-a03_systemlog.tar.Z, system.log
>
>
> Customer has a 3 node cluster with 500Mb data on each node
> {noformat}
> [cassandra@nbcqa-chc-a02 ~]$ nodetool status
> Note: Ownership information does not include topology; for complete information, specify a keyspace
> Datacenter: CH2
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens  Owns   Host ID                               Rack
> UN  162.150.4.234  255.26 MB  256     33.2%  4ad1b6a8-8759-4920-b54a-f059126900df  RAC1
> UN  162.150.4.235  318.37 MB  256     32.6%  3eb0ec58-4b81-442e-bee5-4c91da447f38  RAC1
> UN  162.150.4.167  243.7 MB   256     34.2%  5b2c1900-bf03-41c1-bb4e-82df1655b8d8  RAC1
> [cassandra@nbcqa-chc-a02 ~]$
> {noformat}
> After we run repair command, system runs into OOM after some 45 minutes
> Nothing else is running
> {noformat}
> [cassandra@nbcqa-chc-a02 ~]$ date
> Fri Sep 19 15:55:33 UTC 2014
> [cassandra@nbcqa-chc-a02 ~]$ nodetool repair -st -9220354588320251877 -et -9220354588320251873
> Sep 19, 2014 4:06:08 PM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: Read timed out
> {noformat}
> Herer is when we run OOM
> {noformat}
> ERROR [ReadStage:28914] 2014-09-19 16:34:50,381 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:28914,5,main]
> java.lang.OutOfMemoryError: Java heap space
>         at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:69)
>         at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:76)
>         at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:43)
>         at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48)
>         at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
>         at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1195)
>         at org.apache.cassandra.db.columniterator.SimpleSliceReader.<init>(SimpleSliceReader.java:57)
>         at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65)
>         at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42)
>         at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167)
>         at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
>         at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:250)
>         at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
>         at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
>         at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
>         at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:333)
>         at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
>         at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1363)
>         at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1927)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>         at java.lang.Thread.run(Unknown Source)
> {noformat}
> Cassandra process pegs 1 of the 8 CPU's 100% 
> {noformat}
> top - 16:50:07 up 11 days,  2:01,  2 users,  load average: 0.54, 0.60, 0.65
> Tasks: 175 total,   1 running, 174 sleeping,   0 stopped,   0 zombie
> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  :  0.7%us,  0.3%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu4  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu5  :  0.3%us,  0.3%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.3%st
> Cpu6  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu7  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  16332056k total, 16167212k used,   164844k free,   149956k buffers
> Swap:        0k total,        0k used,        0k free,  8360056k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2161 cassandr  20   0 11.5g 6.9g 227m S 107.8 44.0 281:59.49 java
>  9942 root      20   0  106m 2320 1344 S  1.0  0.0   0:00.03 dhclient-script
> 28969 opscente  20   0 4479m 188m  12m S  0.7  1.2  56:24.24 java
>  5123 cassandr  20   0 1788m 107m  28m S  0.3  0.7   0:08.09 java
>     1 root      20   0 19228 1352 1072 S  0.0  0.0   0:00.82 init
>     2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 kthreadd
>     3 root      RT   0     0    0    0 S  0.0  0.0   0:05.52 migration/0
>     4 root      20   0     0    0    0 S  0.0  0.0   0:13.15 ksoftirqd/0
>     5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
>     6 root      RT   0     0    0    0 S  0.0  0.0   0:03.33 watchdog/0
>     7 root      RT   0     0    0    0 S  0.0  0.0   0:04.88 migration/1
>     8 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1
>     9 root      20   0     0    0    0 S  0.0  0.0   0:19.21 ksoftirqd/1
>    10 root      RT   0     0    0    0 S  0.0  0.0   0:03.24 watchdog/1
>    11 root      RT   0     0    0    0 S  0.0  0.0   0:05.46 migration/2
>    12 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/2
>    13 root      20   0     0    0    0 S  0.0  0.0   0:16.87 ksoftirqd/2
>    14 root      RT   0     0    0    0 S  0.0  0.0   0:03.49 watchdog/2
>    15 root      RT   0     0    0    0 S  0.0  0.0   0:05.31 migration/3
>    16 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/3
>    17 root      20   0     0    0    0 S  0.0  0.0   0:19.33 ksoftirqd/3
>    18 root      RT   0     0    0    0 S  0.0  0.0   0:03.43 watchdog/3
>    19 root      RT   0     0    0    0 S  0.0  0.0   0:05.36 migration/4
>    20 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/4
>    21 root      20   0     0    0    0 S  0.0  0.0   0:17.64 ksoftirqd/4
>    22 root      RT   0     0    0    0 S  0.0  0.0   0:03.18 watchdog/4
>    23 root      RT   0     0    0    0 S  0.0  0.0   0:05.31 migration/5
> {noformat}
> There is a 12Gb HeapDump, the memory leak suspects show the following trace
> {noformat}
>  Thread Stack
> RMI TCP Connection(1621)-162.150.4.235
>   at org.apache.cassandra.service.StorageService.createRepairRangeFrom(Ljava/lang/String;Ljava/lang/String;)Ljava/util/Collection; (StorageService.java:2612)
>   at org.apache.cassandra.service.StorageService.forceRepairRangeAsync(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;ZLjava/util/Collection;Ljava/util/Collection;[Ljava/lang/String;)I (StorageService.java:2541)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at sun.reflect.misc.Trampoline.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor8.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at sun.reflect.misc.MethodUtil.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at com.sun.jmx.mbeanserver.PerInterface.invoke(Ljava/lang/Object;Ljava/lang/String;[Ljava/lang/Object;[Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Ljava/lang/String;[Ljava/lang/Object;[Ljava/lang/String;)Ljava/lang/Object; (Unknown Source)
>   at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Ljavax/management/ObjectName;Ljava/lang/String;[Ljava/lang/Object;[Ljava/lang/String;)Ljava/lang/Object; (Unknown Source)
>   at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Ljavax/management/ObjectName;Ljava/lang/String;[Ljava/lang/Object;[Ljava/lang/String;)Ljava/lang/Object; (Unknown Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.doOperation(I[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.access$300(Ljavax/management/remote/rmi/RMIConnectionImpl;I[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run()Ljava/lang/Object; (Unknown Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(I[Ljava/lang/Object;Ljavax/security/auth/Subject;)Ljava/lang/Object; (Unknown Source)
>   at javax.management.remote.rmi.RMIConnectionImpl.invoke(Ljavax/management/ObjectName;Ljava/lang/String;Ljava/rmi/MarshalledObject;[Ljava/lang/String;Ljavax/security/auth/Subject;)Ljava/lang/Object; (Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor37.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
>   at sun.rmi.server.UnicastServerRef.dispatch(Ljava/rmi/Remote;Ljava/rmi/server/RemoteCall;)V (Unknown Source)
>   at sun.rmi.transport.Transport$1.run()Ljava/lang/Void; (Unknown Source)
>   at sun.rmi.transport.Transport$1.run()Ljava/lang/Object; (Unknown Source)
>   at java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (Native Method)
>   at sun.rmi.transport.Transport.serviceCall(Ljava/rmi/server/RemoteCall;)Z (Unknown Source)
>   at sun.rmi.transport.tcp.TCPTransport.handleMessages(Lsun/rmi/transport/Connection;Z)V (Unknown Source)
>   at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0()V (Unknown Source)
>   at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run()V (Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (Unknown Source)
>   at java.lang.Thread.run()V (Unknown Source)
> {noformat}
> The file is way too large to be attached here.  
> It's currently held at my SCP server
> Let us know if there is any other info you may need
> I will be posting the other 2 system logs as soon as I get them 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)