You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jose Martinez Poblete (JIRA)" <ji...@apache.org> on 2014/09/22 16:10:34 UTC
[jira] [Issue Comment Deleted] (CASSANDRA-7983) nodetool repair
triggers OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jose Martinez Poblete updated CASSANDRA-7983:
---------------------------------------------
Comment: was deleted
(was: The rest of the system logs)
> nodetool repair triggers OOM
> ----------------------------
>
> Key: CASSANDRA-7983
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7983
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment:
> {noformat}
> INFO [main] 2014-09-16 14:23:14,621 DseDaemon.java (line 368) DSE version: 4.5.0
> INFO [main] 2014-09-16 14:23:14,622 DseDaemon.java (line 369) Hadoop version: 1.0.4.13
> INFO [main] 2014-09-16 14:23:14,627 DseDaemon.java (line 370) Hive version: 0.12.0.3
> INFO [main] 2014-09-16 14:23:14,628 DseDaemon.java (line 371) Pig version: 0.10.1
> INFO [main] 2014-09-16 14:23:14,629 DseDaemon.java (line 372) Solr version: 4.6.0.2.4
> INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 373) Sqoop version: 1.4.4.14.1
> INFO [main] 2014-09-16 14:23:14,630 DseDaemon.java (line 374) Mahout version: 0.8
> INFO [main] 2014-09-16 14:23:14,631 DseDaemon.java (line 375) Appender version: 3.0.2
> INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 376) Spark version: 0.9.1
> INFO [main] 2014-09-16 14:23:14,632 DseDaemon.java (line 377) Shark version: 0.9.1.1
> INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 160) JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_51
> INFO [main] 2014-09-16 14:23:20,270 CassandraDaemon.java (line 188) Heap size: 6316621824/6316621824
> {noformat}
> Reporter: Jose Martinez Poblete
> Attachments: gc.log.0, nbcqa-chc-a01_systemlog.tar.Z, nbcqa-chc-a03_systemlog.tar.Z, system.log
>
>
> Customer has a 3 node cluster with 500Mb data on each node
> {noformat}
> [cassandra@nbcqa-chc-a02 ~]$ nodetool status
> Note: Ownership information does not include topology; for complete information, specify a keyspace
> Datacenter: CH2
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID Rack
> UN 162.150.4.234 255.26 MB 256 33.2% 4ad1b6a8-8759-4920-b54a-f059126900df RAC1
> UN 162.150.4.235 318.37 MB 256 32.6% 3eb0ec58-4b81-442e-bee5-4c91da447f38 RAC1
> UN 162.150.4.167 243.7 MB 256 34.2% 5b2c1900-bf03-41c1-bb4e-82df1655b8d8 RAC1
> [cassandra@nbcqa-chc-a02 ~]$
> {noformat}
> After we run repair command, system runs into OOM after some 45 minutes
> Nothing else is running
> {noformat}
> [cassandra@nbcqa-chc-a02 ~]$ date
> Fri Sep 19 15:55:33 UTC 2014
> [cassandra@nbcqa-chc-a02 ~]$ nodetool repair -st -9220354588320251877 -et -9220354588320251873
> Sep 19, 2014 4:06:08 PM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: Read timed out
> {noformat}
> Herer is when we run OOM
> {noformat}
> ERROR [ReadStage:28914] 2014-09-19 16:34:50,381 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:28914,5,main]
> java.lang.OutOfMemoryError: Java heap space
> at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:69)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:76)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:43)
> at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48)
> at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
> at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1195)
> at org.apache.cassandra.db.columniterator.SimpleSliceReader.<init>(SimpleSliceReader.java:57)
> at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65)
> at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42)
> at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167)
> at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
> at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:250)
> at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
> at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
> at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
> at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:333)
> at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
> at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1363)
> at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1927)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> Cassandra process pegs 1 of the 8 CPU's 100%
> {noformat}
> top - 16:50:07 up 11 days, 2:01, 2 users, load average: 0.54, 0.60, 0.65
> Tasks: 175 total, 1 running, 174 sleeping, 0 stopped, 0 zombie
> Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu1 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu3 : 0.7%us, 0.3%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu4 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu5 : 0.3%us, 0.3%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
> Cpu6 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu7 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Mem: 16332056k total, 16167212k used, 164844k free, 149956k buffers
> Swap: 0k total, 0k used, 0k free, 8360056k cached
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 2161 cassandr 20 0 11.5g 6.9g 227m S 107.8 44.0 281:59.49 java
> 9942 root 20 0 106m 2320 1344 S 1.0 0.0 0:00.03 dhclient-script
> 28969 opscente 20 0 4479m 188m 12m S 0.7 1.2 56:24.24 java
> 5123 cassandr 20 0 1788m 107m 28m S 0.3 0.7 0:08.09 java
> 1 root 20 0 19228 1352 1072 S 0.0 0.0 0:00.82 init
> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
> 3 root RT 0 0 0 0 S 0.0 0.0 0:05.52 migration/0
> 4 root 20 0 0 0 0 S 0.0 0.0 0:13.15 ksoftirqd/0
> 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
> 6 root RT 0 0 0 0 S 0.0 0.0 0:03.33 watchdog/0
> 7 root RT 0 0 0 0 S 0.0 0.0 0:04.88 migration/1
> 8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
> 9 root 20 0 0 0 0 S 0.0 0.0 0:19.21 ksoftirqd/1
> 10 root RT 0 0 0 0 S 0.0 0.0 0:03.24 watchdog/1
> 11 root RT 0 0 0 0 S 0.0 0.0 0:05.46 migration/2
> 12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
> 13 root 20 0 0 0 0 S 0.0 0.0 0:16.87 ksoftirqd/2
> 14 root RT 0 0 0 0 S 0.0 0.0 0:03.49 watchdog/2
> 15 root RT 0 0 0 0 S 0.0 0.0 0:05.31 migration/3
> 16 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/3
> 17 root 20 0 0 0 0 S 0.0 0.0 0:19.33 ksoftirqd/3
> 18 root RT 0 0 0 0 S 0.0 0.0 0:03.43 watchdog/3
> 19 root RT 0 0 0 0 S 0.0 0.0 0:05.36 migration/4
> 20 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/4
> 21 root 20 0 0 0 0 S 0.0 0.0 0:17.64 ksoftirqd/4
> 22 root RT 0 0 0 0 S 0.0 0.0 0:03.18 watchdog/4
> 23 root RT 0 0 0 0 S 0.0 0.0 0:05.31 migration/5
> {noformat}
> There is a 12Gb HeapDump, the memory leak suspects show the following trace
> {noformat}
> Thread Stack
> RMI TCP Connection(1621)-162.150.4.235
> at org.apache.cassandra.service.StorageService.createRepairRangeFrom(Ljava/lang/String;Ljava/lang/String;)Ljava/util/Collection; (StorageService.java:2612)
> at org.apache.cassandra.service.StorageService.forceRepairRangeAsync(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;ZLjava/util/Collection;Ljava/util/Collection;[Ljava/lang/String;)I (StorageService.java:2541)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at sun.reflect.misc.Trampoline.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at sun.reflect.misc.MethodUtil.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Ljava/lang/Object;Ljava/lang/Object;[Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(Ljava/lang/Object;Ljava/lang/String;[Ljava/lang/Object;[Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Ljava/lang/String;[Ljava/lang/Object;[Ljava/lang/String;)Ljava/lang/Object; (Unknown Source)
> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Ljavax/management/ObjectName;Ljava/lang/String;[Ljava/lang/Object;[Ljava/lang/String;)Ljava/lang/Object; (Unknown Source)
> at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Ljavax/management/ObjectName;Ljava/lang/String;[Ljava/lang/Object;[Ljava/lang/String;)Ljava/lang/Object; (Unknown Source)
> at javax.management.remote.rmi.RMIConnectionImpl.doOperation(I[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at javax.management.remote.rmi.RMIConnectionImpl.access$300(Ljavax/management/remote/rmi/RMIConnectionImpl;I[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run()Ljava/lang/Object; (Unknown Source)
> at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(I[Ljava/lang/Object;Ljavax/security/auth/Subject;)Ljava/lang/Object; (Unknown Source)
> at javax.management.remote.rmi.RMIConnectionImpl.invoke(Ljavax/management/ObjectName;Ljava/lang/String;Ljava/rmi/MarshalledObject;[Ljava/lang/String;Ljavax/security/auth/Subject;)Ljava/lang/Object; (Unknown Source)
> at sun.reflect.GeneratedMethodAccessor37.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
> at sun.rmi.server.UnicastServerRef.dispatch(Ljava/rmi/Remote;Ljava/rmi/server/RemoteCall;)V (Unknown Source)
> at sun.rmi.transport.Transport$1.run()Ljava/lang/Void; (Unknown Source)
> at sun.rmi.transport.Transport$1.run()Ljava/lang/Object; (Unknown Source)
> at java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (Native Method)
> at sun.rmi.transport.Transport.serviceCall(Ljava/rmi/server/RemoteCall;)Z (Unknown Source)
> at sun.rmi.transport.tcp.TCPTransport.handleMessages(Lsun/rmi/transport/Connection;Z)V (Unknown Source)
> at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0()V (Unknown Source)
> at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run()V (Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (Unknown Source)
> at java.lang.Thread.run()V (Unknown Source)
> {noformat}
> The file is way too large to be attached here.
> It's currently held at my SCP server
> Let us know if there is any other info you may need
> I will be posting the other 2 system logs as soon as I get them
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)