You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Eduard Tudenhoefner (JIRA)" <ji...@apache.org> on 2016/03/17 16:36:33 UTC

[jira] [Created] (CASSANDRA-11373) nodetool repair might get stalled when running major compactions at the same time with LCS

Eduard Tudenhoefner created CASSANDRA-11373:
-----------------------------------------------

             Summary: nodetool repair might get stalled when running major compactions at the same time with LCS
                 Key: CASSANDRA-11373
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11373
             Project: Cassandra
          Issue Type: Bug
          Components: Compaction
            Reporter: Eduard Tudenhoefner
            Assignee: Marcus Eriksson
         Attachments: jstack.txt

Our test is basically running *nodetool repair* on specific keyspaces (such as keyspace1) and the test is also triggering *nodetool compact keyspace1 standard1* in the background. 
And so it looks like running major compactions & repairs lead to that issue when using *LCS*.


Below is an excerpt from the *thread dump* (the rest is attached)
{code}
"CompactionExecutor:2" #33 daemon prio=1 os_prio=4 tid=0x00007f5363e64f10 nid=0x3c4e waiting for monitor entry [0x00007f53340d8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:252)
	- waiting to lock <0x00000006c9362c80> (a org.apache.cassandra.db.compaction.CompactionStrategyManager)
	at org.apache.cassandra.db.lifecycle.Tracker.notifySSTableRepairedStatusChanged(Tracker.java:434)
	at org.apache.cassandra.db.compaction.CompactionManager.performAnticompaction(CompactionManager.java:550)
	at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:465)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- <0x00000006c9362ca8> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"CompactionExecutor:1" #32 daemon prio=1 os_prio=4 tid=0x00007f5363e618b0 nid=0x3c4d runnable [0x00007f5334119000]
   java.lang.Thread.State: RUNNABLE
	at com.google.common.collect.Iterators$7.computeNext(Iterators.java:650)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
	at com.google.common.collect.Iterators.addAll(Iterators.java:361)
	at com.google.common.collect.Iterables.addAll(Iterables.java:354)
	at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:589)
	at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:349)
	- locked <0x00000006d0f7a6a8> (a org.apache.cassandra.db.compaction.LeveledManifest)
	at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:98)
	- locked <0x00000006d0f7a568> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
	at org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:95)
	- locked <0x00000006c9362c80> (a org.apache.cassandra.db.compaction.CompactionStrategyManager)
	at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:257)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{code}

*CPU usage is at 100%*
{code}
top -p 15386
top - 12:12:40 up  1:28,  1 user,  load average: 1.08, 1.11, 1.16
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.0 sy, 12.4 ni, 87.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  16433792 total,  8947336 used,  7486456 free,    89552 buffers
KiB Swap:        0 total,        0 used,        0 free.  3326796 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
15386 automat+  20   0 7891448 5.004g 290184 S 102.9 31.9  80:07.06 java

{code}

*ttop shows that the compaction thread consumes all the CPU*
{code}
$ java -jar sjk.jar ttop -p 15386
Monitoring threads ...

2016-03-17T12:17:13.514+0000 Process summary
  process cpu=126.34%
  application cpu=102.81% (user=102.46% sys=0.35%)
  other: cpu=23.53%
  heap allocation rate 375mb/s
[000002] user= 0.00% sys= 0.00% alloc=     0b/s - Reference Handler
[000003] user= 0.00% sys= 0.00% alloc=     0b/s - Finalizer
[000005] user= 0.00% sys= 0.00% alloc=     0b/s - Signal Dispatcher
[000012] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-7199
[000013] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-0
[000015] user= 0.00% sys= 0.00% alloc=   476b/s - AsyncAppender-Worker-ASYNCDEBUGLOG
[000016] user= 0.00% sys= 0.05% alloc=  1070b/s - ScheduledTasks:1
[000017] user= 0.00% sys= 0.00% alloc=    33b/s - EXPIRING-MAP-REAPER:1
[000018] user= 0.00% sys= 0.02% alloc=  1932b/s - Background_Reporter:1
[000022] user= 0.00% sys= 0.00% alloc=     0b/s - MemtablePostFlush:1
[000023] user= 0.00% sys= 0.00% alloc=     0b/s - MemtableReclaimMemory:1
[000026] user= 0.00% sys= 0.00% alloc=     0b/s - SlabPoolCleaner
[000027] user= 0.00% sys= 0.00% alloc=     0b/s - PERIODIC-COMMIT-LOG-SYNCER
[000028] user= 0.00% sys= 0.00% alloc=     0b/s - COMMIT-LOG-ALLOCATOR
[000029] user= 0.00% sys= 0.01% alloc=  7086b/s - OptionalTasks:1
[000030] user= 0.00% sys= 0.00% alloc=     0b/s - Reference-Reaper:1
[000031] user= 0.00% sys= 0.00% alloc=     0b/s - Strong-Reference-Leak-Detector:1
[000032] user=99.45% sys= 0.07% alloc=  374mb/s - CompactionExecutor:1
[000033] user= 0.00% sys= 0.00% alloc=     0b/s - CompactionExecutor:2
[000036] user= 0.00% sys= 0.00% alloc=     0b/s - NonPeriodicTasks:1
[000037] user= 0.00% sys= 0.00% alloc=     0b/s - LocalPool-Cleaner:1
[000041] user= 0.00% sys= 0.00% alloc=     0b/s - IndexSummaryManager:1
[000043] user= 0.00% sys= 0.01% alloc=  2705b/s - GossipTasks:1
[000044] user= 0.00% sys= 0.00% alloc=     0b/s - ACCEPT-/10.200.182.146
[000045] user= 0.00% sys= 0.01% alloc=  2283b/s - BatchlogTasks:1
[000055] user= 0.00% sys= 0.02% alloc=  9494b/s - GossipStage:1
[000056] user= 0.00% sys= 0.00% alloc=     0b/s - AntiEntropyStage:1
[000057] user= 0.00% sys= 0.00% alloc=     0b/s - MigrationStage:1
[000058] user= 0.00% sys= 0.00% alloc=     0b/s - MiscStage:1
[000067] user= 0.00% sys= 0.02% alloc=  2445b/s - MessagingService-Incoming-/10.200.182.144
[000068] user= 0.00% sys= 0.01% alloc=   968b/s - MessagingService-Outgoing-/10.200.182.144
[000069] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.144
[000070] user= 0.00% sys= 0.02% alloc=   512b/s - MessagingService-Outgoing-/10.200.182.144
[000072] user= 0.00% sys= 0.00% alloc=     0b/s - NanoTimeToCurrentTimeMillis updater
[000073] user= 0.00% sys= 0.02% alloc=  3113b/s - MessagingService-Incoming-/10.200.182.144
[000074] user= 0.00% sys= 0.00% alloc=     0b/s - PendingRangeCalculator:1
[000075] user= 0.00% sys= 0.41% alloc=   66kb/s - SharedPool-Worker-1
[000076] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-2
[000077] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-3
[000078] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-4
[000079] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-5
[000080] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-6
[000081] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-7
[000082] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-8
[000084] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-2
[000085] user= 0.00% sys= 0.00% alloc=   181b/s - HintsWriteExecutor:1
[000091] user= 0.00% sys= 0.00% alloc=     0b/s - PO-thread-0
[000092] user= 0.00% sys= 0.00% alloc=     0b/s - NodeHealthPlugin-Scheduler-thread-0
[000093] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-1
[000094] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-2
[000097] user= 0.00% sys= 0.00% alloc=     0b/s - Lease RemoteMessageServer acceptor-2-1
[000104] user= 0.00% sys= 0.00% alloc=     0b/s - RemoteMessageClient worker-4-1
[000120] user= 0.00% sys= 0.00% alloc=     0b/s - RemoteMessageClient connection limiter - 0
[000121] user= 0.00% sys= 0.00% alloc=     0b/s - threadDeathWatcher-5-1
[000122] user= 0.00% sys= 0.00% alloc=     0b/s - PO-thread scheduler
[000123] user= 0.00% sys= 0.00% alloc=     0b/s - JOB-TRACKER
[000124] user= 0.00% sys= 0.01% alloc=  1276b/s - TASK-TRACKER
[000127] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-1
[000128] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-2
[000129] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-3
[000130] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-4
[000131] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-5
[000132] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-6
[000133] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-7
[000134] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-8
[000135] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-9
[000136] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-10
[000137] user= 0.19% sys=-0.18% alloc=     0b/s - epollEventLoopGroup-6-11
[000138] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-12
[000139] user= 0.19% sys=-0.19% alloc=     0b/s - epollEventLoopGroup-6-13
[000140] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-14
[000141] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-15
[000142] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-16
[000143] user= 0.19% sys=-0.04% alloc=   13kb/s - Thread-7
[000144] user= 0.00% sys= 0.00% alloc=     0b/s - taskCleanup
[000145] user= 0.00% sys= 0.00% alloc=     0b/s - DseGossipStateUpdater
[000146] user= 0.00% sys= 0.00% alloc=     0b/s - DestroyJavaVM
[000149] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-10
[000150] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-11
[000151] user= 0.00% sys= 0.00% alloc=     0b/s - Directory/File cleanup thread
[000153] user= 0.00% sys= 0.00% alloc=     0b/s - pool-15-thread-1
[000190] user= 0.00% sys= 0.00% alloc=     0b/s - pool-18-thread-1
[000215] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-3
[000217] user= 0.00% sys= 0.00% alloc=     0b/s - RMI Scheduler(0)
[000220] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Connection(335)-10.200.182.146
[000222] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-4
[000223] user= 0.00% sys= 0.00% alloc=     0b/s - taskCleanup
[000224] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-69
[000225] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-70
[000227] user= 0.00% sys= 0.00% alloc=     0b/s - pool-19-thread-1
[000254] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-9
[000255] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-11
[000256] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-10
[000269] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-13
[000270] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-12
[000272] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-14
[000273] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-15
[000274] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-18
[000275] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-19
[000276] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-20
[000277] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-17
[000278] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-16
[000279] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-22
[000280] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-21
[000281] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-23
[000282] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-24
[000283] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-25
[000284] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-26
[000285] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-27
[000286] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-28
[000287] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-29
[000288] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-30
[000289] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-31
[000290] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-32
[000296] user= 0.00% sys= 0.00% alloc=  1970b/s - pool-2-thread-1
[000297] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-33
[000298] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-34
[000302] user= 0.00% sys= 0.01% alloc=  1576b/s - MessagingService-Incoming-/10.200.182.145
[000303] user= 0.00% sys= 0.00% alloc=   451b/s - MessagingService-Outgoing-/10.200.182.145
[000304] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.145
[000305] user= 0.00% sys= 0.01% alloc=   206b/s - MessagingService-Outgoing-/10.200.182.145
[000308] user= 0.00% sys= 0.00% alloc=   424b/s - MessagingService-Incoming-/10.200.182.145
[000314] user= 0.00% sys= 0.00% alloc=     0b/s - StreamingTransferTaskTimeouts:1
[000324] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
[000325] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
[000326] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
[000328] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.146
[000329] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-35
[000330] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-37
[000331] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-36
[000332] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-39
[000333] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-38
[000334] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-42
[000335] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-41
[000336] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-40
[000337] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-46
[000338] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-44
[000339] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-45
[000340] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-43
[000341] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-47
[000342] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-48
[000343] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-50
[000344] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-49
[000375] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:1
[000376] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:2
[000406] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.145
[000408] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.146
[000409] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.144
[000415] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:3
[000418] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:4
[000435] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:5
[000438] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:6
[000439] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:7
[000444] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:8
[000687] user= 0.00% sys= 0.00% alloc=     0b/s - JMX server connection timeout 687
[000688] user= 2.44% sys= 0.16% alloc= 1380kb/s - RMI TCP Connection(401)-10.200.182.146
[000694] user= 0.00% sys= 0.00% alloc=     0b/s - Attach Listener
[000726] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Connection(400)-10.200.182.146
[000743] user=-0.00% sys=-0.16% alloc=-109800b/s - MemtableFlushWriter:112
[000745] user= 0.00% sys= 0.00% alloc=     0b/s - MemtableFlushWriter:113
[000746] user= 0.00% sys= 0.03% alloc=  4295b/s - JMX server connection timeout 746
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)