You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yuki Morishita (JIRA)" <ji...@apache.org> on 2016/03/17 16:39:33 UTC

[jira] [Commented] (CASSANDRA-11373) nodetool repair might get stalled when running major compactions at the same time with LCS

    [ https://issues.apache.org/jira/browse/CASSANDRA-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199733#comment-15199733 ] 

Yuki Morishita commented on CASSANDRA-11373:
--------------------------------------------

Is this related to CASSANDRA-10099 [~krummas]?

> nodetool repair might get stalled when running major compactions at the same time with LCS
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11373
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11373
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Eduard Tudenhoefner
>            Assignee: Marcus Eriksson
>         Attachments: jstack.txt
>
>
> Our test is basically running *nodetool repair* on specific keyspaces (such as keyspace1) and the test is also triggering *nodetool compact keyspace1 standard1* in the background. 
> And so it looks like running major compactions & repairs lead to that issue when using *LCS*.
> Below is an excerpt from the *thread dump* (the rest is attached)
> {code}
> "CompactionExecutor:2" #33 daemon prio=1 os_prio=4 tid=0x00007f5363e64f10 nid=0x3c4e waiting for monitor entry [0x00007f53340d8000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:252)
> 	- waiting to lock <0x00000006c9362c80> (a org.apache.cassandra.db.compaction.CompactionStrategyManager)
> 	at org.apache.cassandra.db.lifecycle.Tracker.notifySSTableRepairedStatusChanged(Tracker.java:434)
> 	at org.apache.cassandra.db.compaction.CompactionManager.performAnticompaction(CompactionManager.java:550)
> 	at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:465)
> 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- <0x00000006c9362ca8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "CompactionExecutor:1" #32 daemon prio=1 os_prio=4 tid=0x00007f5363e618b0 nid=0x3c4d runnable [0x00007f5334119000]
>    java.lang.Thread.State: RUNNABLE
> 	at com.google.common.collect.Iterators$7.computeNext(Iterators.java:650)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> 	at com.google.common.collect.Iterators.addAll(Iterators.java:361)
> 	at com.google.common.collect.Iterables.addAll(Iterables.java:354)
> 	at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:589)
> 	at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:349)
> 	- locked <0x00000006d0f7a6a8> (a org.apache.cassandra.db.compaction.LeveledManifest)
> 	at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:98)
> 	- locked <0x00000006d0f7a568> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy)
> 	at org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:95)
> 	- locked <0x00000006c9362c80> (a org.apache.cassandra.db.compaction.CompactionStrategyManager)
> 	at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:257)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> *CPU usage is at 100%*
> {code}
> top -p 15386
> top - 12:12:40 up  1:28,  1 user,  load average: 1.08, 1.11, 1.16
> Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0.3 us,  0.0 sy, 12.4 ni, 87.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> KiB Mem:  16433792 total,  8947336 used,  7486456 free,    89552 buffers
> KiB Swap:        0 total,        0 used,        0 free.  3326796 cached Mem
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 15386 automat+  20   0 7891448 5.004g 290184 S 102.9 31.9  80:07.06 java
> {code}
> *ttop shows that the compaction thread consumes all the CPU*
> {code}
> $ java -jar sjk.jar ttop -p 15386
> Monitoring threads ...
> 2016-03-17T12:17:13.514+0000 Process summary
>   process cpu=126.34%
>   application cpu=102.81% (user=102.46% sys=0.35%)
>   other: cpu=23.53%
>   heap allocation rate 375mb/s
> [000002] user= 0.00% sys= 0.00% alloc=     0b/s - Reference Handler
> [000003] user= 0.00% sys= 0.00% alloc=     0b/s - Finalizer
> [000005] user= 0.00% sys= 0.00% alloc=     0b/s - Signal Dispatcher
> [000012] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-7199
> [000013] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Accept-0
> [000015] user= 0.00% sys= 0.00% alloc=   476b/s - AsyncAppender-Worker-ASYNCDEBUGLOG
> [000016] user= 0.00% sys= 0.05% alloc=  1070b/s - ScheduledTasks:1
> [000017] user= 0.00% sys= 0.00% alloc=    33b/s - EXPIRING-MAP-REAPER:1
> [000018] user= 0.00% sys= 0.02% alloc=  1932b/s - Background_Reporter:1
> [000022] user= 0.00% sys= 0.00% alloc=     0b/s - MemtablePostFlush:1
> [000023] user= 0.00% sys= 0.00% alloc=     0b/s - MemtableReclaimMemory:1
> [000026] user= 0.00% sys= 0.00% alloc=     0b/s - SlabPoolCleaner
> [000027] user= 0.00% sys= 0.00% alloc=     0b/s - PERIODIC-COMMIT-LOG-SYNCER
> [000028] user= 0.00% sys= 0.00% alloc=     0b/s - COMMIT-LOG-ALLOCATOR
> [000029] user= 0.00% sys= 0.01% alloc=  7086b/s - OptionalTasks:1
> [000030] user= 0.00% sys= 0.00% alloc=     0b/s - Reference-Reaper:1
> [000031] user= 0.00% sys= 0.00% alloc=     0b/s - Strong-Reference-Leak-Detector:1
> [000032] user=99.45% sys= 0.07% alloc=  374mb/s - CompactionExecutor:1
> [000033] user= 0.00% sys= 0.00% alloc=     0b/s - CompactionExecutor:2
> [000036] user= 0.00% sys= 0.00% alloc=     0b/s - NonPeriodicTasks:1
> [000037] user= 0.00% sys= 0.00% alloc=     0b/s - LocalPool-Cleaner:1
> [000041] user= 0.00% sys= 0.00% alloc=     0b/s - IndexSummaryManager:1
> [000043] user= 0.00% sys= 0.01% alloc=  2705b/s - GossipTasks:1
> [000044] user= 0.00% sys= 0.00% alloc=     0b/s - ACCEPT-/10.200.182.146
> [000045] user= 0.00% sys= 0.01% alloc=  2283b/s - BatchlogTasks:1
> [000055] user= 0.00% sys= 0.02% alloc=  9494b/s - GossipStage:1
> [000056] user= 0.00% sys= 0.00% alloc=     0b/s - AntiEntropyStage:1
> [000057] user= 0.00% sys= 0.00% alloc=     0b/s - MigrationStage:1
> [000058] user= 0.00% sys= 0.00% alloc=     0b/s - MiscStage:1
> [000067] user= 0.00% sys= 0.02% alloc=  2445b/s - MessagingService-Incoming-/10.200.182.144
> [000068] user= 0.00% sys= 0.01% alloc=   968b/s - MessagingService-Outgoing-/10.200.182.144
> [000069] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.144
> [000070] user= 0.00% sys= 0.02% alloc=   512b/s - MessagingService-Outgoing-/10.200.182.144
> [000072] user= 0.00% sys= 0.00% alloc=     0b/s - NanoTimeToCurrentTimeMillis updater
> [000073] user= 0.00% sys= 0.02% alloc=  3113b/s - MessagingService-Incoming-/10.200.182.144
> [000074] user= 0.00% sys= 0.00% alloc=     0b/s - PendingRangeCalculator:1
> [000075] user= 0.00% sys= 0.41% alloc=   66kb/s - SharedPool-Worker-1
> [000076] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-2
> [000077] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-3
> [000078] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-4
> [000079] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-5
> [000080] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-6
> [000081] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-7
> [000082] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-8
> [000084] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-2
> [000085] user= 0.00% sys= 0.00% alloc=   181b/s - HintsWriteExecutor:1
> [000091] user= 0.00% sys= 0.00% alloc=     0b/s - PO-thread-0
> [000092] user= 0.00% sys= 0.00% alloc=     0b/s - NodeHealthPlugin-Scheduler-thread-0
> [000093] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-1
> [000094] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-2
> [000097] user= 0.00% sys= 0.00% alloc=     0b/s - Lease RemoteMessageServer acceptor-2-1
> [000104] user= 0.00% sys= 0.00% alloc=     0b/s - RemoteMessageClient worker-4-1
> [000120] user= 0.00% sys= 0.00% alloc=     0b/s - RemoteMessageClient connection limiter - 0
> [000121] user= 0.00% sys= 0.00% alloc=     0b/s - threadDeathWatcher-5-1
> [000122] user= 0.00% sys= 0.00% alloc=     0b/s - PO-thread scheduler
> [000123] user= 0.00% sys= 0.00% alloc=     0b/s - JOB-TRACKER
> [000124] user= 0.00% sys= 0.01% alloc=  1276b/s - TASK-TRACKER
> [000127] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-1
> [000128] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-2
> [000129] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-3
> [000130] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-4
> [000131] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-5
> [000132] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-6
> [000133] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-7
> [000134] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-8
> [000135] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-9
> [000136] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-10
> [000137] user= 0.19% sys=-0.18% alloc=     0b/s - epollEventLoopGroup-6-11
> [000138] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-12
> [000139] user= 0.19% sys=-0.19% alloc=     0b/s - epollEventLoopGroup-6-13
> [000140] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-14
> [000141] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-15
> [000142] user= 0.00% sys= 0.00% alloc=     0b/s - epollEventLoopGroup-6-16
> [000143] user= 0.19% sys=-0.04% alloc=   13kb/s - Thread-7
> [000144] user= 0.00% sys= 0.00% alloc=     0b/s - taskCleanup
> [000145] user= 0.00% sys= 0.00% alloc=     0b/s - DseGossipStateUpdater
> [000146] user= 0.00% sys= 0.00% alloc=     0b/s - DestroyJavaVM
> [000149] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-10
> [000150] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-11
> [000151] user= 0.00% sys= 0.00% alloc=     0b/s - Directory/File cleanup thread
> [000153] user= 0.00% sys= 0.00% alloc=     0b/s - pool-15-thread-1
> [000190] user= 0.00% sys= 0.00% alloc=     0b/s - pool-18-thread-1
> [000215] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-3
> [000217] user= 0.00% sys= 0.00% alloc=     0b/s - RMI Scheduler(0)
> [000220] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Connection(335)-10.200.182.146
> [000222] user= 0.00% sys= 0.00% alloc=     0b/s - pool-10-thread-4
> [000223] user= 0.00% sys= 0.00% alloc=     0b/s - taskCleanup
> [000224] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-69
> [000225] user= 0.00% sys= 0.00% alloc=     0b/s - Thread-70
> [000227] user= 0.00% sys= 0.00% alloc=     0b/s - pool-19-thread-1
> [000254] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-9
> [000255] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-11
> [000256] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-10
> [000269] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-13
> [000270] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-12
> [000272] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-14
> [000273] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-15
> [000274] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-18
> [000275] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-19
> [000276] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-20
> [000277] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-17
> [000278] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-16
> [000279] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-22
> [000280] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-21
> [000281] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-23
> [000282] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-24
> [000283] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-25
> [000284] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-26
> [000285] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-27
> [000286] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-28
> [000287] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-29
> [000288] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-30
> [000289] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-31
> [000290] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-32
> [000296] user= 0.00% sys= 0.00% alloc=  1970b/s - pool-2-thread-1
> [000297] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-33
> [000298] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-34
> [000302] user= 0.00% sys= 0.01% alloc=  1576b/s - MessagingService-Incoming-/10.200.182.145
> [000303] user= 0.00% sys= 0.00% alloc=   451b/s - MessagingService-Outgoing-/10.200.182.145
> [000304] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.145
> [000305] user= 0.00% sys= 0.01% alloc=   206b/s - MessagingService-Outgoing-/10.200.182.145
> [000308] user= 0.00% sys= 0.00% alloc=   424b/s - MessagingService-Incoming-/10.200.182.145
> [000314] user= 0.00% sys= 0.00% alloc=     0b/s - StreamingTransferTaskTimeouts:1
> [000324] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
> [000325] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
> [000326] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Outgoing-/10.200.182.146
> [000328] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.146
> [000329] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-35
> [000330] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-37
> [000331] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-36
> [000332] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-39
> [000333] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-38
> [000334] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-42
> [000335] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-41
> [000336] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-40
> [000337] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-46
> [000338] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-44
> [000339] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-45
> [000340] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-43
> [000341] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-47
> [000342] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-48
> [000343] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-50
> [000344] user= 0.00% sys= 0.00% alloc=     0b/s - SharedPool-Worker-49
> [000375] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:1
> [000376] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:2
> [000406] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.145
> [000408] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.146
> [000409] user= 0.00% sys= 0.00% alloc=     0b/s - MessagingService-Incoming-/10.200.182.144
> [000415] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:3
> [000418] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:4
> [000435] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:5
> [000438] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:6
> [000439] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:7
> [000444] user= 0.00% sys= 0.00% alloc=     0b/s - StreamConnectionEstablisher:8
> [000687] user= 0.00% sys= 0.00% alloc=     0b/s - JMX server connection timeout 687
> [000688] user= 2.44% sys= 0.16% alloc= 1380kb/s - RMI TCP Connection(401)-10.200.182.146
> [000694] user= 0.00% sys= 0.00% alloc=     0b/s - Attach Listener
> [000726] user= 0.00% sys= 0.00% alloc=     0b/s - RMI TCP Connection(400)-10.200.182.146
> [000743] user=-0.00% sys=-0.16% alloc=-109800b/s - MemtableFlushWriter:112
> [000745] user= 0.00% sys= 0.00% alloc=     0b/s - MemtableFlushWriter:113
> [000746] user= 0.00% sys= 0.03% alloc=  4295b/s - JMX server connection timeout 746
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)