You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Loic Lambiel (JIRA)" <ji...@apache.org> on 2017/11/01 17:03:01 UTC

[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

    [ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234385#comment-16234385 ] 

Loic Lambiel commented on CASSANDRA-13948:
------------------------------------------

I tried your patch on 3.11.2 and got the following errors:


{code:java}
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,397 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@51f52b91) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@1358582595:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103504-big-Data.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,413 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@70a08046) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@632323950:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103503-big-Data.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,413 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@5d6161ea) to class org.apache.cassandra.io.util.FileHandle$Cleanup@1594052942:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103502-big-Index.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,429 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@2bd55858) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@230164803:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103502-big-Data.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,429 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@5b00472f) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@508355616:Memory@[7f6b54130b10..7f6b54136f10) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,429 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@1f5a7829) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@1390774416:[Memory@[0..20), Memory@[0..240)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@b8594dc) to class org.apache.cassandra.io.util.FileHandle$Cleanup@1913719912:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103503-big-Index.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3ec6a933) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1083770739:Memory@[7f6b5453ff30..7f6b54546330) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@671d48c8) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@496335375:[Memory@[0..20), Memory@[0..240)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@1611d7bf) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@515635345:Memory@[7f6b540f7dc0..7f6b540fe1c0) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@136db886) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@622788070:[Memory@[0..20), Memory@[0..240)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3daa7ad5) to class org.apache.cassandra.io.util.FileHandle$Cleanup@2090103425:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103504-big-Index.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@2435c1d) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1348493438:Memory@[7f6b546ebd80..7f6b546f2180) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,430 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3e72d475) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@1638775104:[Memory@[0..20), Memory@[0..240)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,431 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@7916a2dc) to class org.apache.cassandra.io.util.FileHandle$Cleanup@715546128:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103505-big-Index.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-11-01 17:51:35,446 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@25fe64d) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@238299391:/var/lib/cassandra/data/datadisk7/blobstore/block-ad8329f0740d11e68fe6cba3b122d983/mc-103505-big-Data.db was not released before the reference was garbage collected
ERROR [CompactionExecutor:71] 2017-11-01 17:51:36,872 CassandraDaemon.java:228 - Exception in thread Thread[CompactionExecutor:71,1,main]
java.lang.AssertionError: null
	at org.apache.cassandra.io.compress.CompressionMetadata$Chunk.<init>(CompressionMetadata.java:474) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:239) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:163) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:73) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:61) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.io.util.MmappedRegions.map(MmappedRegions.java:104) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.io.util.FileHandle$Builder.complete(FileHandle.java:362) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.io.sstable.format.big.BigTableWriter.openEarly(BigTableWriter.java:290) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(SSTableRewriter.java:179) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:134) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:98) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:141) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:201) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:85) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:268) ~[apache-cassandra-3.11.2.jar:3.11.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_131]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
	at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$t
{code}

Since we are heavily impacted by this bug (see also CASSANDRA-13980), I'm ok to test as soon as you've an updated version of the patch.


> Reload compaction strategies when JBOD disk boundary changes
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-13948
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>            Priority: Major
>             Fix For: 3.11.x, 4.x
>
>         Attachments: debug.log
>
>
> The thread dump below shows a race between an sstable replacement by the {{IndexSummaryRedistribution}} and {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=836 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, line=943 (Compiled frame)
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, java.util.Collection, org.apache.cassandra.db.compaction.OperationType, java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) @bci=157, line=227 (Interpreted frame)
>  - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) @bci=61, line=116 (Compiled frame)
>  - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() @bci=2, line=200 (Interpreted frame)
>  - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() @bci=5, line=185 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() @bci=559, line=130 (Interpreted frame)
>  - org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) @bci=9, line=1420 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) @bci=4, line=250 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 (Interpreted frame)
>  - org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() @bci=37, line=294 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> {noformat}
> Thread 94573: (state = IN_JAVA)
>  - java.util.HashMap$HashIterator.nextNode() @bci=95, line=1441 (Compiled frame; information may be imprecise)
>  - java.util.HashMap$KeyIterator.next() @bci=1, line=1461 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.View$3.apply(org.apache.cassandra.db.lifecycle.View) @bci=20, line=268 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.View$3.apply(java.lang.Object) @bci=5, line=265 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.apply(com.google.common.base.Predicate, com.google.common.base.Function) @bci=13, line=133 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.tryModify(java.lang.Iterable, org.apache.cassandra.db.compaction.OperationType) @bci=31, line=99 (Compiled frame)
>  - org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(int) @bci=84, line=139 (Compiled frame)
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(int) @bci=105, line=119 (Interpreted frame)
>  - org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run() @bci=84, line=265 (Interpreted frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> This particular node remain in this state forever, indicating {{LeveledCompactionStrategyTask.getNextBackgroundTask}} was looping indefinitely.
> What happened is that sstable references were replaced on the tracker by the {{IndexSummaryRedistribution}} thread, so the {{AbstractCompactionStrategy.getNextBackgroundTask}} could not create the transaction with the old references, and the {{IndexSummaryRedistribution}} could not update the sstable reference in the compaction strategy because {{AbstractCompactionStrategy.getNextBackgroundTask}} was holding the {{CompactionStrategyManager}} lock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org