You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2017/12/06 14:36:01 UTC
[jira] [Comment Edited] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

    [ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280241#comment-16280241 ] 

Paulo Motta edited comment on CASSANDRA-13948 at 12/6/17 2:35 PM:
------------------------------------------------------------------

bq. +1 on the 3.11 patch

Thanks for the review!

bq. need to check the trunk patch as well, cancelling "ready to commit" (i hope)

The merge went smoothly, most of the conflicts were related to CASSANDRA-9143, so I [updated|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13948#diff-f9c882c974db60a710cf1f195cfdb801R113] {{CompactionStrategyManagerTest}} to mark a subset of sstables as repaired and pending repair to make sure sstables are being assigned the correct strategies for repaired and pending repair sstables.

However, there were 2 test failures in the trunk branch after the merge:
1. [testSetLocalCompactionStrategy|https://github.com/pauloricardomg/cassandra/blob/41416af426c41cdd38e157f38fb440342bca4dd0/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java#L163]
2. [disk_balance_bootstrap_test|https://github.com/pauloricardomg/cassandra-dtest/blob/73d7a8e1deb5eab05867d804933621062c2f6762/disk_balance_test.py#L34]

1. was failing [here|https://github.com/pauloricardomg/cassandra/blob/41416af426c41cdd38e157f38fb440342bca4dd0/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java#L175] because {{ALTER TABLE t WITH gc_grace_seconds = 1000}} was causing the manually set compaction strategy to be replaced by the strategy defined on the schema. After investigation, it turned out that the disk boundaries were being invalidated due to the schema reload (introduced by CASSANDRA-9425), and {{maybeReload(TableMetadata)}} was causing the compaction strategies to be reloaded with the schema settings instead of the manually set settings. In order to fix this, I split {{maybeReload}} in the original {{maybeReload(TableMetadata)}}, which should be called externally by {{ColumnFamilyStore}} and only reloads the strategies when the schema table parameters change, and {{maybeReloadDiskBoundaries}} which is used internally and reloads the compaction strategies with the same table settings when the disk boundaries are invalidated [here|https://github.com/apache/cassandra/commit/de5916e7c4f37736d5e1d06f0fc2b9c082b6bb99].

2.since the local ranges are not defined when the bootstrapping node starts, the disk boundaries are empty, but before CASSANDRA-9425 the boundaries were invalidated during keyspace construction ([here|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L388]), so the correct boundaries were used during streaming. After CASSANDRA-9425  the boundaries were no longer reloaded during keyspace creation ([here|https://github.com/apache/cassandra/blob/4c80eeece37d79f434078224a0504400ae10a20d/src/java/org/apache/cassandra/schema/Schema.java#L138]), so the empty boundaries were used during streaming and the disks were imbalanced. I had exactly the same problem on CASSANDRA-14083 ([here|https://issues.apache.org/jira/browse/CASSANDRA-14083?focusedCommentId=16272918&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16272918]). The solution is to invalidate the disk boundaries after the tokens are set during bootstrap ([here|https://github.com/apache/cassandra/commit/a37bbda45142e1b351908a4ff5196eb08e92082b]).

After these two fixes, the tests were passing (failures seem unrelated - test screenshots from internal CI below):

||3.11||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-13948]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13948]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:13948]|
|[testall|https://issues.apache.org/jira/secure/attachment/12900864/3.11-13948-testall.png]|[testall|https://issues.apache.org/jira/secure/attachment/12900862/trunk-13948-testall.png]|
|[dtest|https://issues.apache.org/jira/secure/attachment/12900865/3.11-13948-dtest.png]|[dtest|https://issues.apache.org/jira/secure/attachment/12900863/trunk-13948-dtest.png]|


was (Author: pauloricardomg):
bq. need to check the trunk patch as well, cancelling "ready to commit" (i hope)

The merge went smoothly, most of the conflicts were related to CASSANDRA-9143, so I [updated|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13948#diff-f9c882c974db60a710cf1f195cfdb801R113] {{CompactionStrategyManagerTest}} to mark a subset of sstables as repaired and pending repair to make sure sstables are being assigned the correct strategies for repaired and pending repair sstables.

However, there were 2 test failures in the trunk branch after the merge:
1. [testSetLocalCompactionStrategy|https://github.com/pauloricardomg/cassandra/blob/41416af426c41cdd38e157f38fb440342bca4dd0/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java#L163]
2. [disk_balance_bootstrap_test|https://github.com/pauloricardomg/cassandra-dtest/blob/73d7a8e1deb5eab05867d804933621062c2f6762/disk_balance_test.py#L34]

1. was failing [here|https://github.com/pauloricardomg/cassandra/blob/41416af426c41cdd38e157f38fb440342bca4dd0/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java#L175] because {{ALTER TABLE t WITH gc_grace_seconds = 1000}} was causing the manually set compaction strategy to be replaced by the strategy defined on the schema. After investigation, it turned out that the disk boundaries were being invalidated due to the schema reload (introduced by CASSANDRA-9425), and {{maybeReload(TableMetadata)}} was causing the compaction strategies to be reloaded with the schema settings instead of the manually set settings. In order to fix this, I split {{maybeReload}} in the original {{maybeReload(TableMetadata)}}, which should be called externally by {{ColumnFamilyStore}} and only reloads the strategies when the schema table parameters change, and {{maybeReloadDiskBoundaries}} which is used internally and reloads the compaction strategies with the same table settings when the disk boundaries are invalidated [here|https://github.com/apache/cassandra/commit/de5916e7c4f37736d5e1d06f0fc2b9c082b6bb99].

2.since the local ranges are not defined when the bootstrapping node starts, the disk boundaries are empty, but before CASSANDRA-9425 the boundaries were invalidated during keyspace construction ([here|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/config/Schema.java#L388]), so the correct boundaries were used during streaming. After CASSANDRA-9425  the boundaries were no longer reloaded during keyspace creation ([here|https://github.com/apache/cassandra/blob/4c80eeece37d79f434078224a0504400ae10a20d/src/java/org/apache/cassandra/schema/Schema.java#L138]), so the empty boundaries were used during streaming and the disks were imbalanced. I had exactly the same problem on CASSANDRA-14083 ([here|https://issues.apache.org/jira/browse/CASSANDRA-14083?focusedCommentId=16272918&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16272918]). The solution is to invalidate the disk boundaries after the tokens are set during bootstrap ([here|https://github.com/apache/cassandra/commit/a37bbda45142e1b351908a4ff5196eb08e92082b]).

After these two fixes, the tests were passing (failures seem unrelated - test screenshots from internal CI below):

||3.11||trunk||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:3.11-13948]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-13948]|[branch|https://github.com/apache/cassandra-dtest/compare/master...pauloricardomg:13948]|
|[testall|https://issues.apache.org/jira/secure/attachment/12900864/3.11-13948-testall.png]|[testall|https://issues.apache.org/jira/secure/attachment/12900862/trunk-13948-testall.png]|
|[dtest|https://issues.apache.org/jira/secure/attachment/12900865/3.11-13948-dtest.png]|[dtest|https://issues.apache.org/jira/secure/attachment/12900863/trunk-13948-dtest.png]|

> Reload compaction strategies when JBOD disk boundary changes
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-13948
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>             Fix For: 3.11.x, 4.x
>
>         Attachments: 13948dtest.png, 13948testall.png, 3.11-13948-dtest.png, 3.11-13948-testall.png, debug.log, dtest13948.png, dtest2.png, threaddump-cleanup.txt, threaddump.txt, trace.log, trunk-13948-dtest.png, trunk-13948-testall.png
>
>
> The thread dump below shows a race between an sstable replacement by the {{IndexSummaryRedistribution}} and {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=836 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, line=943 (Compiled frame)
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, java.util.Collection, org.apache.cassandra.db.compaction.OperationType, java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) @bci=157, line=227 (Interpreted frame)
>  - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) @bci=61, line=116 (Compiled frame)
>  - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() @bci=2, line=200 (Interpreted frame)
>  - org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() @bci=5, line=185 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() @bci=559, line=130 (Interpreted frame)
>  - org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) @bci=9, line=1420 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) @bci=4, line=250 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 (Interpreted frame)
>  - org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() @bci=37, line=294 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> {noformat}
> Thread 94573: (state = IN_JAVA)
>  - java.util.HashMap$HashIterator.nextNode() @bci=95, line=1441 (Compiled frame; information may be imprecise)
>  - java.util.HashMap$KeyIterator.next() @bci=1, line=1461 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.View$3.apply(org.apache.cassandra.db.lifecycle.View) @bci=20, line=268 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.View$3.apply(java.lang.Object) @bci=5, line=265 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.apply(com.google.common.base.Predicate, com.google.common.base.Function) @bci=13, line=133 (Compiled frame)
>  - org.apache.cassandra.db.lifecycle.Tracker.tryModify(java.lang.Iterable, org.apache.cassandra.db.compaction.OperationType) @bci=31, line=99 (Compiled frame)
>  - org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(int) @bci=84, line=139 (Compiled frame)
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(int) @bci=105, line=119 (Interpreted frame)
>  - org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run() @bci=84, line=265 (Interpreted frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
> {noformat}
> This particular node remain in this state forever, indicating {{LeveledCompactionStrategyTask.getNextBackgroundTask}} was looping indefinitely.
> What happened is that sstable references were replaced on the tracker by the {{IndexSummaryRedistribution}} thread, so the {{AbstractCompactionStrategy.getNextBackgroundTask}} could not create the transaction with the old references, and the {{IndexSummaryRedistribution}} could not update the sstable reference in the compaction strategy because {{AbstractCompactionStrategy.getNextBackgroundTask}} was holding the {{CompactionStrategyManager}} lock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org