You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Matt Fleming (Jira)" <ji...@apache.org> on 2021/05/13 21:14:00 UTC

[jira] [Created] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

Matt Fleming created CASSANDRA-16668:
----------------------------------------

             Summary: Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero
                 Key: CASSANDRA-16668
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
             Project: Cassandra
          Issue Type: Bug
            Reporter: Matt Fleming


A difficult-to-hit race condition exists in changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool size from 0 -> 4 which results in the test failing like so:

{{junit.framework.AssertionFailedError: Test tasks did not hit max concurrency goal expected:<true> but was:<false>junit.framework.AssertionFailedError: Test tasks did not hit max concurrency goal expected:<true> but was:<false> at org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198) at org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}

I can hit this issue maybe 2/3 times for every 100 invocations of the unit test.

The issue that causes the failure is that if tasks are still enqueued when the maximum pool size is set to zero and if all of the SEPWorker threads enter the STOP state before the pool size is bumped to 4, then no SEPWorker threads will be spun up to service the task queue. This causes the above error.

Why don't we spin up SEPWorker threads when enqueing tasks? Because of the guard logic in addTask: [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]

In this scenario taskPermits will not be zero (because we have tasks on the queue) so we never call {{maybeStartSpinningWorker()}}.

A trick to make this issue much easier to hit is to insert a {{Thread.sleep(500)}} immediately after setting the pool size to zero. This has the effect of guaranteeing that all SEPWorker threads will be STOP'd before enqueueing more work.

Here's a fix that attempts to spin up an SEPWorker whenever we grow the number of work permits: https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org