You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Eric Evans (JIRA)" <ji...@apache.org> on 2016/07/22 21:20:21 UTC
[jira] [Created] (CASSANDRA-12281) Gossip blocks on startup when another node is bootstrapping

Eric Evans created CASSANDRA-12281:
--------------------------------------

             Summary: Gossip blocks on startup when another node is bootstrapping
                 Key: CASSANDRA-12281
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12281
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Eric Evans
            Priority: Minor
         Attachments: restbase1015-a_jstack.txt

In our cluster, normal node startup times (after a drain on shutdown) are less than 1 minute.  However, when another node in the cluster is bootstrapping, the same node startup takes nearly 30 minutes to complete, the apparent result of gossip blocking on pending range calculations.

{noformat}
$ nodetool-a tpstats
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
MutationStage                     0         0           1840         0                 0
ReadStage                         0         0           2350         0                 0
RequestResponseStage              0         0             53         0                 0
ReadRepairStage                   0         0              1         0                 0
CounterMutationStage              0         0              0         0                 0
HintedHandoff                     0         0             44         0                 0
MiscStage                         0         0              0         0                 0
CompactionExecutor                3         3            395         0                 0
MemtableReclaimMemory             0         0             30         0                 0
PendingRangeCalculator            1         2             29         0                 0
GossipStage                       1      5602            164         0                 0
MigrationStage                    0         0              0         0                 0
MemtablePostFlush                 0         0            111         0                 0
ValidationExecutor                0         0              0         0                 0
Sampler                           0         0              0         0                 0
MemtableFlushWriter               0         0             30         0                 0
InternalResponseStage             0         0              0         0                 0
AntiEntropyStage                  0         0              0         0                 0
CacheCleanupExecutor              0         0              0         0                 0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
MUTATION                     0
COUNTER_MUTATION             0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0
{noformat}

A full thread dump is attached, but the relevant bit seems to be here:

{noformat}
[ ... ]

"GossipStage:1" #1801 daemon prio=5 os_prio=0 tid=0x00007fe4cd54b000 nid=0xea9 waiting on condition [0x00007fddcf883000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000004c1e922c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
	at org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:174)
	at org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:160)
	at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2023)
	at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1682)
	at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
	at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1165)
	at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1128)
	at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

[ ... ]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)