You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Aleksey Plekhanov (Jira)" <ji...@apache.org> on 2020/11/24 10:07:00 UTC
[jira] [Created] (IGNITE-13747) Coordinator failure after node
left: Unexpected rebalance on rebalanced cluster
Aleksey Plekhanov created IGNITE-13747:
------------------------------------------
Summary: Coordinator failure after node left: Unexpected rebalance on rebalanced cluster
Key: IGNITE-13747
URL: https://issues.apache.org/jira/browse/IGNITE-13747
Project: Ignite
Issue Type: Bug
Reporter: Aleksey Plekhanov
Exchange worker terminated on a coordinator after node left in some cases with stack trace:
{noformat}
java.lang.AssertionError: Unexpected rebalance on rebalanced cluster: assignments=GridDhtPreloaderAssignments [exchangeId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=0f61f2f6-6ffb-4772-a6a6-d2f411600002, consistentId=127.0.0.1:47502, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=66, intOrder=35, lastExchangeTime=1606212165542, loc=false, ver=2.10.0#20201124-sha1:00000000, isClient=false], topVer=67, msgTemplate=null, span=org.apache.ignite.internal.processors.tracing.NoopSpan@73893b7d, nodeId8=30b8d2de, msg=Node left, type=NODE_LEFT, tstamp=1606212166204], nodeId=0f61f2f6, evt=NODE_LEFT], topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], cancelled=false, affinityReassign=false, super={TcpDiscoveryNode [id=cfa40f59-ed19-4d5e-9d62-55f44a100001, consistentId=127.0.0.1:47501, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1606212120855, loc=false, ver=2.10.0#20201124-sha1:00000000, isClient=false]=GridDhtPartitionDemandMessage [rebalanceId=506, parts=IgniteDhtDemandedPartitionsMap [historical=null, full=HashSet [8]], timeout=0, workerId=-1, topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], partCnt=1, super=GridCacheGroupIdMessage [grpId=94416770]]}], locPart=[topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], lastChangeTopVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], waitRebalance=false, nodes=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 30b8d2de-d610-4dd4-aff2-fe4098b00000], locPart=GridDhtLocalPartition [rmvQueueMaxSize=1024, rmvdEntryTtl=10000, id=8, delayedRenting=true, finishFutRef=null, clearVer=1606261964138, grp=cache, state=MOVING, reservations=0, empty=false, createTime=11/24/2020 13:02:45, fullSize=68, cntr=Counter [init=0, val=2003]], ver4=AffinityTopologyVersion [topVer=67, minorTopVer=0], affOwners4=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 30b8d2de-d610-4dd4-aff2-fe4098b00000], ver3=AffinityTopologyVersion [topVer=66, minorTopVer=0], affOwners3=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 0f61f2f6-6ffb-4772-a6a6-d2f411600002], ver2=AffinityTopologyVersion [topVer=65, minorTopVer=0], affOwners2=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 30b8d2de-d610-4dd4-aff2-fe4098b00000], ver1=AffinityTopologyVersion [topVer=64, minorTopVer=0], affOwners1=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 259eb107-e17a-4a8f-9ad8-4653f8500002], ver0=AffinityTopologyVersion [topVer=63, minorTopVer=0], affOwners0=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 30b8d2de-d610-4dd4-aff2-fe4098b00000]]
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.generateAssignments(GridDhtPreloader.java:302)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3483)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3184)
{noformat}
Reproducer:
{code:java}
@Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception {
return super.getConfiguration(igniteInstanceName)
.setCacheConfiguration(new CacheConfiguration<>("cache")
.setBackups(1)
.setEvictionPolicyFactory(() -> new LruEvictionPolicy<>().setMaxSize(100))
.setOnheapCacheEnabled(true)
.setNearConfiguration(new NearCacheConfiguration<>())
.setAffinity(new RendezvousAffinityFunction(false, 10))
);
}
@Test
public void test() throws Exception {
startGrids(4);
try {
AtomicInteger gridIdx = new AtomicInteger();
long ts = U.currentTimeMillis();
GridTestUtils.runMultiThreadedAsync(() -> {
IgniteCache<Integer, Integer> cache = grid(gridIdx.getAndIncrement()).cache("cache");
while (U.currentTimeMillis() - ts < 150_000L)
cache.put(ThreadLocalRandom.current().nextInt(100_000), 0);
}, 2, "put-worker");
while (U.currentTimeMillis() - ts < 150_000L) {
stopGrid(2);
startGrid(2);
}
}
finally {
stopAllGrids();
}
}
{code}
Also test GridCachePartitionedOptimisticTxNodeRestartTest#testRestartWithTxFourNodesOneBackupsOffheapEvict flaky on Team-City for this reason.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)