You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Aleksey Plekhanov (Jira)" <ji...@apache.org> on 2020/11/24 10:07:00 UTC
[jira] [Created] (IGNITE-13747) Coordinator failure after node left: Unexpected rebalance on rebalanced cluster

Aleksey Plekhanov created IGNITE-13747:
------------------------------------------

             Summary: Coordinator failure after node left: Unexpected rebalance on rebalanced cluster
                 Key: IGNITE-13747
                 URL: https://issues.apache.org/jira/browse/IGNITE-13747
             Project: Ignite
          Issue Type: Bug
            Reporter: Aleksey Plekhanov


Exchange worker terminated on a coordinator after node left in some cases with stack trace:

{noformat}
java.lang.AssertionError: Unexpected rebalance on rebalanced cluster: assignments=GridDhtPreloaderAssignments [exchangeId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=0f61f2f6-6ffb-4772-a6a6-d2f411600002, consistentId=127.0.0.1:47502, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=66, intOrder=35, lastExchangeTime=1606212165542, loc=false, ver=2.10.0#20201124-sha1:00000000, isClient=false], topVer=67, msgTemplate=null, span=org.apache.ignite.internal.processors.tracing.NoopSpan@73893b7d, nodeId8=30b8d2de, msg=Node left, type=NODE_LEFT, tstamp=1606212166204], nodeId=0f61f2f6, evt=NODE_LEFT], topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], cancelled=false, affinityReassign=false, super={TcpDiscoveryNode [id=cfa40f59-ed19-4d5e-9d62-55f44a100001, consistentId=127.0.0.1:47501, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1606212120855, loc=false, ver=2.10.0#20201124-sha1:00000000, isClient=false]=GridDhtPartitionDemandMessage [rebalanceId=506, parts=IgniteDhtDemandedPartitionsMap [historical=null, full=HashSet [8]], timeout=0, workerId=-1, topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], partCnt=1, super=GridCacheGroupIdMessage [grpId=94416770]]}], locPart=[topVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], lastChangeTopVer=AffinityTopologyVersion [topVer=67, minorTopVer=0], waitRebalance=false, nodes=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 30b8d2de-d610-4dd4-aff2-fe4098b00000], locPart=GridDhtLocalPartition [rmvQueueMaxSize=1024, rmvdEntryTtl=10000, id=8, delayedRenting=true, finishFutRef=null, clearVer=1606261964138, grp=cache, state=MOVING, reservations=0, empty=false, createTime=11/24/2020 13:02:45, fullSize=68, cntr=Counter [init=0, val=2003]], ver4=AffinityTopologyVersion [topVer=67, minorTopVer=0], affOwners4=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 30b8d2de-d610-4dd4-aff2-fe4098b00000], ver3=AffinityTopologyVersion [topVer=66, minorTopVer=0], affOwners3=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 0f61f2f6-6ffb-4772-a6a6-d2f411600002], ver2=AffinityTopologyVersion [topVer=65, minorTopVer=0], affOwners2=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 30b8d2de-d610-4dd4-aff2-fe4098b00000], ver1=AffinityTopologyVersion [topVer=64, minorTopVer=0], affOwners1=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 259eb107-e17a-4a8f-9ad8-4653f8500002], ver0=AffinityTopologyVersion [topVer=63, minorTopVer=0], affOwners0=[cfa40f59-ed19-4d5e-9d62-55f44a100001, 30b8d2de-d610-4dd4-aff2-fe4098b00000]]
  at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.generateAssignments(GridDhtPreloader.java:302)
  at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3483)
  at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3184)
{noformat}

Reproducer:

{code:java}
    @Override protected IgniteConfiguration getConfiguration(String igniteInstanceName) throws Exception {
        return super.getConfiguration(igniteInstanceName)
            .setCacheConfiguration(new CacheConfiguration<>("cache")
                .setBackups(1)
                .setEvictionPolicyFactory(() -> new LruEvictionPolicy<>().setMaxSize(100))
                .setOnheapCacheEnabled(true)
                .setNearConfiguration(new NearCacheConfiguration<>())
                .setAffinity(new RendezvousAffinityFunction(false, 10))
            );
    }

    @Test
    public void test() throws Exception {
        startGrids(4);

        try {
            AtomicInteger gridIdx = new AtomicInteger();

            long ts = U.currentTimeMillis();

            GridTestUtils.runMultiThreadedAsync(() -> {
                IgniteCache<Integer, Integer> cache = grid(gridIdx.getAndIncrement()).cache("cache");

                while (U.currentTimeMillis() - ts < 150_000L)
                    cache.put(ThreadLocalRandom.current().nextInt(100_000), 0);
            }, 2, "put-worker");

            while (U.currentTimeMillis() - ts < 150_000L) {
                stopGrid(2);
                startGrid(2);
            }
        }
        finally {
            stopAllGrids();
        }
    }
{code}

Also test GridCachePartitionedOptimisticTxNodeRestartTest#testRestartWithTxFourNodesOneBackupsOffheapEvict flaky on Team-City for this reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)