You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ignite TC Bot (Jira)" <ji...@apache.org> on 2021/02/08 22:26:00 UTC

[jira] [Commented] (IGNITE-14138) Historical rebalance kills cluster

    [ https://issues.apache.org/jira/browse/IGNITE-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281417#comment-17281417 ] 

Ignite TC Bot commented on IGNITE-14138:
----------------------------------------

{panel:title=Branch: [pull/8769/head] Base: [master] : Possible Blockers (1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}JCache TCK 1.1{color} [[tests 0 TIMEOUT , Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=5863779]]

{panel}
{panel:title=Branch: [pull/8769/head] Base: [master] : New Tests (4217)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}JCache TCK 1.1{color} [[tests 4216|https://ci.ignite.apache.org/viewLog.html?buildId=5863779]]
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testContinuousQuery - PASSED{color}
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testReplacexAsyncOld - PASSED{color}
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testIterator - PASSED{color}
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testRemovexAsyncOld - PASSED{color}
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testOptimisticTxMissingKeyNoCommit - PASSED{color}
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testInvokeReturnValueGetOptimisticRepeatableRead - PASSED{color}
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testInvokeAllOptimisticRepeatableRead - PASSED{color}
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testPessimisticTxRepeatableRead - PASSED{color}
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testPutAll - PASSED{color}
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testRemove - PASSED{color}
* {color:#013220}InterceptorCacheConfigVariationsFullApiTestSuite: InterceptorCacheConfigVariationsFullApiTest_27.testPutIfAbsentAsyncConcurrent - PASSED{color}
... and 4205 new tests

{color:#00008b}PDS 4{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=5862892]]
* {color:#013220}IgnitePdsTestSuite4: CacheRebalanceWithRemovedWalSegment.test - PASSED{color}

{panel}
[TeamCity *--&gt; Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=5862919&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Historical rebalance kills cluster
> ----------------------------------
>
>                 Key: IGNITE-14138
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14138
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladislav Pyatkov
>            Assignee: Vladislav Pyatkov
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {noformat}
> [2021-01-12T05:11:02,142][ERROR][rebalance-#508%---%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Failed to continue supplying [grp=SQL_USAGES_EPE, demander=48254935-7aa9-4ab5-b398-fdaec334fab7, topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]]]]
> org.apache.ignite.IgniteCheckedException: Failed to continue supplying [grp=SQL_1, demander=48254935-7aa9-4ab5-b398-fdaec334fab7, topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:571) [ignite-core.jar]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:398) [ignite-core.jar]
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:489) [ignite-core.jar]
> 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:474) [ignite-core.jar]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) [ignite-core.jar]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) [ignite-core.jar]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$800(GridCacheIoManager.java:109) [ignite-core.jar]
> 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1707) [ignite-core.jar]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1721) [ignite-core.jar]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:157) [ignite-core.jar]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:3011) [ignite-core.jar]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1662) [ignite-core.jar]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4900(GridIoManager.java:157) [ignite-core.jar]
> 	at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1629) [ignite-core.jar]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
> 	at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: org.apache.ignite.IgniteCheckedException: Could not find start pointer for partition [part=4, partCntrSince=1115]
> 	at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.searchEarliestWalPointer(CheckpointHistory.java:557) ~[ignite-core.jar]
> 	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.historicalIterator(GridCacheOffheapManager.java:1121) ~[ignite-core.jar]
> 	at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.rebalanceIterator(IgniteCacheOffheapManagerImpl.java:1195) ~[ignite-core.jar]
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:322) ~[ignite-core.jar]
> 	... 16 more
> {noformat}
> I believe that it should throw IgniteHistoricalIteratorException instead of IgniteCheckedException, so it can be properly handled and rebalance can move to the full rebalance instead of killing nodes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)