You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Anton Vinogradov (JIRA)" <ji...@apache.org> on 2018/07/12 14:10:00 UTC
[jira] [Commented] (IGNITE-8783) Failover tests periodically cause
hanging of the whole Data Structures suite on TC
[ https://issues.apache.org/jira/browse/IGNITE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541697#comment-16541697 ]
Anton Vinogradov commented on IGNITE-8783:
------------------------------------------
Hang reason found
at {{org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager#createClientLatch}}
you can see code
{noformat}
// There is final ack for created latch.
if (pendingAcks.containsKey(latchId)) {
latch.complete();
pendingAcks.remove(latchId); // this cause pending acks loss when coordinator failure was not handled yet (eg. we handling another node fail)
}
else
clientLatches.put(latchId, latch);
{noformat}
so, I propose to replace this code with simple
{noformat}
clientLatches.put(latchId, latch);
{noformat}
[~Jokser],
Could you please explain idea of handling final message from old_coordinator?
As far as I see - latches will be recreated on each topology change and acks will be resent.
> Failover tests periodically cause hanging of the whole Data Structures suite on TC
> ----------------------------------------------------------------------------------
>
> Key: IGNITE-8783
> URL: https://issues.apache.org/jira/browse/IGNITE-8783
> Project: Ignite
> Issue Type: Bug
> Components: data structures
> Reporter: Ivan Rakov
> Assignee: Anton Vinogradov
> Priority: Major
> Labels: MakeTeamcityGreenAgain
>
> History of suite runs: https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_DataStructures&tab=buildTypeHistoryList&branch_IgniteTests24Java8=%3Cdefault%3E
> Chance of suite hang is 18% in master (based on previous 50 runs).
> Hang is always caused by one of the following failover tests:
> {noformat}
> GridCacheReplicatedDataStructuresFailoverSelfTest#testAtomicSequenceConstantTopologyChange
> GridCachePartitionedDataStructuresFailoverSelfTest#testFairReentrantLockConstantTopologyChangeNonFailoverSafe
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)