You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Oleg Ignatenko (JIRA)" <ji...@apache.org> on 2019/01/15 21:13:00 UTC
[jira] [Commented] (IGNITE-10518) MVCC: Update operation may hangs
on backup on unstable topology.
[ https://issues.apache.org/jira/browse/IGNITE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743399#comment-16743399 ]
Oleg Ignatenko commented on IGNITE-10518:
-----------------------------------------
(x) Teamcity history for reproducer ([IgniteTxCachePrimarySyncTest0.testSingleKeyCommitFromPrimary|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=4989034880085631279&tab=testDetails]) suggests that problem hasn't been fixed in any imaginable way: I checked last 100 execution results for about 30 days since Dec 16 2018 and all of them without any exception show all the same "muted failure" result:
{noformat}
Test status Duration Build Info Changes Agent
Muted failure 18ms … MVCC Cache 9 pull/5823/head #1023 Tests passed: 10, muted: 9 andrey.mashenk… (2) 14 Jan 19 17:34 publicagent17_9096
Muted failure 12ms … MVCC Cache 9 refs/heads/master #1020 Tests passed: 10, muted: 9 No changes 14 Jan 19 14:10 publicagent07_9092
Muted failure 24ms … MVCC Cache 9 refs/heads/master #1019 Tests passed: 10, muted: 9 No changes 14 Jan 19 13:06 publicagent13_9096
Muted failure 17ms … MVCC Cache 9 refs/heads/master #1018 Tests passed: 10, muted: 9 Changes (2) 14 Jan 19 12:17 publicagent10_9092
Muted failure 18ms … MVCC Cache 9 refs/heads/master #1017 Tests passed: 10, muted: 9 Changes (2) 14 Jan 19 11:16 publicagent14_9096
Muted failure 14ms … MVCC Cache 9 refs/heads/master #1016 Tests passed: 10, muted: 9 No changes 14 Jan 19 10:06 publicagent11_9092
Muted failure 15ms … MVCC Cache 9 refs/heads/master #1015 Tests passed: 10, muted: 9 No changes 14 Jan 19 09:17 publicagent10_9096
Muted failure 12ms … MVCC Cache 9 refs/heads/master #1014 Tests passed: 10, muted: 9 No changes 14 Jan 19 08:28 publicagent11_9092
Muted failure 25ms … MVCC Cache 9 refs/heads/master #1013 Tests passed: 10, muted: 9 No changes 14 Jan 19 07:36 publicagent17_9091
Muted failure 16ms … MVCC Cache 9 refs/heads/master #1012 Tests passed: 10, muted: 9 No changes 14 Jan 19 06:46 publicagent11_9096
Muted failure 26ms … MVCC Cache 9 refs/heads/master #1011 Tests passed: 10, muted: 9 No changes 14 Jan 19 05:56 publicagent09_9094
Muted failure 8ms … MVCC Cache 9 refs/heads/master #1010 Tests passed: 10, muted: 9 No changes 14 Jan 19 05:07 publicagent17_9092
Muted failure 18ms … MVCC Cache 9 refs/heads/master #1009 Tests passed: 10, muted: 9 No changes 14 Jan 19 04:16 publicagent15_9094
Muted failure 18ms … MVCC Cache 9 refs/heads/master #1008 Tests passed: 10, muted: 9 No changes 14 Jan 19 03:26 publicagent16_9096
Muted failure 25ms … MVCC Cache 9 refs/heads/master #1007 Tests passed: 10, muted: 9 No changes 14 Jan 19 01:56 publicagent14_9094
Muted failure 10ms … MVCC Cache 9 refs/heads/master #1006 Tests passed: 10, muted: 9 No changes 14 Jan 19 01:06 publicagent16_9093
Muted failure 20ms … MVCC Cache 9 pull/5814/head #1005 Tests passed: 9, ignored: 1, muted: 9 Oleg Ignatenko (79) 14 Jan 19 00:16 publicagent06_9092
Muted failure 13ms … MVCC Cache 9 refs/heads/master #1004 Tests passed: 10, muted: 9 No changes 13 Jan 19 23:47 publicagent16_9092
... etc{noformat}
----
I happened to find it out when re-running TC bot to get visa for IGNITE-10796 because I picked unmuted test from master. I re-run MVCC 9 suite several times and every time it failed with execution timeout and it passed only after I suppressed execution of reproducer back again.
Typical thread dump I observed from timed out test:
{noformat}
"sys-stripe-0-#557%distributed.IgniteTxCachePrimarySyncTest0%" #631 prio=5 os_prio=0 tid=0x00007f7861d06000 nid=0x73583 waiting on condition [0x00007f7817af9000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.StripedExecutor$StripeConcurrentQueue.take(StripedExecutor.java:672)
at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:494)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748){noformat}
(i) Reopening the ticket because of above. In case if I am mistaken - [~amashenkov], [~gvvinblade], if you can provide successful teamcity execution results for this test case (or better yet, TC bot visa for this PR) then please feel free to close it again.
> MVCC: Update operation may hangs on backup on unstable topology.
> -----------------------------------------------------------------
>
> Key: IGNITE-10518
> URL: https://issues.apache.org/jira/browse/IGNITE-10518
> Project: Ignite
> Issue Type: Bug
> Components: mvcc
> Reporter: Andrew Mashenkov
> Assignee: Andrew Mashenkov
> Priority: Critical
> Labels: Hanging, failover, mvcc_stabilization_stage_1
> Fix For: 2.8
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Update operation may hangs on backup awaiting next topology.
> Symptoms:
> # Exchange for topology version 6.1 has been finished.
> # Exchange for topology version 6.2 awaits for partition release.
> # DhtTxRemote waits for exchange.
> Seems, tx maps on outdated topology version.
> Reproducer IgniteTxCachePrimarySyncTest.testSingleKeyCommit() in Mvcc mode.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)