You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Oleg Ignatenko (JIRA)" <ji...@apache.org> on 2019/01/15 21:13:00 UTC

[jira] [Commented] (IGNITE-10518) MVCC: Update operation may hangs on backup on unstable topology.

    [ https://issues.apache.org/jira/browse/IGNITE-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743399#comment-16743399 ] 

Oleg Ignatenko commented on IGNITE-10518:
-----------------------------------------

(x) Teamcity history for reproducer ([IgniteTxCachePrimarySyncTest0.testSingleKeyCommitFromPrimary|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=4989034880085631279&tab=testDetails]) suggests that problem hasn't been fixed in any imaginable way: I checked last 100 execution results for about 30 days since Dec 16 2018 and all of them without any exception show all the same "muted failure" result:
{noformat}
Test status	Duration	 	Build Info	Changes		Agent
Muted failure	18ms		… MVCC Cache 9	pull/5823/head	#1023	Tests passed: 10, muted: 9 	andrey.mashenk… (2) 	14 Jan 19 17:34	publicagent17_9096
Muted failure	12ms		… MVCC Cache 9	refs/heads/master	#1020	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 14:10	publicagent07_9092
Muted failure	24ms		… MVCC Cache 9	refs/heads/master	#1019	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 13:06	publicagent13_9096
Muted failure	17ms		… MVCC Cache 9	refs/heads/master	#1018	Tests passed: 10, muted: 9 	Changes (2) 	14 Jan 19 12:17	publicagent10_9092
Muted failure	18ms		… MVCC Cache 9	refs/heads/master	#1017	Tests passed: 10, muted: 9 	Changes (2) 	14 Jan 19 11:16	publicagent14_9096
Muted failure	14ms		… MVCC Cache 9	refs/heads/master	#1016	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 10:06	publicagent11_9092
Muted failure	15ms		… MVCC Cache 9	refs/heads/master	#1015	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 09:17	publicagent10_9096
Muted failure	12ms		… MVCC Cache 9	refs/heads/master	#1014	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 08:28	publicagent11_9092
Muted failure	25ms		… MVCC Cache 9	refs/heads/master	#1013	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 07:36	publicagent17_9091
Muted failure	16ms		… MVCC Cache 9	refs/heads/master	#1012	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 06:46	publicagent11_9096
Muted failure	26ms		… MVCC Cache 9	refs/heads/master	#1011	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 05:56	publicagent09_9094
Muted failure	8ms		… MVCC Cache 9	refs/heads/master	#1010	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 05:07	publicagent17_9092
Muted failure	18ms		… MVCC Cache 9	refs/heads/master	#1009	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 04:16	publicagent15_9094
Muted failure	18ms		… MVCC Cache 9	refs/heads/master	#1008	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 03:26	publicagent16_9096
Muted failure	25ms		… MVCC Cache 9	refs/heads/master	#1007	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 01:56	publicagent14_9094
Muted failure	10ms		… MVCC Cache 9	refs/heads/master	#1006	Tests passed: 10, muted: 9 	No changes 	14 Jan 19 01:06	publicagent16_9093
Muted failure	20ms		… MVCC Cache 9	pull/5814/head	#1005	Tests passed: 9, ignored: 1, muted: 9 	Oleg Ignatenko (79) 	14 Jan 19 00:16	publicagent06_9092
Muted failure	13ms		… MVCC Cache 9	refs/heads/master	#1004	Tests passed: 10, muted: 9 	No changes 	13 Jan 19 23:47	publicagent16_9092
... etc{noformat}
----
I happened to find it out when re-running TC bot to get visa for IGNITE-10796 because I picked unmuted test from master. I re-run MVCC 9 suite several times and every time it failed with execution timeout and it passed only after I suppressed execution of reproducer back again.

Typical thread dump I observed from timed out test:
{noformat}
"sys-stripe-0-#557%distributed.IgniteTxCachePrimarySyncTest0%" #631 prio=5 os_prio=0 tid=0x00007f7861d06000 nid=0x73583 waiting on condition [0x00007f7817af9000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
	at org.apache.ignite.internal.util.StripedExecutor$StripeConcurrentQueue.take(StripedExecutor.java:672)
	at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:494)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
	at java.lang.Thread.run(Thread.java:748){noformat}
(i) Reopening the ticket because of above. In case if I am mistaken - [~amashenkov], [~gvvinblade], if you can provide successful teamcity execution results for this test case (or better yet, TC bot visa for this PR) then please feel free to close it again.

> MVCC: Update operation may hangs on backup on unstable topology. 
> -----------------------------------------------------------------
>
>                 Key: IGNITE-10518
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10518
>             Project: Ignite
>          Issue Type: Bug
>          Components: mvcc
>            Reporter: Andrew Mashenkov
>            Assignee: Andrew Mashenkov
>            Priority: Critical
>              Labels: Hanging, failover, mvcc_stabilization_stage_1
>             Fix For: 2.8
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update operation may hangs on backup awaiting next topology.
> Symptoms: 
>  # Exchange for topology version 6.1 has been finished.
>  # Exchange for topology version 6.2 awaits for partition release.
>  # DhtTxRemote waits for exchange.
> Seems, tx maps on outdated topology version.
> Reproducer IgniteTxCachePrimarySyncTest.testSingleKeyCommit()  in Mvcc mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)