You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Wilfred Spiegelenburg (Jira)" <ji...@apache.org> on 2022/03/09 02:32:00 UTC
[jira] [Reopened] (YUNIKORN-1108) Fix race conditions in dispatcher_test.go

     [ https://issues.apache.org/jira/browse/YUNIKORN-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wilfred Spiegelenburg reopened YUNIKORN-1108:
---------------------------------------------

The race is created by the overwrite of the event channel in the {{TestEventWillNotBeLostWhenEventChannelIsFull()}}

Moving the Stop call around might have fixed it for now but the race is not fixed. The small change in the execution order has given the garbage collector a little more time to clean up the go routine that was created from Start. Adding an event handler to the map of event handlers is using a mutex lock. Changing the event channel as we do in that test is always cause for problems.

Everything in the dispatcher relies on the {{once.Do()}} which means we are re-using the dispatcher in the all tests. The real solution is to replace the dispatcher in each test so we know we are clean and cannot cause races between tests. That would require a small change in {{initDispatcher()}} to always create a new instance when called, i.e. remove the nil check. The structure with the {{once.Do()}} is the protection we should not double up. The nil check I see as a bug anyway and it looks like we do not trust {{once.Do()}}
After making that change we need to call {{initDispatcher()}} at the start of each dispatcher test which leaves us with a clean never used before dispatcher. We can still create a new clean dispatcher if we wilfully broke it using a defer

> Fix race conditions in dispatcher_test.go
> -----------------------------------------
>
>                 Key: YUNIKORN-1108
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1108
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - kubernetes, test - unit
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>
> A dispatcher test failed locally and there was also a data race observed:
> {noformat}
> 2022-03-08T16:19:15.795+0100	INFO	log/logger.go:89	scheduler configuration, pretty print	{"configs": "{\n \"schedulerName\": \"yunikorn\",\n \"clusterId\": \"my-kube-cluster\",\n \"clusterVersion\": \"0.1\",\n \"policyGroup\": \"queues\",\n \"schedulingIntervalSecond\": 1000000000,\n \"absoluteKubeConfigFilePath\": \"\",\n \"loggingLevel\": 0,\n \"logEncoding\": \"console\",\n \"logFilePath\": \"\",\n \"volumeBindTimeout\": 10000000000,\n \"testMode\": false,\n \"eventChannelCapacity\": 1048576,\n \"dispatchTimeout\": 300000000000,\n \"kubeQPS\": 1000,\n \"kubeBurst\": 1000,\n \"predicates\": \"\",\n \"operatorPlugins\": \"general,yunikorn-app\",\n \"enableConfigHotRefresh\": false,\n \"disableGangScheduling\": false,\n \"userLabelKey\": \"yunikorn.apache.org/username\"\n}"}
> 2022-03-08T16:19:15.796+0100	INFO	dispatcher/dispatcher.go:80	Init dispatcher	{"EventChannelCapacity": 1048576, "AsyncDispatchLimit": 104857, "DispatchTimeoutInSeconds": 300}
> 2022-03-08T16:19:15.797+0100	INFO	dispatcher/dispatcher.go:183	starting the dispatcher
> 2022-03-08T16:19:15.798+0100	INFO	dispatcher/dispatcher.go:179	dispatcher is draining out
> --- FAIL: TestDispatcherStartStop (0.01s)
>     dispatcher_test.go:119: assertion failed: 1 (int) != 2 (int)
> ==================
> WARNING: DATA RACE
> Write at 0x00c0003d2140 by goroutine 37:
>   github.com/apache/incubator-yunikorn-k8shim/pkg/dispatcher.TestEventWillNotBeLostWhenEventChannelIsFull()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/dispatcher/dispatcher_test.go:145 +0xda
>   testing.tRunner()
>       /snap/go/9028/src/testing/testing.go:1259 +0x22f
>   testing.(*T).Run·dwrap·21()
>       /snap/go/9028/src/testing/testing.go:1306 +0x47
> Previous read at 0x00c0003d2140 by goroutine 35:
>   github.com/apache/incubator-yunikorn-k8shim/pkg/dispatcher.Start.func1()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/dispatcher/dispatcher.go:187 +0x6b
> Goroutine 37 (running) created at:
>   testing.(*T).Run()
>       /snap/go/9028/src/testing/testing.go:1306 +0x726
>   testing.runTests.func1()
>       /snap/go/9028/src/testing/testing.go:1598 +0x99
>   testing.tRunner()
>       /snap/go/9028/src/testing/testing.go:1259 +0x22f
>   testing.runTests()
>       /snap/go/9028/src/testing/testing.go:1596 +0x7ca
>   testing.(*M).Run()
>       /snap/go/9028/src/testing/testing.go:1504 +0x9d1
>   main.main()
>       _testmain.go:99 +0x324
> Goroutine 35 (running) created at:
>   github.com/apache/incubator-yunikorn-k8shim/pkg/dispatcher.Start()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/dispatcher/dispatcher.go:184 +0x84
>   github.com/apache/incubator-yunikorn-k8shim/pkg/dispatcher.TestDispatcherStartStop()
>       /home/bacskop/repos/incubator-yunikorn-k8shim/pkg/dispatcher/dispatcher_test.go:104 +0x189
>   testing.tRunner()
>       /snap/go/9028/src/testing/testing.go:1259 +0x22f
>   testing.(*T).Run·dwrap·21()
>       /snap/go/9028/src/testing/testing.go:1306 +0x47
> ==================
> {noformat}
> The problem is that even if we call {{dispatcher.drain()}} and the event channel is empty, processing an event might still be in progress. 
> The simplest solution seems to be placing {{Stop()}} right after {{dispatcher.drain()}}. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org