You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@yunikorn.apache.org by "Wilfred Spiegelenburg (Jira)" <ji...@apache.org> on 2023/05/03 13:06:00 UTC

[jira] [Resolved] (YUNIKORN-1714) Fatal error: concurrent write/read when calling Queue.RemoveApplication()

     [ https://issues.apache.org/jira/browse/YUNIKORN-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wilfred Spiegelenburg resolved YUNIKORN-1714.
---------------------------------------------
    Fix Version/s: 1.3.0
       Resolution: Fixed

change committed, thanks for the quick fix

> Fatal error: concurrent write/read when calling Queue.RemoveApplication()
> -------------------------------------------------------------------------
>
>                 Key: YUNIKORN-1714
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1714
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.3.0
>
>
> Encountered this problem when doing some local testing with lot of running applications:
> {noformat}
> fatal error: concurrent map read and map write
> goroutine 8785 [running]:
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).RemoveApplication(0xc0002e0840, 0xc004a1cc40)
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/queue.go:697 +0x65
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).UnSetQueue(0xc004a1cc40)
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1493 +0x45
> github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).moveTerminatedApp(0xc0002aa600, {0xc00372e4e0, 0x16})
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/partition.go:1409 +0x73
> created by github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).executeTerminatedCallback
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1831 +0xaa
> ...
> goroutine 8782 [runnable]:
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).timeoutStateTimer.func1()
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:298
> created by time.goFunc
> 	/snap/go/current/src/time/sleep.go:176 +0x32
> goroutine 8623 [runnable]:
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).executeTerminatedCallback.func1()
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1831
> runtime.goexit()
> 	/snap/go/current/src/runtime/asm_amd64.s:1598 +0x1
> created by github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).executeTerminatedCallback
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1831 +0xaa
> goroutine 8786 [runnable]:
> go.uber.org/zap.(*stacktrace).Next(...)
> 	/home/bacskop/go/pkg/mod/go.uber.org/zap@v1.24.0/stacktrace.go:127
> go.uber.org/zap.(*Logger).check(0xc0003bb650, 0x0, {0x1e6c20c, 0x2c})
> 	/home/bacskop/go/pkg/mod/go.uber.org/zap@v1.24.0/logger.go:372 +0x7e5
> go.uber.org/zap.(*Logger).Info(0xc0002e0420?, {0x1e6c20c?, 0x1?}, {0xc005745680, 0x2, 0x2})
> 	/home/bacskop/go/pkg/mod/go.uber.org/zap@v1.24.0/logger.go:219 +0x3b
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).RemoveApplication(0xc0002e0840, 0xc004aa0380)
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/queue.go:742 +0xcc6
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).UnSetQueue(0xc004aa0380)
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1493 +0x45
> github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).moveTerminatedApp(0xc0002aa600, {0xc00372e498, 0x16})
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/partition.go:1409 +0x73
> created by github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).executeTerminatedCallback
> 	/home/bacskop/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1831 +0xaa
> {noformat}
> There is an unprotected access to {{sq.applications[]}}, the code checks if an application exist without locking. But this can fail because the map can be modified concurrently, which Go detects and does not allow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org