You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Craig Condit (Jira)" <ji...@apache.org> on 2022/06/01 23:18:00 UTC

[jira] [Comment Edited] (YUNIKORN-1228) Race condition when serializing K8s objects

    [ https://issues.apache.org/jira/browse/YUNIKORN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545194#comment-17545194 ] 

Craig Condit edited comment on YUNIKORN-1228 at 6/1/22 11:17 PM:
-----------------------------------------------------------------

The fix is to always clone objects that we are sending to K8s (events, pod / node updates, etc.)

The methods that serialize these objects actually modify them, so they must be copied first. Locking won't work, especially for events, as they may send data asynchronously.


was (Author: ccondit):
The fix is to always clone objects that we are sending to K8s (events, pod / node updates, etc.)

The methods that serialize these objects actually modify them, so they must be copied first.

> Race condition when serializing K8s objects
> -------------------------------------------
>
>                 Key: YUNIKORN-1228
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1228
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - kubernetes
>            Reporter: Craig Condit
>            Assignee: Craig Condit
>            Priority: Major
>
> A race was recently uncovered during testing:
>  
> {noformat}
> ==================
> WARNING: DATA RACE
> Read at 0x00c000581810 by goroutine 46:
>   k8s.io/apimachinery/pkg/apis/meta/v1.(*TypeMeta).GroupVersionKind()
>       /home/testuser/go/pkg/mod/k8s.io/apimachinery@v0.20.11/pkg/apis/meta/v1/meta.go:128 +0x64
>   k8s.io/client-go/tools/reference.GetReference()
>       /home/testuser/go/pkg/mod/k8s.io/client-go@v0.20.11/tools/reference/ref.go:59 +0x17d
>   k8s.io/client-go/tools/events.(*recorderImpl).Eventf()
>       /home/testuser/go/pkg/mod/k8s.io/client-go@v0.20.11/tools/events/event_recorder.go:46 +0x119
>   github.com/apache/yunikorn-k8shim/pkg/plugin/predicates.(*predicateManagerImpl).predicatesAllocate()
>       /home/testuser/repos/incubator-yunikorn-k8shim/pkg/plugin/predicates/predicate_manager.go:80 +0x36f
>   github.com/apache/yunikorn-k8shim/pkg/plugin/predicates.(*predicateManagerImpl).Predicates()
>       /home/testuser/repos/incubator-yunikorn-k8shim/pkg/plugin/predicates/predicate_manager.go:64 +0x52
>   github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).IsPodFitNode()
>       /home/testuser/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:341 +0x241
>   github.com/apache/yunikorn-k8shim/pkg/callback.(*AsyncRMCallback).Predicates()
>       /home/testuser/repos/incubator-yunikorn-k8shim/pkg/callback/scheduler_callback.go:187 +0xb7
>   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Node).preConditions()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/objects/node.go:386 +0x1c7
>   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Node).preAllocateConditions()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/objects/node.go:368 +0xe4
>   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryNode()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1190 +0xe7
>   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryNodes()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:1112 +0x7c4
>   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).tryAllocate()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/objects/application.go:849 +0x7a4
>   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).TryAllocate()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/objects/queue.go:1070 +0x18c
>   github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).TryAllocate()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/objects/queue.go:1082 +0xf7
>   github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).tryAllocate()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/partition.go:831 +0x15c
>   github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).schedule()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/context.go:137 +0x1b6
>   github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).internalSchedule()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/scheduler.go:77 +0x47
>   github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.func2()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/scheduler.go:67 +0x39Previous write at 0x00c000581810 by goroutine 47:
>   k8s.io/apimachinery/pkg/apis/meta/v1.(*TypeMeta).SetGroupVersionKind()
>       /home/testuser/go/pkg/mod/k8s.io/apimachinery@v0.20.11/pkg/apis/meta/v1/meta.go:123 +0x190
>   k8s.io/apimachinery/pkg/runtime.WithVersionEncoder.Encode()
>       /home/testuser/go/pkg/mod/k8s.io/apimachinery@v0.20.11/pkg/runtime/helper.go:241 +0x408
>   k8s.io/apimachinery/pkg/runtime.(*WithVersionEncoder).Encode()
>       <autogenerated>:1 +0xfb
>   k8s.io/apimachinery/pkg/runtime.Encode()
>       /home/testuser/go/pkg/mod/k8s.io/apimachinery@v0.20.11/pkg/runtime/codec.go:50 +0xb3
>   k8s.io/client-go/rest.(*Request).Body()
>       /home/testuser/go/pkg/mod/k8s.io/client-go@v0.20.11/rest/request.go:453 +0x87c
>   k8s.io/client-go/kubernetes/typed/core/v1.(*pods).UpdateStatus()
>       /home/testuser/go/pkg/mod/k8s.io/client-go@v0.20.11/kubernetes/typed/core/v1/pod.go:152 +0x2ec
>   github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).updatePodCondition()
>       /home/testuser/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:821 +0x83c
>   github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).HandleContainerStateUpdate()
>       /home/testuser/repos/incubator-yunikorn-k8shim/pkg/cache/context.go:858 +0x2c6
>   github.com/apache/yunikorn-k8shim/pkg/callback.(*AsyncRMCallback).UpdateContainerSchedulingState()
>       /home/testuser/repos/incubator-yunikorn-k8shim/pkg/callback/scheduler_callback.go:198 +0x45
>   github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).inspectOutstandingRequests()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/scheduler.go:179 +0x535
>   github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).internalInspectOutstandingRequests()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/scheduler.go:94 +0x38
>   github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService.func3()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/scheduler.go:68 +0x39Goroutine 46 (running) created at:
>   github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/scheduler.go:67 +0x384
>   github.com/apache/yunikorn-core/pkg/entrypoint.startAllServicesWithParameters()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/entrypoint/entrypoint.go:90 +0x624
>   github.com/apache/yunikorn-core/pkg/entrypoint.StartAllServices()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/entrypoint/entrypoint.go:44 +0x4f
>   github.com/apache/yunikorn-core/pkg/entrypoint.StartAllServicesWithLogger()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/entrypoint/entrypoint.go:55 +0x3b
>   main.main()
>       /home/testuser/repos/incubator-yunikorn-k8shim/pkg/cmd/shim/main.go:50 +0x4c4Goroutine 47 (running) created at:
>   github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/scheduler/scheduler.go:68 +0x3f0
>   github.com/apache/yunikorn-core/pkg/entrypoint.startAllServicesWithParameters()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/entrypoint/entrypoint.go:90 +0x624
>   github.com/apache/yunikorn-core/pkg/entrypoint.StartAllServices()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/entrypoint/entrypoint.go:44 +0x4f
>   github.com/apache/yunikorn-core/pkg/entrypoint.StartAllServicesWithLogger()
>       /home/testuser/repos/incubator-yunikorn-core/pkg/entrypoint/entrypoint.go:55 +0x3b
>   main.main()
>       /home/testuser/repos/incubator-yunikorn-k8shim/pkg/cmd/shim/main.go:50 +0x4c4
> =================={noformat}
> This seems to be due to the serialization that K8s does whens ending objects over the wire. If the version component of a GroupKindVersion object (which is a member of every K8s object) is not set, it is updated in-line. This transforms what would seemingly be a read operation into a read-write operation. We need to create defensive copies before calling methods which will serialize a K8s object over the wire, including event generation and updates to nodes or pods.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org