You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2022/01/24 20:59:26 UTC

[GitHub] [incubator-yunikorn-k8shim] craigcondit opened a new pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

craigcondit opened a new pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359


   ### What is this PR for?
   Fix binding of dynamic volumes to pods. This is broken after YUNIKORN-872. Also add additional error logging.
   
   ### What type of PR is it?
   * [x] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/YUNIKORN-1041
   
   ### How should this be tested?
   Manual testing for now, e2e tests to be added to YUNIKORN-1039.
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   
   <img width="1260" alt="error-0 12 1" src="https://user-images.githubusercontent.com/12699633/150863728-544578da-0d64-43ff-8fdc-5e5b8a62928a.png">
   
   <img width="1270" alt="no-error-fix" src="https://user-images.githubusercontent.com/12699633/150863293-a98aae12-0936-485a-a891-ee41679e6e97.png">
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r792001553



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))

Review comment:
       We should still check it during binding to ensure that the status has not changed though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791158077



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {

Review comment:
       This is the actual fix (ensure these slices are empty rather than nil).

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       This is the actual fix -- don't allow the slices within volumes to be nil.

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}

Review comment:
       This is based on upstream K8s; if we have unbound immediate claims, we can't satisfy this.

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}

Review comment:
       This was added based on upstream K8s. If any reasons are returned from FindPodVolumes, there are conflicts and we must fail.

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       > We need tests for this code path.
   
   I don't disagree. However, currently, the entirety of volume binding is disabled in test mode. Mocking out the proper interactions is going to be a pretty major undertaking.

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}

Review comment:
       Exactly.

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))

Review comment:
       I agree, we probably need to test for this earlier. I'll work on trying to get this working earlier as well. Your suggestion as to where makes sense -- I will investigate that.

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)

Review comment:
       We probably do. I missed that one.

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       > We need tests for this code path.
   
   I don't disagree. However, currently, the entirety of volume binding is disabled in test mode. Mocking out the proper interactions is going to be a pretty major undertaking. Any ideas how best to proceed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
yangwwei commented on pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#issuecomment-1020706767


   Do you plan to add any UT or e2e test for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791303545



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       > We need tests for this code path.
   
   I don't disagree. However, currently, the entirety of volume binding is disabled in test mode. Mocking out the proper interactions is going to be a pretty major undertaking. Any ideas how best to proceed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] commented on pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#issuecomment-1020549509


   # [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#359](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (17ecebb) into [master](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/commit/799a42ed9b247815fd344f70e0df27d6b83f90d2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (799a42e) will **decrease** coverage by `0.49%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359/graphs/tree.svg?width=650&height=150&src=pr&token=LZImIuvleR&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #359      +/-   ##
   ==========================================
   - Coverage   62.90%   62.41%   -0.50%     
   ==========================================
     Files          41       41              
     Lines        6104     6152      +48     
   ==========================================
     Hits         3840     3840              
   - Misses       2112     2160      +48     
     Partials      152      152              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [pkg/cache/context.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL2NvbnRleHQuZ28=) | `34.69% <0.00%> (-2.52%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [799a42e...17ecebb](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r792000295



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))

Review comment:
       Looking at this a bit more, I don't believe we need to worry about this. The existing VolumeBinder plugin gets called as part of pod / node evaluation, so this logic (which comes from PreFilter) should already fire prior to us attempting to bind the node.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] wilfred-s commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791272906



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}

Review comment:
       looks good

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       This prevents the early return in the K8s code. The K8s API code expects an empty array and never a nil for any success case. Even in the K8s test code a nil seems to be replaced with an empty array for success cases if I read the code correctly.

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)

Review comment:
       This same call is made in `AssumePod` line 457 why do we not need the change there?

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       The nil checks for the `volumes.StaticBindings` and `volumes.DynamicProvisions` have been in the code since k8s 1.12. The real cause is a flow on effect of the change in the `FindPodVolumes` return value as part of a cleanup commit.  That was added in k8s 1.19 and later: 0 length slices are replaced with nil values:
   https://gitlab.cncf.ci/kubernetes/kubernetes/commit/2cdc63aeaab603c61b48e95c044f1e4bb2f32548
   
   We need tests for this code path.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r792028645



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       Given the major surgery required to get more of context.go tested, I'd like to move forward with this PR as-is to unblock 0.12.2 and add e2e tests in YUNIKORN-1039 targeting 1.0.0. @wilfred-s does that sound ok with you? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791159232



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}

Review comment:
       This is based on upstream K8s; if we have unbound immediate claims, we can't satisfy this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#issuecomment-1021507724


   > `bindPodVolumes` from the context is called from an event in the task. That task event should have dispatched a failure which I assumed should have failed the whole task which should have triggered a retry etc.
   > 
   > If there was no failure there is an error handling issue. These changes do not change this at all. Does that mean that volume bind failures are not handled at all? Having a warning event does not equate to a failure does it?
   
   `task.postTaskAllocated()` dispatches a `FailTaskEvent` if `err != nil` is returned from `context.bindPodVolumes()`, so I think we're good there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
yangwwei commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791274687



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}

Review comment:
       Good catch.
   
   ```
   volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
   ```
   Seems like there will be a case where `err==nil` but `len(reasons) > 0`
   
   > // It returns an error when something went wrong or a list of reasons why the node is
   	// (currently) not usable for the pod.
   

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       why does this fix the problem? 
   I did not see anywhere to complain NIL value

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))

Review comment:
       Is this the right place to check the unbound immediate claims? Maybe too late? The PVC relies on a PV controller to do the binding, before this happens, the default scheduler [skips scheduling the pod] in the PreFilter phase (https://github.com/kubernetes/kubernetes/blob/c592bd40f2df941aa4ea364592ce92fd5c669bfc/pkg/scheduler/framework/plugins/volumebinding/volume_binding.go#L185-L187). Our code here is pretty late, and returning an error means we failed to schedule this pod, and user will get a "PodVolumesBindFailure" error in the pod. Would it make sense to do this check [here](https://github.com/apache/incubator-yunikorn-k8shim/blob/799a42ed9b247815fd344f70e0df27d6b83f90d2/pkg/cache/application.go#L414)? This is a sanity check to ensure that a pod is ready for scheduling.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] wilfred-s commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791276068



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       This prevents the early return in the K8s code. The K8s API code expects an empty array and never a nil for any success case. Even in the K8s test code a nil seems to be replaced with an empty array for success cases if I read the code correctly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791303545



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       > We need tests for this code path.
   
   I don't disagree. However, currently, the entirety of volume binding is disabled in test mode. Mocking out the proper interactions is going to be a pretty major undertaking.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791304588



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)

Review comment:
       We probably do. I missed that one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] wilfred-s commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791272906



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}

Review comment:
       looks good




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r792002515



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)

Review comment:
       actually, i don't think we do. This logic will have already run via the PreFilter() / Filter() calls on the upstream VolumeBinder plugin (we call these as part of node fitness checks).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
yangwwei commented on pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#issuecomment-1020706767


   Do you plan to add any UT or e2e test for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791304035



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))

Review comment:
       I agree, we probably need to test for this earlier. I'll work on trying to get this working earlier as well. Your suggestion as to where makes sense -- I will investigate that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#issuecomment-1020743460


   > Do you plan to add any UT or e2e test for this?
   
   Yes, e2e tests will be added in YUNIKORN-1039. Those will take time to develop and this issue is a blocker for 0.12.2 so I don't think we should hold up the fix for the e2e tests to be developed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791303676



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}

Review comment:
       Exactly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791158892



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       This is the actual fix -- don't allow the slices within volumes to be nil.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791159646



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}

Review comment:
       This was added based on upstream K8s. If any reasons are returned from FindPodVolumes, there are conflicts and we must fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] wilfred-s commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r793447909



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       yes that is good




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] commented on pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#issuecomment-1020549509


   # [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#359](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (17ecebb) into [master](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/commit/799a42ed9b247815fd344f70e0df27d6b83f90d2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (799a42e) will **decrease** coverage by `0.49%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359/graphs/tree.svg?width=650&height=150&src=pr&token=LZImIuvleR&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #359      +/-   ##
   ==========================================
   - Coverage   62.90%   62.41%   -0.50%     
   ==========================================
     Files          41       41              
     Lines        6104     6152      +48     
   ==========================================
     Hits         3840     3840              
   - Misses       2112     2160      +48     
     Partials      152      152              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [pkg/cache/context.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL2NvbnRleHQuZ28=) | `34.69% <0.00%> (-2.52%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [799a42e...17ecebb](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/359?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#issuecomment-1020743460


   > Do you plan to add any UT or e2e test for this?
   
   Yes, e2e tests will be added in YUNIKORN-1039. Those will take time to develop and this issue is a blocker for 0.12.2 so I don't think we should hold up the fix for the e2e tests to be developed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791158077



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {

Review comment:
       This is the actual fix (ensure these slices are empty rather than nil).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
yangwwei commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791274687



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}

Review comment:
       Good catch.
   
   ```
   volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
   ```
   Seems like there will be a case where `err==nil` but `len(reasons) > 0`
   
   > // It returns an error when something went wrong or a list of reasons why the node is
   	// (currently) not usable for the pod.
   

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       why does this fix the problem? 
   I did not see anywhere to complain NIL value

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))

Review comment:
       Is this the right place to check the unbound immediate claims? Maybe too late? The PVC relies on a PV controller to do the binding, before this happens, the default scheduler [skips scheduling the pod] in the PreFilter phase (https://github.com/kubernetes/kubernetes/blob/c592bd40f2df941aa4ea364592ce92fd5c669bfc/pkg/scheduler/framework/plugins/volumebinding/volume_binding.go#L185-L187). Our code here is pretty late, and returning an error means we failed to schedule this pod, and user will get a "PodVolumesBindFailure" error in the pod. Would it make sense to do this check [here](https://github.com/apache/incubator-yunikorn-k8shim/blob/799a42ed9b247815fd344f70e0df27d6b83f90d2/pkg/cache/application.go#L414)? This is a sanity check to ensure that a pod is ready for scheduling.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] wilfred-s commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r791278454



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)

Review comment:
       This same call is made in `AssumePod` line 457 why do we not need the change there?

##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
 				return err
 			}
 			node, err := ctx.schedulerCache.GetNodeInfo(assumedPod.Spec.NodeName)
 			if err != nil {
+				log.Logger().Error("Failed to get node info",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Error(err))
+				return err
+			}
+			volumes, reasons, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if err != nil {
+				log.Logger().Error("Failed to find pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
 				return err
 			}
-			volumes, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.FindPodVolumes(assumedPod, boundClaims, claimsToBind, node)
+			if len(reasons) > 0 {
+				sReasons := make([]string, 0)
+				for _, reason := range reasons {
+					sReasons = append(sReasons, string(reason))
+				}
+				sReason := strings.Join(sReasons, ", ")
+				err = fmt.Errorf("pod %s has conflicting volume claims: %s", pod.Name, sReason)
+				log.Logger().Error("Pod has conflicting volume claims",
+					zap.String("podName", assumedPod.Name),
+					zap.String("nodeName", assumedPod.Spec.NodeName),
+					zap.Int("claimsToBind", len(claimsToBind)),
+					zap.Error(err))
+				return err
+			}
+			if volumes.StaticBindings == nil {
+				// convert nil to empty array
+				volumes.StaticBindings = make([]*scheduling.BindingInfo, 0)
+			}
+			if volumes.DynamicProvisions == nil {
+				// convert nil to empty array
+				volumes.DynamicProvisions = make([]*v1.PersistentVolumeClaim, 0)
+			}

Review comment:
       The nil checks for the `volumes.StaticBindings` and `volumes.DynamicProvisions` have been in the code since k8s 1.12. The real cause is a flow on effect of the change in the `FindPodVolumes` return value as part of a cleanup commit.  That was added in k8s 1.19 and later: 0 length slices are replaced with nil values:
   https://gitlab.cncf.ci/kubernetes/kubernetes/commit/2cdc63aeaab603c61b48e95c044f1e4bb2f32548
   
   We need tests for this code path.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on a change in pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
yangwwei commented on a change in pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#discussion_r792014909



##########
File path: pkg/cache/context.go
##########
@@ -365,19 +366,68 @@ func (ctx *Context) bindPodVolumes(pod *v1.Pod) error {
 				zap.String("podName", pod.Name))
 		} else {
 			log.Logger().Info("Binding Pod Volumes", zap.String("podName", pod.Name))
-			boundClaims, claimsToBind, _, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
+			boundClaims, claimsToBind, unboundClaimsImmediate, err := ctx.apiProvider.GetAPIs().VolumeBinder.GetPodVolumes(assumedPod)
 			if err != nil {
+				log.Logger().Error("Failed to get pod volumes",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))
+				return err
+			}
+			if len(unboundClaimsImmediate) > 0 {
+				err = fmt.Errorf("pod %s has unbound immediate claims", pod.Name)
+				log.Logger().Error("Pod has unbound immediate claims",
+					zap.String("podName", assumedPod.Name),
+					zap.Error(err))

Review comment:
       I see, thanks. Looks like this is already covered in [predict manager](https://github.com/apache/incubator-yunikorn-k8shim/blob/799a42ed9b247815fd344f70e0df27d6b83f90d2/pkg/plugin/predicates/predicate_manager.go#L86), that's good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit commented on pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit commented on pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359#issuecomment-1022310441


   Moving forward with this to unblock 0.12.2 release. @wilfred-s, if you have additional concerns, let's address in a follow-up patch for 1.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] craigcondit closed pull request #359: [YUNIKORN-1041] Fix binding of dynamic volumes to pod

Posted by GitBox <gi...@apache.org>.
craigcondit closed pull request #359:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/359


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org