You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by GitBox <gi...@apache.org> on 2020/03/20 05:16:03 UTC

[GitHub] [beam] youngoli opened a new pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

youngoli opened a new pull request #11179: [BEAM-3301] Bugfix in DoFn validation.
URL: https://github.com/apache/beam/pull/11179
 
 
   Accidentally left a bug in DoFn validation. It was treating main inputs
   as if they could only be FnValues, when they could also be FnIter or
   FnReIter as well, such as in the case of a GBK or CoGBK, which meant
   valid pipelines were failing validation. This fixes that.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) 
   Portable | --- | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] youngoli commented on issue #11179: [BEAM-3301] Bugfix in DoFn validation.

Posted by GitBox <gi...@apache.org>.
youngoli commented on issue #11179: [BEAM-3301] Bugfix in DoFn validation.
URL: https://github.com/apache/beam/pull/11179#issuecomment-601541864
 
 
   R: @lostluck 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] lostluck commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

Posted by GitBox <gi...@apache.org>.
lostluck commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.
URL: https://github.com/apache/beam/pull/11179#discussion_r395444326
 
 

 ##########
 File path: sdks/go/pkg/beam/pcollection.go
 ##########
 @@ -60,6 +60,12 @@ func (p PCollection) Type() FullType {
 	return p.n.Type()
 }
 
+// OutputsKV returns whether the output of this PCollection are single value
+// elements or KV pairs.
+func (p PCollection) OutputsKV() bool {
 
 Review comment:
   1. No need to have this exported right now, since we can't make a breaking change later. It's only used in the same package. Let's not expand the user surface unless there are good user usages, as a broad API get confusing.
   
   2. IsKV would be a more precise name, since PCollections are a logical representation of all their data, not actually a source or a sink. They can represent KV type or they aren't.
   
   3. This isn't checking if it's a KV type, it's checking if it's a Keyed type or not, since it's also checking if it's a CoGBK.
   
   4. Since this is only used in the one place, it's reasonable to move the conditional there instead of adding the one off helper method.
   
   I'm always willing to hear other opinions!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] youngoli commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

Posted by GitBox <gi...@apache.org>.
youngoli commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.
URL: https://github.com/apache/beam/pull/11179#discussion_r395850295
 
 

 ##########
 File path: sdks/go/pkg/beam/core/graph/fn.go
 ##########
 @@ -446,23 +444,16 @@ func validateMainInputs(fn *Fn, method *funcx.Fn, methodName string, numMainIn m
 		return err
 	}
 
-	// Check that the first numMainIn inputs are not side inputs (Iters or
-	// ReIters). We aren't able to catch singleton side inputs here since
-	// they're indistinguishable from main inputs.
-	mainInputs := method.Param[pos : pos+int(numMainIn)]
-	for i, p := range mainInputs {
-		if p.Kind != funcx.FnValue {
-			err := errors.Errorf("expected main input parameter but found "+
-				"side input parameter in position %v",
-				pos+i)
-			err = errors.SetTopLevelMsgf(err,
-				"Method %v in DoFn %v should have all main inputs before side inputs, "+
-					"but a side input (as Iter or ReIter) appears as parameter %v when a "+
-					"main input was expected.",
-				methodName, fn.Name(), pos+i)
-			err = errors.WithContextf(err, "method %v", methodName)
-			return err
-		}
+	// Check that the first input is not an Iter or ReIter (those aren't valid
+	// as the first main input).
+	first := method.Param[pos].Kind
+	if first != funcx.FnValue {
+		err := errors.New("first main input parameter must be value type")
 
 Review comment:
   I'll just add it in real quick while squashing the commits.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] lostluck commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

Posted by GitBox <gi...@apache.org>.
lostluck commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.
URL: https://github.com/apache/beam/pull/11179#discussion_r395840270
 
 

 ##########
 File path: sdks/go/pkg/beam/core/graph/fn.go
 ##########
 @@ -446,23 +444,16 @@ func validateMainInputs(fn *Fn, method *funcx.Fn, methodName string, numMainIn m
 		return err
 	}
 
-	// Check that the first numMainIn inputs are not side inputs (Iters or
-	// ReIters). We aren't able to catch singleton side inputs here since
-	// they're indistinguishable from main inputs.
-	mainInputs := method.Param[pos : pos+int(numMainIn)]
-	for i, p := range mainInputs {
-		if p.Kind != funcx.FnValue {
-			err := errors.Errorf("expected main input parameter but found "+
-				"side input parameter in position %v",
-				pos+i)
-			err = errors.SetTopLevelMsgf(err,
-				"Method %v in DoFn %v should have all main inputs before side inputs, "+
-					"but a side input (as Iter or ReIter) appears as parameter %v when a "+
-					"main input was expected.",
-				methodName, fn.Name(), pos+i)
-			err = errors.WithContextf(err, "method %v", methodName)
-			return err
-		}
+	// Check that the first input is not an Iter or ReIter (those aren't valid
+	// as the first main input).
+	first := method.Param[pos].Kind
+	if first != funcx.FnValue {
+		err := errors.New("first main input parameter must be value type")
 
 Review comment:
   ...must be a value type..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] youngoli merged pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

Posted by GitBox <gi...@apache.org>.
youngoli merged pull request #11179: [BEAM-3301] Bugfix in DoFn validation.
URL: https://github.com/apache/beam/pull/11179
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] lostluck commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

Posted by GitBox <gi...@apache.org>.
lostluck commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.
URL: https://github.com/apache/beam/pull/11179#discussion_r395838378
 
 

 ##########
 File path: sdks/go/pkg/beam/pcollection.go
 ##########
 @@ -60,6 +60,12 @@ func (p PCollection) Type() FullType {
 	return p.n.Type()
 }
 
+// OutputsKV returns whether the output of this PCollection are single value
+// elements or KV pairs.
+func (p PCollection) OutputsKV() bool {
 
 Review comment:
   That's my usual guideline. If I use it once, keep it in place; twice, copy it; three times, helper function.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] youngoli commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.

Posted by GitBox <gi...@apache.org>.
youngoli commented on a change in pull request #11179: [BEAM-3301] Bugfix in DoFn validation.
URL: https://github.com/apache/beam/pull/11179#discussion_r395836071
 
 

 ##########
 File path: sdks/go/pkg/beam/pcollection.go
 ##########
 @@ -60,6 +60,12 @@ func (p PCollection) Type() FullType {
 	return p.n.Type()
 }
 
+// OutputsKV returns whether the output of this PCollection are single value
+// elements or KV pairs.
+func (p PCollection) OutputsKV() bool {
 
 Review comment:
   I was originally picturing this as a helper function for callers of NewDoFn. It seems easy for future callers to make a mistake and only check if the PCollection is a KV and forget to check for CoGBK, so I thought a helper method would be useful in the future.
   
   1. I missed that pardo.go is in the same package as pcollection.go. I'm also leaning to not expanding the user surface if it's not necessary.
   
   2 & 3. Yeah I was unsure about the name, since it's not technically checking for KVs, I just couldn't think of anything better. IsKeyed sounds good though.
   
   4. That's the other part I was debating. My goal was to make it easy to avoid the mistake in the future, but thinking about it... It seems unlikely that anyone else would even be using this code, and I would expect that if they were they were an advanced user doing something tricky.
   
   I think I'll go with just putting the conditional in pardo.go and adding a comment. We can always turn it into a helper later if it does get used in multiple places.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services