You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by willb <gi...@git.apache.org> on 2014/05/09 23:18:10 UTC
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
GitHub user willb opened a pull request:
https://github.com/apache/spark/pull/717
SPARK-571: forbid return statements in cleaned closures
This patch checks top-level closure arguments to `ClosureCleaner.clean` for `return` statements and raises an exception if it finds any. This is mainly a user-friendliness addition, since programs with return statements in closure arguments will currently fail upon RDD actions with a less-than-intuitive error message.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/willb/spark spark-571
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/717.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #717
----
commit b017c4779a0bfc54f215bc1135d9d23ee9761c20
Author: William Benton <wi...@redhat.com>
Date: 2014-05-09T20:05:49Z
Added a test for SPARK-571
commit 295b6a5960fd9a4509e64b225b0e4b479704978b
Author: William Benton <wi...@redhat.com>
Date: 2014-05-09T21:14:37Z
Forbid return statements in closure arguments.
This commit ensures that ClosureCleaner.clean will identify
toplevel return statements in closures and raise an exception
if they are encountered.
This is intended to fix SPARK-571.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42911882
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42913622
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/717#discussion_r12497542
--- Diff: core/src/test/scala/org/apache/spark/util/ClosureCleanerSuite.scala ---
@@ -21,6 +21,7 @@ import org.scalatest.FunSuite
import org.apache.spark.LocalSparkContext._
import org.apache.spark.SparkContext
+import org.apache.spark.SparkException
--- End diff --
nit: group this with line above (as in the other file)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-43003979
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14946/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42715751
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42911361
I left couple comments on style. Thanks for leading this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42999578
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/717#discussion_r12497536
--- Diff: core/src/test/scala/org/apache/spark/util/ClosureCleanerSuite.scala ---
@@ -108,6 +117,20 @@ class TestClassWithoutFieldAccess {
}
}
+object TestObjectWithBogusReturns {
+ def badClosureWithReturn(v: org.apache.spark.rdd.RDD[Int]): Int = {
--- End diff --
nit: This function is pretty simple. Mind moving it into `run()` directly?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42730477
@willb the re-partition issue is not related to your patch. There was an inadvertent check-in in master that caused it. It has been fixed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/717
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42911307
Ah never mind my quesiton. "return" exits the outer level function instead of the closure.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-43010073
BTW I am merging this only into master (not branch-1.0) since it is too late for changes into Spark core for 1.0 unless they are critical bug fixes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42719784
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14859/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42717201
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by willb <gi...@git.apache.org>.
Github user willb commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-43009698
Thanks, @rxin!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42715752
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14858/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42910436
Why do closures with return fail?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42719783
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42760036
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42715653
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/717#discussion_r12564569
--- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
@@ -181,6 +184,26 @@ private[spark] object ClosureCleaner extends Logging {
}
private[spark]
+class ReturnStatementFinder extends ClassVisitor(ASM4) {
+ override def visitMethod(access: Int, name: String, desc: String,
+ sig: String, exceptions: Array[String]): MethodVisitor = {
+ if(name.contains("apply")) {
--- End diff --
add a space after if (also for the if below)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42717217
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by willb <gi...@git.apache.org>.
Github user willb commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42760025
Thanks for the clarification, @pwendell.
@andrewor14, I've grouped my imports and collapsed that function. Thanks again.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by willb <gi...@git.apache.org>.
Github user willb commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-43000971
@rxin, the latest commit adds such a test.
I'll note that the code I have only finds toplevel `return` statements, so it won't spuriously reject code like that example, but it also won't reject code like this, which is an invalid use of `return`:
```scala
nums.map {x =>
// this return is invalid since it will transfer control outside the closure
val foo = {y: Int => return 2; 1 }
foo(x)
}
```
I thought identifying toplevel `return` statements in closures represented an acceptable tradeoff between complexity and user-friendliness in order to provide better feedback in the common case (but not necessarily to exhaustively flag all bogus closures, no matter how pathological). If this isn't the right tradeoff, I could adapt the code to reject a higher percentage of closures with nonlocal returns.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-43003978
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42913623
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14932/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42760037
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42911723
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by willb <gi...@git.apache.org>.
Github user willb commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42730126
Thanks for the quick feedback, @andrewor14! I'll make those changes. I'll also look in to see if I can figure out why that repartition test is failing; oddly, it's working for me locally.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-43008847
I think the current approach is fine. I'm going to merge this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42999605
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42992463
@willb can you also add a test case where an inline function is using return statement in its closure to make sure that is allowed? e.g.
```scala
sc.parallelize(1 to 100).map { i =>
def foo(): Int = { return 1; 2 }
foo()
}
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/717#discussion_r12564572
--- Diff: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala ---
@@ -181,6 +184,26 @@ private[spark] object ClosureCleaner extends Logging {
}
private[spark]
+class ReturnStatementFinder extends ClassVisitor(ASM4) {
+ override def visitMethod(access: Int, name: String, desc: String,
+ sig: String, exceptions: Array[String]): MethodVisitor = {
+ if(name.contains("apply")) {
+ new MethodVisitor(ASM4) {
+ override def visitTypeInsn(op: Int, tp: String) {
+ if(op == NEW && tp.contains("scala/runtime/NonLocalReturnControl")) {
+ throw new SparkException("Return statements aren't allowed in Spark closures")
+ }
+ }
+ }
+ } else {
+ new MethodVisitor(ASM4) {}
+ }
+ }
+}
+
--- End diff --
just leave one line here for spacing
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by willb <gi...@git.apache.org>.
Github user willb commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42911366
@rxin, yes! (I was firing up a debugger in case you wanted to know exactly where it was returning to.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/717#issuecomment-42715644
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---