You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by xuanyuanking <gi...@git.apache.org> on 2018/11/06 12:34:38 UTC
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
GitHub user xuanyuanking opened a pull request:
https://github.com/apache/spark/pull/22955
[SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCondition
## What changes were proposed in this pull request?
As comment in https://github.com/apache/spark/pull/22326#issuecomment-424923967, we test the new added optimizer rule by end-to-end test in python side, need to add suites under `org.apache.spark.sql.catalyst.optimizer` like other optimizer rules.
## How was this patch tested?
new added UT
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xuanyuanking/spark SPARK-25949
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22955.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22955
----
commit 344b516a414107cf5494951ae74886b6f62f3e5a
Author: Yuanjian Li <xy...@...>
Date: 2018-11-06T12:29:26Z
Add test for PullOutPythonUDFInJoinCondition
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98698/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22955
**[Test build #98698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98698/testReport)** for PR 22955 at commit [`8d04b4c`](https://github.com/apache/spark/commit/8d04b4c4f084610e8ce8f11590ad4cd537c5952f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r232163738
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers._
+
+import org.apache.spark.api.python.PythonEvalType
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.PythonUDF
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.internal.SQLConf._
+import org.apache.spark.sql.types.BooleanType
+
+class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
+
+ object Optimize extends RuleExecutor[LogicalPlan] {
+ val batches =
+ Batch("Extract PythonUDF From JoinCondition", Once,
+ PullOutPythonUDFInJoinCondition) ::
+ Batch("Check Cartesian Products", Once,
+ CheckCartesianProducts) :: Nil
+ }
+
+ val testRelationLeft = LocalRelation('a.int, 'b.int)
+ val testRelationRight = LocalRelation('c.int, 'd.int)
+
+ // Dummy python UDF for testing. Unable to execute.
+ val pythonUDF = PythonUDF("pythonUDF", null,
+ BooleanType,
+ Seq.empty,
+ PythonEvalType.SQL_BATCHED_UDF,
+ udfDeterministic = true)
+
+ val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
+
+ test("inner join condition with python udf only") {
+ val query = testRelationLeft.join(
+ testRelationRight,
+ joinType = Inner,
+ condition = Some(pythonUDF))
+ val expected = testRelationLeft.join(
+ testRelationRight,
+ joinType = Inner,
+ condition = None).where(pythonUDF).analyze
+
+ // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
+ val exception = the [AnalysisException] thrownBy {
--- End diff --
Thanks, done in 38b1555.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98676/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22955
**[Test build #98676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98676/testReport)** for PR 22955 at commit [`38b1555`](https://github.com/apache/spark/commit/38b15552995355d5e00186fb2b332928a83d248a).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98644/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r232163956
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers._
+
+import org.apache.spark.api.python.PythonEvalType
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.PythonUDF
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.internal.SQLConf._
+import org.apache.spark.sql.types.BooleanType
+
+class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
+
+ object Optimize extends RuleExecutor[LogicalPlan] {
+ val batches =
+ Batch("Extract PythonUDF From JoinCondition", Once,
+ PullOutPythonUDFInJoinCondition) ::
+ Batch("Check Cartesian Products", Once,
+ CheckCartesianProducts) :: Nil
+ }
+
+ val testRelationLeft = LocalRelation('a.int, 'b.int)
+ val testRelationRight = LocalRelation('c.int, 'd.int)
+
+ // Dummy python UDF for testing. Unable to execute.
+ val pythonUDF = PythonUDF("pythonUDF", null,
+ BooleanType,
+ Seq.empty,
+ PythonEvalType.SQL_BATCHED_UDF,
+ udfDeterministic = true)
+
+ val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
+
+ test("inner join condition with python udf only") {
+ val query = testRelationLeft.join(
+ testRelationRight,
+ joinType = Inner,
+ condition = Some(pythonUDF))
+ val expected = testRelationLeft.join(
+ testRelationRight,
+ joinType = Inner,
+ condition = None).where(pythonUDF).analyze
+
+ // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
+ val exception = the [AnalysisException] thrownBy {
+ Optimize.execute(query.analyze)
+ }
+ assert(exception.message.startsWith("Detected implicit cartesian product"))
+
+ // pull out the python udf while set spark.sql.crossJoin.enabled=true
+ withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
+ val optimized = Optimize.execute(query.analyze)
+ comparePlans(optimized, expected)
+ }
+ }
+
+ test("left semi join condition with python udf only") {
+ val query = testRelationLeft.join(
+ testRelationRight,
+ joinType = LeftSemi,
+ condition = Some(pythonUDF))
+ val expected = testRelationLeft.join(
+ testRelationRight,
+ joinType = Inner,
+ condition = None).where(pythonUDF).select('a, 'b).analyze
+
+ // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
+ val exception = the [AnalysisException] thrownBy {
+ Optimize.execute(query.analyze)
+ }
+ assert(exception.message.startsWith("Detected implicit cartesian product"))
+
+ // pull out the python udf while set spark.sql.crossJoin.enabled=true
+ withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
+ val optimized = Optimize.execute(query.analyze)
+ comparePlans(optimized, expected)
+ }
+ }
+
+ test("python udf with other common condition") {
--- End diff --
Thanks, add more cases in 38b1555.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22955
**[Test build #98515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98515/testReport)** for PR 22955 at commit [`344b516`](https://github.com/apache/spark/commit/344b516a414107cf5494951ae74886b6f62f3e5a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class PullOutPythonUDFInJoinConditionSuite extends PlanTest `
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4794/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22955
**[Test build #98515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98515/testReport)** for PR 22955 at commit [`344b516`](https://github.com/apache/spark/commit/344b516a414107cf5494951ae74886b6f62f3e5a).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4910/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r232163715
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers._
+
+import org.apache.spark.api.python.PythonEvalType
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.PythonUDF
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.internal.SQLConf._
+import org.apache.spark.sql.types.BooleanType
+
+class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
+
+ object Optimize extends RuleExecutor[LogicalPlan] {
+ val batches =
+ Batch("Extract PythonUDF From JoinCondition", Once,
+ PullOutPythonUDFInJoinCondition) ::
+ Batch("Check Cartesian Products", Once,
+ CheckCartesianProducts) :: Nil
+ }
+
+ val testRelationLeft = LocalRelation('a.int, 'b.int)
+ val testRelationRight = LocalRelation('c.int, 'd.int)
+
+ // Dummy python UDF for testing. Unable to execute.
+ val pythonUDF = PythonUDF("pythonUDF", null,
+ BooleanType,
+ Seq.empty,
+ PythonEvalType.SQL_BATCHED_UDF,
+ udfDeterministic = true)
+
+ val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
--- End diff --
Thanks, done in 38b1555.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/22955
Thanks for the reply, unnecessary end-to-end tests removed in https://github.com/apache/spark/pull/22326/commits/2b6977de4a3b3489b9c2172a6a8a39831bf1d048, others maybe should be kept? Cause mock python udf in scala side can't 100% same with python side.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22955
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r231838895
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers._
+
+import org.apache.spark.api.python.PythonEvalType
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.PythonUDF
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.internal.SQLConf._
+import org.apache.spark.sql.types.BooleanType
+
+class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
+
+ object Optimize extends RuleExecutor[LogicalPlan] {
+ val batches =
+ Batch("Extract PythonUDF From JoinCondition", Once,
+ PullOutPythonUDFInJoinCondition) ::
+ Batch("Check Cartesian Products", Once,
+ CheckCartesianProducts) :: Nil
+ }
+
+ val testRelationLeft = LocalRelation('a.int, 'b.int)
+ val testRelationRight = LocalRelation('c.int, 'd.int)
+
+ // Dummy python UDF for testing. Unable to execute.
+ val pythonUDF = PythonUDF("pythonUDF", null,
+ BooleanType,
+ Seq.empty,
+ PythonEvalType.SQL_BATCHED_UDF,
+ udfDeterministic = true)
+
+ val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
--- End diff --
nit: `unsupportedJoinTypes`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98515/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22955
**[Test build #98644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98644/testReport)** for PR 22955 at commit [`38b1555`](https://github.com/apache/spark/commit/38b15552995355d5e00186fb2b332928a83d248a).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r232447827
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -50,20 +50,11 @@ class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
PythonEvalType.SQL_BATCHED_UDF,
udfDeterministic = true)
- val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
-
- test("inner join condition with python udf only") {
- val query = testRelationLeft.join(
- testRelationRight,
- joinType = Inner,
- condition = Some(pythonUDF))
- val expected = testRelationLeft.join(
- testRelationRight,
- joinType = Inner,
- condition = None).where(pythonUDF).analyze
+ val unsupportedJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
+ private def comparePlansWithConf(query: LogicalPlan, expected: LogicalPlan): Unit = {
--- End diff --
better naming? what does it mean `WithConf`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4882/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22955
**[Test build #98676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98676/testReport)** for PR 22955 at commit [`38b1555`](https://github.com/apache/spark/commit/38b15552995355d5e00186fb2b332928a83d248a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r232488956
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -50,20 +50,11 @@ class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
PythonEvalType.SQL_BATCHED_UDF,
udfDeterministic = true)
- val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
-
- test("inner join condition with python udf only") {
- val query = testRelationLeft.join(
- testRelationRight,
- joinType = Inner,
- condition = Some(pythonUDF))
- val expected = testRelationLeft.join(
- testRelationRight,
- joinType = Inner,
- condition = None).where(pythonUDF).analyze
+ val unsupportedJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
+ private def comparePlansWithConf(query: LogicalPlan, expected: LogicalPlan): Unit = {
--- End diff --
how about `comparePlanWithCrossJoinEnable`? Just afraid it's too long at first, any advise :) Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r231839453
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers._
+
+import org.apache.spark.api.python.PythonEvalType
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.PythonUDF
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.internal.SQLConf._
+import org.apache.spark.sql.types.BooleanType
+
+class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
+
+ object Optimize extends RuleExecutor[LogicalPlan] {
+ val batches =
+ Batch("Extract PythonUDF From JoinCondition", Once,
+ PullOutPythonUDFInJoinCondition) ::
+ Batch("Check Cartesian Products", Once,
+ CheckCartesianProducts) :: Nil
+ }
+
+ val testRelationLeft = LocalRelation('a.int, 'b.int)
+ val testRelationRight = LocalRelation('c.int, 'd.int)
+
+ // Dummy python UDF for testing. Unable to execute.
+ val pythonUDF = PythonUDF("pythonUDF", null,
+ BooleanType,
+ Seq.empty,
+ PythonEvalType.SQL_BATCHED_UDF,
+ udfDeterministic = true)
+
+ val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
+
+ test("inner join condition with python udf only") {
+ val query = testRelationLeft.join(
+ testRelationRight,
+ joinType = Inner,
+ condition = Some(pythonUDF))
+ val expected = testRelationLeft.join(
+ testRelationRight,
+ joinType = Inner,
+ condition = None).where(pythonUDF).analyze
+
+ // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
+ val exception = the [AnalysisException] thrownBy {
--- End diff --
nit: `intercept` is more widespread in the codebase, we can maybe use that for consistency..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22955
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4927/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r231840717
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers._
+
+import org.apache.spark.api.python.PythonEvalType
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.PythonUDF
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.internal.SQLConf._
+import org.apache.spark.sql.types.BooleanType
+
+class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
+
+ object Optimize extends RuleExecutor[LogicalPlan] {
+ val batches =
+ Batch("Extract PythonUDF From JoinCondition", Once,
+ PullOutPythonUDFInJoinCondition) ::
+ Batch("Check Cartesian Products", Once,
+ CheckCartesianProducts) :: Nil
+ }
+
+ val testRelationLeft = LocalRelation('a.int, 'b.int)
+ val testRelationRight = LocalRelation('c.int, 'd.int)
+
+ // Dummy python UDF for testing. Unable to execute.
+ val pythonUDF = PythonUDF("pythonUDF", null,
+ BooleanType,
+ Seq.empty,
+ PythonEvalType.SQL_BATCHED_UDF,
+ udfDeterministic = true)
+
+ val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
+
+ test("inner join condition with python udf only") {
+ val query = testRelationLeft.join(
+ testRelationRight,
+ joinType = Inner,
+ condition = Some(pythonUDF))
+ val expected = testRelationLeft.join(
+ testRelationRight,
+ joinType = Inner,
+ condition = None).where(pythonUDF).analyze
+
+ // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
+ val exception = the [AnalysisException] thrownBy {
+ Optimize.execute(query.analyze)
+ }
+ assert(exception.message.startsWith("Detected implicit cartesian product"))
+
+ // pull out the python udf while set spark.sql.crossJoin.enabled=true
+ withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
+ val optimized = Optimize.execute(query.analyze)
+ comparePlans(optimized, expected)
+ }
+ }
+
+ test("left semi join condition with python udf only") {
+ val query = testRelationLeft.join(
+ testRelationRight,
+ joinType = LeftSemi,
+ condition = Some(pythonUDF))
+ val expected = testRelationLeft.join(
+ testRelationRight,
+ joinType = Inner,
+ condition = None).where(pythonUDF).select('a, 'b).analyze
+
+ // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
+ val exception = the [AnalysisException] thrownBy {
+ Optimize.execute(query.analyze)
+ }
+ assert(exception.message.startsWith("Detected implicit cartesian product"))
+
+ // pull out the python udf while set spark.sql.crossJoin.enabled=true
+ withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
+ val optimized = Optimize.execute(query.analyze)
+ comparePlans(optimized, expected)
+ }
+ }
+
+ test("python udf with other common condition") {
--- End diff --
shall we add more cases like this for `Or` instead of `And`? And with several UDF/other conditions? Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22955
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r232447937
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers._
+
+import org.apache.spark.api.python.PythonEvalType
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.PythonUDF
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.internal.SQLConf._
+import org.apache.spark.sql.types.BooleanType
+
+class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
+
+ object Optimize extends RuleExecutor[LogicalPlan] {
+ val batches =
+ Batch("Extract PythonUDF From JoinCondition", Once,
+ PullOutPythonUDFInJoinCondition) ::
+ Batch("Check Cartesian Products", Once,
+ CheckCartesianProducts) :: Nil
+ }
+
+ val testRelationLeft = LocalRelation('a.int, 'b.int)
+ val testRelationRight = LocalRelation('c.int, 'd.int)
+
+ // Dummy python UDF for testing. Unable to execute.
+ val pythonUDF = PythonUDF("pythonUDF", null,
+ BooleanType,
+ Seq.empty,
+ PythonEvalType.SQL_BATCHED_UDF,
+ udfDeterministic = true)
+
+ val unsupportedJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
+
+ private def comparePlansWithConf(query: LogicalPlan, expected: LogicalPlan): Unit = {
+ // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
+ val exception = intercept[AnalysisException] {
+ Optimize.execute(query.analyze)
+ }
+ assert(exception.message.startsWith("Detected implicit cartesian product"))
+
+ // pull out the python udf while set spark.sql.crossJoin.enabled=true
+ withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
+ val optimized = Optimize.execute(query.analyze)
+ comparePlans(optimized, expected)
+ }
+ }
+
+ test("inner join condition with python udf only") {
--- End diff --
sorry, probably I was not clear enough in my previous comment. This UT and the following differ only for the join type. We can dedup them by doing something like:
```
Seq(Inner, LeftSemi).foreach { joinType =>
test(...) { ...}
}
```
PS nit: maybe we can also define a new `val supportedJoinTypes = Seq(Inner, LeftSemi)`...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22955
shall we also remove the end-to-end tests which are now not needed anymore?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/22955
cc @cloud-fan @mgaido91
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/22955
Thanks @mgaido91 @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22955
**[Test build #98644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98644/testReport)** for PR 22955 at commit [`38b1555`](https://github.com/apache/spark/commit/38b15552995355d5e00186fb2b332928a83d248a).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r231840088
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers._
+
+import org.apache.spark.api.python.PythonEvalType
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.PythonUDF
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.internal.SQLConf._
+import org.apache.spark.sql.types.BooleanType
+
+class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
+
+ object Optimize extends RuleExecutor[LogicalPlan] {
+ val batches =
+ Batch("Extract PythonUDF From JoinCondition", Once,
+ PullOutPythonUDFInJoinCondition) ::
+ Batch("Check Cartesian Products", Once,
+ CheckCartesianProducts) :: Nil
+ }
+
+ val testRelationLeft = LocalRelation('a.int, 'b.int)
+ val testRelationRight = LocalRelation('c.int, 'd.int)
+
+ // Dummy python UDF for testing. Unable to execute.
+ val pythonUDF = PythonUDF("pythonUDF", null,
+ BooleanType,
+ Seq.empty,
+ PythonEvalType.SQL_BATCHED_UDF,
+ udfDeterministic = true)
+
+ val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
+
+ test("inner join condition with python udf only") {
--- End diff --
shall we dedup the code of this and the next testcase? they differ only by the join type...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r232489060
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers._
+
+import org.apache.spark.api.python.PythonEvalType
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.PythonUDF
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.internal.SQLConf._
+import org.apache.spark.sql.types.BooleanType
+
+class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
+
+ object Optimize extends RuleExecutor[LogicalPlan] {
+ val batches =
+ Batch("Extract PythonUDF From JoinCondition", Once,
+ PullOutPythonUDFInJoinCondition) ::
+ Batch("Check Cartesian Products", Once,
+ CheckCartesianProducts) :: Nil
+ }
+
+ val testRelationLeft = LocalRelation('a.int, 'b.int)
+ val testRelationRight = LocalRelation('c.int, 'd.int)
+
+ // Dummy python UDF for testing. Unable to execute.
+ val pythonUDF = PythonUDF("pythonUDF", null,
+ BooleanType,
+ Seq.empty,
+ PythonEvalType.SQL_BATCHED_UDF,
+ udfDeterministic = true)
+
+ val unsupportedJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
+
+ private def comparePlansWithConf(query: LogicalPlan, expected: LogicalPlan): Unit = {
+ // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
+ val exception = intercept[AnalysisException] {
+ Optimize.execute(query.analyze)
+ }
+ assert(exception.message.startsWith("Detected implicit cartesian product"))
+
+ // pull out the python udf while set spark.sql.crossJoin.enabled=true
+ withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
+ val optimized = Optimize.execute(query.analyze)
+ comparePlans(optimized, expected)
+ }
+ }
+
+ test("inner join condition with python udf only") {
--- End diff --
I'm sorry for lacking of comments to your previous comment `they differ only by the join type...`, they differ not only the type, but also the expected plan.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22955
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22955
**[Test build #98698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98698/testReport)** for PR 22955 at commit [`8d04b4c`](https://github.com/apache/spark/commit/8d04b4c4f084610e8ce8f11590ad4cd537c5952f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22955#discussion_r232163787
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.scalatest.Matchers._
+
+import org.apache.spark.api.python.PythonEvalType
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.PythonUDF
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.internal.SQLConf._
+import org.apache.spark.sql.types.BooleanType
+
+class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
+
+ object Optimize extends RuleExecutor[LogicalPlan] {
+ val batches =
+ Batch("Extract PythonUDF From JoinCondition", Once,
+ PullOutPythonUDFInJoinCondition) ::
+ Batch("Check Cartesian Products", Once,
+ CheckCartesianProducts) :: Nil
+ }
+
+ val testRelationLeft = LocalRelation('a.int, 'b.int)
+ val testRelationRight = LocalRelation('c.int, 'd.int)
+
+ // Dummy python UDF for testing. Unable to execute.
+ val pythonUDF = PythonUDF("pythonUDF", null,
+ BooleanType,
+ Seq.empty,
+ PythonEvalType.SQL_BATCHED_UDF,
+ udfDeterministic = true)
+
+ val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
+
+ test("inner join condition with python udf only") {
--- End diff --
Sorry for this, done in 38b1555.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org