You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by xuanyuanking <gi...@git.apache.org> on 2018/11/06 12:34:38 UTC

[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

GitHub user xuanyuanking opened a pull request:

    https://github.com/apache/spark/pull/22955

    [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCondition

    ## What changes were proposed in this pull request?
    
    As comment in https://github.com/apache/spark/pull/22326#issuecomment-424923967, we test the new added optimizer rule by end-to-end test in python side, need to add suites under `org.apache.spark.sql.catalyst.optimizer` like other optimizer rules.
    
    ## How was this patch tested?
    new added UT


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuanyuanking/spark SPARK-25949

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22955.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22955
    
----
commit 344b516a414107cf5494951ae74886b6f62f3e5a
Author: Yuanjian Li <xy...@...>
Date:   2018-11-06T12:29:26Z

    Add test for PullOutPythonUDFInJoinCondition

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98698/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    **[Test build #98698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98698/testReport)** for PR 22955 at commit [`8d04b4c`](https://github.com/apache/spark/commit/8d04b4c4f084610e8ce8f11590ad4cd537c5952f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r232163738
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.scalatest.Matchers._
    +
    +import org.apache.spark.api.python.PythonEvalType
    +import org.apache.spark.sql.AnalysisException
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.PythonUDF
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.internal.SQLConf._
    +import org.apache.spark.sql.types.BooleanType
    +
    +class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Extract PythonUDF From JoinCondition", Once,
    +        PullOutPythonUDFInJoinCondition) ::
    +      Batch("Check Cartesian Products", Once,
    +        CheckCartesianProducts) :: Nil
    +  }
    +
    +  val testRelationLeft = LocalRelation('a.int, 'b.int)
    +  val testRelationRight = LocalRelation('c.int, 'd.int)
    +
    +  // Dummy python UDF for testing. Unable to execute.
    +  val pythonUDF = PythonUDF("pythonUDF", null,
    +    BooleanType,
    +    Seq.empty,
    +    PythonEvalType.SQL_BATCHED_UDF,
    +    udfDeterministic = true)
    +
    +  val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    +
    +  test("inner join condition with python udf only") {
    +    val query = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = Inner,
    +      condition = Some(pythonUDF))
    +    val expected = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = Inner,
    +      condition = None).where(pythonUDF).analyze
    +
    +    // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
    +    val exception = the [AnalysisException] thrownBy {
    --- End diff --
    
    Thanks, done in 38b1555.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98676/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    **[Test build #98676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98676/testReport)** for PR 22955 at commit [`38b1555`](https://github.com/apache/spark/commit/38b15552995355d5e00186fb2b332928a83d248a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98644/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r232163956
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.scalatest.Matchers._
    +
    +import org.apache.spark.api.python.PythonEvalType
    +import org.apache.spark.sql.AnalysisException
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.PythonUDF
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.internal.SQLConf._
    +import org.apache.spark.sql.types.BooleanType
    +
    +class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Extract PythonUDF From JoinCondition", Once,
    +        PullOutPythonUDFInJoinCondition) ::
    +      Batch("Check Cartesian Products", Once,
    +        CheckCartesianProducts) :: Nil
    +  }
    +
    +  val testRelationLeft = LocalRelation('a.int, 'b.int)
    +  val testRelationRight = LocalRelation('c.int, 'd.int)
    +
    +  // Dummy python UDF for testing. Unable to execute.
    +  val pythonUDF = PythonUDF("pythonUDF", null,
    +    BooleanType,
    +    Seq.empty,
    +    PythonEvalType.SQL_BATCHED_UDF,
    +    udfDeterministic = true)
    +
    +  val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    +
    +  test("inner join condition with python udf only") {
    +    val query = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = Inner,
    +      condition = Some(pythonUDF))
    +    val expected = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = Inner,
    +      condition = None).where(pythonUDF).analyze
    +
    +    // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
    +    val exception = the [AnalysisException] thrownBy {
    +      Optimize.execute(query.analyze)
    +    }
    +    assert(exception.message.startsWith("Detected implicit cartesian product"))
    +
    +    // pull out the python udf while set spark.sql.crossJoin.enabled=true
    +    withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
    +      val optimized = Optimize.execute(query.analyze)
    +      comparePlans(optimized, expected)
    +    }
    +  }
    +
    +  test("left semi join condition with python udf only") {
    +    val query = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = LeftSemi,
    +      condition = Some(pythonUDF))
    +    val expected = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = Inner,
    +      condition = None).where(pythonUDF).select('a, 'b).analyze
    +
    +    // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
    +    val exception = the [AnalysisException] thrownBy {
    +      Optimize.execute(query.analyze)
    +    }
    +    assert(exception.message.startsWith("Detected implicit cartesian product"))
    +
    +    // pull out the python udf while set spark.sql.crossJoin.enabled=true
    +    withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
    +      val optimized = Optimize.execute(query.analyze)
    +      comparePlans(optimized, expected)
    +    }
    +  }
    +
    +  test("python udf with other common condition") {
    --- End diff --
    
    Thanks, add more cases in 38b1555.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    **[Test build #98515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98515/testReport)** for PR 22955 at commit [`344b516`](https://github.com/apache/spark/commit/344b516a414107cf5494951ae74886b6f62f3e5a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class PullOutPythonUDFInJoinConditionSuite extends PlanTest `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4794/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    **[Test build #98515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98515/testReport)** for PR 22955 at commit [`344b516`](https://github.com/apache/spark/commit/344b516a414107cf5494951ae74886b6f62f3e5a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4910/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r232163715
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.scalatest.Matchers._
    +
    +import org.apache.spark.api.python.PythonEvalType
    +import org.apache.spark.sql.AnalysisException
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.PythonUDF
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.internal.SQLConf._
    +import org.apache.spark.sql.types.BooleanType
    +
    +class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Extract PythonUDF From JoinCondition", Once,
    +        PullOutPythonUDFInJoinCondition) ::
    +      Batch("Check Cartesian Products", Once,
    +        CheckCartesianProducts) :: Nil
    +  }
    +
    +  val testRelationLeft = LocalRelation('a.int, 'b.int)
    +  val testRelationRight = LocalRelation('c.int, 'd.int)
    +
    +  // Dummy python UDF for testing. Unable to execute.
    +  val pythonUDF = PythonUDF("pythonUDF", null,
    +    BooleanType,
    +    Seq.empty,
    +    PythonEvalType.SQL_BATCHED_UDF,
    +    udfDeterministic = true)
    +
    +  val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    --- End diff --
    
    Thanks, done in 38b1555.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Thanks for the reply, unnecessary end-to-end tests removed in https://github.com/apache/spark/pull/22326/commits/2b6977de4a3b3489b9c2172a6a8a39831bf1d048, others maybe should be kept? Cause mock python udf in scala side can't 100% same with python side.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r231838895
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.scalatest.Matchers._
    +
    +import org.apache.spark.api.python.PythonEvalType
    +import org.apache.spark.sql.AnalysisException
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.PythonUDF
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.internal.SQLConf._
    +import org.apache.spark.sql.types.BooleanType
    +
    +class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Extract PythonUDF From JoinCondition", Once,
    +        PullOutPythonUDFInJoinCondition) ::
    +      Batch("Check Cartesian Products", Once,
    +        CheckCartesianProducts) :: Nil
    +  }
    +
    +  val testRelationLeft = LocalRelation('a.int, 'b.int)
    +  val testRelationRight = LocalRelation('c.int, 'd.int)
    +
    +  // Dummy python UDF for testing. Unable to execute.
    +  val pythonUDF = PythonUDF("pythonUDF", null,
    +    BooleanType,
    +    Seq.empty,
    +    PythonEvalType.SQL_BATCHED_UDF,
    +    udfDeterministic = true)
    +
    +  val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    --- End diff --
    
    nit: `unsupportedJoinTypes`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98515/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    **[Test build #98644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98644/testReport)** for PR 22955 at commit [`38b1555`](https://github.com/apache/spark/commit/38b15552995355d5e00186fb2b332928a83d248a).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r232447827
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -50,20 +50,11 @@ class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
         PythonEvalType.SQL_BATCHED_UDF,
         udfDeterministic = true)
     
    -  val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    -
    -  test("inner join condition with python udf only") {
    -    val query = testRelationLeft.join(
    -      testRelationRight,
    -      joinType = Inner,
    -      condition = Some(pythonUDF))
    -    val expected = testRelationLeft.join(
    -      testRelationRight,
    -      joinType = Inner,
    -      condition = None).where(pythonUDF).analyze
    +  val unsupportedJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
     
    +  private def comparePlansWithConf(query: LogicalPlan, expected: LogicalPlan): Unit = {
    --- End diff --
    
    better naming? what does it mean `WithConf`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4882/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    **[Test build #98676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98676/testReport)** for PR 22955 at commit [`38b1555`](https://github.com/apache/spark/commit/38b15552995355d5e00186fb2b332928a83d248a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r232488956
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -50,20 +50,11 @@ class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
         PythonEvalType.SQL_BATCHED_UDF,
         udfDeterministic = true)
     
    -  val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    -
    -  test("inner join condition with python udf only") {
    -    val query = testRelationLeft.join(
    -      testRelationRight,
    -      joinType = Inner,
    -      condition = Some(pythonUDF))
    -    val expected = testRelationLeft.join(
    -      testRelationRight,
    -      joinType = Inner,
    -      condition = None).where(pythonUDF).analyze
    +  val unsupportedJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
     
    +  private def comparePlansWithConf(query: LogicalPlan, expected: LogicalPlan): Unit = {
    --- End diff --
    
    how about `comparePlanWithCrossJoinEnable`? Just afraid it's too long at first, any advise :) Thanks. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r231839453
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.scalatest.Matchers._
    +
    +import org.apache.spark.api.python.PythonEvalType
    +import org.apache.spark.sql.AnalysisException
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.PythonUDF
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.internal.SQLConf._
    +import org.apache.spark.sql.types.BooleanType
    +
    +class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Extract PythonUDF From JoinCondition", Once,
    +        PullOutPythonUDFInJoinCondition) ::
    +      Batch("Check Cartesian Products", Once,
    +        CheckCartesianProducts) :: Nil
    +  }
    +
    +  val testRelationLeft = LocalRelation('a.int, 'b.int)
    +  val testRelationRight = LocalRelation('c.int, 'd.int)
    +
    +  // Dummy python UDF for testing. Unable to execute.
    +  val pythonUDF = PythonUDF("pythonUDF", null,
    +    BooleanType,
    +    Seq.empty,
    +    PythonEvalType.SQL_BATCHED_UDF,
    +    udfDeterministic = true)
    +
    +  val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    +
    +  test("inner join condition with python udf only") {
    +    val query = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = Inner,
    +      condition = Some(pythonUDF))
    +    val expected = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = Inner,
    +      condition = None).where(pythonUDF).analyze
    +
    +    // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
    +    val exception = the [AnalysisException] thrownBy {
    --- End diff --
    
    nit: `intercept` is more widespread in the codebase, we can maybe use that for consistency..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4927/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r231840717
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.scalatest.Matchers._
    +
    +import org.apache.spark.api.python.PythonEvalType
    +import org.apache.spark.sql.AnalysisException
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.PythonUDF
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.internal.SQLConf._
    +import org.apache.spark.sql.types.BooleanType
    +
    +class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Extract PythonUDF From JoinCondition", Once,
    +        PullOutPythonUDFInJoinCondition) ::
    +      Batch("Check Cartesian Products", Once,
    +        CheckCartesianProducts) :: Nil
    +  }
    +
    +  val testRelationLeft = LocalRelation('a.int, 'b.int)
    +  val testRelationRight = LocalRelation('c.int, 'd.int)
    +
    +  // Dummy python UDF for testing. Unable to execute.
    +  val pythonUDF = PythonUDF("pythonUDF", null,
    +    BooleanType,
    +    Seq.empty,
    +    PythonEvalType.SQL_BATCHED_UDF,
    +    udfDeterministic = true)
    +
    +  val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    +
    +  test("inner join condition with python udf only") {
    +    val query = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = Inner,
    +      condition = Some(pythonUDF))
    +    val expected = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = Inner,
    +      condition = None).where(pythonUDF).analyze
    +
    +    // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
    +    val exception = the [AnalysisException] thrownBy {
    +      Optimize.execute(query.analyze)
    +    }
    +    assert(exception.message.startsWith("Detected implicit cartesian product"))
    +
    +    // pull out the python udf while set spark.sql.crossJoin.enabled=true
    +    withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
    +      val optimized = Optimize.execute(query.analyze)
    +      comparePlans(optimized, expected)
    +    }
    +  }
    +
    +  test("left semi join condition with python udf only") {
    +    val query = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = LeftSemi,
    +      condition = Some(pythonUDF))
    +    val expected = testRelationLeft.join(
    +      testRelationRight,
    +      joinType = Inner,
    +      condition = None).where(pythonUDF).select('a, 'b).analyze
    +
    +    // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
    +    val exception = the [AnalysisException] thrownBy {
    +      Optimize.execute(query.analyze)
    +    }
    +    assert(exception.message.startsWith("Detected implicit cartesian product"))
    +
    +    // pull out the python udf while set spark.sql.crossJoin.enabled=true
    +    withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
    +      val optimized = Optimize.execute(query.analyze)
    +      comparePlans(optimized, expected)
    +    }
    +  }
    +
    +  test("python udf with other common condition") {
    --- End diff --
    
    shall we add more cases like this for `Or` instead of `And`? And with several UDF/other conditions? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22955


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r232447937
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -0,0 +1,171 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.scalatest.Matchers._
    +
    +import org.apache.spark.api.python.PythonEvalType
    +import org.apache.spark.sql.AnalysisException
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.PythonUDF
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.internal.SQLConf._
    +import org.apache.spark.sql.types.BooleanType
    +
    +class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Extract PythonUDF From JoinCondition", Once,
    +        PullOutPythonUDFInJoinCondition) ::
    +      Batch("Check Cartesian Products", Once,
    +        CheckCartesianProducts) :: Nil
    +  }
    +
    +  val testRelationLeft = LocalRelation('a.int, 'b.int)
    +  val testRelationRight = LocalRelation('c.int, 'd.int)
    +
    +  // Dummy python UDF for testing. Unable to execute.
    +  val pythonUDF = PythonUDF("pythonUDF", null,
    +    BooleanType,
    +    Seq.empty,
    +    PythonEvalType.SQL_BATCHED_UDF,
    +    udfDeterministic = true)
    +
    +  val unsupportedJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    +
    +  private def comparePlansWithConf(query: LogicalPlan, expected: LogicalPlan): Unit = {
    +    // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
    +    val exception = intercept[AnalysisException] {
    +      Optimize.execute(query.analyze)
    +    }
    +    assert(exception.message.startsWith("Detected implicit cartesian product"))
    +
    +    // pull out the python udf while set spark.sql.crossJoin.enabled=true
    +    withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
    +      val optimized = Optimize.execute(query.analyze)
    +      comparePlans(optimized, expected)
    +    }
    +  }
    +
    +  test("inner join condition with python udf only") {
    --- End diff --
    
    sorry, probably I was not clear enough in my previous comment. This UT and the following differ only for the join type. We can dedup them by doing something like:
    
    ```
    Seq(Inner, LeftSemi).foreach { joinType =>
      test(...) { ...}
    }
    ```
    
    PS nit: maybe we can also define a new `val supportedJoinTypes = Seq(Inner, LeftSemi)`...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    shall we also remove the end-to-end tests which are now not needed anymore?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    cc @cloud-fan @mgaido91 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Thanks @mgaido91 @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    **[Test build #98644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98644/testReport)** for PR 22955 at commit [`38b1555`](https://github.com/apache/spark/commit/38b15552995355d5e00186fb2b332928a83d248a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r231840088
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.scalatest.Matchers._
    +
    +import org.apache.spark.api.python.PythonEvalType
    +import org.apache.spark.sql.AnalysisException
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.PythonUDF
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.internal.SQLConf._
    +import org.apache.spark.sql.types.BooleanType
    +
    +class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Extract PythonUDF From JoinCondition", Once,
    +        PullOutPythonUDFInJoinCondition) ::
    +      Batch("Check Cartesian Products", Once,
    +        CheckCartesianProducts) :: Nil
    +  }
    +
    +  val testRelationLeft = LocalRelation('a.int, 'b.int)
    +  val testRelationRight = LocalRelation('c.int, 'd.int)
    +
    +  // Dummy python UDF for testing. Unable to execute.
    +  val pythonUDF = PythonUDF("pythonUDF", null,
    +    BooleanType,
    +    Seq.empty,
    +    PythonEvalType.SQL_BATCHED_UDF,
    +    udfDeterministic = true)
    +
    +  val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    +
    +  test("inner join condition with python udf only") {
    --- End diff --
    
    shall we dedup the code of this and the next testcase? they differ only by the join type...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r232489060
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -0,0 +1,171 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.scalatest.Matchers._
    +
    +import org.apache.spark.api.python.PythonEvalType
    +import org.apache.spark.sql.AnalysisException
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.PythonUDF
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.internal.SQLConf._
    +import org.apache.spark.sql.types.BooleanType
    +
    +class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Extract PythonUDF From JoinCondition", Once,
    +        PullOutPythonUDFInJoinCondition) ::
    +      Batch("Check Cartesian Products", Once,
    +        CheckCartesianProducts) :: Nil
    +  }
    +
    +  val testRelationLeft = LocalRelation('a.int, 'b.int)
    +  val testRelationRight = LocalRelation('c.int, 'd.int)
    +
    +  // Dummy python UDF for testing. Unable to execute.
    +  val pythonUDF = PythonUDF("pythonUDF", null,
    +    BooleanType,
    +    Seq.empty,
    +    PythonEvalType.SQL_BATCHED_UDF,
    +    udfDeterministic = true)
    +
    +  val unsupportedJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    +
    +  private def comparePlansWithConf(query: LogicalPlan, expected: LogicalPlan): Unit = {
    +    // AnalysisException thrown by CheckCartesianProducts while spark.sql.crossJoin.enabled=false
    +    val exception = intercept[AnalysisException] {
    +      Optimize.execute(query.analyze)
    +    }
    +    assert(exception.message.startsWith("Detected implicit cartesian product"))
    +
    +    // pull out the python udf while set spark.sql.crossJoin.enabled=true
    +    withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {
    +      val optimized = Optimize.execute(query.analyze)
    +      comparePlans(optimized, expected)
    +    }
    +  }
    +
    +  test("inner join condition with python udf only") {
    --- End diff --
    
    I'm sorry for lacking of comments to your previous comment `they differ only by the join type...`, they differ not only the type, but also the expected plan.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFInJoinCo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22955
  
    **[Test build #98698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98698/testReport)** for PR 22955 at commit [`8d04b4c`](https://github.com/apache/spark/commit/8d04b4c4f084610e8ce8f11590ad4cd537c5952f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22955: [SPARK-25949][SQL] Add test for PullOutPythonUDFI...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22955#discussion_r232163787
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullOutPythonUDFInJoinConditionSuite.scala ---
    @@ -0,0 +1,128 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.scalatest.Matchers._
    +
    +import org.apache.spark.api.python.PythonEvalType
    +import org.apache.spark.sql.AnalysisException
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.PythonUDF
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.internal.SQLConf._
    +import org.apache.spark.sql.types.BooleanType
    +
    +class PullOutPythonUDFInJoinConditionSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Extract PythonUDF From JoinCondition", Once,
    +        PullOutPythonUDFInJoinCondition) ::
    +      Batch("Check Cartesian Products", Once,
    +        CheckCartesianProducts) :: Nil
    +  }
    +
    +  val testRelationLeft = LocalRelation('a.int, 'b.int)
    +  val testRelationRight = LocalRelation('c.int, 'd.int)
    +
    +  // Dummy python UDF for testing. Unable to execute.
    +  val pythonUDF = PythonUDF("pythonUDF", null,
    +    BooleanType,
    +    Seq.empty,
    +    PythonEvalType.SQL_BATCHED_UDF,
    +    udfDeterministic = true)
    +
    +  val notSupportJoinTypes = Seq(LeftOuter, RightOuter, FullOuter, LeftAnti)
    +
    +  test("inner join condition with python udf only") {
    --- End diff --
    
    Sorry for this, done in 38b1555.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org