You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/03 04:54:03 UTC

[GitHub] [spark] cxzl25 opened a new pull request, #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

cxzl25 opened a new pull request, #38489:
URL: https://github.com/apache/spark/pull/38489

   ### What changes were proposed in this pull request?
   BHJ LeftAnti does not update numOutputRows when codegen is disabled
   
   
   ### Why are the changes needed?
   PR #29104 Only update numOutputRows when codegen is enabled, but there is no numOutputRows when codegen is disabled, and numOutputRows is equal to 0.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   local test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled
URL: https://github.com/apache/spark/pull/38489


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38489:
URL: https://github.com/apache/spark/pull/38489#discussion_r1032059252


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala:
##########
@@ -842,6 +842,29 @@ class SQLMetricsSuite extends SharedSparkSession with SQLMetricsTestUtils
       assert(createTableAsSelect.metrics("numOutputRows").value == 1)
     }
   }
+
+  test("SPARK-41003: BHJ LeftAnti does not update numOutputRows when codegen is disabled") {
+    Seq(true, false).foreach(enableWholeStage => {

Review Comment:
   super nit:
   ```
   Seq(...).foreach { enableWholeStage =>
   
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on PR #38489:
URL: https://github.com/apache/spark/pull/38489#issuecomment-1326113970

   Help review. Thanks. @HyukjinKwon @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on PR #38489:
URL: https://github.com/apache/spark/pull/38489#issuecomment-1305078019

   Help review. Thanks.
   @leanken-zz  @cloud-fan  @wangyum 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38489:
URL: https://github.com/apache/spark/pull/38489#discussion_r1032059432


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala:
##########
@@ -842,6 +842,29 @@ class SQLMetricsSuite extends SharedSparkSession with SQLMetricsTestUtils
       assert(createTableAsSelect.metrics("numOutputRows").value == 1)
     }
   }
+
+  test("SPARK-41003: BHJ LeftAnti does not update numOutputRows when codegen is disabled") {
+    Seq(true, false).foreach(enableWholeStage => {
+      withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> enableWholeStage.toString) {
+        withSQLConf(SQLConf.OPTIMIZE_NULL_AWARE_ANTI_JOIN.key -> "true") {
+          withTable("t1", "t2") {
+            spark.range(4).write.saveAsTable("t1")
+            spark.range(2).write.saveAsTable("t2")
+            val df = sql("SELECT * FROM t1 WHERE id NOT IN (SELECT id FROM t2)")
+            df.collect()
+            val plan = df.queryExecution.executedPlan
+
+            val exchanges = plan.collect {

Review Comment:
   exchanges -> joins?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38489:
URL: https://github.com/apache/spark/pull/38489#issuecomment-1302866545

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #38489:
URL: https://github.com/apache/spark/pull/38489#issuecomment-1326121090

   Can we add a test in `SQLMetricsSuite`? If it's hard we can skip the test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #38489:
URL: https://github.com/apache/spark/pull/38489#issuecomment-1328725410

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38489:
URL: https://github.com/apache/spark/pull/38489#discussion_r1031191650


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala:
##########
@@ -162,6 +162,9 @@ case class BroadcastHashJoinExec(
               // Anti Join: Drop the row on the streamed side if it is a match on the build
               hashed.get(lookupKey) == null
             }
+          }).map(row => {
+            numOutputRows += 1

Review Comment:
   shall we do it inside `.filter` instead of adding `.map`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on PR #38489:
URL: https://github.com/apache/spark/pull/38489#issuecomment-1326175282

   Add UT.
   
   ## Current
   ```
   [info] - SPARK-41003: BHJ LeftAnti does not update numOutputRows when codegen is disabled *** FAILED *** (10 seconds, 124 milliseconds)
   [info]   0 did not equal 2 The query plan metric numOutputRows did not match, expected:2, actual:0 (SQLMetricsTestUtils.scala:270)
   [info]   org.scalatest.exceptions.TestFailedException:
   ```
   ## Fix
   ```
   [info] - SPARK-41003: BHJ LeftAnti does not update numOutputRows when codegen is disabled (9 seconds, 123 milliseconds)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
yaooqinn commented on PR #38489:
URL: https://github.com/apache/spark/pull/38489#issuecomment-1326164612

   > Can we add a test in `SQLMetricsSuite`? If it's hard we can skip the test.
   
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on pull request #38489: [SPARK-41003][SQL] BHJ LeftAnti does not update numOutputRows when codegen is disabled

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on PR #38489:
URL: https://github.com/apache/spark/pull/38489#issuecomment-1301639710

   ## Current
   ### enable codegen
   <img width="547" alt="image" src="https://user-images.githubusercontent.com/3898450/199650431-d6443f45-03f9-489c-b1de-72c619acb37e.png">
   
   ### disable codegen
   <img width="539" alt="image" src="https://user-images.githubusercontent.com/3898450/199650250-da96353a-fb40-4485-bc07-4161c68c79f8.png">
   
   
   ## Fix
   <img width="543" alt="image" src="https://user-images.githubusercontent.com/3898450/199650273-c371e4af-bfb1-463d-9964-f619d3ded7bf.png">
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org