You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/01/06 13:28:06 UTC

[GitHub] [spark] wangyum opened a new pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

wangyum opened a new pull request #31068:
URL: https://github.com/apache/spark/pull/31068


   ### What changes were proposed in this pull request?
   
   This pr add row count to `Union` operator when CBO enabled.
   ```scala
   spark.sql("CREATE TABLE t1 USING parquet AS SELECT id FROM RANGE(10)")
   spark.sql("CREATE TABLE t2 USING parquet AS SELECT id FROM RANGE(10)")
   spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS")
   spark.sql("ANALYZE TABLE t2 COMPUTE STATISTICS FOR ALL COLUMNS")
   spark.sql("set spark.sql.cbo.enabled=true")
   spark.sql("SELECT * FROM t1 UNION ALL SELECT * FROM t2").explain("cost")
   ```
   
   Before this pr:
   ```
   == Optimized Logical Plan ==
   Union false, false, Statistics(sizeInBytes=320.0 B)
   :- Relation[id#5880L] parquet, Statistics(sizeInBytes=160.0 B, rowCount=10)
   +- Relation[id#5881L] parquet, Statistics(sizeInBytes=160.0 B, rowCount=10)
   ```
   
   After this pr:
   ```
   == Optimized Logical Plan ==
   Union false, false, Statistics(sizeInBytes=320.0 B, rowCount=20)
   :- Relation[id#2138L] parquet, Statistics(sizeInBytes=160.0 B, rowCount=10)
   +- Relation[id#2139L] parquet, Statistics(sizeInBytes=160.0 B, rowCount=10)
   ```
   
   ### Why are the changes needed?
   
   Improve Statistics.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755775198


   Looks fine except for the @viirya comment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755897628


   Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755895095


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38358/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755871061


   **[Test build #133770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133770/testReport)** for PR 31068 at commit [`3c5af90`](https://github.com/apache/spark/commit/3c5af902a3b33c6f2894d4bbb49073f77cc29f79).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755319463


   **[Test build #133746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133746/testReport)** for PR 31068 at commit [`c0dbbe4`](https://github.com/apache/spark/commit/c0dbbe4ec5e56bdbe5281a98f6756471f1870025).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755963351


   **[Test build #133770 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133770/testReport)** for PR 31068 at commit [`3c5af90`](https://github.com/apache/spark/commit/3c5af902a3b33c6f2894d4bbb49073f77cc29f79).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755353405


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38334/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755465228


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133746/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31068:
URL: https://github.com/apache/spark/pull/31068#discussion_r553117889



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala
##########
@@ -141,6 +141,16 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase {
       expectedStatsCboOff = Statistics(sizeInBytes = 120))
   }
 
+  test("SPARK-34031: Union operator missing rowCount when enable CBO") {
+    val union = Union(plan :: plan :: plan :: Nil)
+    val childrenSize = union.children.size
+    val sizeInBytes = plan.size.get * childrenSize
+    val rowCount = Some(plan.rowCount * childrenSize)
+    checkStats(union,
+      expectedStatsCboOn = Statistics(sizeInBytes = sizeInBytes, rowCount = rowCount),

Review comment:
       ```suggestion
       checkStats(
         union,
         expectedStatsCboOn = Statistics(sizeInBytes = sizeInBytes, rowCount = rowCount),
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755986591


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133770/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755464250


   Seems needing to update query plan files.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755894728


   @wangyum, sorry but can you push an empty commit to retrigger the GA build? I would like to keep the result of the test failure because it looks like a flaky test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755871061


   **[Test build #133770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133770/testReport)** for PR 31068 at commit [`3c5af90`](https://github.com/apache/spark/commit/3c5af902a3b33c6f2894d4bbb49073f77cc29f79).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755437909


   **[Test build #133746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133746/testReport)** for PR 31068 at commit [`c0dbbe4`](https://github.com/apache/spark/commit/c0dbbe4ec5e56bdbe5281a98f6756471f1870025).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755986591


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133770/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755908233


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38358/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755353405


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38334/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755319463


   **[Test build #133746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133746/testReport)** for PR 31068 at commit [`c0dbbe4`](https://github.com/apache/spark/commit/c0dbbe4ec5e56bdbe5281a98f6756471f1870025).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755465228


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133746/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755349850


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38334/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755885708


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38358/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755896387


   The flaky tests are known and being fixed at `https://github.com/apache/spark/pull/31076`. Let me just merge this in


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755908233


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38358/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31068:
URL: https://github.com/apache/spark/pull/31068#issuecomment-755353372


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38334/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #31068:
URL: https://github.com/apache/spark/pull/31068#discussion_r552650903



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/BasicStatsPlanVisitor.scala
##########
@@ -79,7 +79,15 @@ object BasicStatsPlanVisitor extends LogicalPlanVisitor[Statistics] {
 
   override def visitScriptTransform(p: ScriptTransformation): Statistics = default(p)
 
-  override def visitUnion(p: Union): Statistics = fallback(p)
+  override def visitUnion(p: Union): Statistics = {
+    val stats = p.children.map(_.stats)
+    val rowCount = if (stats.exists(_.rowCount.isEmpty)) {
+      None
+    } else {
+      Some(stats.map(_.rowCount.get).sum)
+    }
+    Statistics(sizeInBytes = stats.map(_.sizeInBytes).sum, rowCount = rowCount)
+  }

Review comment:
       Same logic, just add row count:
   https://github.com/apache/spark/blob/6c5ba8169ae64fdcefd8530c2b38326178f5fa92/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L148-L150




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #31068: [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #31068:
URL: https://github.com/apache/spark/pull/31068


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org