You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/22 11:44:53 UTC
[GitHub] [spark] zhengruifeng opened a new pull request #35609: [SPARK-38286][SQL] check Union's maxRows and maxRowsPerPartition
zhengruifeng opened a new pull request #35609:
URL: https://github.com/apache/spark/pull/35609
### What changes were proposed in this pull request?
check Union's maxRows and maxRowsPerPartition
### Why are the changes needed?
Union's maxRows and maxRowsPerPartition may overflow:
```
scala> val df1 = spark.range(0, Long.MaxValue, 1, 1)
df1: org.apache.spark.sql.Dataset[Long] = [id: bigint]
scala> val df2 = spark.range(0, 100, 1, 10)
df2: org.apache.spark.sql.Dataset[Long] = [id: bigint]
scala> val union = df1.union(df2)
union: org.apache.spark.sql.Dataset[Long] = [id: bigint]
scala> union.queryExecution.logical.maxRowsPerPartition
res19: Option[Long] = Some(-9223372036854775799)
scala> union.queryExecution.logical.maxRows
res20: Option[Long] = Some(-9223372036854775709)
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
added testsuite
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #35609: [SPARK-38286][SQL] Union's maxRows and maxRowsPerPartition may overflow
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #35609:
URL: https://github.com/apache/spark/pull/35609#discussion_r813463402
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -277,11 +277,18 @@ case class Union(
assert(!allowMissingCol || byName, "`allowMissingCol` can be true only if `byName` is true.")
override def maxRows: Option[Long] = {
- if (children.exists(_.maxRows.isEmpty)) {
Review comment:
since 2.0, https://github.com/apache/spark/blob/branch-2.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L190-L197
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #35609: [SPARK-38286][SQL] check Union's maxRows and maxRowsPerPartition
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #35609:
URL: https://github.com/apache/spark/pull/35609#issuecomment-1047713774
cc @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #35609: [SPARK-38286][SQL] Union's maxRows and maxRowsPerPartition may overflow
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35609:
URL: https://github.com/apache/spark/pull/35609#discussion_r812966259
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -277,11 +277,18 @@ case class Union(
assert(!allowMissingCol || byName, "`allowMissingCol` can be true only if `byName` is true.")
override def maxRows: Option[Long] = {
- if (children.exists(_.maxRows.isEmpty)) {
Review comment:
do you know which Spark version we start to have `Union.maxRows`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #35609: [SPARK-38286][SQL] Union's maxRows and maxRowsPerPartition may overflow
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35609:
URL: https://github.com/apache/spark/pull/35609#issuecomment-1048857734
cc @sigmod
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #35609: [SPARK-38286][SQL] Union's maxRows and maxRowsPerPartition may overflow
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #35609:
URL: https://github.com/apache/spark/pull/35609
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #35609: [SPARK-38286][SQL] Union's maxRows and maxRowsPerPartition may overflow
Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #35609:
URL: https://github.com/apache/spark/pull/35609#issuecomment-1048504368
multi-table join may generate large `maxRows`, and union those joins may cause `maxRows` overflow, which may result in unexpected behavior in optimizer.
this also seems a correctness issue. @cloud-fan
the failed doc test also exist in other PRs, should be irrelative
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #35609: [SPARK-38286][SQL] Union's maxRows and maxRowsPerPartition may overflow
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35609:
URL: https://github.com/apache/spark/pull/35609#issuecomment-1049444309
thanks, merging to master/3.2/3.1/3.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org