You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "peter-toth (via GitHub)" <gi...@apache.org> on 2024/03/07 17:33:35 UTC
[PR] [SPARK-47319][SQL] Fix missingInput calculation [spark]
peter-toth opened a new pull request, #45424:
URL: https://github.com/apache/spark/pull/45424
### What changes were proposed in this pull request?
This PR speeds up `QueryPlan.missingInput()` calculation.
### Why are the changes needed?
This seems to be the root cause of `DeduplicateRelations` slowness in some cases.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing UTs.
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]
Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #45424:
URL: https://github.com/apache/spark/pull/45424#issuecomment-1985053630
Thanks, merged to master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]
Posted by "attilapiros (via GitHub)" <gi...@apache.org>.
attilapiros commented on PR #45424:
URL: https://github.com/apache/spark/pull/45424#issuecomment-1984150861
LGTM
I talked to @peter-toth offline and the improvement comes from not calculating the `inputSet` at all when references is empty
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]
Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #45424:
URL: https://github.com/apache/spark/pull/45424#discussion_r1517119767
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala:
##########
@@ -104,13 +104,19 @@ class AttributeSet private (private val baseSet: mutable.LinkedHashSet[Attribute
* in `other`.
*/
def --(other: Iterable[NamedExpression]): AttributeSet = {
Review Comment:
This can be more efficient, but looks weird in standard collection APIs such as `def --`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]
Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn closed pull request #45424: [SPARK-47319][SQL] Improve missingInput calculation
URL: https://github.com/apache/spark/pull/45424
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]
Posted by "attilapiros (via GitHub)" <gi...@apache.org>.
attilapiros commented on code in PR #45424:
URL: https://github.com/apache/spark/pull/45424#discussion_r1516669562
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala:
##########
@@ -104,13 +104,19 @@ class AttributeSet private (private val baseSet: mutable.LinkedHashSet[Attribute
* in `other`.
*/
def --(other: Iterable[NamedExpression]): AttributeSet = {
Review Comment:
and then we can save here what in the `missingInput()` was saved in your previous commit (the calculation of the `inputSet`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]
Posted by "peter-toth (via GitHub)" <gi...@apache.org>.
peter-toth commented on PR #45424:
URL: https://github.com/apache/spark/pull/45424#issuecomment-1985292442
Thanks for the review @attilapiros, @cloud-fan, @yaooqinn!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]
Posted by "attilapiros (via GitHub)" <gi...@apache.org>.
attilapiros commented on code in PR #45424:
URL: https://github.com/apache/spark/pull/45424#discussion_r1516651884
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala:
##########
@@ -104,13 +104,19 @@ class AttributeSet private (private val baseSet: mutable.LinkedHashSet[Attribute
* in `other`.
*/
def --(other: Iterable[NamedExpression]): AttributeSet = {
Review Comment:
@peter-toth What about changing the `other` to a call-by-name parameter?
```suggestion
def --(other: => Iterable[NamedExpression]): AttributeSet = {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]
Posted by "peter-toth (via GitHub)" <gi...@apache.org>.
peter-toth commented on PR #45424:
URL: https://github.com/apache/spark/pull/45424#issuecomment-1984153122
@cloud-fan can you please take a look?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org