You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "peter-toth (via GitHub)" <gi...@apache.org> on 2024/03/07 17:33:35 UTC

[PR] [SPARK-47319][SQL] Fix missingInput calculation [spark]

peter-toth opened a new pull request, #45424:
URL: https://github.com/apache/spark/pull/45424

   ### What changes were proposed in this pull request?
   This PR speeds up `QueryPlan.missingInput()` calculation.
   
   
   ### Why are the changes needed?
   This seems to be the root cause of `DeduplicateRelations` slowness in some cases.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Existing UTs.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #45424:
URL: https://github.com/apache/spark/pull/45424#issuecomment-1985053630

   Thanks, merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

Posted by "attilapiros (via GitHub)" <gi...@apache.org>.
attilapiros commented on PR #45424:
URL: https://github.com/apache/spark/pull/45424#issuecomment-1984150861

   LGTM
   
   I talked to @peter-toth offline and the improvement comes from not calculating the `inputSet` at all when references is empty 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #45424:
URL: https://github.com/apache/spark/pull/45424#discussion_r1517119767


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala:
##########
@@ -104,13 +104,19 @@ class AttributeSet private (private val baseSet: mutable.LinkedHashSet[Attribute
    * in `other`.
    */
   def --(other: Iterable[NamedExpression]): AttributeSet = {

Review Comment:
   This can be more efficient, but looks weird in standard collection APIs such as `def --`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn closed pull request #45424: [SPARK-47319][SQL] Improve missingInput calculation
URL: https://github.com/apache/spark/pull/45424


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

Posted by "attilapiros (via GitHub)" <gi...@apache.org>.
attilapiros commented on code in PR #45424:
URL: https://github.com/apache/spark/pull/45424#discussion_r1516669562


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala:
##########
@@ -104,13 +104,19 @@ class AttributeSet private (private val baseSet: mutable.LinkedHashSet[Attribute
    * in `other`.
    */
   def --(other: Iterable[NamedExpression]): AttributeSet = {

Review Comment:
   and then we can save here what in the `missingInput()` was saved in your previous commit (the calculation of the `inputSet`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

Posted by "peter-toth (via GitHub)" <gi...@apache.org>.
peter-toth commented on PR #45424:
URL: https://github.com/apache/spark/pull/45424#issuecomment-1985292442

   Thanks for the review @attilapiros, @cloud-fan, @yaooqinn!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

Posted by "attilapiros (via GitHub)" <gi...@apache.org>.
attilapiros commented on code in PR #45424:
URL: https://github.com/apache/spark/pull/45424#discussion_r1516651884


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala:
##########
@@ -104,13 +104,19 @@ class AttributeSet private (private val baseSet: mutable.LinkedHashSet[Attribute
    * in `other`.
    */
   def --(other: Iterable[NamedExpression]): AttributeSet = {

Review Comment:
   @peter-toth  What about changing the `other` to a call-by-name parameter?
   ```suggestion
     def --(other: => Iterable[NamedExpression]): AttributeSet = {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

Posted by "peter-toth (via GitHub)" <gi...@apache.org>.
peter-toth commented on PR #45424:
URL: https://github.com/apache/spark/pull/45424#issuecomment-1984153122

   @cloud-fan can you please take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org