You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/04/04 12:00:14 UTC

[GitHub] [spark] beliefer opened a new pull request, #40661: [SPARK-43025][SQL] Eliminate Union if filters have the same child plan

beliefer opened a new pull request, #40661:
URL: https://github.com/apache/spark/pull/40661

   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   New feature and just update the inner implementation.
   
   
   ### How was this patch tested?
   New test cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #40661: [SPARK-43025][SQL] Eliminate Union if filters have the same child plan

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #40661:
URL: https://github.com/apache/spark/pull/40661#issuecomment-1534274671

   I think @peter-toth did something similar before, can you share some ideas @peter-toth ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43025][SQL] Eliminate Union if filters have the same child plan [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #40661: [SPARK-43025][SQL] Eliminate Union if filters have the same child plan
URL: https://github.com/apache/spark/pull/40661


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40661: [SPARK-43025][SQL] Eliminate Union if filters have the same child plan

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40661:
URL: https://github.com/apache/spark/pull/40661#issuecomment-1534538671

   > I think @peter-toth did something similar before, can you share some ideas @peter-toth ?
   
   I guess @peter-toth did the similar thing for scalar subquery, but this one try to fix non-scalar subquery.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40661: [SPARK-43025][SQL] Eliminate Union if filters have the same child plan

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40661:
URL: https://github.com/apache/spark/pull/40661#issuecomment-1536060234

   @peter-toth Thank you for your first look.
   In fact, this PR references some functions as you mentioned above (e.g. `checkIdenticalPlans`). If we can share these functions, that be good.
   
   The partitioning/bucketing column filter seems doesn't improve anything. I will optimize it in further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] beliefer commented on pull request #40661: [SPARK-43025][SQL] Eliminate Union if filters have the same child plan

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #40661:
URL: https://github.com/apache/spark/pull/40661#issuecomment-1534242428

   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #40661: [SPARK-43025][SQL] Eliminate Union if filters have the same child plan

Posted by "peter-toth (via GitHub)" <gi...@apache.org>.
peter-toth commented on PR #40661:
URL: https://github.com/apache/spark/pull/40661#issuecomment-1536035613

   > > I think @peter-toth did something similar before, can you share some ideas @peter-toth ?
   > 
   > I guess @peter-toth did the similar thing for scalar subquery, but this one try to fix non-scalar subquery.
   
   Sorry, I haven't got time to fully review the PR (maybe next week) but at first sight it seems to copy some fuctions (e.g. `checkIdenticalPlans()`, `mergeNamedExpressions()`) from `MergeScalarSubqueries` so there seems to be some room for improvement and we could share the common functions. Also, some names (e.g. `subqueryIndex`) might need some changes here.
   
   This PR combines UNION ALL legs if they return disjoint set of rows from the same source node. I think this makes sense in those cases when there are overlaping scans in the legs (despite the disjoint filters), and by "overlapping" I mean that the scans use some common set of files.
   So seems like the only case when this change doesn't bring improvement is when the filter is a partitioning/bucketing column and the scans in union legs doesn't overlap. But even in that case I'm not sure if this PR has any disadvantage, just doesn't improve anything...
   
   BTW, `MergeScalarSubqueries` (https://github.com/apache/spark/pull/32298) does very similar merging, but we run that only once because merging can be costly when there are many candidates. Do we need `EliminateUnions` in `operatorOptimizationBatch`?
   Sidenote: `MergeScalarSubqueries` doesn't work with different filters currently. This is because merging filters in subqueries is more comlicated as we need to propogate the filters up to an aggregate, and because it can cause performance degradation when we have non-overlapping scans. (See this WIP PR: https://github.com/apache/spark/pull/37630).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43025][SQL] Eliminate Union if filters have the same child plan [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40661:
URL: https://github.com/apache/spark/pull/40661#issuecomment-1807297079

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43025][SQL] Eliminate Union if filters have the same child plan [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40661:
URL: https://github.com/apache/spark/pull/40661#issuecomment-1958435617

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org