You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "copperybean (via GitHub)" <gi...@apache.org> on 2023/08/15 07:22:16 UTC

[GitHub] [spark] copperybean opened a new pull request, #42495: [SPARK-44812][SQL] Push filters through join generated from intersect

copperybean opened a new pull request, #42495:
URL: https://github.com/apache/spark/pull/42495

   Currently, filters can not be pushed through join generated from intersect for two types of SQLs
   1. `select a from (select a from tl intersect select x from tr) where a > 123 `
   The physical plan is
   ```
   AdaptiveSparkPlan isFinalPlan=false
   +- HashAggregate(keys=[a#8L], functions=[])
      +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=133]
         +- HashAggregate(keys=[a#8L], functions=[])
            +- BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
               :- Filter (isnotnull(a#8L) AND (a#8L > 123))
               :  +- FileScan json [a#8L] ...
               +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), isnull(input[0, bigint, true])),false), [plan_id=129]
                  +- FileScan json [x#24L] ... 
   ```
   We can find the filter `a > 123` is not pushed to right table.
   
   2. `select a from (select a from tl intersect select x from tr) join trr on a = y`
   The physical plan is 
   ```
   *(3) Project [a#8L]
   +- *(3) BroadcastHashJoin [a#8L], [y#114L], Inner, BuildRight, false
      :- *(3) HashAggregate(keys=[a#8L], functions=[])
      :  +- Exchange hashpartitioning(a#8L, 200), ENSURE_REQUIREMENTS, [plan_id=506]
      :     +- *(1) HashAggregate(keys=[a#8L], functions=[])
      :        +- *(1) BroadcastHashJoin [coalesce(a#8L, 0), isnull(a#8L)], [coalesce(x#24L, 0), isnull(x#24L)], LeftSemi, BuildRight, false
      :           :- *(1) Filter isnotnull(a#8L)
      :           :  +- FileScan json [a#8L] ...
      :           +- BroadcastExchange HashedRelationBroadcastMode(List(coalesce(input[0, bigint, true], 0), isnull(input[0, bigint, true])),false), [plan_id=490]
      :              +- FileScan json [x#24L] ...
      +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [plan_id=512]
         +- *(2) Filter isnotnull(y#114L)
            +- FileScan json [y#114L] ...
   ```
   There should be a filter `isnotnull( x )` for table tr, while it's not pushed down.
   
   ### What changes were proposed in this pull request?
   For 1st SQL, the filter can not be pushed down through `EqualNullSafe` in optimizer rule `InferFiltersFromConstraints`. This can be fixed in method `ConstraintHelper.inferAdditionalConstraints`.
   For 2nd SQL, after rules `InferFiltersFromConstraints` and `PushDownPredicates`, the filter can only be pushed to left side of join, so another `InferFiltersFromConstraints` is needed to push the filter to right side of join.
   
   ### Why are the changes needed?
   Huge amount of data may be filtered before expensive join operation.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44812][SQL] Push all predicates according to EqualTo and EqualNullSafe [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #42495: [SPARK-44812][SQL] Push all predicates according to EqualTo and EqualNullSafe
URL: https://github.com/apache/spark/pull/42495


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] copperybean commented on pull request #42495: [SPARK-44812][SQL] Push all predicates according to EqualTo and EqualNullSafe

Posted by "copperybean (via GitHub)" <gi...@apache.org>.
copperybean commented on PR #42495:
URL: https://github.com/apache/spark/pull/42495#issuecomment-1724694181

   @cloud-fan @wangyum  Could you review this PR, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44812][SQL] Push all predicates according to EqualTo and EqualNullSafe [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #42495:
URL: https://github.com/apache/spark/pull/42495#issuecomment-1871639166

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org