You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/06 18:20:14 UTC

[GitHub] [spark] allisonwang-db opened a new pull request, #38135: [SPARK-36114][SQL] Support subqueries with correlated non-equality predicates

allisonwang-db opened a new pull request, #38135:
URL: https://github.com/apache/spark/pull/38135

### What changes were proposed in this pull request?

This PR supports correlated non-equality predicates in subqueries. It leverages the DecorrelateInnerQuery framework to decorrelate subqueries with non-equality predicates. DecorrelateInnerQuery inserts domain joins in the query plan and the rule RewriteCorrelatedScalarSubquery rewrites the domain joins into actual joins with the outer query.

Note, correlated non-equality predicates can lead to query plans with non-equality join conditions, which may be planned as a broadcast NL join or cartesian product.

### Why are the changes needed?

To improve subquery support in Spark.

### Does this PR introduce _any_ user-facing change?

Yes. Before this PR, Spark does not allow correlated non-equality predicates in subqueries.
For example:
```sql
SELECT (SELECT min(c2) FROM t2 WHERE t1.c1 > t2.c1) FROM t1
```
This will throw an exception: `Correlated column is not allowed in a non-equality predicate`

After this PR, this query can run successfully.

### How was this patch tested?

Unit tests and SQL query tests.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org