You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/06 00:32:37 UTC

[GitHub] [spark] sunchao opened a new pull request, #38924: [SPARK-41398][SQL] Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

sunchao opened a new pull request, #38924:
URL: https://github.com/apache/spark/pull/38924

### What changes were proposed in this pull request?

This PR relaxes the current constraint of Storage-Partitioned Join which requires that the partition keys after runtime filtering to be exact the same as the partition keys before the filtering.

### Why are the changes needed?

At the moment, Spark requires that when Storage-Partitioned Join is used together with runtime filtering, the partition keys before and after the filtering shall exact match. If not, a `SparkException` is thrown.

However, this is not strictly necessary in the case where the partition keys after the filtering is a subset of the original keys. In this scenario, we can use empty partitions for those missing keys in the latter.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Modified an existing test case to match the change.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org