You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/17 17:06:45 UTC

[GitHub] [spark] cloud-fan opened a new pull request, #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

cloud-fan opened a new pull request, #38692:
URL: https://github.com/apache/spark/pull/38692

### What changes were proposed in this pull request?

Today, Spark is very conservative and uses the analyzed plan instead of the optimized plan as the cache key. Many cache opportunities are missed.

This PR updates `SparkSessionExtensions` to allow people to inject custom plan normalization rules. Users can pick some safe optimizer rules, or implement new rules based on their business needs, to do plan normalization and increase the cache hit rate.

### Why are the changes needed?

allow advanced users to do caching better.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new test

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org