You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/12/03 18:07:12 UTC

[GitHub] [spark] aokolnychyi opened a new pull request #26751: [SPARK-30107][SQL] Expose nested schema pruning to all V2 sources

aokolnychyi opened a new pull request #26751: [SPARK-30107][SQL] Expose nested schema pruning to all V2 sources
URL: https://github.com/apache/spark/pull/26751

### What changes were proposed in this pull request?

This PR exposes the existing logic for nested schema pruning to all sources, which is in line with the description of `SupportsPushDownRequiredColumns` .

Right now, `SchemaPruning` (rule, not helper utility) is applied in the optimizer directly on certain instances of `Table` ignoring `SupportsPushDownRequiredColumns` that is part of `ScanBuilder`. I think it would be cleaner to perform schema pruning and filter push-down in one place. Therefore, this PR moves all the logic into `V2ScanRelationPushDown`.

### Why are the changes needed?

This change allows all V2 data sources to benefit from nested column pruning (if they support it).

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

This PR mostly relies on existing tests. On top, it adds one test to verify that top-level schema pruning works as well as one test for predicates with subqueries.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org