You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shuo Chen <ch...@gmail.com> on 2019/10/08 10:58:46 UTC
question about spark sql join pruning
Hi!
I have a question about spark left join pruning. For example, I have 2
tables:
table A:
create table A (
user_id int,
gender string,
email string,
phone string,
)
table B:
create table B (
user_id int,
jobs string,
graduate_schools string
)
If I select columns of A from sub-query with A left join B. Can spark
optimize the plan to just scan table A?
Query like this:
select user_id, gender, email, phone from (
select A.user_id as user_id,
A.gender as gender,
A.email as email,
A. phone as phone
from A left join B on A.user_id = B.user_id
)
More general question like this,
If I have a view with table A1 left join table A2 left join... An
Columns of Ai, Aj, Ak are selected. i, j k are subset of 1...n
Can spark prune the unnecessary table during the table joining?
--
*Shuo Chen*
chenatu2006@gmail.com