You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Johannes Mayer (JIRA)" <ji...@apache.org> on 2018/07/19 09:46:00 UTC

[jira] [Created] (SPARK-24859) Predicates pushdown on outer joins

Johannes Mayer created SPARK-24859:
--------------------------------------

             Summary: Predicates pushdown on outer joins
                 Key: SPARK-24859
                 URL: https://issues.apache.org/jira/browse/SPARK-24859
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, SQL
    Affects Versions: 2.2.0
         Environment: Cloudera CDH 5.13.1
            Reporter: Johannes Mayer


I have two AVRO tables in Hive called FAct and DIm. Both are partitioned by a common column called part_col. Now I want to join both tables on their id but only for some of partitions.

If I use an inner join, everything works well:

 
{code:java}
select *
from FA f
join DI d
on(f.id = d.id and f.part_col = d.part_col)
where f.part_col = 'xyz'
{code}
 

In the sql explain plan i can see, that the predicate part_col = 'xyz' is also used in the DIm HiveTableScan.

 

When I execute the same query using a left join the full dim table is scanned. There are some workarounds for this issue, but i wanted to report this as a bug, since it works on an inner join, and i think the behaviour should be the same for an outer join

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org