You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/07/20 03:05:00 UTC
[jira] [Comment Edited] (SPARK-24859) Predicates pushdown on outer
joins
[ https://issues.apache.org/jira/browse/SPARK-24859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550159#comment-16550159 ]
Hyukjin Kwon edited comment on SPARK-24859 at 7/20/18 3:04 AM:
---------------------------------------------------------------
Does the same thing happens in Apache Spark's master branch, and mind if I ask a self-contained reproducer and explain results if you are not working on this?
was (Author: hyukjin.kwon):
Does the same thing happens in Apache Spark's master branch, and mind if I ask a reproducer if you are not working on this?
> Predicates pushdown on outer joins
> ----------------------------------
>
> Key: SPARK-24859
> URL: https://issues.apache.org/jira/browse/SPARK-24859
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 2.2.0
> Environment: Cloudera CDH 5.13.1
> Reporter: Johannes Mayer
> Priority: Major
>
> I have two AVRO tables in Hive called FAct and DIm. Both are partitioned by a common column called part_col. Now I want to join both tables on their id but only for some of partitions.
> If I use an inner join, everything works well:
>
> {code:java}
> select *
> from FA f
> join DI d
> on(f.id = d.id and f.part_col = d.part_col)
> where f.part_col = 'xyz'
> {code}
>
> In the sql explain plan I can see, that the predicate part_col = 'xyz' is also used in the DIm HiveTableScan.
>
> When I execute the same query using a left join the full dim table is scanned. There are some workarounds for this issue, but I wanted to report this as a bug, since it works on an inner join, and i think the behaviour should be the same for an outer join
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org