You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by Dave Sugden <da...@shopify.com.INVALID> on 2019/09/20 20:24:23 UTC

Scan planning + joins

Hi,

I'd  just like to confirm that, when using the iceberg spark datasource,
reading an iceberg table that has been identity partitioned on column x,
then performing some equi-join on said column x, does iceberg's scan
planning work as if a predicate on that column had been supplied? IE, no
full table scan?

Can this be observed with dataframe.explain ?

Thanks!

Re: Scan planning + joins

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Hi Dave,

This depends on Spark. Iceberg supports Spark's predicate push-down, where
Spark will forward its filters into the Iceberg scan. But, which filters
are pushed down is up to Spark.

Spark can make some inferences with join predicates. For example, if you
have a filter after a join, it can push the predicate through the join and
down to the source. But I don't think it is able to make all of the
inferences you or I would. For example, if you have an inner join on t1.id
= t2.id and a filter below the join on t1, it won't push that filter across
to the other side of the join. This is a limitation of Spark's optimizer,
not Iceberg.

One last thing to note is that Iceberg doesn't just evaluate these
predicates on partition data. It can also use stats for data files to prune
out unnecessary splits.

On Fri, Sep 20, 2019 at 1:24 PM Dave Sugden <da...@shopify.com.invalid>
wrote:

> Hi,
>
> I'd  just like to confirm that, when using the iceberg spark datasource,
> reading an iceberg table that has been identity partitioned on column x,
> then performing some equi-join on said column x, does iceberg's scan
> planning work as if a predicate on that column had been supplied? IE, no
> full table scan?
>
> Can this be observed with dataframe.explain ?
>
> Thanks!
>

-- 
Ryan Blue
Software Engineer
Netflix