You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Russell Alexander Spitzer (JIRA)" <ji...@apache.org> on 2015/12/31 04:44:49 UTC

[jira] [Comment Edited] (SPARK-11661) We should still pushdown filters returned by a data source's unhandledFilters

    [ https://issues.apache.org/jira/browse/SPARK-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075690#comment-15075690 ] 

Russell Alexander Spitzer edited comment on SPARK-11661 at 12/31/15 3:44 AM:
-----------------------------------------------------------------------------

This seems to have a slightly unintended consequence in the explain dialogue. 

It basically makes it seem as if a source is always pushing down all of the filters (even those it cannot handle)

This can have a confusing effect (I kept checking my code to see where I had broken something :D )

{code: title="Query plan for source where nothing is handled by C* Source"}
Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 = 1))
+- Scan org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78] PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
{code}
Although the tell tale "Filter" step is present my first instinct would tell me that the underlying source relation is using all of those filters.

{code: title="Query plan for source where *everything* is handled by C* Source"}
Scan org.apache.spark.sql.cassandra.CassandraSourceRelation@55d4456c[a#79,b#80,c#81,d#82,e#83,f#84,g#85,h#86] PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
{code}


I think this would be much clearer if we changed the metadata key to "HandledFilters" and only listed those handled fully by the underlying source.

wdyt?


was (Author: rspitzer):
This seems to have a slightly unintended consequence in the explain dialogue. 

It basically makes it seem as if a source is always pushing down all of the filters (even those it cannot handle)

This can have a confusing effect (I kept checking my code to see where I had broken something :D )

{code: Title="Query plan for source where nothing is handled by C* Source"}
Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 = 1))
+- Scan org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78] PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
{code}
Although the tell tale "Filter" step is present my first instinct would tell me that the underlying source relation is using all of those filters.

{code: Title="Query plan for source where *everything* is handled by C* Source"}
Scan org.apache.spark.sql.cassandra.CassandraSourceRelation@55d4456c[a#79,b#80,c#81,d#82,e#83,f#84,g#85,h#86] PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
{code}


I think this would be much clearer if we changed the metadata key to "HandledFilters" and only listed those handled fully by the underlying source.

wdyt?

> We should still pushdown filters returned by a data source's unhandledFilters
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-11661
>                 URL: https://issues.apache.org/jira/browse/SPARK-11661
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>            Priority: Blocker
>             Fix For: 1.6.0
>
>
> We added unhandledFilters interface to SPARK-10978. So, a data source has a chance to let Spark SQL know that for those returned filters, it is possible that the data source will not apply them to every row. So, Spark SQL should use a Filter operator to evaluate those filters. However, if a filter is a part of returned unhandledFilters, we should still push it down. For example, our internal data sources do not override this method, if we do not push down those filters, we are actually turning off the filter pushdown feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org