You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2014/08/29 23:53:54 UTC

[jira] [Commented] (SPARK-3109) Sql query with OR condition should be handled above PhysicalOperation layer

    [ https://issues.apache.org/jira/browse/SPARK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115904#comment-14115904 ] 

Michael Armbrust commented on SPARK-3109:
-----------------------------------------

I don't believe this optimization is valid unless you know that there are no duplicate values for (d,e) in test.

> Sql query with OR condition should be handled above PhysicalOperation layer
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-3109
>                 URL: https://issues.apache.org/jira/browse/SPARK-3109
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.0.2
>            Reporter: Alex Liu
>
> For query like 
> {code}
> select d, e  from test where a = 1 and b = 1 and c = 1 and d > 20 or  d < 0
> {code}
> Spark SQL pushes the whole query to PhysicalOperation. I haven't check how Spark SQL internal query plan works, but I think "OR" condition in the above query should be handled above physical operation. Physical operation should have the following query
> {code} select d, e from test where a = 1 and b = 1 and c  = 1 and d > 20 {code}
> OR
> {code}select d, e from test where d < 0 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org