You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Davies Liu (JIRA)" <ji...@apache.org> on 2015/10/13 22:50:05 UTC

[jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source

    [ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955646#comment-14955646 ] 

Davies Liu commented on SPARK-9182:
-----------------------------------

For JDBC, I think we could push more stuff (for example, a + b > 3) into remote database, which include casting. This is more useful for JDBC than other file based data sources, we may could spend more efforts on it.

> filter and groupBy on DataFrames are not passed through to jdbc source
> ----------------------------------------------------------------------
>
>                 Key: SPARK-9182
>                 URL: https://issues.apache.org/jira/browse/SPARK-9182
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1
>            Reporter: Greg Rahn
>            Assignee: Yijie Shen
>            Priority: Critical
>
> When running all of these API calls, the only one that passes the filter through to the backend jdbc source is equality.  All filters in these commands should be able to be passed through to the jdbc database source.
> {code}
> val url="jdbc:postgresql:grahn"
> val prop = new java.util.Properties
> val emp = sqlContext.read.jdbc(url, "emp", prop)
> emp.filter(emp("sal") === 5000).show()
> emp.filter(emp("sal") < 5000).show()
> emp.filter("sal = 3000").show()
> emp.filter("sal > 2500").show()
> emp.filter("sal >= 2500").show()
> emp.filter("sal < 2500").show()
> emp.filter("sal <= 2500").show()
> emp.filter("sal != 3000").show()
> emp.filter("sal between 3000 and 5000").show()
> emp.filter("ename in ('SCOTT','BLAKE')").show()
> {code}
> We see from the PostgreSQL query log the following is run, and see that only equality predicates are passed through.
> {code}
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE sal = 5000
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp WHERE sal = 3000
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno" FROM emp
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org