You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2016/11/27 10:46:58 UTC

[jira] [Updated] (SPARK-18593) JDBCRDD returns incorrect results for filters on CHAR of PostgreSQL

     [ https://issues.apache.org/jira/browse/SPARK-18593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-18593:
----------------------------------
    Summary: JDBCRDD returns incorrect results for filters on CHAR of PostgreSQL  (was: JDBCRDD returns incorrect results for a query with filters on CHAR type column of PostgreSQL)

> JDBCRDD returns incorrect results for filters on CHAR of PostgreSQL
> -------------------------------------------------------------------
>
>                 Key: SPARK-18593
>                 URL: https://issues.apache.org/jira/browse/SPARK-18593
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2, 1.6.3
>            Reporter: Durga Prasad Gunturu
>            Priority: Minor
>              Labels: correctness
>
> In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with filters on CHAR column with PostgreSQL CHAR type. The root cause is PostgreSQL returns `space padded string` for a result. So, the post processing filter `Filter (a#0 = A)` is evaluated false. Spark 2.0.0 removes the post filter because it is already handled in the database by `PushedFilters: [EqualTo(a,A)]`.
> {code}
> scala> val t_char = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new java.util.Properties())
> t_char: org.apache.spark.sql.DataFrame = [a: string]
> scala> val t_varchar = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar", new java.util.Properties())
> t_varchar: org.apache.spark.sql.DataFrame = [a: string]
> scala> t_char.show
> +----------+
> |         a|
> +----------+
> |A         |
> |AA        |
> |AAA       |
> +----------+
> scala> t_varchar.show
> +---+
> |  a|
> +---+
> |  A|
> | AA|
> |AAA|
> +---+
> scala> t_char.filter(t_char("a")==="A").show
> +---+
> |  a|
> +---+
> +---+
> scala> t_char.filter(t_char("a")==="A         ").show
> +----------+
> |         a|
> +----------+
> |A         |
> +----------+
> scala> t_varchar.filter(t_varchar("a")==="A").show
> +---+
> |  a|
> +---+
> |  A|
> +---+
> scala> t_char.filter(t_char("a")==="A").explain
> == Physical Plan ==
> Filter (a#0 = A)
> +- Scan JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres, password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org