You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2016/11/27 10:07:58 UTC
[jira] [Updated] (SPARK-18593) JDBCRDD returns incorrect results
for a query with filters on CHAR type column of PostgreSQL
[ https://issues.apache.org/jira/browse/SPARK-18593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-18593:
----------------------------------
Summary: JDBCRDD returns incorrect results for a query with filters on CHAR type column of PostgreSQL (was: JDBCRDD fails to filter CHAR type column for PostgreSQL)
> JDBCRDD returns incorrect results for a query with filters on CHAR type column of PostgreSQL
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-18593
> URL: https://issues.apache.org/jira/browse/SPARK-18593
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.2, 1.6.3
> Reporter: Durga Prasad Gunturu
> Priority: Minor
> Labels: correctness
>
> In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with filters on CHAR column with PostgreSQL CHAR type. The root cause is PostgreSQL returns `space padded string` for a result. So, the post processing filter `Filter (a#0 = A)` is evaluated false. Spark 2.0.0 removes the post filter because it is already handled in the database by `PushedFilters: [EqualTo(a,A)]`.
> {code}
> scala> val t_char = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new java.util.Properties())
> t_char: org.apache.spark.sql.DataFrame = [a: string]
> scala> val t_varchar = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar", new java.util.Properties())
> t_varchar: org.apache.spark.sql.DataFrame = [a: string]
> scala> t_char.show
> +----------+
> | a|
> +----------+
> |A |
> |AA |
> |AAA |
> +----------+
> scala> t_varchar.show
> +---+
> | a|
> +---+
> | A|
> | AA|
> |AAA|
> +---+
> scala> t_char.filter(t_char("a")==="A").show
> +---+
> | a|
> +---+
> +---+
> scala> t_char.filter(t_char("a")==="A ").show
> +----------+
> | a|
> +----------+
> |A |
> +----------+
> scala> t_varchar.filter(t_varchar("a")==="A").show
> +---+
> | a|
> +---+
> | A|
> +---+
> scala> t_char.filter(t_char("a")==="A").explain
> == Physical Plan ==
> Filter (a#0 = A)
> +- Scan JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres, password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org