You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Yuming Wang (JIRA)" <ji...@apache.org> on 2018/09/13 07:55:00 UTC

[jira] [Updated] (SPARK-25207) Case-insensitve field resolution for filter pushdown when reading Parquet

     [ https://issues.apache.org/jira/browse/SPARK-25207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuming Wang updated SPARK-25207:
--------------------------------
    Issue Type: Sub-task  (was: Bug)
        Parent: SPARK-25419

> Case-insensitve field resolution for filter pushdown when reading Parquet
> -------------------------------------------------------------------------
>
>                 Key: SPARK-25207
>                 URL: https://issues.apache.org/jira/browse/SPARK-25207
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: yucai
>            Assignee: yucai
>            Priority: Major
>              Labels: Parquet
>             Fix For: 2.4.0
>
>         Attachments: image.png
>
>
> Currently, filter pushdown will not work if Parquet schema and Hive metastore schema are in different letter cases even spark.sql.caseSensitive is false.
> Like the below case:
> {code:java}
> spark.range(10).write.parquet("/tmp/data")
> sql("DROP TABLE t")
> sql("CREATE TABLE t (ID LONG) USING parquet LOCATION '/tmp/data'")
> sql("select * from t where id > 0").show{code}
> -No filter will be pushed down.-
> {code}
> scala> sql("select * from t where id > 0").explain   // Filters are pushed with `ID`
> == Physical Plan ==
> *(1) Project [ID#90L]
> +- *(1) Filter (isnotnull(id#90L) && (id#90L > 0))
>    +- *(1) FileScan parquet default.t[ID#90L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/data], PartitionFilters: [], PushedFilters: [IsNotNull(ID), GreaterThan(ID,0)], ReadSchema: struct<ID:bigint>
> scala> sql("select * from t").show    // Parquet returns NULL for `ID` because it has `id`.
> +----+
> |  ID|
> +----+
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> +----+
> scala> sql("select * from t where id > 0").show   // `NULL > 0` is `false`.
> +---+
> | ID|
> +---+
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org