You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eric Liang (JIRA)" <ji...@apache.org> on 2016/10/18 01:53:59 UTC

[jira] [Updated] (SPARK-17983) Can't filter over mixed case parquet columns of converted Hive tables

     [ https://issues.apache.org/jira/browse/SPARK-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Liang updated SPARK-17983:
-------------------------------
    Description: 
We should probably revive https://github.com/apache/spark/pull/14750 in order to fix this issue and related classes of issues.

The only other alternatives are (1) reconciling on-disk schemas with metastore schema at planning time, which seems pretty messy, and (2) fixing all the datasources to support case-insensitive matching, which also has issues.

Reproduction:
{code}
  private def setupPartitionedTable(tableName: String, dir: File): Unit = {
    spark.range(5).selectExpr("id as normalCol", "id as partCol1", "id as partCol2").write
      .partitionBy("partCol1", "partCol2")
      .mode("overwrite")
      .parquet(dir.getAbsolutePath)

    spark.sql(s"""
      |create external table $tableName (normalCol long)
      |partitioned by (partCol1 int, partCol2 int)
      |stored as parquet
      |location "${dir.getAbsolutePath}"""".stripMargin)
    spark.sql(s"msck repair table $tableName")
  }

  test("filter by mixed case col") {
    withTable("test") {
      withTempDir { dir =>
        setupPartitionedTable("test", dir)
        val df = spark.sql("select * from test where normalCol = 3")
        assert(df.count() == 1)
      }
    }
  }
{code}
cc [~cloud_fan]

  was:
We should probably revive https://github.com/apache/spark/pull/14750 in order to fix this issue and related classes of issues.

The only other alternatives are (1) reconciling on-disk schemas with metastore schema at planning time, which seems pretty messy, and (2) fixing all the datasources to support case-insensitive matching, which also has issues.

cc [~cloud_fan]


> Can't filter over mixed case parquet columns of converted Hive tables
> ---------------------------------------------------------------------
>
>                 Key: SPARK-17983
>                 URL: https://issues.apache.org/jira/browse/SPARK-17983
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Eric Liang
>            Priority: Critical
>
> We should probably revive https://github.com/apache/spark/pull/14750 in order to fix this issue and related classes of issues.
> The only other alternatives are (1) reconciling on-disk schemas with metastore schema at planning time, which seems pretty messy, and (2) fixing all the datasources to support case-insensitive matching, which also has issues.
> Reproduction:
> {code}
>   private def setupPartitionedTable(tableName: String, dir: File): Unit = {
>     spark.range(5).selectExpr("id as normalCol", "id as partCol1", "id as partCol2").write
>       .partitionBy("partCol1", "partCol2")
>       .mode("overwrite")
>       .parquet(dir.getAbsolutePath)
>     spark.sql(s"""
>       |create external table $tableName (normalCol long)
>       |partitioned by (partCol1 int, partCol2 int)
>       |stored as parquet
>       |location "${dir.getAbsolutePath}"""".stripMargin)
>     spark.sql(s"msck repair table $tableName")
>   }
>   test("filter by mixed case col") {
>     withTable("test") {
>       withTempDir { dir =>
>         setupPartitionedTable("test", dir)
>         val df = spark.sql("select * from test where normalCol = 3")
>         assert(df.count() == 1)
>       }
>     }
>   }
> {code}
> cc [~cloud_fan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org