You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kent Yao (Jira)" <ji...@apache.org> on 2021/01/25 07:33:00 UTC
[jira] [Updated] (SPARK-34192) Move char padding to write side

     [ https://issues.apache.org/jira/browse/SPARK-34192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kent Yao updated SPARK-34192:
-----------------------------
    Description: 
On the read side, the char length check and padding bring issues to CBO and PPD and other issues to the catalyst.

It's more reasonable to do it on the write side,  as Spark doesn't take fully control of the storage layer.


  test("SPARK-34192: Know issue of hive for tailing spaces") {
    // https://issues.apache.org/jira/browse/HIVE-13618
    // Trailing spaces in partition column will be treated differently
    // This is because Mysql and Derby(used in tests) considers 'a' = 'a '
    // whereas others like (Postgres, Oracle) doesn't exhibit this problem.
    Seq("char(5)", "string", "VARCHAR(5)").foreach { typ =>
      withTable("t") {
        sql(s"CREATE TABLE t(i STRING, c $typ) USING $format PARTITIONED BY (c)")
        sql(s"INSERT INTO t VALUES ('1', 'a ')")
        val e = intercept[AnalysisException](sql(s"INSERT INTO t VALUES ('1', 'a  ')"))
        assert(e.getMessage.contains("Expecting a partition with name c=a  ,"))
      }
    }
  }

  was:
On the read side, the char length check and padding bring issues to CBO and PPD and other issues to the catalyst.

It's more reasonable to do it on the write side,  as Spark doesn't take fully control of the storage layer.


> Move char padding to write side
> -------------------------------
>
>                 Key: SPARK-34192
>                 URL: https://issues.apache.org/jira/browse/SPARK-34192
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Kent Yao
>            Priority: Major
>
> On the read side, the char length check and padding bring issues to CBO and PPD and other issues to the catalyst.
> It's more reasonable to do it on the write side,  as Spark doesn't take fully control of the storage layer.
>   test("SPARK-34192: Know issue of hive for tailing spaces") {
>     // https://issues.apache.org/jira/browse/HIVE-13618
>     // Trailing spaces in partition column will be treated differently
>     // This is because Mysql and Derby(used in tests) considers 'a' = 'a '
>     // whereas others like (Postgres, Oracle) doesn't exhibit this problem.
>     Seq("char(5)", "string", "VARCHAR(5)").foreach { typ =>
>       withTable("t") {
>         sql(s"CREATE TABLE t(i STRING, c $typ) USING $format PARTITIONED BY (c)")
>         sql(s"INSERT INTO t VALUES ('1', 'a ')")
>         val e = intercept[AnalysisException](sql(s"INSERT INTO t VALUES ('1', 'a  ')"))
>         assert(e.getMessage.contains("Expecting a partition with name c=a  ,"))
>       }
>     }
>   }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org