You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ala Luszczak (Jira)" <ji...@apache.org> on 2020/12/21 11:10:00 UTC

[jira] [Commented] (SPARK-33594) Forbid binary type as partition column

    [ https://issues.apache.org/jira/browse/SPARK-33594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252775#comment-17252775 ] 

Ala Luszczak commented on SPARK-33594:
--------------------------------------

Big :+1: here. Having binary column as partition-by is a terrible idea.
I've seen at least two really bad scenarios result from this.

(1) When reading the data with the vectorized reader, I've seen segmentation faults.
(2) When reading the same data with the non-vectorized (parquet-mr) reader, the segmentation faults disappear, but instead incorrect values are returned for the binary columns.

I would like to point out that just covering the CREATE TABLE statement might not be enough. I think we should bail in the read path as well. After all the user can jest do spark.read.parquet("my/path") without creating a table first.

> Forbid binary type as partition column
> --------------------------------------
>
>                 Key: SPARK-33594
>                 URL: https://issues.apache.org/jira/browse/SPARK-33594
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: angerszhu
>            Priority: Major
>
> Forbid binary type as partition column



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org