You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2018/12/07 03:14:00 UTC

[jira] [Resolved] (SPARK-26263) Throw exception when Partition column value can't be converted to user specified type

     [ https://issues.apache.org/jira/browse/SPARK-26263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-26263.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 23215
[https://github.com/apache/spark/pull/23215]

> Throw exception when Partition column value can't be converted to user specified type
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-26263
>                 URL: https://issues.apache.org/jira/browse/SPARK-26263
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently if user provides data schema, partition column values are converted as per it. But if the conversion failed, e.g. converting string to int, the column value is null.
> For the following directory
> /tmp/testDir
> ├── p=bar
> └── p=foo
> If we run:
> ```
> val schema = StructType(Seq(StructField("p", IntegerType, false)))
> spark.read.schema(schema).csv("/tmp/testDir/").show()
> ```
> We will get:
> +----+
> |   p|
> +----+
> |null|
> |null|
> +----+
> This PR propose to throw exception in such case, instead of converting into null value silently:
> 1. These null partition column values doesn't make sense to users in most case. It is better to know the conversion failure, and then adjust the schema or ETL jobs, etc to fix it.
> 2. There are always exceptions on such conversion failure for non-partition data columns. Partition columns should have the same behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org