You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gengliang Wang (JIRA)" <ji...@apache.org> on 2019/04/29 05:43:01 UTC
[jira] [Updated] (SPARK-27269) File source v2 should validate data
schema only
[ https://issues.apache.org/jira/browse/SPARK-27269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gengliang Wang updated SPARK-27269:
-----------------------------------
Issue Type: Sub-task (was: Bug)
Parent: SPARK-27589
> File source v2 should validate data schema only
> -----------------------------------------------
>
> Key: SPARK-27269
> URL: https://issues.apache.org/jira/browse/SPARK-27269
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Gengliang Wang
> Assignee: Gengliang Wang
> Priority: Major
> Fix For: 3.0.0
>
>
> Currently, File source v2 allows each data source to specify the supported data types by implementing the method `supportsDataType` in `FileScan` and `FileWriteBuilder`.
> However, in the read path, the validation checks all the data types in `readSchema`, which might contain partition columns. This is actually a regression. E.g. Text data source only supports String data type, while the partition columns can still contain Integer type since partition columns are processed by Spark.
> This PR is to:
> 1. Refactor schema validation and check data schema only
> 2. Filter the partition columns in data schema if user specified schema provided.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org