You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gengliang Wang (JIRA)" <ji...@apache.org> on 2019/04/29 05:43:01 UTC

[jira] [Updated] (SPARK-27269) File source v2 should validate data schema only

     [ https://issues.apache.org/jira/browse/SPARK-27269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gengliang Wang updated SPARK-27269:
-----------------------------------
    Issue Type: Sub-task  (was: Bug)
        Parent: SPARK-27589

> File source v2 should validate data schema only
> -----------------------------------------------
>
>                 Key: SPARK-27269
>                 URL: https://issues.apache.org/jira/browse/SPARK-27269
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently, File source v2 allows each data source to specify the supported data types by implementing the method `supportsDataType` in `FileScan` and `FileWriteBuilder`.
> However, in the read path, the validation checks all the data types in `readSchema`, which might contain partition columns.  This is actually a regression. E.g. Text data source only supports String data type, while the partition columns can still contain Integer type since partition columns are processed by Spark.
> This PR is to:
> 1. Refactor schema validation and check data schema only
> 2. Filter the partition columns in data schema if user specified schema provided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org