You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gengliang Wang (JIRA)" <ji...@apache.org> on 2019/02/01 03:28:00 UTC

[jira] [Updated] (SPARK-26744) Support schema validation in File Source V2

     [ https://issues.apache.org/jira/browse/SPARK-26744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gengliang Wang updated SPARK-26744:
-----------------------------------
    Description: 
The method supportDataType in FileFormat helps to validate the output/input schema before execution starts. So that we can avoid some invalid data source IO, and users can see clean error messages.

This PR is to implement the same method in the FileDataSourceV2 framework. Comparing to FileFormat, FileDataSourceV2 has multiple layers. The API is added in two places:

1. FileWriteBuilder: this is where we can get the actual write schema
2. FileScan: this is where we can get the actual read schema.

> Support schema validation in File Source V2
> -------------------------------------------
>
>                 Key: SPARK-26744
>                 URL: https://issues.apache.org/jira/browse/SPARK-26744
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> The method supportDataType in FileFormat helps to validate the output/input schema before execution starts. So that we can avoid some invalid data source IO, and users can see clean error messages.
> This PR is to implement the same method in the FileDataSourceV2 framework. Comparing to FileFormat, FileDataSourceV2 has multiple layers. The API is added in two places:
> 1. FileWriteBuilder: this is where we can get the actual write schema
> 2. FileScan: this is where we can get the actual read schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org