You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/05/31 14:45:00 UTC
[jira] [Assigned] (SPARK-27873) Csv reader, adding a corrupt record
column causes error if enforceSchema=false
[ https://issues.apache.org/jira/browse/SPARK-27873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-27873:
------------------------------------
Assignee: Apache Spark
> Csv reader, adding a corrupt record column causes error if enforceSchema=false
> ------------------------------------------------------------------------------
>
> Key: SPARK-27873
> URL: https://issues.apache.org/jira/browse/SPARK-27873
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.3
> Reporter: Marcin Mejran
> Assignee: Apache Spark
> Priority: Major
>
> In the Spark CSV reader If you're using permissive mode with a column for storing corrupt records then you need to add a new schema column corresponding to columnNameOfCorruptRecord.
> However, if you have a header row and enforceSchema=false the schema vs. header validation fails because there is an extra column corresponding to columnNameOfCorruptRecord.
> Since, the FAILFAST mode doesn't print informative error messages on which rows failed to parse there is no way other to track down broken rows without setting a corrupt record column.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org