You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2017/06/08 16:03:18 UTC
[jira] [Commented] (SPARK-21024) CSV parse mode handles Univocity
parser exceptions
[ https://issues.apache.org/jira/browse/SPARK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042925#comment-16042925 ]
Takeshi Yamamuro commented on SPARK-21024:
------------------------------------------
Is it worth fixing this? (I feel it is a kind of conrner cases..) cc: [~smilegator] [~hyukjin.kwon]
> CSV parse mode handles Univocity parser exceptions
> --------------------------------------------------
>
> Key: SPARK-21024
> URL: https://issues.apache.org/jira/browse/SPARK-21024
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.1.1
> Reporter: Takeshi Yamamuro
> Priority: Minor
>
> The current master cannot skip the illegal records that Univocity parsers:
> This comes from the spark-user mailing list:
> https://www.mail-archive.com/user@spark.apache.org/msg63985.html
> {code}
> scala> Seq("0,1", "0,1,2,3").toDF().write.text("/Users/maropu/Desktop/data")
> scala> val df = spark.read.format("csv").schema("a int, b int").option("maxColumns", "3").load("/Users/maropu/Desktop/data")
> scala> df.show
> com.univocity.parsers.common.TextParsingException: java.lang.ArrayIndexOutOfBoundsException - 3
> Hint: Number of columns processed may have exceeded limit of 3 columns. Use settings.setMaxColumns(int) to define the maximum number of columns your input can have
> Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse
> Parser Configuration: CsvParserSettings:
> Auto configuration enabled=true
> Autodetect column delimiter=false
> Autodetect quotes=false
> Column reordering enabled=true
> Empty value=null
> Escape unquoted values=false
> ...
> at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:339)
> at com.univocity.parsers.common.AbstractParser.handleEOF(AbstractParser.java:195)
> at com.univocity.parsers.common.AbstractParser.parseLine(AbstractParser.java:544)
> at org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:191)
> at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:308)
> at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:308)
> at org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:60)
> at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312)
> at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> ...
> {code}
> We could easily fix this like: https://github.com/apache/spark/compare/master...maropu:HandleExceptionInParser
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org