You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/11/04 13:59:58 UTC

[jira] [Assigned] (SPARK-18269) NumberFormatException when reading csv for a nullable column

     [ https://issues.apache.org/jira/browse/SPARK-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-18269:
------------------------------------

    Assignee:     (was: Apache Spark)

> NumberFormatException when reading csv for a nullable column
> ------------------------------------------------------------
>
>                 Key: SPARK-18269
>                 URL: https://issues.apache.org/jira/browse/SPARK-18269
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.1
>            Reporter: Jork Zijlstra
>
> Having a schema with a nullable column thrown an java.lang.NumberFormatException: null when the data + delimeter isn't specified in the csv.
> Specifying the schema:
> StructType(Array(
>   StructField("id", IntegerType, nullable = false),
>   StructField("underlyingId", IntegerType, true)
> ))
> Data (without trailing delimeter to specify the second column):
> 1
> Read the data:
> sparkSession.read
>     .schema(sourceSchema)
>     .option("header", "false")
>     .option("delimiter", """\t""")
>     .csv(files(dates): _*)
>     .rdd
> Actual Result: 
> java.lang.NumberFormatException: null
> 	at java.lang.Integer.parseInt(Integer.java:542)
> 	at java.lang.Integer.parseInt(Integer.java:615)
> 	at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
> 	at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
> 	at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:244)
> Reason:
> The csv line is parsed into a Map (indexSafeTokens), which is short of one value. So indexSafeTokens(index) throws a NullpointerException reading the optional value which isn't in the Map.
> The NullpointerException is then given to the CSVTypeCast.castTo(datum: String, .....) as the datum value.
> The subsequent NumberFormatException is thrown due to the fact that a NullpointerException cannot be cast into the Type.
> Possible fix:
> - Use the provided schema to parse the line with the correct number of columns
> - Since its nullable implement a try catch on CSVRelation.csvParser indexSafeTokens(index)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org