You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Divya Gehlot <di...@gmail.com> on 2016/02/24 07:38:11 UTC

Fwd: [Vote] : Spark-csv 1.3 + Spark 1.5.2 - Error parsing null values except String data type

Hi,

Please vote if you have faced this issue.
I am getting error when parsing null values with Spark-csv
DataFile :
name age
alice 35
bob null
peter 24
Code :
 spark-shell  --packages com.databricks:spark-csv_2.10:1.3.0  --master
yarn-client -i /TestDivya/Spark/Testnull.scala

Testnull.scala

> import org.apache.spark.sql.types.{StructType, StructField,NullType,
> DateType,, IntegerType,, LongType,DoubleType, FloatType, StringType,};
> import java.util.Properties
> import org.apache.spark._
> import org.apache.spark.sql._
>
> val testnullSchema = StructType(List(
> StructField("name", StringType, false),
>                      StructField("age", IntegerType, true)))
> val dfreadnull =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("nullValue","").option("treatEmptyValuesAsNulls","true").schema(testnullSchema).load("hdfs://xxx.xxx.xxx.xxx
> :8020/TestDivya/Spark/nulltest1.csv
> <http://172.31.29.201:8020/TestDivya/Spark/nulltest1.csv>")



 Has anybody faced similar issue reading csv file which has null values in
fields apart from String datatype .

*P.S - Googled it and found the issue is open Spark-csv Github Repo
<https://github.com/databricks/spark-csv/issues/192>*

Thanks,
Divya