You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/08/05 00:44:20 UTC
[jira] [Commented] (SPARK-16903) nullValue in first field is not
respected by CSV source when read
[ https://issues.apache.org/jira/browse/SPARK-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408704#comment-15408704 ]
Hyukjin Kwon commented on SPARK-16903:
--------------------------------------
Hi [~falaki], is this about SPARK-16462, SPARK-16460 and SPARK-15144 ? Maybe the discussion in https://github.com/apache/spark/pull/14118 is related.
I guess we have been not reading {{null}}s for {{StringType}}.
Meaning this might be not related with the order of column but the type.
For example,
with the data below:
{code}
-,a
10,-
{code}
with the code below:
{code}
val schema = StructType(
StructField("value", DecimalType.SYSTEM_DEFAULT, true) ::
StructField("key", StringType, true) :: Nil)
val cars = spark.read.format("csv")
.schema(schema)
.option("header", "false")
.option("nullValue", "-")
.load("/tmp/null.csv")
cars.show()
{code}
prints the results below:
{code}
+--------------------+---+
| value|key|
+--------------------+---+
| null| a|
|10.00000000000000...| -|
+--------------------+---+
{code}
cc [~proflin] Who I believe took a look for this as well.
> nullValue in first field is not respected by CSV source when read
> -----------------------------------------------------------------
>
> Key: SPARK-16903
> URL: https://issues.apache.org/jira/browse/SPARK-16903
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Hossein Falaki
>
> file:
> {code}
> a,-
> -,10
> {code}
> Query:
> {code}
> create temporary table test(key string, val decimal)
> using com.databricks.spark.csv
> options (path "/tmp/hossein2/null.csv", header "false", delimiter ",", nullValue "-");
> {code}
> Result:
> {code}
> select count(*) from test where key is null
> 0
> {code}
> But
> {code}
> select count(*) from test where val is null
> 1
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org