You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/08/05 00:44:20 UTC

[jira] [Commented] (SPARK-16903) nullValue in first field is not respected by CSV source when read

    [ https://issues.apache.org/jira/browse/SPARK-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408704#comment-15408704 ] 

Hyukjin Kwon commented on SPARK-16903:
--------------------------------------

Hi [~falaki], is this about SPARK-16462, SPARK-16460 and SPARK-15144 ? Maybe the discussion in  https://github.com/apache/spark/pull/14118 is related.

I guess we have been not reading {{null}}s for {{StringType}}. 

Meaning this might be not related with the order of column but the type.

For example,

with the data below:

{code}
-,a
10,-
{code}

with the code below:

{code}
val schema = StructType(
  StructField("value", DecimalType.SYSTEM_DEFAULT, true) ::
  StructField("key", StringType, true) :: Nil)
val cars = spark.read.format("csv")
  .schema(schema)
  .option("header", "false")
  .option("nullValue", "-")
  .load("/tmp/null.csv")
  
cars.show()
{code}

prints the results below:

{code}
+--------------------+---+
|               value|key|
+--------------------+---+
|                null|  a|
|10.00000000000000...|  -|
+--------------------+---+
{code}

cc [~proflin] Who I believe took a look for this as well.

> nullValue in first field is not respected by CSV source when read
> -----------------------------------------------------------------
>
>                 Key: SPARK-16903
>                 URL: https://issues.apache.org/jira/browse/SPARK-16903
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Hossein Falaki
>
> file:
> {code}
> a,-
> -,10
> {code}
> Query:
> {code}
> create temporary table test(key string, val decimal) 
> using com.databricks.spark.csv 
> options (path "/tmp/hossein2/null.csv", header "false", delimiter ",", nullValue "-");
> {code}
> Result:
> {code}
> select count(*) from test where key is null
> 0
> {code}
> But
> {code}
> select count(*) from test where val is null
> 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org