You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kuba Tyszko (JIRA)" <ji...@apache.org> on 2016/12/16 21:35:58 UTC

[jira] [Created] (SPARK-18906) CSV parser should return null for empty (or with "") numeric columns.

Kuba Tyszko created SPARK-18906:
-----------------------------------

             Summary: CSV parser should return null for empty (or with "") numeric columns.
                 Key: SPARK-18906
                 URL: https://issues.apache.org/jira/browse/SPARK-18906
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.1
            Reporter: Kuba Tyszko
            Priority: Minor


Spark allows user to set a nullValue that will indicate certain value's translation to a null type , for example string "NA" could be the one.
Data sources that use such nullValue but also have other columns that may contain empty values may not be parsed correctly.
The change resolves that by assuming that:
when column is infered as numeric
its field will be set to null when parsing fails, for example upon seeing empty value or an empty string.

Example:

---------------
|char|int1|int2
---------------
|a|1|2|
---------------
|a||0
---------------
|NA|""|""
----------------

This example illustrates that column "char" may contain an empty value indicated as "NA", column int1 has a "true null" value but then both int1 and int2 columns have an empty string set as their values.
In such situation parsing will fail.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org