You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stephane Maarek (JIRA)" <ji...@apache.org> on 2016/04/13 03:51:25 UTC

[jira] [Created] (SPARK-14586) SparkSQL doesn't parse decimal like Hive

Stephane Maarek created SPARK-14586:
---------------------------------------

             Summary: SparkSQL doesn't parse decimal like Hive
                 Key: SPARK-14586
                 URL: https://issues.apache.org/jira/browse/SPARK-14586
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.1
            Reporter: Stephane Maarek


create a test_data.csv with the following
{code:none}
a, 2.0
,3.0
{code}

(the space is intended before the 2)

copy the test_data.csv to hdfs:///spark_testing_2

go in hive, run the following statements

CREATE SCHEMA IF NOT EXISTS spark_testing;
DROP TABLE IF EXISTS spark_testing.test_csv_2;
CREATE EXTERNAL TABLE `spark_testing.test_csv_2`(
  column_1 varchar(10),
  column_2 decimal(4,2))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE LOCATION '/spark_testing_2'
TBLPROPERTIES('serialization.null.format'='');
select * from spark_testing.test_csv_2;
OK
a       2
NULL    3

{code}

As you can see, the value " 2" gets parsed correctly to 2

Now onto Spark-shell:

{code:java}

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("select * from spark_testing.test_csv_2").show()

+--------+--------+
|column_1|column_2|
+--------+--------+
|       a|    null|
|    null|    3.00|
+--------+--------+

{code}

As you can see, the " 2" got parsed to null. Therefore Hive and Spark have a similar parsing behavior for decimals. I wouldn't say it is a bug per se, but it looks like a necessary improvement for the two engines to converge



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org