You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "patrickliu (JIRA)" <ji...@apache.org> on 2014/11/05 16:10:33 UTC
[jira] [Created] (SPARK-4252) SparkSQL behaves differently from
Hive when encountering illegal record
patrickliu created SPARK-4252:
---------------------------------
Summary: SparkSQL behaves differently from Hive when encountering illegal record
Key: SPARK-4252
URL: https://issues.apache.org/jira/browse/SPARK-4252
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.1.0
Reporter: patrickliu
Hive will ignore illegal record, while SparkSQL will try to convert illegal record.
Assume I have a text file user.txt with 2 records(userName, age):
Alice,12.4
Bob,13
Then I create a Hive table to query the data:
CREATE TABLE user(
name string,
age int,
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ;
LOAD DATA LOCAL INPATH 'user' INTO TABLE user;
Then I use Hive and SparkSQL to query the 'user' table:
SQL: select * from user;
Result by Hive:
Alice NULL( Hive ignore Alice's age because it is a float number )
Bob 13
Result by SparkSQL:
Alice 12 ( SparkSQL converts Alice's age from float to int )
Bob 13
So if I run, "select sum(age) from user;"
Then I will get different result.
Maybe SparkSQL should be compatible with Hive in this scenario.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org