You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2014/11/05 17:11:34 UTC
[jira] [Closed] (SPARK-4252) SparkSQL behaves differently from Hive
when encountering illegal record
[ https://issues.apache.org/jira/browse/SPARK-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian closed SPARK-4252.
-----------------------------
Resolution: Duplicate
Had an offline discussion with [~patrickliu], the Hive version he was using is 0.10.0. Thus this issue is a duplicate of SPARK-4217. Please refer to [this comment|https://issues.apache.org/jira/browse/SPARK-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198347#comment-14198347] for details.
> SparkSQL behaves differently from Hive when encountering illegal record
> -----------------------------------------------------------------------
>
> Key: SPARK-4252
> URL: https://issues.apache.org/jira/browse/SPARK-4252
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.1.0
> Reporter: patrickliu
>
> Hive will ignore illegal record, while SparkSQL will try to convert illegal record.
> Assume I have a text file user.txt with 2 records(userName, age):
> Alice,12.4
> Bob,13
> Then I create a Hive table to query the data:
> CREATE TABLE user(
> name string,
> age int, (Pay attention! The field is int)
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ;
> LOAD DATA LOCAL INPATH 'user' INTO TABLE user;
> Then I use Hive and SparkSQL to query the 'user' table:
> SQL: select * from user;
> Result by Hive:
> Alice NULL( Hive ignore Alice's age because it is a float number )
> Bob 13
> Result by SparkSQL:
> Alice 12 ( SparkSQL converts Alice's age from float to int )
> Bob 13
> So if I run, "select sum(age) from user;"
> Then I will get different result.
> Maybe SparkSQL should be compatible with Hive in this scenario.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org