You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "peter.zhang (JIRA)" <ji...@apache.org> on 2014/11/05 02:17:33 UTC

[jira] [Comment Edited] (SPARK-4217) Result of SparkSQL is incorrect after a table join and group by operation

    [ https://issues.apache.org/jira/browse/SPARK-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197281#comment-14197281 ] 

peter.zhang edited comment on SPARK-4217 at 11/5/14 1:17 AM:
-------------------------------------------------------------

Hi Venkata,
Thanks for your attention. How did you run hive SQL?
I mean the pure hive app rather than Spark HiveContext!
It's impossible I tested this SQL both in my dev environment (Hive 0.11) and our company's production environment, the result from hive were same.
Please tell me what's your hive version


was (Author: peter_zhang921):
It's impossible I tested this SQL both in my dev environment (Hive 0.11) and our company's production environment, the result from hive were same.
Please tell me what's your hive version

> Result of SparkSQL is incorrect after a table join and group by operation
> -------------------------------------------------------------------------
>
>                 Key: SPARK-4217
>                 URL: https://issues.apache.org/jira/browse/SPARK-4217
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.1.0
>         Environment: Hadoop 2.2.0
> Spark1.1
>            Reporter: peter.zhang
>            Priority: Critical
>         Attachments: TestScript.sql, saledata.zip
>
>
> I runed a test using same SQL script in SparkSQL, Shark and Hive environment as below
> ---------------------------------------------------------------
> select c.theyear, sum(b.amount)
> from tblstock a
> join tblStockDetail b on a.ordernumber = b.ordernumber
> join tbldate c on a.dateid = c.dateid
> group by c.theyear;
> result of hive/shark:
> theyear	_c1
> 2004	1403018
> 2005	5557850
> 2006	7203061
> 2007	11300432
> 2008	12109328
> 2009	5365447
> 2010	188944
> result of SparkSQL:
> 2010	210924
> 2004	3265696
> 2005	13247234
> 2006	13670416
> 2007	16711974
> 2008	14670698
> 2009	6322137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org