You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "XiaodongCui (JIRA)" <ji...@apache.org> on 2017/01/06 09:38:58 UTC

[jira] [Reopened] (SPARK-19102) Accuracy error of spark SQL results

     [ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

XiaodongCui reopened SPARK-19102:
---------------------------------

the data under the path :hdfs://cdh01:8020/sandboxdata_A/test/a in the attach file

> Accuracy error of spark SQL results
> -----------------------------------
>
>                 Key: SPARK-19102
>                 URL: https://issues.apache.org/jira/browse/SPARK-19102
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.6.0, 1.6.1
>         Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>            Reporter: XiaodongCui
>         Attachments: a.zip
>
>
> the problem is cube6's  second column named sumprice is 10000 times bigger than the cube5's  second column named sumprice,but  they should be equal .the bug is only reappear  in the format like sum(a * b),count (distinct  c)
> 	DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
> 		df1.registerTempTable("hd_salesflat");
> 		DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
> 		DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat GROUP BY areacode1");
> 		cube5.show(50);
> 		cube6.show(50);
> my  data:
> transno | quantity | unitprice | areacode1
> 76317828|  1.0000  |  25.0000  |  HDCN
> data schema:
>  |-- areacode1: string (nullable = true)
>  |-- quantity: decimal(20,4) (nullable = true)
>  |-- unitprice: decimal(20,4) (nullable = true)
>  |-- transno: string (nullable = true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org