You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by FangFang Chen <lu...@163.com> on 2016/04/20 12:45:28 UTC

回复：Spark sql and hive into different result with same sql

The output is:
Spark SQ:6828127
Hive:6980574.1269


发自 网易邮箱大师
在2016年04月20日 18:06，FangFang Chen 写道:
Hi all,
Please give some suggestions. Thanks


With following same sql, spark sql and hive give different result. The sql is sum(decimal(38,18)) columns.
Select sum(column) from table;
column is defined as decimal(38,18).


Spark version:1.5.3
Hive version:2.0.0


发自 网易邮箱大师

回复：Re: 回复：Spark sql and hive into different result with same sql

Posted by FangFang Chen <lu...@163.com>.

maybe I found the root cause from spark doc:
"Unlimited precision decimal columns are no longer supported, instead Spark SQL enforces a maximum precision of 38. When inferring schema from BigDecimal objects, a precision of (38, 18) is now used. When no precision is specified in DDL then the default remainsDecimal(10, 0)."
I got decimal(38,18) when describe this table, while got decimal when show create this table. Seems spark is getting the scheme information based on create table side. Correct?
Is there any workaround to resolve this problem? Beside alter hive table column type from decimal to decimal with precision.

Thanks

发自 网易邮箱大师
在2016年04月20日 20:47，Ted Yu 写道:
Do you mind trying out build from master branch ?

1.5.3 is a bit old.

On Wed, Apr 20, 2016 at 5:25 AM, FangFang Chen <lu...@163.com> wrote:

I found spark sql lost precision, and handle data as int with some rule. Following is data got via hive shell and spark sql, with same sql to same hive table:
Hive:
0.4
0.5
1.8
0.4
0.49
1.5
Spark sql:
1
2
2
Seems the handle rule is: when decimal point data <0.5 then to 0, when decimal point data>=0.5 then to 1.

Is this a bug or some configuration thing? Please give some suggestions. Thanks

发自 网易邮箱大师
在2016年04月20日 18:45，FangFang Chen 写道:
The output is:
Spark SQ:6828127
Hive:6980574.1269

发自 网易邮箱大师
在2016年04月20日 18:06，FangFang Chen 写道:
Hi all,
Please give some suggestions. Thanks

With following same sql, spark sql and hive give different result. The sql is sum(decimal(38,18)) columns.
Select sum(column) from table;
column is defined as decimal(38,18).

Spark version:1.5.3
Hive version:2.0.0

发自 网易邮箱大师

Re: 回复：Spark sql and hive into different result with same sql

Posted by Ted Yu <yu...@gmail.com>.

Do you mind trying out build from master branch ?

1.5.3 is a bit old.

On Wed, Apr 20, 2016 at 5:25 AM, FangFang Chen <lu...@163.com>
wrote:

> I found spark sql lost precision, and handle data as int with some rule.
> Following is data got via hive shell and spark sql, with same sql to same
> hive table:
> Hive:
> 0.4
> 0.5
> 1.8
> 0.4
> 0.49
> 1.5
> Spark sql:
> 1
> 2
> 2
> Seems the handle rule is: when decimal point data <0.5 then to 0, when
> decimal point data>=0.5 then to 1.
>
> Is this a bug or some configuration thing? Please give some suggestions.
> Thanks
>
> 发自 网易邮箱大师 <http://u.163.com/signature>
> 在2016年04月20日 18:45，FangFang Chen <lu...@163.com> 写道:
>
> The output is:
> Spark SQ:6828127
> Hive:6980574.1269
>
> 发自 网易邮箱大师 <http://u.163.com/signature>
> 在2016年04月20日 18:06，FangFang Chen <lu...@163.com> 写道:
>
> Hi all,
> Please give some suggestions. Thanks
>
> With following same sql, spark sql and hive give different result. The sql
> is sum(decimal(38,18)) columns.
> Select sum(column) from table;
> column is defined as decimal(38,18).
>
> Spark version:1.5.3
> Hive version:2.0.0
>
> 发自 网易邮箱大师 <http://u.163.com/signature>
>
>
>
>
>
>
>

回复：回复：Spark sql and hive into different result with same sql

Posted by FangFang Chen <lu...@163.com>.

I found spark sql lost precision, and handle data as int with some rule. Following is data got via hive shell and spark sql, with same sql to same hive table:
Hive:
0.4
0.5
1.8
0.4
0.49
1.5
Spark sql:
1
2
2
Seems the handle rule is: when decimal point data <0.5 then to 0, when decimal point data>=0.5 then to 1.


Is this a bug or some configuration thing? Please give some suggestions. Thanks


发自 网易邮箱大师
在2016年04月20日 18:45，FangFang Chen 写道:
The output is:
Spark SQ:6828127
Hive:6980574.1269


发自 网易邮箱大师
在2016年04月20日 18:06，FangFang Chen 写道:
Hi all,
Please give some suggestions. Thanks


With following same sql, spark sql and hive give different result. The sql is sum(decimal(38,18)) columns.
Select sum(column) from table;
column is defined as decimal(38,18).


Spark version:1.5.3
Hive version:2.0.0


发自 网易邮箱大师