You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Xin Hao (JIRA)" <ji...@apache.org> on 2016/03/16 06:44:33 UTC
[jira] [Created] (HIVE-13292) Different DOUBLE type precision issue
between Spark and MR engine
Xin Hao created HIVE-13292:
------------------------------
Summary: Different DOUBLE type precision issue between Spark and MR engine
Key: HIVE-13292
URL: https://issues.apache.org/jira/browse/HIVE-13292
Project: Hive
Issue Type: Bug
Environment: Apache Hive 2.0.0
Apache Spark 1.6.0
Reporter: Xin Hao
Different DOUBLE type precision issue between Spark and MR engine.
Found when executing the TPC-H query5 with scale factor 2 (2GB data size). More details are as below.
(1)The MR engine output:
MOZAMBIQUE,1.0646195910990009E8
ETHIOPIA,1.0108856206629996E8
ALGERIA,9.987582690420012E7
MOROCCO,9.785484184850013E7
KENYA,9.412388077690017E7
(2)The Spark engine output:
MOZAMBIQUE,1.064619591099E8
ETHIOPIA,1.0108856206630005E8
ALGERIA,9.987582690419997E7
MOROCCO,9.785484184850003E7
KENYA,9.412388077690002E7
(3)Detail SQL used:
drop table if exists ${env:RESULT_TABLE};
create table ${env:RESULT_TABLE} (
pid1 STRING,
pid2 DOUBLE
)
row format delimited fields terminated by ',' lines terminated by '\n'
stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location '${env:RESULT_DIR}';
insert into table ${env:RESULT_TABLE}
select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AFRICA'
and o_orderdate >= '1993-01-01'
and o_orderdate < '1994-01-01'
group by
n_name
order by
revenue desc;
(4)Similar issue also exists even after we simplified original query to a simpler one as below:
drop table if exists ${env:RESULT_TABLE};
create table ${env:RESULT_TABLE} (
pid2 DOUBLE
)
row format delimited fields terminated by ',' lines terminated by '\n'
stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location '${env:RESULT_DIR}';
insert into table ${env:RESULT_TABLE}
select
sum(l_extendedprice * (1 - l_discount)) as revenue
from
lineitem
group by
l_orderkey
order by
revenue;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)