You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Xin Hao (JIRA)" <ji...@apache.org> on 2016/03/16 06:44:33 UTC

[jira] [Created] (HIVE-13292) Different DOUBLE type precision issue between Spark and MR engine

Xin Hao created HIVE-13292:
------------------------------

             Summary: Different DOUBLE type precision issue between Spark and MR engine
                 Key: HIVE-13292
                 URL: https://issues.apache.org/jira/browse/HIVE-13292
             Project: Hive
          Issue Type: Bug
         Environment: Apache Hive 2.0.0
Apache Spark 1.6.0
            Reporter: Xin Hao


Different DOUBLE type precision issue between Spark and MR engine.
Found when executing the TPC-H query5 with scale factor 2 (2GB data size). More details are as below.


(1)The MR engine output:
MOZAMBIQUE,1.0646195910990009E8
ETHIOPIA,1.0108856206629996E8
ALGERIA,9.987582690420012E7
MOROCCO,9.785484184850013E7
KENYA,9.412388077690017E7

(2)The Spark engine output:
MOZAMBIQUE,1.064619591099E8
ETHIOPIA,1.0108856206630005E8
ALGERIA,9.987582690419997E7
MOROCCO,9.785484184850003E7
KENYA,9.412388077690002E7


(3)Detail SQL used:
drop table if exists ${env:RESULT_TABLE};
create table ${env:RESULT_TABLE} (
  pid1 STRING,
  pid2 DOUBLE
)
row format delimited fields terminated by ',' lines terminated by '\n'
stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location '${env:RESULT_DIR}';

insert into table ${env:RESULT_TABLE}

select
        n_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue
from
        customer,
        orders,
        lineitem,
        supplier,
        nation,
        region
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and l_suppkey = s_suppkey
        and c_nationkey = s_nationkey
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'AFRICA'
        and o_orderdate >= '1993-01-01'
        and o_orderdate < '1994-01-01'
group by
        n_name
order by
        revenue desc;

(4)Similar issue also exists even after we simplified original query to a simpler one as below:

drop table if exists ${env:RESULT_TABLE};
create table ${env:RESULT_TABLE} (
  pid2 DOUBLE
)
row format delimited fields terminated by ',' lines terminated by '\n'
stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location '${env:RESULT_DIR}';

insert into table ${env:RESULT_TABLE}

select
        sum(l_extendedprice * (1 - l_discount)) as revenue
from
        lineitem
group by
        l_orderkey
order by
        revenue;




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)