You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Venki Korukanti (JIRA)" <ji...@apache.org> on 2015/07/08 19:39:04 UTC

[jira] [Resolved] (DRILL-3271) Hive : Tpch 01.q fails with a verification issue for SF100 dataset

     [ https://issues.apache.org/jira/browse/DRILL-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Venki Korukanti resolved DRILL-3271.
------------------------------------
    Resolution: Invalid

Just had a discussion with [~adeneche]. Floating point differences between runs are due to truncation in arithmetic operations and the order of data received at aggregator. The differences here still seems to be in acceptable range. We need to update the margin error constant in test framework.

> Hive : Tpch 01.q fails with a verification issue for SF100 dataset
> ------------------------------------------------------------------
>
>                 Key: DRILL-3271
>                 URL: https://issues.apache.org/jira/browse/DRILL-3271
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Hive
>            Reporter: Rahul Challapalli
>            Assignee: Venki Korukanti
>             Fix For: 1.2.0
>
>         Attachments: tpch100_hive.ddl
>
>
> git.commit.id.abbrev=5f26b8b
> Query :
> {code}
> select
>   l_returnflag,
>   l_linestatus,
>   sum(l_quantity) as sum_qty,
>   sum(l_extendedprice) as sum_base_price,
>   sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
>   sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
>   avg(l_quantity) as avg_qty,
>   avg(l_extendedprice) as avg_price,
>   avg(l_discount) as avg_disc,
>   count(*) as count_order
> from
>   lineitem
> where
>   l_shipdate <= date '1998-12-01' - interval '120' day (3)
> group by
>   l_returnflag,
>   l_linestatus
> order by
>   l_returnflag,
>   l_linestatus;
> {code}
> The 4th column appears to have some differences. Not sure if it is within acceptable range
> Expected :
> {code}
> A       F       3.775127758E9   5.660776097194428E12    5.377736398183942E12    5.592847429515948E12    25.499370423275426      38236.11698430475       0.05000224353079674     148047881
> N       O       7.269911583E9   1.0901214476134316E13   1.0356163586785008E13   1.077041889123738E13    25.499873337396807      38236.997134222445      0.04999763132401859     285095988
> R       F       3.77572497E9    5.661603032745362E12    5.378513563915394E12    5.593662252666902E12    25.50006628406532       38236.69725845312       0.05000130433952159     148067261
> N       F       9.8553062E7     1.4777109838597995E11   1.403849659650348E11    1.459997930327757E11    25.501556956882876      38237.19938880449       0.04998528433803118     3864590
> {code}
> Actual : 
> {code}
> A       F       3.775127758E9   5.660776097194352E12    5.37773639818398E12     5.592847429515874E12    25.499370423275426      38236.11698430423       0.0500022435305286      148047881
> N       O       7.269911583E9   1.0901214476134352E13   1.0356163586784926E13   1.0770418891237576E13   25.499873337396807      38236.99713422257       0.04999763132535226     285095988
> R       F       3.77572497E9    5.661603032745394E12    5.378513563915313E12    5.593662252666848E12    25.50006628406532       38236.69725845333       0.05000130433925318     148067261
> N       F       9.8553062E7     1.4777109838598022E11   1.4038496596503506E11   1.45999793032776E11     25.501556956882876      38237.19938880456       0.049985284338093884    3864590
> {code}
> The data is 100 GB, so I couldn't attach it here.
> I attached the hive ddl. Let me know if you need anything else



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)