You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2020/08/14 13:22:00 UTC

[jira] [Created] (ARROW-9741) [Rust] [DataFusion] Incorrect count in TPC-H query 1 result set

Andy Grove created ARROW-9741:
---------------------------------

             Summary: [Rust] [DataFusion] Incorrect count in TPC-H query 1 result set
                 Key: ARROW-9741
                 URL: https://issues.apache.org/jira/browse/ARROW-9741
             Project: Apache Arrow
          Issue Type: Bug
          Components: Rust, Rust - DataFusion
            Reporter: Andy Grove
            Assignee: Andy Grove
             Fix For: 2.0.0


I am testing with the 100 GB (scale factor 100) data set, in Parquet format. The results overall match between Spark and DataFusion with the exception of one of the counts (Spark has 291241911 and DataFusion has 300058170 .. a difference of 8816259).

 

DataFusion query:
{code:java}
"select
    l_returnflag,
    l_linestatus,
    sum(l_quantity),
    sum(l_extendedprice),
    sum(l_extendedprice * (1 - l_discount)),
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)),
    avg(l_quantity),
    avg(l_extendedprice),
    avg(l_discount),
    count(*)
from
    lineitem
where
    l_shipdate <= '1998-12-01'
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus" {code}
DataFusion output:
{code:java}
+--------------+--------------+-----------------+----------------------+--------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------+-----------------+
| l_returnflag | l_linestatus | sum(l_quantity) | sum(l_extendedprice) | sum(l_extendedprice Multiply CAST(Int64(1) as Float64) Minus l_discount) | sum(l_extendedprice Multiply CAST(Int64(1) as Float64) Minus l_discount Multiply CAST(Int64(1) as Float64) Plus l_tax) | avg(l_quantity)    | avg(l_extendedprice) | avg(l_discount)      | count(UInt8(1)) |
+--------------+--------------+-----------------+----------------------+--------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------+-----------------+
| A            | F            | 3775127758      | 5660776097194.464    | 5377736398183.935                                                        | 5592847429515.93                                                                                                       | 25.49937060623502  | 38236.11838745711    | 0.05000224145223291  | 148047881       |
| N            | F            | 98553062        | 147771098385.98004   | 140384965965.03473                                                       | 145999793032.77594                                                                                                     | 25.501475096542002 | 38237.03209968505    | 0.0499850931498342   | 3864590         |
| N            | O            | 7651423419      | 11473321691083.244   | 10899667121317.215                                                       | 11335664103186.313                                                                                                     | 25.499799813085986 | 38236.99077003657    | 0.04999757591275955  | 300058170       |
| R            | F            | 3775724970      | 5661603032745.35     | 5378513563915.415                                                        | 5593662252666.921                                                                                                      | 25.500067651651772 | 38236.70005754084    | 0.050001305269911714 | 148067261       |
+--------------+--------------+-----------------+----------------------+--------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+----------------------+-----------------+
 {code}
Spark query:
{code:java}
| select
|     l_returnflag,
|     l_linestatus,
|     sum(l_quantity) as sum_qty,
|     sum(l_extendedprice) as sum_base_price,
|     sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
|     sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
|     avg(l_quantity) as avg_qty,
|     avg(l_extendedprice) as avg_price,
|     avg(l_discount) as avg_disc,
|     count(*) as count_order
| from
|     lineitem
| where
|     l_shipdate < '1998-09-01'
| group by
|     l_returnflag,
|     l_linestatus
| order by
|     l_returnflag,
|     l_linestatus {code}
Spark output:
{code:java}
+------------+------------+-------------+--------------------+--------------------+--------------------+------------------+------------------+--------------------+-----------+
|l_returnflag|l_linestatus|      sum_qty|      sum_base_price|      sum_disc_price|          sum_charge|           avg_qty|         avg_price|            avg_disc|count_order|
+------------+------------+-------------+--------------------+--------------------+--------------------+------------------+------------------+--------------------+-----------+
|           A|           F|3.775127758E9|5.660776097194467E12|5.377736398183933E12|5.592847429515929E12|25.499370423275426| 38236.11698430501| 0.05000224353093977|  148047881|
|           N|           F|  9.8553062E7|1.477710983859800...|1.403849659650348E11|1.459997930327758...|25.501556956882876|38237.199388804525|0.049985284338051286|    3864590|
|           N|           O|7.426674812E9|1.113628734444901...|1.057947943676070...|1.100266737949706...| 25.50002088126664|38237.241701277715| 0.04999786229238074|  291241911|
|           R|           F| 3.77572497E9|5.661603032745349E12|5.378513563915412E12|5.593662252666918E12| 25.50006628406532| 38236.69725845302|0.050001304339664904|  148067261|
+------------+------------+-------------+--------------------+--------------------+--------------------+------------------+------------------+--------------------+-----------+
 {code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)