You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2017/01/16 13:02:26 UTC

[jira] [Created] (HIVE-15640) Hive/Druid integration: null handling for metrics

Jesus Camacho Rodriguez created HIVE-15640:
----------------------------------------------

             Summary: Hive/Druid integration: null handling for metrics
                 Key: HIVE-15640
                 URL: https://issues.apache.org/jira/browse/HIVE-15640
             Project: Hive
          Issue Type: Bug
          Components: Druid integration
    Affects Versions: 2.2.0
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez
            Priority: Critical


Null values for metrics in Druid and Hive are not handled the same way (_0.0_ vs _NULL_).

In Druid:
{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 00:00:00'
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
OK
6015006	1999-11-03 00:00:00	0.0	0.0
9011009	1999-11-05 00:00:00	0.0	0.0
8003009	1999-11-03 00:00:00	11.0	1.0299999713897705
10005014	1999-11-05 00:00:00	86.0	1.100000023841858
6008007	1999-11-09 00:00:00	81.0	1.3700000047683716
6003003	1999-11-08 00:00:00	45.0	1.600000023841858
8008009	1999-11-08 00:00:00	98.0	1.7100000381469727
8015003	1999-11-02 00:00:00	10.0	1.7400000095367432
8004008	1999-11-10 00:00:00	45.0	1.7599999904632568
8009009	1999-11-07 00:00:00	81.0	1.7699999809265137
{code}

In Hive:
{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset_hive
WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 00:00:00'
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
OK
6015006	1999-11-03 00:00:00	NULL	NULL
9011009	1999-11-05 00:00:00	NULL	NULL
8003009	1999-11-03 00:00:00	11	1.03
10005014	1999-11-05 00:00:00	86	1.1
6008007	1999-11-09 00:00:00	81	1.37
6003003	1999-11-08 00:00:00	45	1.6
8008009	1999-11-08 00:00:00	98	1.71
8015003	1999-11-02 00:00:00	10	1.74
8004008	1999-11-10 00:00:00	45	1.76
8009009	1999-11-07 00:00:00	81	1.77
{code}

However, for Druid dimensions, NULL values seem to be handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)