You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2017/01/16 10:39:26 UTC
[jira] [Created] (HIVE-15635) Hive/Druid integration: timeseries query shows all days, even if no data

Jesus Camacho Rodriguez created HIVE-15635:
----------------------------------------------

             Summary: Hive/Druid integration: timeseries query shows all days, even if no data
                 Key: HIVE-15635
                 URL: https://issues.apache.org/jira/browse/HIVE-15635
             Project: Hive
          Issue Type: Bug
          Components: Druid integration
    Affects Versions: 2.2.0
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez
            Priority: Critical


We should have consistent results on Druid vs Hive. However, following query is transformed into timeseries Druid query which yields different results in Druid, since it will show all values for the given time granularity, even if there is no data for the given _i\_brand\_id_.

In Druid:
{code:sql}
SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), sum(ss_wholesale_cost)
FROM store_sales_sold_time_subset
WHERE i_brand_id = 10001009
GROUP BY floor_day(`__time`)
ORDER BY `granularity`;
OK
1999-11-01 00:00:00	45	37.47
1999-11-02 00:00:00	-9223372036854775808	0.0
1999-11-03 00:00:00	-9223372036854775808	0.0
1999-11-04 00:00:00	39	61.52
1999-11-05 00:00:00	74	145.84
1999-11-06 00:00:00	62	14.5
1999-11-07 00:00:00	-9223372036854775808	0.0
1999-11-08 00:00:00	5	34.08
1999-11-09 00:00:00	-9223372036854775808	0.0
1999-11-10 00:00:00	-9223372036854775808	0.0
1999-11-11 00:00:00	-9223372036854775808	0.0
1999-11-12 00:00:00	66	67.22
1999-11-13 00:00:00	-9223372036854775808	0.0
1999-11-14 00:00:00	-9223372036854775808	0.0
1999-11-15 00:00:00	-9223372036854775808	0.0
1999-11-16 00:00:00	60	96.37
1999-11-17 00:00:00	50	79.11
1999-11-18 00:00:00	-9223372036854775808	0.0
1999-11-19 00:00:00	-9223372036854775808	0.0
1999-11-20 00:00:00	-9223372036854775808	0.0
1999-11-21 00:00:00	-9223372036854775808	0.0
1999-11-22 00:00:00	-9223372036854775808	0.0
1999-11-23 00:00:00	57	17.69
1999-11-24 00:00:00	-9223372036854775808	0.0
1999-11-25 00:00:00	-9223372036854775808	0.0
1999-11-26 00:00:00	-9223372036854775808	0.0
1999-11-27 00:00:00	86	91.59
1999-11-28 00:00:00	-9223372036854775808	0.0
1999-11-29 00:00:00	93	136.48
1999-11-30 00:00:00	-9223372036854775808	0.0
{code}

In Hive:
{code:sql}
SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), sum(ss_wholesale_cost)
FROM store_sales_sold_time_subset_hive
WHERE i_brand_id = 10001009
GROUP BY floor_day(`__time`)
ORDER BY `granularity`;
OK
1999-11-01 00:00:00	45	37.47
1999-11-04 00:00:00	39	61.52
1999-11-05 00:00:00	74	145.84
1999-11-06 00:00:00	62	14.5
1999-11-08 00:00:00	5	34.08
1999-11-12 00:00:00	66	67.22
1999-11-16 00:00:00	60	96.36999999999999
1999-11-17 00:00:00	50	79.11
1999-11-23 00:00:00	57	17.689999999999998
1999-11-27 00:00:00	86	91.59
1999-11-29 00:00:00	93	136.48
{code}

Probably we should handle this in the _timeseries_ record reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)