You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2018/02/17 06:06:03 UTC

[jira] [Commented] (HIVE-15640) Hive/Druid integration: null handling for metrics

    [ https://issues.apache.org/jira/browse/HIVE-15640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368114#comment-16368114 ] 

slim bouguerra commented on HIVE-15640:
---------------------------------------

Null handling still un-merged yet... pending on reviews https://github.com/druid-io/druid/pull/5278

> Hive/Druid integration: null handling for metrics
> -------------------------------------------------
>
>                 Key: HIVE-15640
>                 URL: https://issues.apache.org/jira/browse/HIVE-15640
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>
> Null values for metrics in Druid and Hive are not handled the same way (_0.0_ vs _NULL_).
> In Druid:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset
> WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 00:00:00'
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s
> LIMIT 10;
> OK
> 6015006	1999-11-03 00:00:00	0.0	0.0
> 9011009	1999-11-05 00:00:00	0.0	0.0
> 8003009	1999-11-03 00:00:00	11.0	1.0299999713897705
> 10005014	1999-11-05 00:00:00	86.0	1.100000023841858
> 6008007	1999-11-09 00:00:00	81.0	1.3700000047683716
> 6003003	1999-11-08 00:00:00	45.0	1.600000023841858
> 8008009	1999-11-08 00:00:00	98.0	1.7100000381469727
> 8015003	1999-11-02 00:00:00	10.0	1.7400000095367432
> 8004008	1999-11-10 00:00:00	45.0	1.7599999904632568
> 8009009	1999-11-07 00:00:00	81.0	1.7699999809265137
> {code}
> In Hive:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset_hive
> WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 00:00:00'
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s
> LIMIT 10;
> OK
> 6015006	1999-11-03 00:00:00	NULL	NULL
> 9011009	1999-11-05 00:00:00	NULL	NULL
> 8003009	1999-11-03 00:00:00	11	1.03
> 10005014	1999-11-05 00:00:00	86	1.1
> 6008007	1999-11-09 00:00:00	81	1.37
> 6003003	1999-11-08 00:00:00	45	1.6
> 8008009	1999-11-08 00:00:00	98	1.71
> 8015003	1999-11-02 00:00:00	10	1.74
> 8004008	1999-11-10 00:00:00	45	1.76
> 8009009	1999-11-07 00:00:00	81	1.77
> {code}
> However, for Druid dimensions, NULL values seem to be handled properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)