You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2018/02/17 06:06:03 UTC
[jira] [Commented] (HIVE-15640) Hive/Druid integration: null
handling for metrics
[ https://issues.apache.org/jira/browse/HIVE-15640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368114#comment-16368114 ]
slim bouguerra commented on HIVE-15640:
---------------------------------------
Null handling still un-merged yet... pending on reviews https://github.com/druid-io/druid/pull/5278
> Hive/Druid integration: null handling for metrics
> -------------------------------------------------
>
> Key: HIVE-15640
> URL: https://issues.apache.org/jira/browse/HIVE-15640
> Project: Hive
> Issue Type: Bug
> Components: Druid integration
> Affects Versions: 2.2.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
> Priority: Critical
>
> Null values for metrics in Druid and Hive are not handled the same way (_0.0_ vs _NULL_).
> In Druid:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset
> WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 00:00:00'
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s
> LIMIT 10;
> OK
> 6015006 1999-11-03 00:00:00 0.0 0.0
> 9011009 1999-11-05 00:00:00 0.0 0.0
> 8003009 1999-11-03 00:00:00 11.0 1.0299999713897705
> 10005014 1999-11-05 00:00:00 86.0 1.100000023841858
> 6008007 1999-11-09 00:00:00 81.0 1.3700000047683716
> 6003003 1999-11-08 00:00:00 45.0 1.600000023841858
> 8008009 1999-11-08 00:00:00 98.0 1.7100000381469727
> 8015003 1999-11-02 00:00:00 10.0 1.7400000095367432
> 8004008 1999-11-10 00:00:00 45.0 1.7599999904632568
> 8009009 1999-11-07 00:00:00 81.0 1.7699999809265137
> {code}
> In Hive:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset_hive
> WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 00:00:00'
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s
> LIMIT 10;
> OK
> 6015006 1999-11-03 00:00:00 NULL NULL
> 9011009 1999-11-05 00:00:00 NULL NULL
> 8003009 1999-11-03 00:00:00 11 1.03
> 10005014 1999-11-05 00:00:00 86 1.1
> 6008007 1999-11-09 00:00:00 81 1.37
> 6003003 1999-11-08 00:00:00 45 1.6
> 8008009 1999-11-08 00:00:00 98 1.71
> 8015003 1999-11-02 00:00:00 10 1.74
> 8004008 1999-11-10 00:00:00 45 1.76
> 8009009 1999-11-07 00:00:00 81 1.77
> {code}
> However, for Druid dimensions, NULL values seem to be handled properly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)