You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Carter Shanklin (JIRA)" <ji...@apache.org> on 2017/04/23 23:11:04 UTC

[jira] [Created] (HIVE-16513) width_bucket issues

Carter Shanklin created HIVE-16513:
--------------------------------------

             Summary: width_bucket issues
                 Key: HIVE-16513
                 URL: https://issues.apache.org/jira/browse/HIVE-16513
             Project: Hive
          Issue Type: Bug
            Reporter: Carter Shanklin


width_bucket was recently added with HIVE-15982. This ticket notes a few issues.

Usability issue:
Currently only accepts integral numeric types. Decimals, floats and doubles are not supported.

Runtime failures: This query will cause a runtime divide-by-zero in the reduce stage.
select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1;

The divide-by-zero seems to trigger any time I use a group-by. Here's another example (that actually requires the group-by):

select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1;

Advanced Usage Issues:

Suppose you have a table e011_01 as follows:
create table e011_01 (c1 integer, c2 smallint);
insert into e011_01 values (1, 1), (2, 2);

Compile-time problems:
You cannot use simple case expressions, searched case expressions or grouping sets. These queries fail:

select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) from e011_01;
select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) from e011_01;
select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as integer)) from e011_02 group by cube(c1, c2);

I'll admit the grouping one is pretty contrived but the case ones seem straightforward, valid, and it's strange that they don't work. Similar queries work with other UDFs like sum. Why wouldn't they "just work"? Maybe [~ashutoshc] can lend some perspective on that?

Interestingly, you can use window functions in width_bucket, example:
select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01;
works just fine. Hopefully we can get to a place where people implementing functions like this don't need to think about value expression support but we don't seem to be there yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)