You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2016/09/01 00:07:21 UTC
[jira] [Commented] (HIVE-14568) Hive Decimal Returns NULL
[ https://issues.apache.org/jira/browse/HIVE-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453811#comment-15453811 ]
Xuefu Zhang commented on HIVE-14568:
------------------------------------
I think this is mostly by design. You have two columns: decimal(p1, s1) and decimal(p2,s2). We need to statically derive the type for the product of the two columns based on s = s1 + s2 and p1 = p1 + p2 +1. since your s1 = 28 and s2 = 10 in your case, then s = 38. Similarly, p = 38 (which is the max). Thus, the result column has a type decimal(38, 38). This basically means that the result cannot have any integer part. On the other hand, if the result type is set as (38, 18), I can certainly construct example data which shows that the production of the two column loses the scale that I was expecting.
I understand that NULL may have been surprising to people. However, I wonder why a column defined as decimal (38,28) to be used to store data like 1.2, 1.44, etc. Is it reasonable to have a smaller precision/scale?
This sounds like a data modeling issue. the metadata needs to closely define the data.
It's a good point that an ERROR here might be better so that NULL doesn't slick in unnoticed. I believe that in MySQL there is a strict mode, which, when on, will generate error in this case. We don't have such mode defined in Hive, but it may make sense to introduce such a mode.
> Hive Decimal Returns NULL
> -------------------------
>
> Key: HIVE-14568
> URL: https://issues.apache.org/jira/browse/HIVE-14568
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.0.0, 1.2.0
> Environment: Centos 6.7, Hadoop 2.7.2,hive 1.0.0,2.0
> Reporter: gurmukh singh
> Assignee: Xuefu Zhang
>
> Hi
> I was under the impression that the bug: https://issues.apache.org/jira/browse/HIVE-5022 got fixed. But, I see the same issue in Hive 1.0 and hive 1.2 as well.
> hive> desc mul_table;
> OK
> prc decimal(38,28)
> vol decimal(38,10)
> Time taken: 0.068 seconds, Fetched: 2 row(s)
> hive> select prc, vol, prc*vol as cost from mul_table;
> OK
> 1.2 200 NULL
> 1.44 200 NULL
> 2.14 100 NULL
> 3.004 50 NULL
> 1.2 200 NULL
> Time taken: 0.048 seconds, Fetched: 5 row(s)
> Rather then returning NULL, it should give error or round off.
> I understand that, I can use Double instead of decimal or can cast it, but still returning "Null" will make many things go unnoticed.
> hive> desc mul_table2;
> OK
> prc double
> vol decimal(14,10)
> Time taken: 0.049 seconds, Fetched: 2 row(s)
> hive> select * from mul_table2;
> OK
> 1.4 200
> 1.34 200
> 7.34 100
> 7454533.354544 100
> Time taken: 0.028 seconds, Fetched: 4 row(s)
> hive> select prc, vol, prc*vol as cost from mul_table3;
> OK
> 7.34 100 734.0
> 7.34 1000 7340.0
> 1.0004 1000 1000.4
> 7454533.354544 100 7.454533354544E8 <----- Wrong result
> 7454533.354544 1000 7.454533354544E9 <----- Wrong result
> Time taken: 0.025 seconds, Fetched: 5 row(s)
> Casting:
> hive> select prc, vol, cast(prc*vol as decimal(38,38)) as cost from mul_table3;
> OK
> 7.34 100 NULL
> 7.34 1000 NULL
> 1.0004 1000 NULL
> 7454533.354544 100 NULL
> 7454533.354544 1000 NULL
> Time taken: 0.033 seconds, Fetched: 5 row(s)
> hive> select prc, vol, cast(prc*vol as decimal(38,10)) as cost from mul_table3;
> OK
> 7.34 100 734
> 7.34 1000 7340
> 1.0004 1000 1000.4
> 7454533.354544 100 745453335.4544
> 7454533.354544 1000 7454533354.544
> Time taken: 0.026 seconds, Fetched: 5 row(s)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)