You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2014/09/16 22:48:34 UTC

[jira] [Commented] (HIVE-8111) CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO

    [ https://issues.apache.org/jira/browse/HIVE-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136178#comment-14136178 ] 

Sergey Shelukhin commented on HIVE-8111:
----------------------------------------

[~ashutoshc] [~jpullokkaran] fyi. I've tried doing 1 and 2 and encountered problems, for now exploring 5 and 3... tell me if you have any input.

Example of the biggest problem where decimal becomes null due to incorrect type is:
SELECT key * value FROM DECIMAL_UDF, "expressions: (key * value) (type: decimal(31,10))" becomes "expressions: (key * CAST( value AS decimal(31,10))) (type: decimal(38,20))" and 1524157875171467887.5019052100 becomes NULL because there are more than 18 digits in decimal part.
Incorrect types can also result in different types which I assume can make insert/create queries have undesirable results; not sure about other possible effects.


> CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-8111
>                 URL: https://issues.apache.org/jira/browse/HIVE-8111
>             Project: Hive
>          Issue Type: Sub-task
>          Components: CBO
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> Original test failure: looks like column type changes to different decimals in most cases. In one case it causes the integer part to be too big to fit, so the result becomes null it seems.
> What happens is that CBO adds casts to arithmetic expressions to make them type compatible; these casts become part of new AST, and then Hive adds casts on top of these casts. This (the first part) also causes lots of out file changes. It's not clear how to best fix it so far, in addition to incorrect decimal width and sometimes nulls when width is larger than allowed in Hive.
> Option one - don't add those for numeric ops - cannot be done if numeric op is a part of compare, for which CBO needs correct types.
> Option two - unwrap casts when determining type in Hive - hard or impossible to tell apart CBO-added casts and user casts. 
> Option three - don't change types in Hive if CBO has run - seems hacky and hard to ensure it's applied everywhere.
> Option four - map all expressions precisely between two trees and remove casts again after optimization, will be pretty difficult.
> Option five - somehow mark those casts. Not sure about how yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)