You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2018/11/18 05:57:00 UTC

[jira] [Created] (IMPALA-7865) Repeated type widening of arithmetic expressions

Paul Rogers created IMPALA-7865:
-----------------------------------

             Summary: Repeated type widening of arithmetic expressions
                 Key: IMPALA-7865
                 URL: https://issues.apache.org/jira/browse/IMPALA-7865
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers


An issue related to IMPALA-7855 occurs in {{ExprRewriterTest.TestToSql()}} in the CTAS test. (This test will be made into a separate method, {{TestCTASToSql()}}). When run with the "integrated rewrite" feature enabled, we get into this odd situation:

 * Analyze the {{CreateTableAsSelect}} statement. Create a temporary copy of the associated {{SELECT}} statement.
 * Rewrite the {{SELECT}} statement from {{SELECT 1 + 1}} (both {{TINYINT}}, with {{SMALLINT} for the {{+}} operation) to {{SELECT 2}} (as type {{TINYINT}}.)
 * After constant folding, the rule checks the original type of the expression ({{SMALLINT}}) and casts the result ({{TINYINT}}) to the original type ({{SMALLINT}}) using an implicit cast.
 * Perform column substitutions, reset and reanalyze. This process discards implicit casts. Because the value is 2, it takes the type TINYINT.
 * Create the base table expressions using the newly rewritten value ({{TINYINT}}) though the result expression is still {{SMALLINT}}.
 * Use the base expressions from the above (type as {{TINY}}) to declare the target table column.
 * Now, try to map the result expression {{SMALLINT}} into the newly created table column {{TINYINT}}. Fails with a overflow error.

While IMPALA-7855 describes how types are widened unnecessarily due to a single expression, the problem here occurs over time, due to repeated analysis of the same numeric expression:

* The analyzer implements a set of type propagation rules that generates a resulting type for arithmetic expressions that is wider than the types of the arguments. For example for {{tinyint_col + 1{{, {{tinyint_col}} and {{1}} are {{TINYINT}}, but the result of the expression is promoted to {{SMALLINT}}.
* The planner then sets the type of the constant (1 here) to {{SMALLINT}}.
* Repeat the process on the next cycle. {{tinyint_col}} is {{TINYINT}}, {{1}} is {{SMALLINT}}. Now the result of the expression is {{INT}} and {{1}} is retyped to be {{INT}}.
* Repeat again and the expression (and constant) are promoted to {{BIGINT}}.
    
Meanwhile, analysis has taken a clone of the expression with the old types. As a result, the types of columns in the result list for a SELECT statement can differ from the same columns recorded in the SELECT list.

 * After the above, the base table expression for a {{SELECT}} statement has one schema ({{TINYINT}}), the result expression has another ({{SMALLINT}}).

While the inconsistency in types may seem a minor issue, it does lead to analysis failures and does need to be addressed.

Perhaps two fixes are needed:

 * When rewriting a numeric literal in the constant folding rule, apply the rules from {{NumericLiteral}} to override the type guessed by the constant evaluation.
 * Modify the {{substituteImpl}} method to a) don't reset numeric literals, or, more generally, b) don't reset expressions that did not change (or their children did not change.)

Longer term, the implicit cast mechanism is overly fragile: we add it then discard it, resulting in subtle type inconsistencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)