You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/01/02 11:24:00 UTC

[jira] [Commented] (IMPALA-7865) Repeated type widening of arithmetic expressions

    [ https://issues.apache.org/jira/browse/IMPALA-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731969#comment-16731969 ] 

ASF subversion and git services commented on IMPALA-7865:
---------------------------------------------------------

Commit 27577dd652554dda5a03016e2d1e3ab66fe6b1f5 in impala's branch refs/heads/master from Paul Rogers
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=27577dd ]

IMPALA-7902: NumericLiteral fixes, refactoring

The work to clean up the rewriter logic must start with a stable AST,
which must start with sprucing up some issues with the leaf nodes. This
CR tackles the NumericLiteral used to hold numbers.

IMPALA-7896: Literals should not need explicit analyze step

Partial fix: removes the need to analyze a numeric literal: analyze() is
a no-op. This eliminates the need to do a "fake" analysis with a null
analyzer: numeric literals are now created analyzed. This is useful
because the catalog module creates numeric literals outside of a query
(and outside of an analyzer.)

A literal is immutable except for type. Modified the constructor to set
the type and cost, then mark the node as analyzed. A later call to
analyze() has nothing to do.

Code that created and dummy-analyzed numeric literals changed to use
static create() methods resulting in simpler literal creation, and
eliminates the special "analyzer == null" checks in analyze().

IMPALA-7886: NumericLiteral constructor fails to round values to
             Decimal type
IMPALA-7887: NumericLiteral fails to detect numeric overflow
IMPALA-7888: Incorrect NumericLiteral overflow checks for FLOAT,
             DOUBLE
IMPALA-7891: Analyzer does not detect numeric overflow in CAST
IMPALA-7894: Parser does not catch double overflow

These are all caused by the somewhat cluttered state of the numeric
range check code after years of incremental changes. This patch
centralizes all checks into a series of constants and methods for
uniformity.  All values are set in the constructor which now checks
that the value is legal for the type. Cast operations verify that the
cast is valid. Multiple semi-parallel versions of the same logic is
replaced by calls to a single implementation.

The numeric checks now follow the SQL standard which says that
implementations should fail if a cast would trucate the most significant
digits, but round when truncating the least significant.

IMPALA-7865: Repeated type widening of arithmetic expressions

Partial fix. Replaces the "is explicit cast" flag in the numeric literal
with the explicit type. This allows reseting an implicit type back to
the explciit type if an arithmetic expression is analyzed multiple
times. A later patch will feed this type information into the type
inference mechanism to complete the fix.

Finally, adds a set of new exceptions that begin to unify error
reporting.  These handle casts (SqlCastException), value validation
(InvalidValueException) and unsupported features
(UnsupportedFeatureException.) These all derive from AnalysisException
for backward compatibility. Tests use the new exceptions to check for
expected errors rather than parsing text strings (which tend to
change.)

Testing:

* Added unit tests just for numeric literals. Refactored code to
  simplify the tests.
* Added a test case for the obscure case in Decimal V1 of an implicit
  cast overflow.
* The depth-check tests needed one extra level of nesting to trigger
  failure.
* Ran all FE tests.

Change-Id: I484600747b2871d3a6fe9153751973af9a8534f2
Reviewed-on: http://gerrit.cloudera.org:8080/12001
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Repeated type widening of arithmetic expressions
> ------------------------------------------------
>
>                 Key: IMPALA-7865
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7865
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> An issue related to IMPALA-7855 occurs in {{ExprRewriterTest.TestToSql()}} in the CTAS test. (This test will be made into a separate method, {{TestCTASToSql()}}). When run with the "integrated rewrite" feature enabled, we get into this odd situation:
>  * Analyze the {{CreateTableAsSelect}} statement. Create a temporary copy of the associated {{SELECT}} statement.
>  * Rewrite the {{SELECT}} statement from {{SELECT 1 + 1}} (both {{TINYINT}}, with {{SMALLINT}} for the {{+}} operation) to {{SELECT 2}} (as type {{TINYINT}}.)
>  * After constant folding, the rule checks the original type of the expression ({{SMALLINT}}) and casts the result ({{TINYINT}}) to the original type ({{SMALLINT}}) using an implicit cast.
>  * Perform column substitutions, reset and reanalyze. This process discards implicit casts. Because the value is 2, it takes the type {{TINYINT}}.
>  * Create the base table expressions using the newly rewritten value ({{TINYINT}}) though the result expression is still {{SMALLINT}}.
>  * Use the base expressions from the above (type as {{TINYINT}}) to declare the target table column.
>  * Now, try to map the result expression {{SMALLINT}} into the newly created table column {{TINYINT}}. Fails with a overflow error.
> While IMPALA-7855 describes how types are widened unnecessarily due to a single expression, the problem here occurs over time, due to repeated analysis of the same numeric expression:
>  * The analyzer implements a set of type propagation rules that generates a resulting type for arithmetic expressions that is wider than the types of the arguments. For example for {{tinyint_col + 1}}, {{tinyint_col}} and {{1}} are {{TINYINT}}, but the result of the expression is promoted to {{SMALLINT}}.
>  * The planner then sets the type of the constant (1 here) to {{SMALLINT}}.
>  * Repeat the process on the next cycle. {{tinyint_col}} is {{TINYINT}}, {{1}} is {{SMALLINT}}. Now the result of the expression is {{INT}} and {{1}} is retyped to be {{INT}}.
>  * Repeat again and the expression (and constant) are promoted to {{BIGINT}}.
> Meanwhile, analysis has taken a clone of the expression with the old types. As a result, the types of columns in the result list for a SELECT statement can differ from the same columns recorded in the SELECT list.
>  * After the above, the base table expression for a {{SELECT}} statement has one schema ({{TINYINT}}), the result expression has another ({{SMALLINT}}).
> While the inconsistency in types may seem a minor issue, it does lead to analysis failures and does need to be addressed.
> Perhaps two fixes are needed:
>  * When rewriting a numeric literal in the constant folding rule, apply the rules from {{NumericLiteral}} to override the type guessed by the constant evaluation.
>  * Modify the {{substituteImpl}} method to a) don't reset numeric literals, or, more generally, b) don't reset expressions that did not change (or their children did not change.)
> Longer term, the implicit cast mechanism is overly fragile: we add it then discard it, resulting in subtle type inconsistencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org