You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Daniel Becker (Jira)" <ji...@apache.org> on 2023/04/27 14:24:00 UTC

[jira] [Comment Edited] (IMPALA-12035) impala-shell accepts very big numbers but fails to store them correctly

    [ https://issues.apache.org/jira/browse/IMPALA-12035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717113#comment-17717113 ] 

Daniel Becker edited comment on IMPALA-12035 at 4/27/23 2:23 PM:
-----------------------------------------------------------------

The problem seems to be this:
 * in the case when an error is returned, the literal (9999999999999999999999) can fit into a DECIMAL and therefore is interpreted as such (see https://github.com/apache/impala/blob/c8aa5796d93510723342055cc70cf8d00abae754/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java#L344)
 * this DECIMAL can't fit in a BIGINT, so the cast fails (see https://github.com/apache/impala/blob/0a42185d17164af0c855647de25f1bc87f33ee71/be/src/runtime/decimal-value.inline.h#L124)
 * in the second case with the very big number, the literal can't even fit in a DECIMAL, so it is interpreted as a DOUBLE instead; this is not completely unreasonable as it fits in the range of DOUBLE, although of course there is some loss of precision
 * the DOUBLE can be cast to BIGINT, although the value is incorrect and this invokes undefined behaviour (see Floating–integral conversions in [https://en.cppreference.com/w/cpp/language/implicit_conversion]); the conversion is done here: [https://github.com/apache/impala/blob/0a42185d17164af0c855647de25f1bc87f33ee71/be/src/exprs/cast-functions-ir.cc#L61]

This can be seen in the explain statements after {{{}set explain_level=2{}}}:

 
{code:java}
Analyzed query: SELECT CAST(CAST(9999999999999999999999 AS DECIMAL(22,0)) AS BIGINT)
VS
Analyzed query: SELECT CAST(CAST(999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 AS DOUBLE) AS BIGINT)
{code}
Possible solutions:
 # Adding checks to integer-floating point conversions in the BE (both directions).
 ** Pros: 
 *** Strictly speaking this would be necessary as these conversions may invoke undefined behaviour, see the link above. Conversions to and from DecimalValue do have checks.
 ** Cons:
 *** This could lead to performance regression because of the added checks.
 *** Existing use cases could break if we return an error, although in those cases the results were already incorrect.
 # Adding checks only for the integer-floating point conversion of literals/constant expressions in the FE.
 ## Performance impact is negligible.
 ## Undefined behaviour may be invoked in other cases, as before.

 


was (Author: daniel.becker):
The problem seems to be this:
 * in the case when an error is returned, the literal (9999999999999999999999) can fit into a DECIMAL and therefore is interpreted as such (see https://github.com/apache/impala/blob/c8aa5796d93510723342055cc70cf8d00abae754/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java#L344)
 * this DECIMAL can't fit in a BIGINT, so the cast fails (see https://github.com/apache/impala/blob/0a42185d17164af0c855647de25f1bc87f33ee71/be/src/runtime/decimal-value.inline.h#L124)
 * in the second case with the very big number, the literal can't even fit in a DECIMAL, so it is interpreted as a DOUBLE instead; this is not completely unreasonable as it fits in the range of DOUBLE, although of course there is some loss of precision
 * the DOUBLE can be cast to BIGINT, although the value is incorrect and this invokes undefined behaviour (see Floating–integral conversions in [https://en.cppreference.com/w/cpp/language/implicit_conversion);] the conversion is done here: [https://github.com/apache/impala/blob/0a42185d17164af0c855647de25f1bc87f33ee71/be/src/exprs/cast-functions-ir.cc#L61]

This can be seen in the explain statements after {{{}set explain_level=2{}}}:

 
{code:java}
Analyzed query: SELECT CAST(CAST(9999999999999999999999 AS DECIMAL(22,0)) AS BIGINT)
VS
Analyzed query: SELECT CAST(CAST(999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 AS DOUBLE) AS BIGINT)
{code}
Possible solutions:
 # Adding checks to integer-floating point conversions in the BE (both directions).
 ** Pros: 
 *** Strictly speaking this would be necessary as these conversions may invoke undefined behaviour, see the link above. Conversions to and from DecimalValue do have checks.
 ** Cons:
 *** This could lead to performance regression because of the added checks.
 *** Existing use cases could break if we return an error, although in those cases the results were already incorrect.
 # Adding checks only for the integer-floating point conversion of literals/constant expressions in the FE.
 ## Performance impact is negligible.
 ## Undefined behaviour may be invoked in other cases, as before.

 

> impala-shell accepts very big numbers but fails to store them correctly
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-12035
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12035
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Bakai Ádám
>            Assignee: Daniel Becker
>            Priority: Major
>
> I tried  to insert rows very big bigints, and it worked as expected with big integers (error message, no new row stored), but it didn’t work as expected with ridiculously big integers( no error message, stored incorrect value). Here are the commands used:
> {code:java}
> drop TABLE my_first_table2;
> CREATE TABLE my_first_table2
> (
>   id BIGINT,
>   name STRING,
>   PRIMARY KEY(id)
> );
> INSERT INTO my_first_table2 VALUES (cast(9 as BIGINT), "sarah");
> -- this works just fine as expected
> INSERT INTO my_first_table2 VALUES (cast(9999999999999999999999 as BIGINT), "sarah");
> -- ERROR: UDF ERROR: Decimal expression overflowed which is expected since it is over bigint max value (source: https://impala.apache.org/docs/build/plain-html/topics/impala_bigint.html#:~:text=Range%3A%20%2D9223372036854775808%20..,9223372036854775807. )
> INSERT INTO my_first_table2 VALUES (cast(999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 as BIGINT), "sarah");
> -- this succeeds and doesn't throw the same error as the previous command which is concerning
> select * from my_first_table2;
> -- there are two rows in the table, and the id is incorrect in one of them  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org