You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org> on 2021/03/10 01:56:28 UTC

[Impala-ASF-CR] WIP IMPALA-10564: Return error when inserting an overflowed decimal value

Wenzhe Zhou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17168


Change subject: WIP IMPALA-10564: Return error when inserting an overflowed decimal value
......................................................................

WIP IMPALA-10564: Return error when inserting an overflowed decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants so that the overflowed decimal values could be detected by
frontend during expression analysis. If there is an alias (variable) in
the data expression for the decimal column, only backend could detect
decimal overflow.

This patch added checking for the error status of ScalarExprEvaluator
in TableWriter when ScalarExprEvaluator return NULL for decimal column.
If there is an error, the query will be failed without inserting NULL
for decimal column.

Tests:
 - Manually ran queries with overflowed decimal values by using CTAS
   and INSERT-SELECT statements. Verified that queries failed as
   expected.
 - TODO: add unit-test.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
2 files changed, 20 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-10564: Return error when inserting an overflowed decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: WIP IMPALA-10564: Return error when inserting an overflowed decimal value
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8326/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 02:13:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8399/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 02:12:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6994/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Mar 2021 16:58:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants so that the overflowed decimal values could be detected by
frontend during expression analysis. If there is an alias (variable) in
the data expression for the decimal column, only backend could detect
decimal overflow.

This patch added checking for the query status of RuntimeState in
TableWriter when ScalarExprEvaluator return NULL for decimal column.
If there is an error, the query will be failed without inserting NULL
for decimal column.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Manually ran queries with overflowed decimal values by using CTAS
   and INSERT-SELECT statements. Verified that queries failed without
   inserting NULL as expected.
 - Manually ran queries with overflowed decimal values and decimal_v2
   set as false. The result is same as before - NULLs were inserted
   to table for invalid decimal values with warning message.
 - Added unit-tests for INSERT-SELECT and CTAS.
 - Passed core tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
5 files changed, 139 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8331/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 22:28:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10564: Return error when inserting an overflowed decimal value

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: WIP IMPALA-10564: Return error when inserting an overflowed decimal value
......................................................................


Patch Set 1: Code-Review+1

(2 comments)

Code change LGTM. Have a couple of comments. I can upgrade to +2 once you add a unit test and address comments. 
One other question is about non-HDFS tables. Is insert into Kudu tables covered through these ? You don't have to test it for this fix but something to check in the future.

http://gerrit.cloudera.org:8080/#/c/17168/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17168/1//COMMIT_MSG@7
PS1, Line 7: overflowed
See related comment below.. it can occur for other error conditions besides overflow so you can change the message to be something more generic..like 'inserting an invalid decimal value'


http://gerrit.cloudera.org:8080/#/c/17168/1/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17168/1/be/src/exec/hdfs-text-table-writer.cc@107
PS1, Line 107: overflowed
It may not always be overflow.  For instance I get the same NULL value inserted in the table for a divide-by-zero error:

insert into t1 select cast((a+b+c)/0 as decimal (28,10)) as x from (select cast(654964569154.9565 as decimal (28,7)) as a, cast(44658554984 as decimal (28,7)) as b, cast(2.111 as decimal (28,7)) as c) q;

I see these are the error conditions for decimal :

grep SetError exprs/decimal-operators-ir.cc

      ctx->SetError("Decimal expression overflowed"); \
      ctx->SetError("String to Decimal parse failed");
      ctx->SetError("String to Decimal cast overflowed");
          if (decimal_v2) ctx->SetError("Cannot divide decimal by zero"); \



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 03:16:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Mar 2021 16:57:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py
File tests/query_test/test_decimal_queries.py:

http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py@96
PS2, Line 96: @pytest.mark.execute_serially
flake8: E302 expected 2 blank lines, found 1



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 22:09:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8422/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Mar 2021 16:46:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8423/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Mar 2021 16:48:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants so that the overflowed decimal values could be detected by
frontend during expression analysis. If there is an alias (variable) in
the data expression for the decimal column, only backend could detect
decimal overflow.

This patch added checking for the query status of RuntimeState in
TableWriter when ScalarExprEvaluator return NULL for decimal column.
If there is an error, the query will be failed without inserting NULL
for decimal column.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Manually ran queries with overflowed decimal values by using CTAS
   and INSERT-SELECT statements. Verified that queries failed without
   inserting NULL as expected.
 - Manually ran queries with overflowed decimal values and decimal_v2
   set as false. The result is same as before - NULLs were inserted
   to table for invalid decimal values with warning message.
 - Added unit-tests for INSERT-SELECT and CTAS.
 - Passed core tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
5 files changed, 140 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants so that the overflowed decimal values could be detected by
frontend during expression analysis. If there is an alias (variable) in
the data expression for the decimal column, only backend could detect
decimal overflow.

This patch added a query option use_null_for_decimal_errors. When it
is disabled, backend checks the query status of RuntimeState in
Table Writer when ScalarExprEvaluator return NULL for decimal column.
If there is an invalid deciaml error, the query will be failed without
inserting NULL for decimal column. If use_null_for_decimal_errors is
enabled, NULL will be inserted into table for invalid decimal value.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Manually ran queries with overflowed decimal values by using CTAS
   and INSERT-SELECT statements. Verified that queries failed without
   inserting NULL as expected if use_null_for_decimal_errors was set
   as false, and NULLs were inserted into the table for overflowed
   decimal if use_null_for_decimal_errors was set as true.
 - Manually ran queries with overflowed decimal values and decimal_v2
   set as false. The result is same as before - NULLs were inserted
   to table for invalid decimal values with warning message.
 - Added unit-tests for INSERT-SELECT and CTAS.
 - Passed core tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/common/status.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/runtime/runtime-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/udf/udf.cc
M be/src/udf/udf.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/generate_error_codes.py
A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
16 files changed, 283 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8395/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Mar 2021 22:17:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants so that the overflowed decimal values could be detected by
frontend during expression analysis. If there is an alias (variable) in
the data expression for the decimal column, only backend could detect
decimal overflow.

This patch added a query option use_null_for_decimal_errors. When it
is disabled, backend checks the query status of RuntimeState in
Table Writer when ScalarExprEvaluator return NULL for decimal column.
If there is an invalid deciaml error, the query will be failed without
inserting NULL for decimal column. If use_null_for_decimal_errors is
enabled, NULL will be inserted into table for invalid decimal value.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Manually ran queries with overflowed decimal values by using CTAS
   and INSERT-SELECT statements. Verified that queries failed without
   inserting NULL as expected if use_null_for_decimal_errors was set
   as false, and NULLs were inserted into the table for overflowed
   decimal if use_null_for_decimal_errors was set as true.
 - Manually ran queries with overflowed decimal values and decimal_v2
   set as false. The result is same as before - NULLs were inserted
   to table for invalid decimal values with warning message.
 - Added unit-tests for INSERT-SELECT and CTAS.
 - Passed core tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/common/status.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/runtime/runtime-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/udf/udf.cc
M be/src/udf/udf.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/generate_error_codes.py
A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
16 files changed, 285 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 4: Code-Review+1

(7 comments)

Thanks for making the changes. A few mostly nits and one comment about the test.

http://gerrit.cloudera.org:8080/#/c/17168/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17168/1//COMMIT_MSG@7
PS1, Line 7: lid decima
> Done
Done


http://gerrit.cloudera.org:8080/#/c/17168/1/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17168/1/be/src/exec/hdfs-text-table-writer.cc@107
PS1, Line 107: invalid de
> Done
Done


http://gerrit.cloudera.org:8080/#/c/17168/4/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17168/4/be/src/exec/hdfs-text-table-writer.cc@107
PS4, Line 107:           // IMPALA-10564: For invalid decimal value, we should return an error
nit: the comment indicates this is done unconditionally. Could you update it similar to other places that it is done based on the query option.


http://gerrit.cloudera.org:8080/#/c/17168/4/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17168/4/be/src/exec/kudu-table-sink.cc@253
PS4, Line 253:         // IMPALA-10564: For invalid decimal value, we should return an error
nit: same comment as above.


http://gerrit.cloudera.org:8080/#/c/17168/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17168/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@698
PS4, Line 698:       // IMPALA-10564: For invalid decimal value, we should return an error
nit: same as above


http://gerrit.cloudera.org:8080/#/c/17168/4/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

http://gerrit.cloudera.org:8080/#/c/17168/4/common/thrift/generate_error_codes.py@475
PS4, Line 475: type
'value' would be more accurate than 'type' to avoid confusion with decimal type precision and scale.


http://gerrit.cloudera.org:8080/#/c/17168/4/testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
File testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test:

http://gerrit.cloudera.org:8080/#/c/17168/4/testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test@43
PS4, Line 43: ---- RESULTS
This verifies if a row was inserted but Is there a way to check that the inserted value is NULL ? I was thinking of something like 'select count(*) from overflowed_decimal_tbl where <column> is null' .  Although, since you are using the same table for the insertions, the count will keep growing.



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Mar 2021 23:18:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] MPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17168 )

Change subject: MPALA-10564: Return error when inserting an invalid decimal value
......................................................................

MPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants.

This patch fixed the issue by calling RuntimeState::CheckQueryState()
in the end of HdfsTableWriter::AppendRows() and KuduTableSink::Send().
If there is an invalid deciaml error, the query will be failed without
inserting NULL for decimal column.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Added unit-tests for INSERT-SELECT and CTAS statements with
   overflowed decimal values to be inserted into tables. The
   overflowed decimal values are expressed as a constant expression,
   or as an expression with aliases.
   Also added cases to verify behaviour of decimal_v1 is unchanged.
 - Passed core tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
5 files changed, 218 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

aggressive fix means calling state->CheckQueryState() in the end of HdfsTableSink::Send() function. It may affect other query behavior. I will not fix the issue in this way.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 23:11:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/udf/udf.h
File be/src/udf/udf.h:

http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/udf/udf.h@29
PS7, Line 29: #include "gen-cpp/ErrorCodes_types.h" // for TErrorCode
> How about use "#ifndef IMPALA_UDF_SDK_BUILD" to include this file? Otherwis
Sure, I think that's fine


http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift@652
PS7, Line 652: false by default
> If we set the default value for query option as true, the default
 > behavior for the cases with constant decimal will be changed.

How? I thought the constant case was handled in the FE?



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 20 Mar 2021 01:04:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 2:

Thanks for adding the check for Kudu tables and the unit tests.  When thinking about this a bit more, one concern I have is if a user was doing an INSERT-SELECT of a billion row (or relatively large) table, would just one decimal value overflowing error out the whole operation or will the rest of the inserts go through ? I suppose this depends on the ABORT_ON_ERROR setting ?

ETL operations normally expect invalid/dirty rows to be logged into a separate location while continuing with the insertion of other rows. We should make sure this new behavior does not regress that workflow.  Any thoughts on that ?


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 22:33:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants so that the overflowed decimal values could be detected by
frontend during expression analysis. If there is an alias (variable) in
the data expression for the decimal column, only backend could detect
decimal overflow.

This patch added a query option use_null_for_decimal_errors. When it
is disabled, backend checks the query status of RuntimeState in
Table Writer when ScalarExprEvaluator return NULL for decimal column.
If there is an invalid deciaml error, the query will be failed without
inserting NULL for decimal column. If use_null_for_decimal_errors is
enabled, NULL will be inserted into table for invalid decimal value.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Manually ran queries with overflowed decimal values by using CTAS
   and INSERT-SELECT statements. Verified that queries failed without
   inserting NULL as expected if use_null_for_decimal_errors was set
   as false, and NULLs were inserted into the table for overflowed
   decimal if use_null_for_decimal_errors was set as true.
 - Manually ran queries with overflowed decimal values and decimal_v2
   set as false. The result is same as before - NULLs were inserted
   to table for invalid decimal values with warning message.
 - Added unit-tests for INSERT-SELECT and CTAS.
 - Passed core tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/common/status.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/runtime/runtime-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/udf/udf.cc
M be/src/udf/udf.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/generate_error_codes.py
A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
16 files changed, 272 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants.

This patch fixed the issue by checking the query status of RuntimeState
in Table Writer when ScalarExprEvaluator return NULL for decimal column.
If there is an invalid deciaml error, the query will be failed without
inserting NULL for decimal column.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Added unit-tests for INSERT-SELECT and CTAS statements with
   overflowed decimal values to be inserted into tables. The
   overflowed decimal values are expressed as a constant expression,
   or as an expression with aliases.
   Also added cases to verify behaviour of decimal_v1 is unchanged.
 - Passed core tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/common/status.h
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/runtime/runtime-state.h
M be/src/udf/udf.cc
M be/src/udf/udf.h
M common/thrift/generate_error_codes.py
A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
11 files changed, 288 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8397/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 00:33:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/udf/udf.h
File be/src/udf/udf.h:

http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/udf/udf.h@29
PS7, Line 29: #include "gen-cpp/ErrorCodes_types.h" // for TErrorCode
> Sure, I think that's fine
Done


http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift@652
PS7, Line 652: false by default
> > If we set the default value for query option as true, the default
I debug the constant case. Both FE and BE evaluates the decimal expression. FE evaluates the constant decimal expression, and writes error message and calling stack to log file when it get decimal overflow issue. But it does not transfer the materialized decimal value to BE so BE has to evaluate the decimal expression again. FE generate different plan tree for expression with/without alias. For constant case, FragmentInstanceState::ExecInternal() call UnionNode::GetNext() before calling HdfsTableSink::Send(). UnionNode::GetNext() call UnionNode::GetNextConst(), then UnionNode::MaterializeExprs(), ... DecimalOperators::ScaleDecimalValue() to get expression value. If there is decimal overflow issue, the error is saved in RuntimeState object. When HdfsTableSink::Send() is called, it calls RuntimeState::CheckQueryState() and returns error if there is a recorded error. That's why the current code return error for decimal overflow with constant expression. For expression with alias, the expression is evaluated util the column values are going to be wrote to table in HdfsTextTableWriter::AppendRows(). So in this case, when RuntimeState::CheckQueryState() is called by HdfsTableSink::Send(), there is no error saved in RuntimeState object, it continue to call HdfsTableSink::WriteRowsToPartition(), then HdfsTextTableWriter::AppendRows() to write rows. But HdfsTextTableWriter::AppendRows() don't call RuntimeState::CheckQueryState() anymore, it inserts NULL values to the table even there is an error recorded in RuntimeState object. That's reason why the current behaviors for two cases are different. Both cases are handled by BE. The new query option should affect both cases. If I don't change the code in HdfsTableSink::Send(), the behaviors for the two cases will be different.

The alias is required to reproduce the customer issue. If using "create table as select cast(a*b*c as decimal(28,10)) from source" with big a, b, c values, Impala return error without inserting NULL since the calling path is different.



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 20 Mar 2021 03:39:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants.

This patch fixed the issue by calling RuntimeState::CheckQueryState()
in the end of HdfsTableWriter::AppendRows() and KuduTableSink::Send().
If there is an invalid decimal error, the query will be failed without
inserting NULL for decimal column.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Added unit-tests for INSERT-SELECT and CTAS statements with
   overflowed decimal values to be inserted into tables. The
   overflowed decimal values are expressed as a constant expression,
   or as an expression with aliases.
   Also added cases to verify behaviour of decimal_v1 is unchanged.
 - Passed exhaustive tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Reviewed-on: http://gerrit.cloudera.org:8080/17168
Reviewed-by: Thomas Tauber-Marshall <tm...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
5 files changed, 218 insertions(+), 0 deletions(-)

Approvals:
  Thomas Tauber-Marshall: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift@652
PS7, Line 652: false by default
> I debug the constant case. Both FE and BE evaluates the decimal expression.
As I mentioned in the above comments, FE evaluates the constant decimal expression, but it does not transfer the materialized decimal value to BE so BE has to evaluate the decimal expression again in that case. FE generate different plan trees for expression with/without alias. The final result is determined by BE. I debug this issue and found there are following three calling paths to handle INSERT, INSERT-SELECT and CTAS statements.

Case 1: Insertion with constant decimal expression

Sample queries:
# create a table
create table t10 (id int, a decimal(28,10), b decimal(28,10), c decimal(28,10));
# queries with overlfowed decimal values:
insert into table t10 values(2, cast(754964565555555555559154.9565 as decimal (28,10)), cast(84658554984 as decimal (28,10)), cast(88856788.8877 as decimal (28,10)));
insert into table t10 select 3, cast(cast(654964569154.9565 as decimal (28,7))*44658554984*2.111 as decimal (28,10)), cast(84658554984 as decimal (28,10)), cast(88856788.8877 as decimal (28,10));
create table d1 stored as parquet as select cast(cast(654964569154.9565 as decimal (28,7))*44658554984*2.111 as decimal (28,10));

Result for above queries with current Impala (without my fixing): return error "Decimal expression overflowed" to client.

Case 2: Insertion by selecing from another source table

Sample queries:
# create source table, then insert some valid data
create table t10 (id int, a decimal(28,10), b decimal(28,10), c decimal(28,10));
insert into table t10 values(1, cast(654964569154.9565 as decimal (28,10)), cast(44658554984 as decimal (28,10)), cast(56788.8877 as decimal (28,10)));
 with overlfowed decimal values:
# queries with overlfowed decimal values:
create table t11 as select id, cast(a*b*c as decimal (28,10)) from t10;
insert into table t11 select id, cast(a*b*c as decimal (28,10)) from t10;

Result for above queries with current Impala: return error "Decimal expression overflowed" to client.

Case 3: Insertion one row with aliases

Sample queries:
# queries with overlfowed decimal values:
create table t12 as select 1 as id, cast(a*44658554984*2.111 as decimal (28,10)) as d_28 from (SELECT cast(654964569154.9565 as decimal (28,7)) as a) q;
insert into table t3 select 2, cast(a*44658554984*2.111 as decimal (28,10)) as d_28 from (SELECT cast(654964569154.9565 as decimal (28,7)) as a) q;

Result for above queries with current Impala: NULLs are inserted into table without error.

Case 3 is the only case we can re-produce the customer issue. But the first two cases are more common.


I traced down the BE code and found the causes of results for above three cases. 
impala::DecimalOperators::ScaleDecimalValue() is function to evaluate decimal expression. If it get invalid value like overflowed value, it will save the error in RuntimeState object. The error is not directly retuned to the caller. Later if RuntimeState::CheckQueryState() is called, it will return error if there is one saved error. There are 9 places in BE to call RuntimeState::CheckQueryState(), which make the query failed if RuntimeState::CheckQueryState() return error. In the above first two cases, RuntimeState::CheckQueryState() is called in functions HdfsTableSink::Send(), KuduTableSink::Send(), and ExecNode::QueryMaintenance(). This cause BE to make query failed and return error to client.

The following loop in function FragmentInstanceState::ExecInternal() is the major piece of code to execute a query fragment.

  do {
    Status status;
    row_batch_->Reset();
    {
      SCOPED_TIMER(plan_exec_timer);
      RETURN_IF_ERROR(
          exec_tree_->GetNext(runtime_state_, row_batch_.get(), &exec_tree_complete));
    }
    UpdateState(StateEvent::BATCH_PRODUCED);
    if (VLOG_ROW_IS_ON) row_batch_->VLogRows("FragmentInstanceState::ExecInternal()");
    COUNTER_ADD(rows_produced_counter_, row_batch_->num_rows());
    RETURN_IF_ERROR(sink_->Send(runtime_state_, row_batch_.get()));
    UpdateState(StateEvent::BATCH_SENT);
  } while (!exec_tree_complete);


For case 1, exec_tree_->GetNext() (UnionNode::GetNext() for constant expression) calls down to DecimalOperators::ScaleDecimalValue() to get expression value. If there is decimal overflow issue, the error is saved in RuntimeState object. When sink_->Send() (HdfsTableSink::Send() or KuduTableSink::Send()) is called, it calls RuntimeState::CheckQueryState() and returns error if there is a saved error. This cause BE to return error.

For case 2, the expression is evaluated util the column values are going to be wrote to table in HdfsTextTableWriter::AppendRows(). When RuntimeState::CheckQueryState() is called by sink_->Send() for one row_batch, and there is no error saved in RuntimeState before, it return OK so that HdfsTableSink::Send() continue to call HdfsTableSink::WriteRowsToPartition(), then HdfsTextTableWriter::AppendRows() to write rows. The NULL could be inserted to table for the invalid decimal values for the row_batch without returning error. But when exec_tree_->GetNext() is called for next row_batch, RuntimeState::CheckQueryState() is called in ExecNode::QueryMaintenance(). This cause BE to return error. Also if sink_->Send() is called for next row_batch, it returns error if there is error saved in RuntimeState when handling last row_batch.
In my sample queries, exec_tree_->GetNext() is returned for the first row_batch with exec_tree_complete as false even there is only one row in the source table. So exec_tree_->GetNext() is called again after wrirting first row patch, and hit RuntimeState::CheckQueryState(). 

For case 3, the calling path is same as case 2, but exec_tree_->GetNext() is returned for the first row_batch with exec_tree_complete as true so exec_tree_->GetNext() and sink_->Send() will not be called anymore so that BE does not hit RuntimeState::CheckQueryState() and return OK to the client even there is an error recorded in RuntimeState object. This is reason why the current behaviors for case 3 are different from first two cases.
It is not like the intended behavior by design. Instead, it's more like a bug that RuntimeState::CheckQueryState() is not called in a corner case.


Based on the analysis, I have two questions:
Q1: Is it necessary to introduce new query option "use_null_for_decimal_errors"? 
This is to keep the current behavior in case 3, which should no be used frequently. Customer asked us to fix it.
We should set the default value as false for query option "use_null_for_decimal_errors" since current Impala return error in most cases.

Q2: We have to support the query option in all cases. My current patch does not work for case 2 when "use_null_for_decimal_errors" is set as true.
I don't have a good solution to fix it. To handle next row patch after hitting an invalid decimal value, we have to clean the error saved in RuntimeState after inserting NULL to the table. But this change is risky. We only save the first error in RuntimeState, it's not safe to clean the error since we could get more than one errors for one row batch. Another way is not to save error in RuntimeState when getting invalid decimal values, but this change the behavior of other queries, for example "select cast(a*44658554984*2.111 as decimal (28,10)) as d_28 from (SELECT cast(654964569154.9565 as decimal (28,7)) as a) q;".



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 00:30:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

> Patch Set 7:
> 
> (1 comment)

> It is not like the intended behavior by design. Instead, it's more like a bug that RuntimeState::CheckQueryState() is not called in a corner case.
This does seem to be the underlying issue right ? Is this why when you set ABORT_ON_ERROR = true, the error was not propagated back to caller ? If we can fix this behavior, then yes I would agree we don't need to introduce a new query option.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 16:25:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants.

This patch fixed the issue by calling RuntimeState::CheckQueryState()
in the end of HdfsTableWriter::AppendRows() and KuduTableSink::Send().
If there is an invalid decimal error, the query will be failed without
inserting NULL for decimal column.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Added unit-tests for INSERT-SELECT and CTAS statements with
   overflowed decimal values to be inserted into tables. The
   overflowed decimal values are expressed as a constant expression,
   or as an expression with aliases.
   Also added cases to verify behaviour of decimal_v1 is unchanged.
 - Passed exhaustive tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
5 files changed, 218 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants so that the overflowed decimal values could be detected by
frontend during expression analysis. If there is an alias (variable) in
the data expression for the decimal column, only backend could detect
decimal overflow.

This patch added a query option use_null_for_decimal_errors. When it
is disabled, backend checks the query status of RuntimeState in
Table Writer when ScalarExprEvaluator return NULL for decimal column.
If there is an invalid deciaml error, the query will be failed without
inserting NULL for decimal column. If use_null_for_decimal_errors is
enabled, NULL will be inserted into table for invalid decimal value.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Manually ran queries with overflowed decimal values by using CTAS
   and INSERT-SELECT statements. Verified that queries failed without
   inserting NULL as expected if use_null_for_decimal_errors was set
   as false, and NULLs were inserted into the table for overflowed
   decimal if use_null_for_decimal_errors was set as true.
 - Manually ran queries with overflowed decimal values and decimal_v2
   set as false. The result is same as before - NULLs were inserted
   to table for invalid decimal values with warning message.
 - Added unit-tests for INSERT-SELECT and CTAS.
 - Passed core tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/common/status.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/runtime/runtime-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/udf/udf.cc
M be/src/udf/udf.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/generate_error_codes.py
A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
16 files changed, 285 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/exec/hdfs-text-table-writer.cc@112
PS7, Line 112: state_->GetQueryStatus().IsInvalidDecimalError()
Probably want to move this check to after the check if the column has decimal type - this involves reading an atomic variable so its pretty expensive to be doing once per row, better to at least only have to do it for decimal rows.


http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/udf/udf.h
File be/src/udf/udf.h:

http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/udf/udf.h@29
PS7, Line 29: #include "gen-cpp/ErrorCodes_types.h" // for TErrorCode
I'm not sure if you can do this. See the comment a few lines above this - this file is sometimes used in situations where we don't have a full Impala build available.

I think its probably fine to just rely on string matching for this, instead of using error codes.


http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift@652
PS7, Line 652: false by default
> For constant decimal (e.g the expression without alias), our current code r
I think this is a different situation than the constant decimal case, since for that case you fail the entire query before anything gets inserted so you're not messing up a users data, whereas in this case we're stopping after a partial insert, so its not always clear what the end state of the table will be, and I think preferring to just have nulls inserted makes a lot of sense from a user's perspective.

I guess the argument though is that for decimal_v2 we're supposed to return an error rather than null for overflow, not doing that in this situation was an oversight, so theoretically this is making the behavior more consistent with what its supposed to be anyways.

I still personally am reluctant to change the default behavior in situations like these where the previous behavior wasn't really incorrect (i.e. wasn't giving wrong results, causing crashes, etc.), cause you might be breaking users that rely on the behavior, but I'll leave it up to you.



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 23:44:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

> Patch Set 7:
> 
> (1 comment)

Thanks Wenzhe for the update. Taking another look at your comments.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 15:43:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift@652
PS7, Line 652: false by default
> I think this is a different situation than the constant decimal case, since
It is a fair point to keep the default behavior same as before even though it is not right..as users may get surprised if some of the big insert-select or CTAS jobs start failing.  Once we socialize this with the user base some more, we could decide to change the default.  The ideal solution of course is to do the row skipping as discussed in the jira but that is non-trivial.

BTW, Wenzhe,  my understanding is the alias is not required to repro the issue ..if the columns in the source table are defined with decimal(28,10).  e.g 3 columns a, b, c each of decimal(28,10).  Then if we do  'create table target as select cast(a*b*c as decimal(28,10)) from source' that could still overflow depending on the actual values of a, b, c.



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 20 Mar 2021 00:28:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift@652
PS7, Line 652: false by default
Don't we want the default behavior to be consistent with how it used to work (i.e. true), so that we're not changing behavior for users that might rely on the nulls being inserted already?



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 22:13:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8332/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 22:37:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8398/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 02:03:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10564: Return error when inserting an overflowed decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: WIP IMPALA-10564: Return error when inserting an overflowed decimal value
......................................................................


Patch Set 1:

(2 comments)

Fixed the issue for HDFS and Kudu table. HBase don't have this issue since it does not insert NULL to the table.

http://gerrit.cloudera.org:8080/#/c/17168/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17168/1//COMMIT_MSG@7
PS1, Line 7: overflowed
> See related comment below.. it can occur for other error conditions besides
Done


http://gerrit.cloudera.org:8080/#/c/17168/1/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17168/1/be/src/exec/hdfs-text-table-writer.cc@107
PS1, Line 107: overflowed
> It may not always be overflow.  For instance I get the same NULL value inse
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 21:48:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8420/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 23:38:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 3:

> Patch Set 2:
> 
> Thanks for adding the check for Kudu tables and the unit tests.  When thinking about this a bit more, one concern I have is if a user was doing an INSERT-SELECT of a billion row (or relatively large) table, would just one decimal value overflowing error out the whole operation or will the rest of the inserts go through ? I suppose this depends on the ABORT_ON_ERROR setting ?
> 
> ETL operations normally expect invalid/dirty rows to be logged into a separate location while continuing with the insertion of other rows. We should make sure this new behavior does not regress that workflow.  Any thoughts on that ?

Posted a few comments in the JIRA to capture the decision points. I am leaning towards introducing a query option which would allow the CTAS to error out on the first decimal error. Let me know what you think.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 19:00:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift@652
PS7, Line 652: false by default
> Don't we want the default behavior to be consistent with how it used to wor
For constant decimal (e.g the expression without alias), our current code return error without NULL being inserted. We only change the default behavior for the cases with the alias in decimal expression. This keep the default behavior consistent.



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 23:02:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift@652
PS7, Line 652: false by default
> As I mentioned in the above comments, FE evaluates the constant decimal exp
According to document https://impala.apache.org/docs/build/html/topics/impala_decimal.html, the Overflow behavior for DECIMAL_V2 should be	"Aborts with an error". I lean to remove the new query option.



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 03:48:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

> Patch Set 7:
> 
> When decimal_v2 is set as false, Impala return a warning and insert NULL. So we don't need to introduce another query option. The bug is not related with ABORT_ON_ERROR. Even ABORT_ON_ERROR is set false, we still should abort the query with an error for overflowed decimal value. In first two cases, current code abort the query with an error for invalid decimal value when ABORT_ON_ERROR is set false. To fix the issue, we just return error in table writer if we found got NULL for decimal columns as my patch set 2. Or fix it more aggressively, by replacing "return Status:OK()" with "return state->CheckQueryState())" in HdfsTableSink::Send() and KuduTableSink::Send().

Since decimal_v2 is the default, we only need to focus on that. Based on your 3 examples it is clear that we are already returning the Decimal overflow error in the common cases of 1 and 2.  For the not common case 3, if you just ran the SELECT portion of the CTAS, it will also return the same error.  So, it makes sense to return error for the CTAS as well.  I agree that a new query option is not needed.
For case 3, when running the CTAS, the log file does show 'UDF ERROR: Decimal expression overflowed' in the stack trace for HdfsTableSink::Send() but the execution continues and the row is inserted. Is this what you were referring to for the aggressive fix ?


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 17:57:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 5: Code-Review+2

LGTM. Let me know if you want anyone else to review.  Otherwise, I can start a gerrit-verify run.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 00:56:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/exec/hdfs-text-table-writer.cc@112
PS7, Line 112: state_->GetQueryStatus().IsInvalidDecimalError()
> Probably want to move this check to after the check if the column has decim
Done


http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/udf/udf.h
File be/src/udf/udf.h:

http://gerrit.cloudera.org:8080/#/c/17168/7/be/src/udf/udf.h@29
PS7, Line 29: #include "gen-cpp/ErrorCodes_types.h" // for TErrorCode
> I'm not sure if you can do this. See the comment a few lines above this - t
How about use "#ifndef IMPALA_UDF_SDK_BUILD" to include this file? Otherwise, just pass the string. This is used in udf.cc.


http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17168/7/common/thrift/ImpalaService.thrift@652
PS7, Line 652: false by default
> I think this is a different situation than the constant decimal case, since
If we set the default value for query option as true, the default behavior for the cases with constant decimal will be changed.



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 20 Mar 2021 00:11:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 7:

When decimal_v2 is set as false, Impala return a warning and insert NULL. So we don't need to introduce another query option. The bug is not related with ABORT_ON_ERROR. Even ABORT_ON_ERROR is set false, we still should abort the query with an error for overflowed decimal value. In first two cases, current code abort the query with an error for invalid decimal value when ABORT_ON_ERROR is set false. To fix the issue, we just return error in table writer if we found got NULL for decimal columns as my patch set 2. Or fix it more aggressively, by replacing "return Status:OK()" with "return state->CheckQueryState())" in HdfsTableSink::Send() and KuduTableSink::Send().


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 17:26:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 3:

I did more investigation for different table writer to support rewinding partial row and put the details in the comments of the Jira IMPALA-10564. We will not support skipping row. I will change code to   introduce a query option as Aman suggested.


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 23:20:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py
File tests/query_test/test_decimal_queries.py:

http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py@96
PS2, Line 96: @pytest.mark.execute_serially
> flake8: E302 expected 2 blank lines, found 1
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 22:15:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal value
......................................................................


Patch Set 10: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Mar 2021 22:52:37 +0000
Gerrit-HasComments: No