You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Daniel Becker (Code Review)" <ge...@cloudera.org> on 2022/09/19 08:36:34 UTC

[Impala-ASF-CR] IMPALA-10753: WIP - Incorrect length when multiple CHAR(N) values are inserted

Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18999


Change subject: IMPALA-10753: WIP - Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: WIP - Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

This patch fixes that by implicitly casting the values to the VARCHAR
type of the longest value if all values in a column are CHAR types AND
not all have the same length. This VARCHAR type will be the common type
of the column in the VALUES statement.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we therefore would run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

TODO: If the destination type is also CHAR, there is no need to cast the
values but it can be tricky to detect in time. Can we do it?

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test
M tests/query_test/test_chars.py
5 files changed, 309 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13483/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Jul 2023 14:58:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

For example:
  CREATE TABLE impala_char_insert (s STRING);

  -- all values are CHAR(N) with different N, but all will use the
     biggest N
  INSERT OVERWRITE impala_char_insert VALUES
    (CAST("1" AS CHAR(1))),
    (CAST("12" AS CHAR(2))),
    (CAST("123" AS CHAR(3)));

  SELECT length(s) FROM impala_char_insert;
  3
  3
  3

  -- if inserted individually, the result is
  SELECT length(s) FROM impala_char_insert;
  1
  2
  3

This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

Note that the behaviour in Hive is different from both behaviours in
Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from
CHAR values when they are cast to other types, which is also lossy.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we would therefore run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Note: the new VALUES_STMT_AVOID_LOSSY_CHAR_PADDING query option and
ALLOW_UNSAFE_CASTS are not allowed to be used at the same time: if both
are set to true and the query contains set operation(s), an error is
returned.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test
M tests/query_test/test_chars.py
13 files changed, 331 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/13
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18999/9/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/18999/9/be/src/service/query-options.h@53
PS9, Line 53:       TImpalaQueryOptions::VALUES_STMT_AVOID_LOSSY_CHAR_PADDING + 1);                       \
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/18999/9/be/src/service/query-options.h@298
PS9, Line 298:       VALUES_STMT_AVOID_LOSSY_CHAR_PADDING,  TQueryOptionLevel::REGULAR)                                                        \
line too long (129 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Jul 2023 11:14:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

For example:
  CREATE TABLE impala_char_insert (s STRING);

  -- all values are CHAR(N) with different N, but all will use the
     biggest N
  INSERT OVERWRITE impala_char_insert VALUES
    (CAST("1" AS CHAR(1))),
    (CAST("12" AS CHAR(2))),
    (CAST("123" AS CHAR(3)));

  SELECT length(s) FROM impala_char_insert;
  3
  3
  3

  -- if inserted individually, the result is
  SELECT length(s) FROM impala_char_insert;
  1
  2
  3

This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

Note that the behaviour in Hive is different from both behaviours in
Impala: Hive implicitly removes trailing spaces from CHAR values when
they are cast to other types, which is also lossy.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we would therefore run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test
M tests/query_test/test_chars.py
13 files changed, 339 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/9
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13475/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Jul 2023 11:36:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12127/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Jan 2023 17:15:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

For example:
  CREATE TABLE impala_char_insert (s STRING);

  -- all values are CHAR(N) with different N, but all will use the
     biggest N
  INSERT OVERWRITE impala_char_insert VALUES
    (CAST("1" AS CHAR(1))),
    (CAST("12" AS CHAR(2))),
    (CAST("123" AS CHAR(3)));

  SELECT length(s) FROM impala_char_insert;
  3
  3
  3

  -- if inserted individually, the result is
  SELECT length(s) FROM impala_char_insert;
  1
  2
  3

This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

Note that the behaviour in Hive is different from both behaviours in
Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from
CHAR values when they are cast to other types, which is also lossy.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we would therefore run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Note: the new VALUES_STMT_AVOID_LOSSY_CHAR_PADDING query option and
ALLOW_UNSAFE_CASTS are not allowed to be used at the same time: if both
are set to true and the query contains set operation(s), an error is
returned.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test
M tests/query_test/test_chars.py
13 files changed, 331 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/15
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 15: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 11 Jul 2023 15:48:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11417/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Sep 2022 12:38:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13516/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 13
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 11 Jul 2023 11:48:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18999/10/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/18999/10/be/src/service/query-options.h@53
PS10, Line 53:       TImpalaQueryOptions::VALUES_STMT_AVOID_LOSSY_CHAR_PADDING + 1);                       \
line too long (93 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Jul 2023 12:39:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we therefore would run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
M tests/query_test/test_chars.py
10 files changed, 374 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we therefore would run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
M tests/query_test/test_chars.py
10 files changed, 382 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 7:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13431/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Jun 2023 12:00:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13449/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Jun 2023 14:02:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 15: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 11 Jul 2023 11:28:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

For example:
  CREATE TABLE impala_char_insert (s STRING);

  -- all values are CHAR(N) with different N, but all will use the
     biggest N
  INSERT OVERWRITE impala_char_insert VALUES
    (CAST("1" AS CHAR(1))),
    (CAST("12" AS CHAR(2))),
    (CAST("123" AS CHAR(3)));

  SELECT length(s) FROM impala_char_insert;
  3
  3
  3

  -- if inserted individually, the result is
  SELECT length(s) FROM impala_char_insert;
  1
  2
  3

This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

Note that the behaviour in Hive is different from both behaviours in
Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from
CHAR values when they are cast to other types, which is also lossy.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we would therefore run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Note: the new VALUES_STMT_AVOID_LOSSY_CHAR_PADDING query option and
ALLOW_UNSAFE_CASTS are not allowed to be used at the same time: if both
are set to true and the query contains set operation(s), an error is
returned.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Reviewed-on: http://gerrit.cloudera.org:8080/18999
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test
M tests/query_test/test_chars.py
13 files changed, 331 insertions(+), 10 deletions(-)

Approvals:
  Csaba Ringhofer: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13515/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 14
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 11 Jul 2023 11:49:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we therefore would run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
M tests/query_test/test_chars.py
10 files changed, 382 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@8
PS8, Line 8: 
           : If, in a VALUES clause, for the same column all of the values are CHAR
           : types but not all are of the same length, the common type chosen is
           : CHAR(max(lengths)). This means that shorter values are padded with
           : spaces. If the destination column is not CHAR but VARCHAR or STRING,
           : this produces different results than if the values in the column are
           : inserted individually, in separate statements. This behaviour is
           : suboptimal because information is lost.
> An example could make this clearer.
Done


http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@17
PS8, Line 17: 
> I would prefer to add CHAR to the name somehow as it is only applied for th
Done


http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@26
PS8, Line 26: 
> Can you add a note about already different behavior in CHAR(N) padding comp
I mentioned Hive. If it is important I can find out what Postgres does.


http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift@646
PS8, Line 646:  
> nit: false
Done


http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@200
PS8, Line 200: 
> This is no longer true, right?
Right, removed this from the comment.


http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
File fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java:

http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java@146
PS8, Line 146: 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
> Calling this before analyze looks a bit convoluted.
Done


http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
File testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test:

http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test@32
PS8, Line 32: 
> Can you also add longer string that will be truncated?
If the value is longer than the destination field, it results in an error. I added a test case for that.



-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Jul 2023 11:13:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

For example:
  CREATE TABLE impala_char_insert (s STRING);

  -- all values are CHAR(N) with different N, but all will use the
     biggest N
  INSERT OVERWRITE impala_char_insert VALUES
    (CAST("1" AS CHAR(1))),
    (CAST("12" AS CHAR(2))),
    (CAST("123" AS CHAR(3)));

  SELECT length(s) FROM impala_char_insert;
  3
  3
  3

  -- if inserted individually, the result is
  SELECT length(s) FROM impala_char_insert;
  1
  2
  3

This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

Note that the behaviour in Hive is different from both behaviours in
Impala: Hive implicitly removes trailing spaces from CHAR values when
they are cast to other types, which is also lossy.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we would therefore run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test
M tests/query_test/test_chars.py
13 files changed, 340 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/10
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13477/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Jul 2023 13:02:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@8
PS8, Line 8: 
           : If, in a VALUES clause, for the same column all of the values are CHAR
           : types but not all are of the same length, the common type chosen is
           : CHAR(max(lengths)). This means that shorter values are padded with
           : spaces. If the destination column is not CHAR but VARCHAR or STRING,
           : this produces different results than if the values in the column are
           : inserted individually, in separate statements. This behaviour is
           : suboptimal because information is lost.
An example could make this clearer.


http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@17
PS8, Line 17: VALUES_STMT_NON_LOSSY_COMMON_TYPE
I would prefer to add CHAR to the name somehow as it is only applied for that type.
e.g. VALUES_STMT_AVOID_LOSSY_CHAR_PADDING


http://gerrit.cloudera.org:8080/#/c/18999/8//COMMIT_MSG@26
PS8, Line 26: We choose VARCHAR instead of STRING as the common type because VARCHAR
Can you add a note about already different behavior in CHAR(N) padding compared to Hive? Postgres could be also mentioned.


http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18999/8/common/thrift/Query.thrift@646
PS8, Line 646: 0
nit: false


http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@200
PS8, Line 200: his is only used
This is no longer true, right?


http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
File fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java:

http://gerrit.cloudera.org:8080/#/c/18999/8/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java@146
PS8, Line 146:     if (analyzer.getQueryCtx().client_request
             :         .query_options.values_stmt_non_lossy_common_type) {
             :       // We suppress any AnalysisException thrown by castIfAllChars() because it is the
             :       // responsibility of super.analyze() to validate the query and report errors. Any
             :       // error that results in castIfAllChars() throwing an exception should also trigger
             :       // an exception in super.analyze(), but if there are multiple errors, the exception
             :       // thrown by castIfAllChars() may not correspond to the error first encountered by
             :       // super.analyze(). To keep error reporting consistent, we always choose the
             :       // exception thrown by super.analyze() and suppress exceptions thrown by
             :       // castIfAllChars().
             :       try {
             :         castIfAllChars(analyzer);
             :       } catch (AnalysisException e) {
             :         ex = e;
             :       }
             :     }
             :     super.analyze(analyzer);
Calling this before analyze looks a bit convoluted.

It would be nicer to do this during the usual analyses, e.g. create function in SetOperationStmt like use EnforceNonLossyCommonType() which could return true for ValuesStmt with values_stmt_non_lossy_common_type. This bool could be passed to castToSetOpCompatibleTypes() which looks to best place for this logic for me.


http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
File testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test:

http://gerrit.cloudera.org:8080/#/c/18999/8/testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test@32
PS8, Line 32: (cast("12" as char(2)));
Can you also add longer string that will be truncated?
+ the same for varchars



-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 03 Jul 2023 13:10:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we therefore would run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
M tests/query_test/test_chars.py
10 files changed, 375 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11411/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 14:43:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we therefore would run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
M tests/query_test/test_chars.py
10 files changed, 383 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/6
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13517/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 11 Jul 2023 11:51:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we therefore would run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
M tests/query_test/test_chars.py
10 files changed, 382 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

For example:
  CREATE TABLE impala_char_insert (s STRING);

  -- all values are CHAR(N) with different N, but all will use the
     biggest N
  INSERT OVERWRITE impala_char_insert VALUES
    (CAST("1" AS CHAR(1))),
    (CAST("12" AS CHAR(2))),
    (CAST("123" AS CHAR(3)));

  SELECT length(s) FROM impala_char_insert;
  3
  3
  3

  -- if inserted individually, the result is
  SELECT length(s) FROM impala_char_insert;
  1
  2
  3

This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

Note that the behaviour in Hive is different from both behaviours in
Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from
CHAR values when they are cast to other types, which is also lossy.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we would therefore run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test
M tests/query_test/test_chars.py
13 files changed, 315 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/11
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: WIP - Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: WIP - Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11376/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Sep 2022 08:58:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

This patch adds the query option VALUES_STMT_NON_LOSSY_COMMON_TYPE
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we therefore would run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-common-type.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-non-lossy-common-type.test
M tests/query_test/test_chars.py
10 files changed, 383 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11410/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Sep 2022 14:43:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13438/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Jun 2023 17:12:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................

IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

If, in a VALUES clause, for the same column all of the values are CHAR
types but not all are of the same length, the common type chosen is
CHAR(max(lengths)). This means that shorter values are padded with
spaces. If the destination column is not CHAR but VARCHAR or STRING,
this produces different results than if the values in the column are
inserted individually, in separate statements. This behaviour is
suboptimal because information is lost.

For example:
  CREATE TABLE impala_char_insert (s STRING);

  -- all values are CHAR(N) with different N, but all will use the
     biggest N
  INSERT OVERWRITE impala_char_insert VALUES
    (CAST("1" AS CHAR(1))),
    (CAST("12" AS CHAR(2))),
    (CAST("123" AS CHAR(3)));

  SELECT length(s) FROM impala_char_insert;
  3
  3
  3

  -- if inserted individually, the result is
  SELECT length(s) FROM impala_char_insert;
  1
  2
  3

This patch adds the query option VALUES_STMT_AVOID_LOSSY_CHAR_PADDING
which, when set to true, fixes the problem by implicitly casting the
values to the VARCHAR type of the longest value if all values in a
column are CHAR types AND not all have the same length. This VARCHAR
type will be the common type of the column in the VALUES statement.

The new behaviour is not turned on by default because it is a breaking
change.

Note that the behaviour in Hive is different from both behaviours in
Impala: Hive (and PostgreSQL) implicitly remove trailing spaces from
CHAR values when they are cast to other types, which is also lossy.

We choose VARCHAR instead of STRING as the common type because VARCHAR
can be converted to any VARCHAR type shorter or the same length and also
to STRING, while STRING cannot safely be converted to VARCHAR because
its length is not bounded - we would therefore run into problems if the
common type were STRING and the destination column were VARCHAR.

Note: although the VALUES statement is implemented as a special UNION
operation under the hood, this patch doesn't change the behaviour of
explicit UNION statements, it only applies to VALUES statements.

Note: the new VALUES_STMT_AVOID_LOSSY_CHAR_PADDING query option and
ALLOW_UNSAFE_CASTS are not allowed to be used at the same time: if both
are set to true and the query contains set operation(s), an error is
returned.

Testing:
 - Added tests verifying that unneeded padding doesn't occur and the
   queries succeed in various situations, e.g. different destination
   column types and multi-column inserts. See
   testdata/workloads/functional-query/queries/QueryTest/chars-values-clause.test

Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StatementBase.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-lossy-char-padding.test
A testdata/workloads/functional-query/queries/QueryTest/chars-values-stmt-no-lossy-char-padding.test
M tests/query_test/test_chars.py
13 files changed, 332 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/18999/14
-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 14
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 15:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9491/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 11 Jul 2023 11:29:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18999 )

Change subject: IMPALA-10753: Incorrect length when multiple CHAR(N) values are inserted
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12705/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e9e189cb3c2be0e741ca3d15a7f97ec3a1b1a86
Gerrit-Change-Number: 18999
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Mar 2023 08:39:52 +0000
Gerrit-HasComments: No