You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/08/13 23:26:00 UTC
[jira] [Commented] (IMPALA-1652) Fix CHAR datatype: Incorrect
results with basic predicate on CHAR typed column.
[ https://issues.apache.org/jira/browse/IMPALA-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177374#comment-17177374 ]
Tim Armstrong commented on IMPALA-1652:
---------------------------------------
I looked at this again and I think the initial analysis about whether comparisons were whitespace sensitive may have made this seem harder to fix than it actually is.
There are two general approaches I can see to doing comparisons that ignore trailing whitespace from CHAR columns
# Custom comparison operators that are trailing-whitespace-insensitive.
# Stripping trailing whitespace from CHAR columns, then doing a trailing-whitespace-sensitive comparison.
Mostly the two produce the same results, with the exception if there is a non-CHAR column in the comparison with trailing spaces. In that case the first approach would ignore that trailing space and the second would preserve it.
I think we'd been assuming that we'd have to do the first, but I think the second is actually consistent with Hive's behaviour.
Postgres does do something similar to the first, at least for VARCHAR <-> CHAR comparisons, but Hive does the second:
{noformat}
postgres=# select cast('foo' as char(50)) = cast('foo ' as varchar);
?column?
----------
t
(1 row)
0: jdbc:hive2://localhost:11050> select cast('foo' as char(2)) = cast('foo ' as varchar(50));
+--------+
| _c0 |
+--------+
| false |
+--------+
{noformat}
Moreover, both systems strip whitespace when casting from CHAR to other types, e.g.
{noformat}
postgres=# select cast(cast('foo' as char(50)) as text) || '|';
?column?
----------
foo|
0: jdbc:hive2://localhost:11050> select cast(cast('foo' as char(50)) as string) || '|';
+-------+
| _c0 |
+-------+
| foo| |
+-------+
{noformat}
Impala is the clear outlier here:
{noformat}
[localhost.EXAMPLE.COM:21000] default> select cast(cast('foo' as char(50)) as string) || '|';
+------------------------------------------------------+
| concat(cast(cast('foo' as char(50)) as string), '|') |
+------------------------------------------------------+
| foo | |
+------------------------------------------------------+
{noformat}
Postgres seems to strip trailing whitespace when passing arguments to functions, but Hive does. Postgres weirdly behaves differently when using the || operator.
{noformat}
postgres=# select concat(cast('foo' as char(10)), '|');
concat
-------------
foo |
(1 row)
0: jdbc:hive2://localhost:11050> select concat(cast('foo' as char(10)), '|');
+-------+
| _c0 |
+-------+
| foo| |
+-------+
{noformat}
Where I'm going with this is that it seems like we'd get behaviour that was the same as Hive if casts from CHAR to STRING/VARCHAR stripped trailing whitespace. The downside of doing that is that we'd change some behaviour that is currently consistent with postgres and is fairly reasonable.
So I think roughly the pros and cons are:
Option 1:
* More consistent with postgres (but not fully consistent - would likely not implement all quirks)
* Slightly fewer changes from previous Impala behaviour
Option 2:
* Consistent with Hive
* Simpler to implement
* More internally consistent
So we should consider fixing this by making casts from char to string strip trailing whitespace. It would need a bit more investigation to see if that fully matches Hive behaviour, but I think it seems viable.
> Fix CHAR datatype: Incorrect results with basic predicate on CHAR typed column.
> -------------------------------------------------------------------------------
>
> Key: IMPALA-1652
> URL: https://issues.apache.org/jira/browse/IMPALA-1652
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.1, Impala 2.3.0
> Reporter: Alexander Behm
> Priority: Major
> Labels: correctness, downgraded, usability
>
> Repro:
> {code}
> create table foo(col1 char(10));
> insert into foo values (cast('test1' as char(10)));
> select * from foo where col1 = 'test1'; <-- returns an empty result set
> select * from foo where col1 = cast('test1' as char(10)); <-- correctly returns 1 row
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org