You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2023/03/06 13:04:00 UTC

[jira] [Commented] (IMPALA-11954) Partition an Iceberg table on a string col with '/' char gives incorrect results

    [ https://issues.apache.org/jira/browse/IMPALA-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696937#comment-17696937 ] 

Gabor Kaszab commented on IMPALA-11954:
---------------------------------------

Also, the string values themselves are off if written by Impala not just the stats.
{code:java}
+------+-----------------+
| id   | date_string_col |
+------+-----------------+
| 7299 | 12%2F31%2F10    |
| 7298 | 12%2F31%2F10    |
| 7297 | 12%2F31%2F10    |
| 7296 | 12%2F31%2F10    |
| 7295 | 12%2F31%2F10    |
| 7294 | 12%2F31%2F10    |
| 7293 | 12%2F31%2F10    |
| 7292 | 12%2F31%2F10    |
| 7291 | 12%2F31%2F10    |
| 7290 | 12%2F31%2F10    |
| 7299 | 12/31/10        |
| 7298 | 12/31/10        |
| 7297 | 12/31/10        |
| 7296 | 12/31/10        |
| 7295 | 12/31/10        |
| 7294 | 12/31/10        |
| 7293 | 12/31/10        |
| 7292 | 12/31/10        |
| 7291 | 12/31/10        |
| 7290 | 12/31/10        |
+------+-----------------+ {code}
Here, I run the same insert from Impala and from Hive. Apparently, Impala also 'escape' the value of the string partition column and it is visible in the output too. This also seems incorrect for me.

The query I ran in both engines: 
{code:java}
insert into tmp_ice select id, date_string_col from functional_parquet.alltypes where date_string_col='12/31/10'; {code}

> Partition an Iceberg table on a string col with '/' char gives incorrect results
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-11954
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11954
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.0.0
>            Reporter: Gabor Kaszab
>            Priority: Blocker
>              Labels: correctness, impala-iceberg
>
> Repro:
> {code:java}
> CREATE TABLE IF NOT EXISTS tmp_ice
> (id int, date_string_col string)
> PARTITIONED BY SPEC (date_string_col)
> STORED AS ICEBERG;
> insert into tmp_ice select id, date_string_col from functional_parquet.alltypes;
> select * from tmp_ice where date_string_col = "09/01/09";
> {code}
> This select gives zero rows.
> However, I create the table partitioned by another col, e.g. 'id' then the very same select gives 10 rows as expected.
> The issue may be somewhere here where we split the path by '/' char:
> https://github.com/apache/impala/blob/47c71bbb32d34d4583856af227206934b6f15136/fe/src/main/java/org/apache/impala/util/IcebergUtil.java#L693



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org