You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/04/04 01:15:00 UTC

[jira] [Commented] (IMPALA-11954) Partition an Iceberg table on a string col with '/' char gives incorrect results

    [ https://issues.apache.org/jira/browse/IMPALA-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708183#comment-17708183 ] 

ASF subversion and git services commented on IMPALA-11954:
----------------------------------------------------------

Commit 826b113fd719369955079c96f968a3be4d0b9dab in impala's branch refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=826b113fd ]

IMPALA-11954: Fix for URL encoded partition columns for Iceberg tables

There is a bug when an Iceberg table has a string partition column and
Impala insert special chars into this column that need to be URL
encoded. In this case the partition name is URL encoded not to confuse
the file paths for that partition. E.g. 'b=1/2' value is converted to
'b=1%2F2'.
This if fine for path creation, however, for Iceberg tables
the same URL encoded partition name is saved into catalog as the
partition name also used for Iceberg column stats. This brings to
incorrect results when querying the table as the URL encoded values
are returned in a SELECT * query instead of what the user inserted.
Additionally, when adding a filter to the query, Iceberg will filter
out all the rows because it compares the non-encoded values to the URL
encoded values.

Testing:
  - Added new tests to iceberg-partitioned-insert.test to cover this
    scenario.
  - Re-run the existing test suite.

Change-Id: I67edc3d04738306fed0d4ebc5312f3d8d4f14254
Reviewed-on: http://gerrit.cloudera.org:8080/19654
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Partition an Iceberg table on a string col with '/' char gives incorrect results
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-11954
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11954
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.0.0
>            Reporter: Gabor Kaszab
>            Assignee: Gabor Kaszab
>            Priority: Blocker
>              Labels: correctness, impala-iceberg
>
> Repro:
> {code:java}
> CREATE TABLE IF NOT EXISTS tmp_ice
> (id int, date_string_col string)
> PARTITIONED BY SPEC (date_string_col)
> STORED AS ICEBERG;
> insert into tmp_ice select id, date_string_col from functional_parquet.alltypes;
> select * from tmp_ice where date_string_col = "09/01/09";
> {code}
> This select gives zero rows.
> However, I create the table partitioned by another col, e.g. 'id' then the very same select gives 10 rows as expected.
> The issue may be somewhere here where we split the path by '/' char:
> https://github.com/apache/impala/blob/47c71bbb32d34d4583856af227206934b6f15136/fe/src/main/java/org/apache/impala/util/IcebergUtil.java#L693



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org