You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/09/06 23:16:00 UTC

[jira] [Commented] (IMPALA-11507) Impala cannot read Iceberg tables where DataFile is not under 'table location'

    [ https://issues.apache.org/jira/browse/IMPALA-11507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601043#comment-17601043 ] 

ASF subversion and git services commented on IMPALA-11507:
----------------------------------------------------------

Commit cc26f345a40d10cd5d0dc69f1dc3623fdddf16fd in impala's branch refs/heads/master from LPL
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cc26f345a ]

IMPALA-11507: Use absolute_path when Iceberg data files are outside of the table location

For Iceberg tables, when one of the following properties is used, it is
considered that the table is possible to have data outside the table
location directory:
- 'write.object-storage.enabled' is true
- 'write.data.path' is not empty
- 'write.location-provider.impl' is configured
- 'write.object-storage.path'(Deprecated) is not empty
- 'write.folder-storage.path'(Deprecated) is not empty

We should tolerate the situation that relative path of the data files
cannot be obtained by the table location path, and we could use the
absolute path in that case. E.g. the ETL program will write the table
that the metadata of the Iceberg tables is placed in
'hdfs://nameservice_meta/warehouse/hadoop_catalog/ice_tbl/metadata',
the recent data files in
'hdfs://nameservice_data/warehouse/hadoop_catalog/ice_tbl/data', and the
data files half a year ago in
's3a://nameservice_data/warehouse/hadoop_catalog/ice_tbl/data', it
should still be queried normally by Impala.

Testing:
 - added e2e tests

Change-Id: I666bed21d20d5895f4332e92eb30a94fa24250be
Reviewed-on: http://gerrit.cloudera.org:8080/18894
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Impala cannot read Iceberg tables where DataFile is not under 'table location'
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-11507
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11507
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: LiPenglin
>            Assignee: LiPenglin
>            Priority: Major
>              Labels: impala-iceberg
>
> step1:
> create Iceberg table 'ice_tbl', location is 'hdfs://localhost:20500/ice_tbl_x'
> step2:
> Using 'org.apache.iceberg.AppendFiles' to commit data
> 'hdfs://localhost:20500/ice_tbl_y/data/00001-1-486e37ac-0adc-4f32-b209-4ade2574d3c0-00004.parquet' to 'ice_tbl'
> step3:
> {code:java}
> create external table ice_tbl
> stored as iceberg
> location 'hdfs://localhost:20500/ice_tbl_x'
> tblproperties('iceberg.catalog' = 'hadoop.tables'); {code}
> step4:
> When we query 'ice_tbl', it will throw exception at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java#L513



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org