You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/01/27 12:37:00 UTC

[jira] [Commented] (IMPALA-11864) LOAD DATA should not try to load hidden files for Iceberg tables

    [ https://issues.apache.org/jira/browse/IMPALA-11864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681305#comment-17681305 ] 

ASF subversion and git services commented on IMPALA-11864:
----------------------------------------------------------

Commit 8292e4afdd4b6f5fcfbf2291f97c988c07e1a421 in impala's branch refs/heads/master from Tamas Mate
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8292e4afd ]

IMPALA-11864: Iceberg LOAD DATA should not load S3 hidden files

Loading data from S3 did not skip hidden files because the
FileSystemUtil.listFiles() call was returning a RemoteIterator, which
compared to RecursingIterator does not filter the hidden files. This
would make a load fail because the hidden files likely have invalid
magic string.

This commit adds an extra condition to skip hidden files when creating
the CREATE subquery.

Testing:
 - Added E2E test
 - Ran E2E test on S3 build

Change-Id: Iffd179383c2bb2529f6f9b5f8bf5cba5f3553652
Reviewed-on: http://gerrit.cloudera.org:8080/19441
Reviewed-by: Daniel Becker <da...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <np...@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>


> LOAD DATA should not try to load hidden files for Iceberg tables
> ----------------------------------------------------------------
>
>                 Key: IMPALA-11864
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11864
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.3.0
>            Reporter: Riddhi jain
>            Assignee: Tamas Mate
>            Priority: Major
>             Fix For: Impala 4.3.0
>
>
> Steps to reproduce: # Create an iceberg table
>  # Try to load data into the table from folder which has both hidden files as well *.parq file
> Example queries ran:
> {code:java}
> CREATE TABLE iceberg_partitioned_table1 (id int,
> bool_col boolean,
> timestamp_col timestamp)
> stored as iceberg;{code}
> {code:java}
> LOAD DATA INPATH 's3a://dwx-testdata/impala/sql_test/tests/load_data_inpath/runtime_data/0690a6fa9bfb11ed920c164053429bec/load_data_test/A/impala_data/impala_alltypessmall_data/alltypessmall_parquet_iceberg/year=2009/month=1/' OVERWRITE INTO TABLE iceberg_partitioned_table1;{code}
> It is trying to load hidden file instead of ignoring it and hence throw error saying:
> {code:java}
> AnalysisException: INPATH contains unsupported LOAD format, file '
> s3a://dwx-testdata/impala/sql_test/tests/load_data_inpath/runtime_data/0690a6fa9bfb11ed920c164053429bec/load_data_test/A/impala_data/impala_alltypessmall_data/alltypessmall_parquet_iceberg/year=2009/month=1/.hiddenfileforloaddatatest
> ' has 'This' magic string. {code}
> However when there are no hidden files in the folder it loads successfully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org