You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Li Penglin (Jira)" <ji...@apache.org> on 2023/01/16 11:15:00 UTC

[jira] [Updated] (IMPALA-11844) The Iceberg Position-Delete Table will not work if 'file_path' in DeleteFile is not Fully-qualifies

     [ https://issues.apache.org/jira/browse/IMPALA-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Li Penglin updated IMPALA-11844:
--------------------------------
    Description: 
{code:java}
Given:
Table 'test_tbl' has two files, a datafile and a deletefile.
data_file-00001.parquet:
| input__file__name                         | file__position | col_int | col_long | col_string |
| hdfs://localhost:20500/data/data_file-00001.parquet | 0          | 1     | 10       | "1-10-a"  | 
| hdfs://localhost:20500/data/data_file-00001.parquet | 1          | 2     | 20      | "2-20-a"  | 
| hdfs://localhost:20500/data/data_file-00001.parquet | 2          | 3     | 30      | "3-30-a"  | 
delete_file-00001.parquet:
| file_path                                  | pos  |
| hdfs://localhost:20500/data/data_file-00001.parquet     | 0   |
| /data/data_file-00001.parquet                    | 1   |

Expect:
select * from test_tbl;
| col_int    | col_long   | col_string  |
| 3       | 30       | "3-30-a"   | 

Actual:
| col_int    | col_long   | col_string |
| 2       | 20       | "2-20-a"   | 
| 3       | 30       | "3-30-a"   |  {code}
'file_path' in DeleteFile is not Fully-qualifies should also work properly. Maybe we should look at other engines, including how the native Iceberg api handles it.

  was:
{code:java}
Given:
Table 'test_tbl' has two files, a datafile and a deletefile.
data_file-00001.parquet:
| input__file__name                                      | file__position | col_int | col_long | col_string |
| hdfs://localhost:20500/data/data_file-00001.parquet | 0                 | 1       | 10       | "1-10-a"   | 
| hdfs://localhost:20500/data/data_file-00001.parquet | 1                 | 2       | 20       | "2-20-a"   | 
| hdfs://localhost:20500/data/data_file-00001.parquet | 2                | 3       | 30       | "3-30-a"   | 
delete_file-00001.parquet:
| file_path                                             | pos |
| hdfs://localhost:20500/data/data_file-00001.parquet     | 0   |
| /data/data_file-00001.parquet                         | 1   |Expect:
select * from test_tbl;
| col_int | col_long | col_string |
| 3       | 30       | "3-30-a"   | Actual:
| col_int | col_long | col_string |
| 2       | 20       | "2-20-a"   | 
| 3       | 30       | "3-30-a"   |  {code}
'file_path' in DeleteFile is not Fully-qualifies should also work properly. Maybe we should look at other engines, including how the native Iceberg api handles it.


> The Iceberg Position-Delete Table will not work if 'file_path' in DeleteFile is not Fully-qualifies
> ---------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-11844
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11844
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Li Penglin
>            Priority: Major
>              Labels: impala-iceberg
>
> {code:java}
> Given:
> Table 'test_tbl' has two files, a datafile and a deletefile.
> data_file-00001.parquet:
> | input__file__name                         | file__position | col_int | col_long | col_string |
> | hdfs://localhost:20500/data/data_file-00001.parquet | 0          | 1     | 10       | "1-10-a"  | 
> | hdfs://localhost:20500/data/data_file-00001.parquet | 1          | 2     | 20      | "2-20-a"  | 
> | hdfs://localhost:20500/data/data_file-00001.parquet | 2          | 3     | 30      | "3-30-a"  | 
> delete_file-00001.parquet:
> | file_path                                  | pos  |
> | hdfs://localhost:20500/data/data_file-00001.parquet     | 0   |
> | /data/data_file-00001.parquet                    | 1   |
> Expect:
> select * from test_tbl;
> | col_int    | col_long   | col_string  |
> | 3       | 30       | "3-30-a"   | 
> Actual:
> | col_int    | col_long   | col_string |
> | 2       | 20       | "2-20-a"   | 
> | 3       | 30       | "3-30-a"   |  {code}
> 'file_path' in DeleteFile is not Fully-qualifies should also work properly. Maybe we should look at other engines, including how the native Iceberg api handles it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org