You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2019/08/27 13:24:01 UTC
[jira] [Work stopped] (IMPALA-6119) Inconsistent file metadata
updates when multiple partitions point to the same path
[ https://issues.apache.org/jira/browse/IMPALA-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on IMPALA-6119 stopped by Gabor Kaszab.
--------------------------------------------
> Inconsistent file metadata updates when multiple partitions point to the same path
> ----------------------------------------------------------------------------------
>
> Key: IMPALA-6119
> URL: https://issues.apache.org/jira/browse/IMPALA-6119
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
> Reporter: bharath v
> Assignee: Gabor Kaszab
> Priority: Major
> Labels: correctness, ramp-up
>
> Following steps can give inconsistent results.
> {noformat}
> // Create a partitioned table
> create table test(a int) partitioned by (b int);
> // Create two partitions b=1/b=2 mapped to the same HDFS location.
> insert into test partition(b=1) values (1);
> alter table test add partition (b=2) location 'hdfs://localhost:20500/test-warehouse/test/b=1/'
> [localhost:21000] > show partitions test;
> Query: show partitions test
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> | b | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> | 1 | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://localhost:20500/test-warehouse/test/b=1 |
> | 2 | -1 | 1 | 2B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://localhost:20500/test-warehouse/test/b=1 |
> | Total | -1 | 2 | 4B | 0B | | | | |
> +-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
> // Insert new data into one of the partitions
> insert into test partition(b=1) values (2);
> // Newly added file is reflected only in the added partition files.
> show files in test;
> Query: show files in test
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | Path | Size | Partition |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. | 2B | b=1 |
> | hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B | b=1 |
> | hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B | b=2 |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> invalidate metadata test;
> show files in test;
> // After invalidation, the newly added file now shows up in both the partitions.
> Query: show files in test
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | Path | Size | Partition |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> | hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. | 2B | b=1 |
> | hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B | b=1 |
> | hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. | 2B | b=2 |
> | hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B | b=2 |
> +----------------------------------------------------------------------------------------------------+------+-----------+
> {noformat}
> So, depending whether the user invalidates the table, they can see different results. The bug is in the following code.
> {noformat}
> private FileMetadataLoadStats resetAndLoadFileMetadata(
> Path partDir, List<HdfsPartition> partitions) throws IOException {
> FileMetadataLoadStats loadStats = new FileMetadataLoadStats(partDir);
> ....
> ....
> ....
> for (HdfsPartition partition: partitions) partition.setFileDescriptors(newFileDescs); <======
> {noformat}
> We only update the added file metadata for the new partition (copy-on-write way). Instead we should update the source descriptors so that it is reflected in the other partitions too.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org