You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "bharath v (JIRA)" <ji...@apache.org> on 2017/10/27 16:35:00 UTC

[jira] [Created] (IMPALA-6119) Inconsistent file metadata updates when multiple partitions point to the same path

bharath v created IMPALA-6119:
---------------------------------

             Summary: Inconsistent file metadata updates when multiple partitions point to the same path
                 Key: IMPALA-6119
                 URL: https://issues.apache.org/jira/browse/IMPALA-6119
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 2.10.0, Impala 2.9.0, Impala 2.8.0
            Reporter: bharath v
            Priority: Critical


Following steps can give inconsistent results.

{noformat}
// Create a partitioned table
create table test(a int) partitioned by (b int);
// Create two partitions b=1/b=2 mapped to the same HDFS location.
insert into test partition(b=1) values (1);
alter table test add partition (b=2) location 'hdfs://localhost:20500/test-warehouse/test/b=1/' 
[localhost:21000] > show partitions test;
Query: show partitions test
+-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
| b     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location                                       |
+-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
| 1     | -1    | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/test/b=1 |
| 2     | -1    | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/test/b=1 |
| Total | -1    | 2      | 4B   | 0B           |                   |        |                   |                                                |
+-------+-------+--------+------+--------------+-------------------+--------+-------------------+------------------------------------------------+
// Insert new data into one of the partitions
insert into test partition(b=1) values (2);

// Newly added file is reflected only in the added partition files. 
show files in test;
Query: show files in test
+----------------------------------------------------------------------------------------------------+------+-----------+
| Path                                                                                               | Size | Partition |
+----------------------------------------------------------------------------------------------------+------+-----------+
| hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. | 2B   | b=1       |
| hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B   | b=1       |
| hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B   | b=2       |
+----------------------------------------------------------------------------------------------------+------+-----------+
invalidate metadata test;
 show files in test;

// After invalidation, the newly added file now shows up in both the partitions.
Query: show files in test
+----------------------------------------------------------------------------------------------------+------+-----------+
| Path                                                                                               | Size | Partition |
+----------------------------------------------------------------------------------------------------+------+-----------+
| hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. | 2B   | b=1       |
| hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B   | b=1       |
| hdfs://localhost:20500/test-warehouse/test/b=1/2e44cd49e8c3d30d-572fc97800000000_627280230_data.0. | 2B   | b=2       |
| hdfs://localhost:20500/test-warehouse/test/b=1/e44245ad5c0ef020-a08716d00000000_1244237483_data.0. | 2B   | b=2       |
+----------------------------------------------------------------------------------------------------+------+-----------+
{noformat}

So, depending whether the user invalidates the table, they can see different results. The bug is in the following code.

{noformat}
private FileMetadataLoadStats resetAndLoadFileMetadata(
      Path partDir, List<HdfsPartition> partitions) throws IOException {
    FileMetadataLoadStats loadStats = new FileMetadataLoadStats(partDir);
....
....
....
 for (HdfsPartition partition: partitions) partition.setFileDescriptors(newFileDescs);  <======
{noformat}

We only update the added file metadata for the new partition (copy-on-write way). Instead we should update the source descriptors so that it is reflected in the other partitions too.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)