You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Dudu Markovitz (JIRA)" <ji...@apache.org> on 2017/03/26 20:17:41 UTC

[jira] [Created] (HIVE-16299) In case of partitioned table, MSCK REPAIR TABLE does not do a full validation of a FS paths and in result create false partitions and directories

Dudu Markovitz created HIVE-16299:
-------------------------------------

             Summary: In case of partitioned table, MSCK REPAIR TABLE does not do a full validation of a FS paths and in result create false partitions and directories
                 Key: HIVE-16299
                 URL: https://issues.apache.org/jira/browse/HIVE-16299
             Project: Hive
          Issue Type: Bug
          Components: Metastore
    Affects Versions: storage-2.2.0
            Reporter: Dudu Markovitz
            Priority: Minor


https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java

static String getPartitionName(Path tablePath, Path partitionPath, Set<String> partCols)

------------------------------------------------------------------------------------

MSCK REPAIR validates that any sub-directory is in the format col=val and that there is indeed a partition column named "col".
However, there is no validation of the partition column location and as a result false partitions are being created and so are directories that match those partitions. 

e.g. 1

hive> dfs -mkdir -p /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5;
hive> create external table t (i int) partitioned by (a int,b int,c int) ;
OK
hive> msck repair table t;
OK
Partitions not in metastore:	t:a=1/a=2/a=3/b=4/c=5
Repair: Added partition to metastore t:a=1/a=2/a=3/b=4/c=5
Time taken: 0.563 seconds, Fetched: 2 row(s)
hive> show partitions t;
OK
a=3/b=4/c=5
hive> dfs -ls -R /user/hive/warehouse/t;
drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=1
drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2
drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2/a=3
drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2/a=3/b=4
drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5
drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=3
drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=3/b=4
drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 /user/hive/warehouse/t/a=3/b=4/c=5

e.g. 2
hive> dfs -mkdir -p /user/hive/warehouse/t/c=3/b=2/a=1;
hive> create external table t (i int) partitioned by (a int,b int,c int);
OK
hive> msck repair table t;
OK
Partitions not in metastore:	t:c=3/b=2/a=1
Repair: Added partition to metastore t:c=3/b=2/a=1
Time taken: 0.512 seconds, Fetched: 2 row(s)
hive> show partitions t;
OK
a=1/b=2/c=3
hive> dfs -ls -R  /user/hive/warehouse/t;
drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 /user/hive/warehouse/t/a=1
drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 /user/hive/warehouse/t/a=1/b=2
drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 /user/hive/warehouse/t/a=1/b=2/c=3
drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 /user/hive/warehouse/t/c=3
drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 /user/hive/warehouse/t/c=3/b=2
drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 /user/hive/warehouse/t/c=3/b=2/a=1






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)