You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Venugopal Reddy K (Jira)" <ji...@apache.org> on 2022/12/15 10:18:00 UTC

[jira] [Updated] (HIVE-26860) Appropriate rows in HMS datastore tables(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted upon drop partition table with skewed columns

     [ https://issues.apache.org/jira/browse/HIVE-26860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Venugopal Reddy K updated HIVE-26860:
-------------------------------------
    Description: 
*[Description]*

Appropriate rows in HMS backing datastore tables(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted upon drop partition table with skewed columns.

*[Steps to reproduce]*

1. Create stage table, load some data into stage table, create partition table with skewed columns and load data into that table from the stage table. partdata3 file is attached. It has 2 partitions.

 
{code:java}
create database mydb;
use mydb;
{code}
 
{code:java}
create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;
{code}
{code:java}
load data local inpath 'partdata3' into table stage;{code}
{code:java}
create table skewpart(num int, name string) partitioned by (category string) skewed by(num) on (3,4) stored as directories row format delimited fields terminated by ',' stored as textfile;{code}
{code:java}
insert into skewpart select * from stage;{code}
 

2. Verify warehouse directory table data is correct
{code:java}
kvenureddy@192 category=fruit % ls   
HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME    num=4
kvenureddy@192 category=fruit % pwd
/private/tmp/warehouse/external/mydb.db/skewpart/category=fruit
kvenureddy@192 category=fruit % cd num=4 
kvenureddy@192 num=4 % pwd
/private/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4
kvenureddy@192 num=4 % cat 000000_0 
4,cherry
kvenureddy@192 num=4 % cd ../
kvenureddy@192 category=fruit % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
1,apple
2,banana
6,mango
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd../../
zsh: no such file or directory: cd../../
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../../
kvenureddy@192 skewpart % pwd
/private/tmp/warehouse/external/mydb.db/skewpart
kvenureddy@192 skewpart % ls
category=fruit        category=vegetable
kvenureddy@192 skewpart % cd category=vegetable 
kvenureddy@192 category=vegetable % pwd
/private/tmp/warehouse/external/mydb.db/skewpart/category=vegetable
kvenureddy@192 category=vegetable % ls
HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME    num=3
kvenureddy@192 category=vegetable % cd num=3 
kvenureddy@192 num=3 % cat 000000_0 
3,carrot
kvenureddy@192 num=3 % cd ../
kvenureddy@192 category=vegetable % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
5,potato
7,tomato
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % {code}
 

3. Verify HMS backing datastore tables after creating and loading into partition+skewed table.

Note: Tables having issue(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are shown below.

SD_ID=2 row is added during create table. SD_ID=3 and 4 rows were added because there are 2 partitions during data load.
||SDS||
||SD_ID||CD_ID||INPUT_FORMAT||IS_COMPRESSED||IS_STOREDASSUBDIRECTORIES||LOCATION||NUM_BUCKETS||OUTPUT_FORMAT||SERDE_ID||
|1|1|org.apache.hadoop.mapred.TextInputFormat|0|0|file:/tmp/warehouse/external/mydb.db/stage|-1|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|1|
|{color:#4c9aff}2{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart|file:///tmp/warehouse/external/mydb.db/skewpart]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}2{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}3{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}4{color}|
||SERDES||
||SERDE_ID||NAME||SLIB||DESCRIPTION||SERIALIZER_CLASS||DESERIALIZER_CLASS||SERDE_TYPE||
|1| |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | | |0|
|{color:#4c9aff}2{color}| |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#4c9aff}0{color}|
|{color:#4c9aff}3{color}| |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#4c9aff}0{color}|
|{color:#4c9aff}4{color}| |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#4c9aff}0{color}|
||SERDE_PARAMS||
||SERDE_ID||PARAM_KEY||PARAM_VALUE||
|1|field.delim|,|
|1|serialization.format|,|
|{color:#4c9aff}2{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
|{color:#4c9aff}2{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
||SKEWED_COL_NAMES||
||SD_ID||SKEWED_COL_NAME||INTEGER_IDX||
|{color:#4c9aff}2{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
||SKEWED_COL_VALUE_LOC_MAP||
||SD_ID||STRING_LIST_ID_KID||LOCATION||
|{color:#4c9aff}3{color}|{color:#4c9aff}3{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}14{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}6{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}11{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
||SKEWED_VALUES||
||SD_ID_OID||STRING_LIST_ID_EID||INTEGER_IDX||
|{color:#4c9aff}2{color}|{color:#4c9aff}1{color}|{color:#4c9aff}0{color}|
|{color:#4c9aff}2{color}|{color:#4c9aff}2{color}|{color:#4c9aff}1{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}9{color}|{color:#4c9aff}0{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}10{color}|{color:#4c9aff}1{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}12{color}|{color:#4c9aff}0{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}13{color}|{color:#4c9aff}1{color}|

 

4. Drop *skewpart* table and verify HMS backing datastore tables again.

*Row with SD_ID=2 is deleted. But rows with SD_ID 3 and 4 are not deleted. They remain in the database forever.*
||SDS||
||SD_ID||CD_ID||INPUT_FORMAT||IS_COMPRESSED||IS_STOREDASSUBDIRECTORIES||LOCATION||NUM_BUCKETS||OUTPUT_FORMAT||SERDE_ID||
|1|1|org.apache.hadoop.mapred.TextInputFormat|0|0|file:/tmp/warehouse/external/mydb.db/stage|-1|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|1|
|{color:#de350b}3{color}|{color:#de350b}2{color}|{color:#de350b}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#de350b}0{color}|{color:#de350b}1{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable]{color}|{color:#de350b}-1{color}|{color:#de350b}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#de350b}3{color}|
|{color:#de350b}4{color}|{color:#de350b}2{color}|{color:#de350b}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#de350b}0{color}|{color:#de350b}1{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit]{color}|{color:#de350b}-1{color}|{color:#de350b}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#de350b}4{color}|

*Row with SERDE_ID=2 is deleted. But rows with SERDE_ID 3 and 4 are not deleted since they are being reference from SDS table.*
||SERDES||
||SERDE_ID||NAME||SLIB||DESCRIPTION||SERIALIZER_CLASS||DESERIALIZER_CLASS||SERDE_TYPE||
|1| |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | | |0|
|{color:#de350b}3{color}| |{color:#de350b}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#de350b}0{color}|
|{color:#de350b}4{color}| |{color:#de350b}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#de350b}0{color}|
||SERDE_PARAMS||
||SERDE_ID||PARAM_KEY||PARAM_VALUE||
|1|field.delim|,|
|1|serialization.format|,|
|{color:#de350b}3{color}|{color:#de350b}field.delim{color}|{color:#de350b},{color}|
|{color:#de350b}3{color}|{color:#de350b}serialization.format{color}|{color:#de350b},{color}|
|{color:#de350b}4{color}|{color:#de350b}field.delim{color}|{color:#de350b},{color}|
|{color:#de350b}4{color}|{color:#de350b}serialization.format{color}|{color:#de350b},{color}|

Rows in SKEWED tables which has reference to SD_ID=3 and 4 remains forever.
||SKEWED_COL_NAMES||
||SD_ID||SKEWED_COL_NAME||INTEGER_IDX||
|{color:#de350b}3{color}|{color:#de350b}num{color}|{color:#de350b}0{color}|
|{color:#de350b}4{color}|{color:#de350b}num{color}|{color:#de350b}0{color}|
||SKEWED_COL_VALUE_LOC_MAP||
||SD_ID||STRING_LIST_ID_KID||LOCATION||
|{color:#de350b}3{color}|{color:#de350b}3{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
|{color:#de350b}3{color}|{color:#de350b}14{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
|{color:#de350b}4{color}|{color:#de350b}6{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
|{color:#de350b}4{color}|{color:#de350b}11{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
||SKEWED_VALUES||
||SD_ID_OID||STRING_LIST_ID_EID||INTEGER_IDX||
|{color:#de350b}4{color}|{color:#de350b}9{color}|{color:#de350b}0{color}|
|{color:#de350b}4{color}|{color:#de350b}10{color}|{color:#de350b}1{color}|
|{color:#de350b}3{color}|{color:#de350b}12{color}|{color:#de350b}0{color}|
|{color:#de350b}3{color}|{color:#de350b}13{color}|{color:#de350b}1{color}|

  was:
*[Description]*

Appropriate rows in HMS backing datastore tables(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted upon drop partition table with skewed columns.

*[Steps to reproduce]*

1. Create stage table, load some data into stage table, create partition table with skewed columns and load data into that table from the stage table. partdata3 file is attached. It has 2 partitions.

 
{code:java}
create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;
{code}
 

 
{code:java}
load data local inpath 'partdata3' into table stage;{code}
 

 
{code:java}
create table skewpart(num int, name string) partitioned by (category string) skewed by(num) on (3,4) stored as directories row format delimited fields terminated by ',' stored as textfile;{code}
 

 
{code:java}
insert into skewpart select * from stage;{code}
 

 

2. Verify warehouse directory table data is correct

 
{code:java}
kvenureddy@192 category=fruit % ls   
HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME    num=4
kvenureddy@192 category=fruit % pwd
/private/tmp/warehouse/external/mydb.db/skewpart/category=fruit
kvenureddy@192 category=fruit % cd num=4 
kvenureddy@192 num=4 % pwd
/private/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4
kvenureddy@192 num=4 % cat 000000_0 
4,cherry
kvenureddy@192 num=4 % cd ../
kvenureddy@192 category=fruit % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
1,apple
2,banana
6,mango
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd../../
zsh: no such file or directory: cd../../
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../../
kvenureddy@192 skewpart % pwd
/private/tmp/warehouse/external/mydb.db/skewpart
kvenureddy@192 skewpart % ls
category=fruit        category=vegetable
kvenureddy@192 skewpart % cd category=vegetable 
kvenureddy@192 category=vegetable % pwd
/private/tmp/warehouse/external/mydb.db/skewpart/category=vegetable
kvenureddy@192 category=vegetable % ls
HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME    num=3
kvenureddy@192 category=vegetable % cd num=3 
kvenureddy@192 num=3 % cat 000000_0 
3,carrot
kvenureddy@192 num=3 % cd ../
kvenureddy@192 category=vegetable % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
5,potato
7,tomato
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % {code}
 

 

3. Verify HMS backing datastore tables after creating and loading into partition+skewed table.

Note: Tables having issue(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are shown below.

SD_ID=2 row is added during create table. SD_ID=3 and 4 rows were added [^partdata3] there are 2 partitions during data load.
||SDS||
||SD_ID||CD_ID||INPUT_FORMAT||IS_COMPRESSED||IS_STOREDASSUBDIRECTORIES||LOCATION||NUM_BUCKETS||OUTPUT_FORMAT||SERDE_ID||
|1|1|org.apache.hadoop.mapred.TextInputFormat|0|0|file:/tmp/warehouse/external/mydb.db/stage|-1|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|1|
|{color:#4c9aff}2{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}file:/tmp/warehouse/external/mydb.db/skewpart{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}2{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}3{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}4{color}|

||SERDES||
||SERDE_ID||NAME||SLIB||DESCRIPTION||SERIALIZER_CLASS||DESERIALIZER_CLASS||SERDE_TYPE||
|1| |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | | |0|
|{color:#4c9aff}2{color}| |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#4c9aff}0{color}|
|{color:#4c9aff}3{color}| |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#4c9aff}0{color}|
|{color:#4c9aff}4{color}| |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#4c9aff}0{color}|

||SERDE_PARAMS||
||SERDE_ID||PARAM_KEY||PARAM_VALUE||
|1|field.delim|,|
|1|serialization.format|,|
|{color:#4c9aff}2{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
|{color:#4c9aff}2{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|

||SKEWED_COL_NAMES||
||SD_ID||SKEWED_COL_NAME||INTEGER_IDX||
|{color:#4c9aff}2{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|

||SKEWED_COL_VALUE_LOC_MAP||
||SD_ID||STRING_LIST_ID_KID||LOCATION||
|{color:#4c9aff}3{color}|{color:#4c9aff}3{color}|{color:#4c9aff}file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}14{color}|{color:#4c9aff}file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}6{color}|{color:#4c9aff}file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}11{color}|{color:#4c9aff}file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4{color}|

||SKEWED_VALUES||
||SD_ID_OID||STRING_LIST_ID_EID||INTEGER_IDX||
|{color:#4c9aff}2{color}|{color:#4c9aff}1{color}|{color:#4c9aff}0{color}|
|{color:#4c9aff}2{color}|{color:#4c9aff}2{color}|{color:#4c9aff}1{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}9{color}|{color:#4c9aff}0{color}|
|{color:#4c9aff}4{color}|{color:#4c9aff}10{color}|{color:#4c9aff}1{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}12{color}|{color:#4c9aff}0{color}|
|{color:#4c9aff}3{color}|{color:#4c9aff}13{color}|{color:#4c9aff}1{color}|

 

4. Drop *skewpart* table and verify HMS backing datastore tables again.

*Row with SD_ID=2 is deleted. But rows with SD_ID 3 and 4 are not deleted. They remain the database forever.*
||SDS||
||SD_ID||CD_ID||INPUT_FORMAT||IS_COMPRESSED||IS_STOREDASSUBDIRECTORIES||LOCATION||NUM_BUCKETS||OUTPUT_FORMAT||SERDE_ID||
|1|1|org.apache.hadoop.mapred.TextInputFormat|0|0|file:/tmp/warehouse/external/mydb.db/stage|-1|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|1|
|{color:#de350b}3{color}|{color:#de350b}2{color}|{color:#de350b}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#de350b}0{color}|{color:#de350b}1{color}|{color:#de350b}file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable{color}|{color:#de350b}-1{color}|{color:#de350b}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#de350b}3{color}|
|{color:#de350b}4{color}|{color:#de350b}2{color}|{color:#de350b}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#de350b}0{color}|{color:#de350b}1{color}|{color:#de350b}file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit{color}|{color:#de350b}-1{color}|{color:#de350b}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#de350b}4{color}|

*Row with SERDE_ID=2 is deleted. But rows with SERDE_ID 3 and 4 are not deleted since they are being reference from SD table.*
||SERDES||
||SERDE_ID||NAME||SLIB||DESCRIPTION||SERIALIZER_CLASS||DESERIALIZER_CLASS||SERDE_TYPE||
|1| |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | | |0|
|{color:#de350b}3{color}| |{color:#de350b}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#de350b}0{color}|
|{color:#de350b}4{color}| |{color:#de350b}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#de350b}0{color}|

||SERDE_PARAMS||
||SERDE_ID||PARAM_KEY||PARAM_VALUE||
|1|field.delim|,|
|1|serialization.format|,|
|{color:#de350b}3{color}|{color:#de350b}field.delim{color}|{color:#de350b},{color}|
|{color:#de350b}3{color}|{color:#de350b}serialization.format{color}|{color:#de350b},{color}|
|{color:#de350b}4{color}|{color:#de350b}field.delim{color}|{color:#de350b},{color}|
|{color:#de350b}4{color}|{color:#de350b}serialization.format{color}|{color:#de350b},{color}|

Rows in SKEWED tables which has reference to SD_ID=3 and 4 remains forever.
||SKEWED_COL_NAMES||
||SD_ID||SKEWED_COL_NAME||INTEGER_IDX||
|{color:#de350b}3{color}|{color:#de350b}num{color}|{color:#de350b}0{color}|
|{color:#de350b}4{color}|{color:#de350b}num{color}|{color:#de350b}0{color}|

||SKEWED_COL_VALUE_LOC_MAP||
||SD_ID||STRING_LIST_ID_KID||LOCATION||
|{color:#de350b}3{color}|{color:#de350b}3{color}|{color:#de350b}file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3{color}|
|{color:#de350b}3{color}|{color:#de350b}14{color}|{color:#de350b}file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3{color}|
|{color:#de350b}4{color}|{color:#de350b}6{color}|{color:#de350b}file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4{color}|
|{color:#de350b}4{color}|{color:#de350b}11{color}|{color:#de350b}file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4{color}|

||SKEWED_VALUES||
||SD_ID_OID||STRING_LIST_ID_EID||INTEGER_IDX||
|{color:#de350b}4{color}|{color:#de350b}9{color}|{color:#de350b}0{color}|
|{color:#de350b}4{color}|{color:#de350b}10{color}|{color:#de350b}1{color}|
|{color:#de350b}3{color}|{color:#de350b}12{color}|{color:#de350b}0{color}|
|{color:#de350b}3{color}|{color:#de350b}13{color}|{color:#de350b}1{color}|


> Appropriate rows in HMS datastore tables(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted upon drop partition table with skewed columns
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26860
>                 URL: https://issues.apache.org/jira/browse/HIVE-26860
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Metastore
>            Reporter: Venugopal Reddy K
>            Priority: Major
>         Attachments: image-2022-12-15-14-26-26-131.png, image-2022-12-15-14-31-23-854.png, image-2022-12-15-14-32-55-280.png, partdata3
>
>
> *[Description]*
> Appropriate rows in HMS backing datastore tables(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are not deleted upon drop partition table with skewed columns.
> *[Steps to reproduce]*
> 1. Create stage table, load some data into stage table, create partition table with skewed columns and load data into that table from the stage table. partdata3 file is attached. It has 2 partitions.
>  
> {code:java}
> create database mydb;
> use mydb;
> {code}
>  
> {code:java}
> create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;
> {code}
> {code:java}
> load data local inpath 'partdata3' into table stage;{code}
> {code:java}
> create table skewpart(num int, name string) partitioned by (category string) skewed by(num) on (3,4) stored as directories row format delimited fields terminated by ',' stored as textfile;{code}
> {code:java}
> insert into skewpart select * from stage;{code}
>  
> 2. Verify warehouse directory table data is correct
> {code:java}
> kvenureddy@192 category=fruit % ls   
> HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME    num=4
> kvenureddy@192 category=fruit % pwd
> /private/tmp/warehouse/external/mydb.db/skewpart/category=fruit
> kvenureddy@192 category=fruit % cd num=4 
> kvenureddy@192 num=4 % pwd
> /private/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4
> kvenureddy@192 num=4 % cat 000000_0 
> 4,cherry
> kvenureddy@192 num=4 % cd ../
> kvenureddy@192 category=fruit % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
> 1,apple
> 2,banana
> 6,mango
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd../../
> zsh: no such file or directory: cd../../
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../../
> kvenureddy@192 skewpart % pwd
> /private/tmp/warehouse/external/mydb.db/skewpart
> kvenureddy@192 skewpart % ls
> category=fruit        category=vegetable
> kvenureddy@192 skewpart % cd category=vegetable 
> kvenureddy@192 category=vegetable % pwd
> /private/tmp/warehouse/external/mydb.db/skewpart/category=vegetable
> kvenureddy@192 category=vegetable % ls
> HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME    num=3
> kvenureddy@192 category=vegetable % cd num=3 
> kvenureddy@192 num=3 % cat 000000_0 
> 3,carrot
> kvenureddy@192 num=3 % cd ../
> kvenureddy@192 category=vegetable % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
> 5,potato
> 7,tomato
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % {code}
>  
> 3. Verify HMS backing datastore tables after creating and loading into partition+skewed table.
> Note: Tables having issue(SDS, SERDES, SERDE_PARAMS, SKEWED_COL_NAMES, SKEWED_COL_VALUE_LOC_MAP, SKEWED_VALUES) are shown below.
> SD_ID=2 row is added during create table. SD_ID=3 and 4 rows were added because there are 2 partitions during data load.
> ||SDS||
> ||SD_ID||CD_ID||INPUT_FORMAT||IS_COMPRESSED||IS_STOREDASSUBDIRECTORIES||LOCATION||NUM_BUCKETS||OUTPUT_FORMAT||SERDE_ID||
> |1|1|org.apache.hadoop.mapred.TextInputFormat|0|0|file:/tmp/warehouse/external/mydb.db/stage|-1|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|1|
> |{color:#4c9aff}2{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart|file:///tmp/warehouse/external/mydb.db/skewpart]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}2{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}3{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}2{color}|{color:#4c9aff}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#4c9aff}0{color}|{color:#4c9aff}1{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit]{color}|{color:#4c9aff}-1{color}|{color:#4c9aff}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#4c9aff}4{color}|
> ||SERDES||
> ||SERDE_ID||NAME||SLIB||DESCRIPTION||SERIALIZER_CLASS||DESERIALIZER_CLASS||SERDE_TYPE||
> |1| |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | | |0|
> |{color:#4c9aff}2{color}| |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#4c9aff}0{color}|
> |{color:#4c9aff}3{color}| |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#4c9aff}0{color}|
> |{color:#4c9aff}4{color}| |{color:#4c9aff}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#4c9aff}0{color}|
> ||SERDE_PARAMS||
> ||SERDE_ID||PARAM_KEY||PARAM_VALUE||
> |1|field.delim|,|
> |1|serialization.format|,|
> |{color:#4c9aff}2{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
> |{color:#4c9aff}2{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}field.delim{color}|{color:#4c9aff},{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}serialization.format{color}|{color:#4c9aff},{color}|
> ||SKEWED_COL_NAMES||
> ||SD_ID||SKEWED_COL_NAME||INTEGER_IDX||
> |{color:#4c9aff}2{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}num{color}|{color:#4c9aff}0{color}|
> ||SKEWED_COL_VALUE_LOC_MAP||
> ||SD_ID||STRING_LIST_ID_KID||LOCATION||
> |{color:#4c9aff}3{color}|{color:#4c9aff}3{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}14{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}6{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}11{color}|{color:#4c9aff}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
> ||SKEWED_VALUES||
> ||SD_ID_OID||STRING_LIST_ID_EID||INTEGER_IDX||
> |{color:#4c9aff}2{color}|{color:#4c9aff}1{color}|{color:#4c9aff}0{color}|
> |{color:#4c9aff}2{color}|{color:#4c9aff}2{color}|{color:#4c9aff}1{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}9{color}|{color:#4c9aff}0{color}|
> |{color:#4c9aff}4{color}|{color:#4c9aff}10{color}|{color:#4c9aff}1{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}12{color}|{color:#4c9aff}0{color}|
> |{color:#4c9aff}3{color}|{color:#4c9aff}13{color}|{color:#4c9aff}1{color}|
>  
> 4. Drop *skewpart* table and verify HMS backing datastore tables again.
> *Row with SD_ID=2 is deleted. But rows with SD_ID 3 and 4 are not deleted. They remain in the database forever.*
> ||SDS||
> ||SD_ID||CD_ID||INPUT_FORMAT||IS_COMPRESSED||IS_STOREDASSUBDIRECTORIES||LOCATION||NUM_BUCKETS||OUTPUT_FORMAT||SERDE_ID||
> |1|1|org.apache.hadoop.mapred.TextInputFormat|0|0|file:/tmp/warehouse/external/mydb.db/stage|-1|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|1|
> |{color:#de350b}3{color}|{color:#de350b}2{color}|{color:#de350b}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#de350b}0{color}|{color:#de350b}1{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable]{color}|{color:#de350b}-1{color}|{color:#de350b}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#de350b}3{color}|
> |{color:#de350b}4{color}|{color:#de350b}2{color}|{color:#de350b}org.apache.hadoop.mapred.TextInputFormat{color}|{color:#de350b}0{color}|{color:#de350b}1{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit]{color}|{color:#de350b}-1{color}|{color:#de350b}org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat{color}|{color:#de350b}4{color}|
> *Row with SERDE_ID=2 is deleted. But rows with SERDE_ID 3 and 4 are not deleted since they are being reference from SDS table.*
> ||SERDES||
> ||SERDE_ID||NAME||SLIB||DESCRIPTION||SERIALIZER_CLASS||DESERIALIZER_CLASS||SERDE_TYPE||
> |1| |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe| | | |0|
> |{color:#de350b}3{color}| |{color:#de350b}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#de350b}0{color}|
> |{color:#de350b}4{color}| |{color:#de350b}org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe{color}| | | |{color:#de350b}0{color}|
> ||SERDE_PARAMS||
> ||SERDE_ID||PARAM_KEY||PARAM_VALUE||
> |1|field.delim|,|
> |1|serialization.format|,|
> |{color:#de350b}3{color}|{color:#de350b}field.delim{color}|{color:#de350b},{color}|
> |{color:#de350b}3{color}|{color:#de350b}serialization.format{color}|{color:#de350b},{color}|
> |{color:#de350b}4{color}|{color:#de350b}field.delim{color}|{color:#de350b},{color}|
> |{color:#de350b}4{color}|{color:#de350b}serialization.format{color}|{color:#de350b},{color}|
> Rows in SKEWED tables which has reference to SD_ID=3 and 4 remains forever.
> ||SKEWED_COL_NAMES||
> ||SD_ID||SKEWED_COL_NAME||INTEGER_IDX||
> |{color:#de350b}3{color}|{color:#de350b}num{color}|{color:#de350b}0{color}|
> |{color:#de350b}4{color}|{color:#de350b}num{color}|{color:#de350b}0{color}|
> ||SKEWED_COL_VALUE_LOC_MAP||
> ||SD_ID||STRING_LIST_ID_KID||LOCATION||
> |{color:#de350b}3{color}|{color:#de350b}3{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
> |{color:#de350b}3{color}|{color:#de350b}14{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3|file:///tmp/warehouse/external/mydb.db/skewpart/category=vegetable/num=3]{color}|
> |{color:#de350b}4{color}|{color:#de350b}6{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
> |{color:#de350b}4{color}|{color:#de350b}11{color}|{color:#de350b}[file:/tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4|file:///tmp/warehouse/external/mydb.db/skewpart/category=fruit/num=4]{color}|
> ||SKEWED_VALUES||
> ||SD_ID_OID||STRING_LIST_ID_EID||INTEGER_IDX||
> |{color:#de350b}4{color}|{color:#de350b}9{color}|{color:#de350b}0{color}|
> |{color:#de350b}4{color}|{color:#de350b}10{color}|{color:#de350b}1{color}|
> |{color:#de350b}3{color}|{color:#de350b}12{color}|{color:#de350b}0{color}|
> |{color:#de350b}3{color}|{color:#de350b}13{color}|{color:#de350b}1{color}|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)