You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Venugopal Reddy K (Jira)" <ji...@apache.org> on 2022/12/15 11:25:00 UTC
[jira] [Updated] (HIVE-26861) Skewed column table load do not work as expected if the user data for skewed column is not in lowercase.
[ https://issues.apache.org/jira/browse/HIVE-26861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Venugopal Reddy K updated HIVE-26861:
-------------------------------------
Description:
*[Description]*
Skewed table with case sensitive data on skewed column do not work as expected. S{color:#172b4d}kewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). Otherwise it doesn't work.{color}
*[Steps to reproduce]*
1. Create stage table, load some data into stage table, create table with a skewed column and load data into that table from the stage table. data file is attached below.
{code:java}
0: jdbc:hive2://localhost:10000> create database mydb;
0: jdbc:hive2://localhost:10000> use mydb;
{code}
{code:java}
0: jdbc:hive2://localhost:10000> create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;{code}
{code:java}
0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table stage;{code}
{code:java}
0: jdbc:hive2://localhost:10000> select * from stage;
+------------+-------------+-----------------+
| stage.num | stage.name | stage.category |
+------------+-------------+-----------------+
| 1 | apple | Fruit |
| 2 | banana | Fruit |
| 3 | carrot | vegetable |
| 4 | cherry | Fruit |
| 5 | potato | vegetable |
| 6 | mango | Fruit |
| 7 | tomato | vegetable |
+------------+-------------+-----------------+
7 rows selected (2.688 seconds)
{code}
{code:java}
0: jdbc:hive2://localhost:10000> create table skew(num int, name string, category string) skewed by(category) on ('Fruit','Vegetable') stored as directories row format delimited fields terminated by ',' stored as textfile;{code}
{code:java}
0: jdbc:hive2://localhost:10000> insert into skew select * from stage;{code}
2. Check warehouse directory skew table data. Table was created with *skewed by(category) on ('Fruit','Vegetable') clause.* {color:#de350b}But, t{color}{color:#de350b}here is no directory created for category=fruit.* Data related to category fruit are present in HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME directory itself. {color}
{color:#172b4d}Internally skewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). {color}{color:#172b4d}Thus, directory for fruit is not created.{color}
{code:java}
kvenureddy@192 mydb.db % cd skew
kvenureddy@192 skew % ls
kvenureddy@192 skew % ls
HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME category=vegetable
kvenureddy@192 skew % pwd
/tmp/warehouse/external/mydb.db/skew
kvenureddy@192 skew % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % ls
000000_0
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0
1,apple,Fruit
2,banana,Fruit
4,cherry,Fruit
6,mango,Fruit
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../
kvenureddy@192 skew % cd category=vegetable
kvenureddy@192 category=vegetable % ls
000000_0
kvenureddy@192 category=vegetable % cat 000000_0
3,carrot,vegetable
5,potato,vegetable
7,tomato,vegetable
kvenureddy@192 category=vegetable %
{code}
was:
*[Description]*
Skewed table with case sensitive data on skewed column do not work as expected. S{color:#172b4d}kewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). Otherwise it doesn't work.{color}
*[Steps to reproduce]*
1. Create stage table, load some data into stage table, create table with a skewed column and load data into that table from the stage table. data file is attached below.
{code:java}
0: jdbc:hive2://localhost:10000> create database mydb;
0: jdbc:hive2://localhost:10000> use mydb;
{code}
{code:java}
0: jdbc:hive2://localhost:10000> create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;{code}
{code:java}
0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table stage;{code}
{code:java}
0: jdbc:hive2://localhost:10000> select * from stage;
+------------+-------------+-----------------+
| stage.num | stage.name | stage.category |
+------------+-------------+-----------------+
| 1 | apple | Fruit |
| 2 | banana | Fruit |
| 3 | carrot | vegetable |
| 4 | cherry | Fruit |
| 5 | potato | vegetable |
| 6 | mango | Fruit |
| 7 | tomato | vegetable |
+------------+-------------+-----------------+
7 rows selected (2.688 seconds)
{code}
{code:java}
0: jdbc:hive2://localhost:10000> create table skew(num int, name string, category string) skewed by(category) on ('Fruit','Vegetable') stored as directories row format delimited fields terminated by ',' stored as textfile;{code}
{code:java}
0: jdbc:hive2://localhost:10000> insert into skew select * from stage;{code}
2. Check warehouse directory skew table data. Table was created with {*}skewed by(category) on ('Fruit','Vegetable') clause. {color:#de350b}But, t{color}{*}{color:#de350b}*{color:#de350b}h{color}ere is no directory created for category=fruit.* {color}{color:#172b4d}Data related to category fruit are present in HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME directory itself. {color}
{color:#172b4d}Internally skewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). {color}{color:#172b4d}Thus, directory for fruit is not created.{color}
{code:java}
kvenureddy@192 mydb.db % cd skew
kvenureddy@192 skew % ls
kvenureddy@192 skew % ls
HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME category=vegetable
kvenureddy@192 skew % pwd
/tmp/warehouse/external/mydb.db/skew
kvenureddy@192 skew % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % ls
000000_0
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0
1,apple,Fruit
2,banana,Fruit
4,cherry,Fruit
6,mango,Fruit
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../
kvenureddy@192 skew % cd category=vegetable
kvenureddy@192 category=vegetable % ls
000000_0
kvenureddy@192 category=vegetable % cat 000000_0
3,carrot,vegetable
5,potato,vegetable
7,tomato,vegetable
kvenureddy@192 category=vegetable %
{code}
> Skewed column table load do not work as expected if the user data for skewed column is not in lowercase.
> --------------------------------------------------------------------------------------------------------
>
> Key: HIVE-26861
> URL: https://issues.apache.org/jira/browse/HIVE-26861
> Project: Hive
> Issue Type: Bug
> Reporter: Venugopal Reddy K
> Priority: Major
> Attachments: data
>
>
> *[Description]*
> Skewed table with case sensitive data on skewed column do not work as expected. S{color:#172b4d}kewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). Otherwise it doesn't work.{color}
> *[Steps to reproduce]*
> 1. Create stage table, load some data into stage table, create table with a skewed column and load data into that table from the stage table. data file is attached below.
> {code:java}
> 0: jdbc:hive2://localhost:10000> create database mydb;
> 0: jdbc:hive2://localhost:10000> use mydb;
> {code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;{code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table stage;{code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> select * from stage;
> +------------+-------------+-----------------+
> | stage.num | stage.name | stage.category |
> +------------+-------------+-----------------+
> | 1 | apple | Fruit |
> | 2 | banana | Fruit |
> | 3 | carrot | vegetable |
> | 4 | cherry | Fruit |
> | 5 | potato | vegetable |
> | 6 | mango | Fruit |
> | 7 | tomato | vegetable |
> +------------+-------------+-----------------+
> 7 rows selected (2.688 seconds)
> {code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> create table skew(num int, name string, category string) skewed by(category) on ('Fruit','Vegetable') stored as directories row format delimited fields terminated by ',' stored as textfile;{code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> insert into skew select * from stage;{code}
>
> 2. Check warehouse directory skew table data. Table was created with *skewed by(category) on ('Fruit','Vegetable') clause.* {color:#de350b}But, t{color}{color:#de350b}here is no directory created for category=fruit.* Data related to category fruit are present in HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME directory itself. {color}
> {color:#172b4d}Internally skewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). {color}{color:#172b4d}Thus, directory for fruit is not created.{color}
> {code:java}
> kvenureddy@192 mydb.db % cd skew
> kvenureddy@192 skew % ls
> kvenureddy@192 skew % ls
> HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME category=vegetable
> kvenureddy@192 skew % pwd
> /tmp/warehouse/external/mydb.db/skew
> kvenureddy@192 skew % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % ls
> 000000_0
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0
> 1,apple,Fruit
> 2,banana,Fruit
> 4,cherry,Fruit
> 6,mango,Fruit
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../
> kvenureddy@192 skew % cd category=vegetable
> kvenureddy@192 category=vegetable % ls
> 000000_0
> kvenureddy@192 category=vegetable % cat 000000_0
> 3,carrot,vegetable
> 5,potato,vegetable
> 7,tomato,vegetable
> kvenureddy@192 category=vegetable %
> {code}
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)