You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Venugopal Reddy K (Jira)" <ji...@apache.org> on 2022/12/15 11:25:00 UTC

[jira] [Updated] (HIVE-26861) Skewed column table load do not work as expected if the user data for skewed column is not in lowercase.

     [ https://issues.apache.org/jira/browse/HIVE-26861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Venugopal Reddy K updated HIVE-26861:
-------------------------------------
    Description: 
*[Description]*

Skewed table with case sensitive data on skewed column do not work as expected. S{color:#172b4d}kewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). Otherwise it doesn't work.{color}

*[Steps to reproduce]* 

1. Create stage table, load some data into stage table, create table with a skewed column and load data into that table from the stage table. data file is attached below.
{code:java}
0: jdbc:hive2://localhost:10000> create database mydb;
0: jdbc:hive2://localhost:10000> use mydb;
{code}
{code:java}
0: jdbc:hive2://localhost:10000> create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;{code}
{code:java}
0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table stage;{code}
{code:java}
0: jdbc:hive2://localhost:10000> select * from stage;
+------------+-------------+-----------------+
| stage.num  | stage.name  | stage.category  |
+------------+-------------+-----------------+
| 1          | apple       | Fruit           |
| 2          | banana      | Fruit           |
| 3          | carrot      | vegetable       |
| 4          | cherry      | Fruit           |
| 5          | potato      | vegetable       |
| 6          | mango       | Fruit           |
| 7          | tomato      | vegetable       |
+------------+-------------+-----------------+
7 rows selected (2.688 seconds)
{code}
{code:java}
0: jdbc:hive2://localhost:10000> create table skew(num int, name string, category string) skewed by(category) on ('Fruit','Vegetable') stored as directories row format delimited fields terminated by ',' stored as textfile;{code}
{code:java}
0: jdbc:hive2://localhost:10000> insert into skew select * from stage;{code}
 

2. Check warehouse directory skew table data. Table was created with *skewed by(category) on ('Fruit','Vegetable') clause.* {color:#de350b}But, t{color}{color:#de350b}here is no directory created for category=fruit.* Data related to category fruit are present in HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME directory itself. {color}

{color:#172b4d}Internally skewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). {color}{color:#172b4d}Thus, directory for fruit is not created.{color}
{code:java}
kvenureddy@192 mydb.db % cd skew 
kvenureddy@192 skew % ls
kvenureddy@192 skew % ls
HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME category=vegetable
kvenureddy@192 skew % pwd
/tmp/warehouse/external/mydb.db/skew
kvenureddy@192 skew % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % ls
000000_0
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
1,apple,Fruit
2,banana,Fruit
4,cherry,Fruit
6,mango,Fruit
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../
kvenureddy@192 skew % cd category=vegetable 
kvenureddy@192 category=vegetable % ls
000000_0
kvenureddy@192 category=vegetable % cat 000000_0 
3,carrot,vegetable
5,potato,vegetable
7,tomato,vegetable
kvenureddy@192 category=vegetable % 
{code}
 

 

 

 

  was:
*[Description]*

Skewed table with case sensitive data on skewed column do not work as expected. S{color:#172b4d}kewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). Otherwise it doesn't work.{color}

*[Steps to reproduce]* 

1. Create stage table, load some data into stage table, create table with a skewed column and load data into that table from the stage table. data file is attached below.

 
{code:java}
0: jdbc:hive2://localhost:10000> create database mydb;
0: jdbc:hive2://localhost:10000> use mydb;
{code}
 

 
{code:java}
0: jdbc:hive2://localhost:10000> create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;{code}
 

 
{code:java}
0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table stage;{code}
 

 
{code:java}
0: jdbc:hive2://localhost:10000> select * from stage;
+------------+-------------+-----------------+
| stage.num  | stage.name  | stage.category  |
+------------+-------------+-----------------+
| 1          | apple       | Fruit           |
| 2          | banana      | Fruit           |
| 3          | carrot      | vegetable       |
| 4          | cherry      | Fruit           |
| 5          | potato      | vegetable       |
| 6          | mango       | Fruit           |
| 7          | tomato      | vegetable       |
+------------+-------------+-----------------+
7 rows selected (2.688 seconds)
{code}
 
{code:java}
0: jdbc:hive2://localhost:10000> create table skew(num int, name string, category string) skewed by(category) on ('Fruit','Vegetable') stored as directories row format delimited fields terminated by ',' stored as textfile;{code}
 

 
{code:java}
0: jdbc:hive2://localhost:10000> insert into skew select * from stage;{code}
 

2. Check warehouse directory skew table data. Table was created with {*}skewed by(category) on ('Fruit','Vegetable') clause. {color:#de350b}But, t{color}{*}{color:#de350b}*{color:#de350b}h{color}ere is no directory created for category=fruit.* {color}{color:#172b4d}Data related to category fruit are present in HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME directory itself. {color}

{color:#172b4d}Internally skewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). {color}{color:#172b4d}Thus, directory for fruit is not created.{color}

 
{code:java}
kvenureddy@192 mydb.db % cd skew 
kvenureddy@192 skew % ls
kvenureddy@192 skew % ls
HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME category=vegetable
kvenureddy@192 skew % pwd
/tmp/warehouse/external/mydb.db/skew
kvenureddy@192 skew % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % ls
000000_0
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
1,apple,Fruit
2,banana,Fruit
4,cherry,Fruit
6,mango,Fruit
kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../
kvenureddy@192 skew % cd category=vegetable 
kvenureddy@192 category=vegetable % ls
000000_0
kvenureddy@192 category=vegetable % cat 000000_0 
3,carrot,vegetable
5,potato,vegetable
7,tomato,vegetable
kvenureddy@192 category=vegetable % 
{code}
 

 

 

 


> Skewed column table load do not work as expected if the user data for skewed column is not in lowercase.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26861
>                 URL: https://issues.apache.org/jira/browse/HIVE-26861
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Venugopal Reddy K
>            Priority: Major
>         Attachments: data
>
>
> *[Description]*
> Skewed table with case sensitive data on skewed column do not work as expected. S{color:#172b4d}kewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). Otherwise it doesn't work.{color}
> *[Steps to reproduce]* 
> 1. Create stage table, load some data into stage table, create table with a skewed column and load data into that table from the stage table. data file is attached below.
> {code:java}
> 0: jdbc:hive2://localhost:10000> create database mydb;
> 0: jdbc:hive2://localhost:10000> use mydb;
> {code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> create table stage(num int, name string, category string) row format delimited fields terminated by ',' stored as textfile;{code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table stage;{code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> select * from stage;
> +------------+-------------+-----------------+
> | stage.num  | stage.name  | stage.category  |
> +------------+-------------+-----------------+
> | 1          | apple       | Fruit           |
> | 2          | banana      | Fruit           |
> | 3          | carrot      | vegetable       |
> | 4          | cherry      | Fruit           |
> | 5          | potato      | vegetable       |
> | 6          | mango       | Fruit           |
> | 7          | tomato      | vegetable       |
> +------------+-------------+-----------------+
> 7 rows selected (2.688 seconds)
> {code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> create table skew(num int, name string, category string) skewed by(category) on ('Fruit','Vegetable') stored as directories row format delimited fields terminated by ',' stored as textfile;{code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> insert into skew select * from stage;{code}
>  
> 2. Check warehouse directory skew table data. Table was created with *skewed by(category) on ('Fruit','Vegetable') clause.* {color:#de350b}But, t{color}{color:#de350b}here is no directory created for category=fruit.* Data related to category fruit are present in HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME directory itself. {color}
> {color:#172b4d}Internally skewed values are stored in lower case. And it is expecting user data also to be in same lower case(i.e.,does case sensitive comparison). {color}{color:#172b4d}Thus, directory for fruit is not created.{color}
> {code:java}
> kvenureddy@192 mydb.db % cd skew 
> kvenureddy@192 skew % ls
> kvenureddy@192 skew % ls
> HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME category=vegetable
> kvenureddy@192 skew % pwd
> /tmp/warehouse/external/mydb.db/skew
> kvenureddy@192 skew % cd HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME 
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % ls
> 000000_0
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cat 000000_0 
> 1,apple,Fruit
> 2,banana,Fruit
> 4,cherry,Fruit
> 6,mango,Fruit
> kvenureddy@192 HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME % cd ../
> kvenureddy@192 skew % cd category=vegetable 
> kvenureddy@192 category=vegetable % ls
> 000000_0
> kvenureddy@192 category=vegetable % cat 000000_0 
> 3,carrot,vegetable
> 5,potato,vegetable
> 7,tomato,vegetable
> kvenureddy@192 category=vegetable % 
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)