You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chaoyu Tang (JIRA)" <ji...@apache.org> on 2017/05/03 02:32:04 UTC

[jira] [Created] (HIVE-16572) Rename a partition should not drop its column stats

Chaoyu Tang created HIVE-16572:
----------------------------------

             Summary: Rename a partition should not drop its column stats
                 Key: HIVE-16572
                 URL: https://issues.apache.org/jira/browse/HIVE-16572
             Project: Hive
          Issue Type: Bug
          Components: Statistics
            Reporter: Chaoyu Tang
            Assignee: Chaoyu Tang


The column stats for the table sample_pt partition (dummy=1) is as following:
{code}
hive> describe formatted sample_pt partition (dummy=1) code;
OK
# col_name            	data_type           	min                 	max                 	num_nulls           	distinct_count      	avg_col_len         	max_col_len         	num_trues           	num_falses          	comment             
	 	 	 	 	 	 	 	 	 	 
code                	string              	                    	                    	0                   	303                 	6.985               	7                   	                    	                    	from deserializer   
Time taken: 0.259 seconds, Fetched: 3 row(s)
{code}
But when this partition is renamed, say
alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
The COLUMN_STATS in partition description are true, but column stats are actually all deleted.
{code}
hive> describe formatted sample_pt partition (dummy=11);
OK
# col_name            	data_type           	comment             
	 	 
code                	string              	                    
description         	string              	                    
salary              	int                 	                    
total_emp           	int                 	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
dummy               	int                 	                    
	 	 
# Detailed Partition Information	 	 
Partition Value:    	[11]                	 
Database:           	default             	 
Table:              	sample_pt           	 
CreateTime:         	Thu Mar 30 23:03:59 EDT 2017	 
LastAccessTime:     	UNKNOWN             	 
Location:           	file:/user/hive/warehouse/apache/sample_pt/dummy=11	 
Partition Parameters:	 	 
	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
	numFiles            	1                   
	numRows             	200                 
	rawDataSize         	10228               
	totalSize           	10428               
	transient_lastDdlTime	1490929439          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	serialization.format	1                   
Time taken: 6.783 seconds, Fetched: 37 row(s)

===
hive> describe formatted sample_pt partition (dummy=11) code;
OK
# col_name            	data_type           	comment             	 	 	 	 	 	 	 	 
	 	 	 	 	 	 	 	 	 	 
code                	string              	from deserializer   	 	 	 	 	 	 	 	 
Time taken: 9.429 seconds, Fetched: 3 row(s)
{code}
The column stats should not be drop when a partition is renamed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)