You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chaoyu Tang (JIRA)" <ji...@apache.org> on 2017/05/03 02:32:04 UTC
[jira] [Created] (HIVE-16572) Rename a partition should not drop
its column stats
Chaoyu Tang created HIVE-16572:
----------------------------------
Summary: Rename a partition should not drop its column stats
Key: HIVE-16572
URL: https://issues.apache.org/jira/browse/HIVE-16572
Project: Hive
Issue Type: Bug
Components: Statistics
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
The column stats for the table sample_pt partition (dummy=1) is as following:
{code}
hive> describe formatted sample_pt partition (dummy=1) code;
OK
# col_name data_type min max num_nulls distinct_count avg_col_len max_col_len num_trues num_falses comment
code string 0 303 6.985 7 from deserializer
Time taken: 0.259 seconds, Fetched: 3 row(s)
{code}
But when this partition is renamed, say
alter table sample_pt partition (dummy=1) rename to partition (dummy=11);
The COLUMN_STATS in partition description are true, but column stats are actually all deleted.
{code}
hive> describe formatted sample_pt partition (dummy=11);
OK
# col_name data_type comment
code string
description string
salary int
total_emp int
# Partition Information
# col_name data_type comment
dummy int
# Detailed Partition Information
Partition Value: [11]
Database: default
Table: sample_pt
CreateTime: Thu Mar 30 23:03:59 EDT 2017
LastAccessTime: UNKNOWN
Location: file:/user/hive/warehouse/apache/sample_pt/dummy=11
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
numFiles 1
numRows 200
rawDataSize 10228
totalSize 10428
transient_lastDdlTime 1490929439
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 6.783 seconds, Fetched: 37 row(s)
===
hive> describe formatted sample_pt partition (dummy=11) code;
OK
# col_name data_type comment
code string from deserializer
Time taken: 9.429 seconds, Fetched: 3 row(s)
{code}
The column stats should not be drop when a partition is renamed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)