You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Yu-Wen Lai (Jira)" <ji...@apache.org> on 2020/07/02 00:47:00 UTC
[jira] [Created] (HIVE-23796) Multiple insert overwrite into a
partitioned table doesn't gather column statistics for all partitions
Yu-Wen Lai created HIVE-23796:
---------------------------------
Summary: Multiple insert overwrite into a partitioned table doesn't gather column statistics for all partitions
Key: HIVE-23796
URL: https://issues.apache.org/jira/browse/HIVE-23796
Project: Hive
Issue Type: Bug
Components: Statistics
Environment: Hive 3.1
Reporter: Yu-Wen Lai
Here I used a simplified sample to illustrate the issue.
When there are multiple insert overwrite clauses, only the partitions related to the last clause will have column statistics. In the sample here, only the partition (ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__) has column statistics, which is related to the last insert clause.
With "hive.stats.column.autogather", by default, is true, we expect that all the partitions' column statistics should be calculated.
{code:sql}
create table web_sales
(
ws_sold_time_sk bigint,
ws_ship_date_sk bigint,
ws_item_sk bigint
)
partitioned by (ws_sold_date_sk bigint)
stored as orc;
from anotherdb.web_sales ws
insert overwrite table web_sales partition (ws_sold_date_sk)
select
ws.ws_sold_time_sk,
ws.ws_ship_date_sk,
ws.ws_item_sk,
ws.ws_sold_date_sk
where ws.ws_sold_date_sk is not null
insert overwrite table web_sales partition (ws_sold_date_sk)
select
ws.ws_sold_time_sk,
ws.ws_ship_date_sk,
ws.ws_item_sk,
ws.ws_sold_date_sk
where ws.ws_sold_date_sk is null
sort by ws.ws_sold_date_sk
;
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)