You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Yu-Wen Lai (Jira)" <ji...@apache.org> on 2020/07/02 00:47:00 UTC

[jira] [Created] (HIVE-23796) Multiple insert overwrite into a partitioned table doesn't gather column statistics for all partitions

Yu-Wen Lai created HIVE-23796:
---------------------------------

             Summary: Multiple insert overwrite into a partitioned table doesn't gather column statistics for all partitions
                 Key: HIVE-23796
                 URL: https://issues.apache.org/jira/browse/HIVE-23796
             Project: Hive
          Issue Type: Bug
          Components: Statistics
         Environment: Hive 3.1
            Reporter: Yu-Wen Lai


Here I used a simplified sample to illustrate the issue. 
When there are multiple insert overwrite clauses, only the partitions related to the last clause will have column statistics. In the sample here, only the partition (ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__) has column statistics, which is related to the last insert clause.

With "hive.stats.column.autogather", by default, is true, we expect that all the partitions' column statistics should be calculated.
{code:sql}
create table web_sales
(
    ws_sold_time_sk           bigint,
    ws_ship_date_sk           bigint,
    ws_item_sk                bigint
)
partitioned by (ws_sold_date_sk           bigint)
stored as orc;
from anotherdb.web_sales ws
insert overwrite table web_sales partition (ws_sold_date_sk)
select
        ws.ws_sold_time_sk,
        ws.ws_ship_date_sk,
        ws.ws_item_sk,
        ws.ws_sold_date_sk
        where ws.ws_sold_date_sk is not null
insert overwrite table web_sales partition (ws_sold_date_sk)
select
        ws.ws_sold_time_sk,
        ws.ws_ship_date_sk,
        ws.ws_item_sk,
        ws.ws_sold_date_sk
        where ws.ws_sold_date_sk is null
        sort by ws.ws_sold_date_sk
;

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)