You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Denys Kuzmenko (Jira)" <ji...@apache.org> on 2023/09/13 14:26:00 UTC

[jira] [Resolved] (HIVE-27309) Large number of partitions and small files causes OOM in query coordinator

     [ https://issues.apache.org/jira/browse/HIVE-27309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Denys Kuzmenko resolved HIVE-27309.
-----------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

> Large number of partitions and small files causes OOM in query coordinator
> --------------------------------------------------------------------------
>
>                 Key: HIVE-27309
>                 URL: https://issues.apache.org/jira/browse/HIVE-27309
>             Project: Hive
>          Issue Type: Improvement
>          Components: Iceberg integration
>            Reporter: Rajesh Balamohan
>            Assignee: Dmitriy Fingerman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
>  When large number of nested partitions (with small files) are read, AM bails out with OOM.
> {noformat}
> CREATE EXTERNAL TABLE `store_sales_delete_6`(
>   `ss_sold_time_sk` int,
>   `ss_item_sk` int,
>   `ss_customer_sk` int,
>   `ss_cdemo_sk` int,
>   `ss_hdemo_sk` int,
>   `ss_addr_sk` int,
>   `ss_store_sk` int,
>   `ss_promo_sk` int,
>   `ss_ticket_number` bigint,
>   `ss_quantity` int,
>   `ss_wholesale_cost` decimal(7,2),
>   `ss_list_price` decimal(7,2),
>   `ss_sales_price` decimal(7,2),
>   `ss_ext_discount_amt` decimal(7,2),
>   `ss_ext_sales_price` decimal(7,2),
>   `ss_ext_wholesale_cost` decimal(7,2),
>   `ss_ext_list_price` decimal(7,2),
>   `ss_ext_tax` decimal(7,2),
>   `ss_coupon_amt` decimal(7,2),
>   `ss_net_paid` decimal(7,2),
>   `ss_net_paid_inc_tax` decimal(7,2),
>   `ss_net_profit` decimal(7,2),
>   `ss_sold_date_sk` int)
> PARTITIONED BY SPEC (
> ss_store_sk, ss_promo_sk, ss_sold_date_sk) STORED by iceberg LOCATION 's3a://blah/blah/tablespace/external/hive/blah.db/store_sales_delete_6';
> alter table store_sales_delete_6 set tblproperties('format'='iceberg/parquet');
> alter table store_sales_delete_6 set tblproperties('format-version'='2');insert into store_sales_delete_6 select * from tpcds_1000_update.ssv limit 100000;;
> select count(*) from store_sales_delete_6;
> {noformat}
> Now, select count query throws OOM in query AM.  This query generates 100,000 splits which are grouped together into 41 splits. But streaming this and sending as events throws OOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)