You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by zhangliyun <ke...@126.com> on 2019/08/23 23:02:24 UTC
Can you help see the dynamic partition insert problem
Hi
when i use Hive dynamic partition feature , I found that it is very easy to meet exceed max created files count exception ( i have set the hive.exec.max.created.files as 100K but still fail)
I have generated a unpartitioned table 'bsl12.email_edge_lyh_mth1' which contains 584M records and will insert it to a partitioned table "bsl12.email_edge_lyh_partitioned2"
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions=500;
SET hive.exec.max.dynamic.partitions.pernode=500;
set hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.created.files=10000;
--select count(*) from bsl12.email_edge_lyh_mth1; --584652128
INSERT OVERWRITE TABLE bsl12.email_edge_lyh_partitioned2 PARTITION (link_crtd_date) SELECT * FROM bsl12.email_edge_lyh_mth1;
I guess that when dynamic partition insert , it will first calculate the partitions the new table contains, meanwhile generate temporary files , in the final step , it will move temporary files to
the specified partition location. Here my problem is generating too much temporary files so cause exceed max created files count exception. So what principal Hive use for generating temporary files? will generate
a temporary file for every record ? so it is will generate temporary files which number is equals to the number of unpartitioned table. Can you give me some suggestion about it? I have tried both Hive on Tez and Hive on MapReduce, both fail.
Kelly Zhang