You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by zhangliyun <ke...@126.com> on 2019/08/23 23:02:24 UTC

Can you help see the dynamic partition insert problem

  Hi 


when i use Hive dynamic partition feature , I found that it is very easy to meet exceed max created files count exception ( i have set the hive.exec.max.created.files as 100K but still fail)




I have generated a unpartitioned table 'bsl12.email_edge_lyh_mth1' which contains 584M records and will insert it to a  partitioned table "bsl12.email_edge_lyh_partitioned2"


 set hive.exec.dynamic.partition=true;
 set hive.exec.max.dynamic.partitions=500;
 SET hive.exec.max.dynamic.partitions.pernode=500;
 set hive.exec.dynamic.partition.mode=nonstrict;
 SET hive.exec.max.created.files=10000;


--select count(*) from bsl12.email_edge_lyh_mth1; --584652128
INSERT OVERWRITE TABLE bsl12.email_edge_lyh_partitioned2 PARTITION (link_crtd_date) SELECT * FROM bsl12.email_edge_lyh_mth1;




I guess that  when dynamic partition insert , it will first calculate the partitions the new table contains,  meanwhile generate temporary files ,  in the final step , it will move temporary files to 
the specified partition location.   Here my problem is generating too much temporary files so cause exceed max created files count exception.  So what principal Hive use for generating temporary files?  will generate
a temporary file for every record ? so it is will generate temporary files which number is equals to the number of  unpartitioned table.  Can you give me some suggestion about it?  I have tried both Hive on Tez and Hive on MapReduce, both fail.








Kelly Zhang