You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Edward Capriolo <ed...@gmail.com> on 2010/07/26 16:49:00 UTC

Dynamic partitions possibly ignoring hive.exec.max.dynamic.partitions

We are working with a trunk of hive hive-0.6.0-957988.

Dynamic partitions are working for us in testing, we tested with about
100 dynamic partitions.

For our production run we have about 1000 (offer_id)'s.

HQL="set hive.exec.dynamic.partition.mode=nonstrict;
     set hive.exec.max.dynamic.partition.pernode=200000;
     set hive.exec.max.dynamic.partitions=20000;
        insert overwrite table bco PARTITION (dt=20100721,offer)
                SELECT * FROM (
                   SELECT browser_id, country, dma_code,
                          offer_id from baseline_raw where
gen_date=20100721 and blt='bco' DISTRIBUTE BY offer_id
                   ) X;
"

2010-07-22 22:36:21,499 Stage-1 map = 100%,  reduce = 50%[Fatal Error]
Operator FS_6 (id=6): Number of dynamic partitions exceeded
hive.exec.max.dynamic.partitions.pernode.. Killing the job.

As you can see we always die in the reducer with the same exception.
Is it possible that hive.exec.max.dynamic.partitions is not being used
in the reducer code?

Thanks,
Edward

Re: Dynamic partitions possibly ignoring hive.exec.max.dynamic.partitions

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Jul 26, 2010 at 12:08 PM, Ning Zhang <nz...@facebook.com> wrote:
> The fatal error was thrown because the # of dp exceeded the the hive.exec.max.dynamic.partitions.pernode (100). There is a typo (missed 's' in 'partitions') in the tutorial (sorry about that). If you correct the typo in your query, it should work.
>
> One thing to watch out is that if you increase the parameter to a large value, it could cause unexpected HDFS errors. The reason is that for each dynamic partitions, we need to open at least 1 file. As long as a file is open, there will be one connection open to one of the HDFS data nodes. There is a limit of the max # of simultaneous connections to any data node (configurable, but the default is 256). So you might want to increase that HDFS parameter as well.
>
> Ning
>
> On Jul 26, 2010, at 7:49 AM, Edward Capriolo wrote:
>
>> We are working with a trunk of hive hive-0.6.0-957988.
>>
>> Dynamic partitions are working for us in testing, we tested with about
>> 100 dynamic partitions.
>>
>> For our production run we have about 1000 (offer_id)'s.
>>
>> HQL="set hive.exec.dynamic.partition.mode=nonstrict;
>>     set hive.exec.max.dynamic.partition.pernode=200000;
>>     set hive.exec.max.dynamic.partitions=20000;
>>        insert overwrite table bco PARTITION (dt=20100721,offer)
>>                SELECT * FROM (
>>                   SELECT browser_id, country, dma_code,
>>                          offer_id from baseline_raw where
>> gen_date=20100721 and blt='bco' DISTRIBUTE BY offer_id
>>                   ) X;
>> "
>>
>> 2010-07-22 22:36:21,499 Stage-1 map = 100%,  reduce = 50%[Fatal Error]
>> Operator FS_6 (id=6): Number of dynamic partitions exceeded
>> hive.exec.max.dynamic.partitions.pernode.. Killing the job.
>>
>> As you can see we always die in the reducer with the same exception.
>> Is it possible that hive.exec.max.dynamic.partitions is not being used
>> in the reducer code?
>>
>> Thanks,
>> Edward
>
>

Ning,

Thank you for the advice. You were right on both counts. Firstly we
did not pluralize the variable names correctly. Secondly, once we did
get the variable name correct our datanodes began "xceiverCount 258
exceeds the limit of concurrent xcievers 256"

For people who end up following in my "large number of dynamic
partition" footsteps the property you need to set on all the
data-nodes is:

     <property>
                <name>dfs.datanode.max.xcievers</name>
                <value>4096</value>
        </property>


Thanks again, Ning!
Dynamic Partitions is a very very exciting feature!

Edward

Re: Dynamic partitions possibly ignoring hive.exec.max.dynamic.partitions

Posted by Ning Zhang <nz...@facebook.com>.
The fatal error was thrown because the # of dp exceeded the the hive.exec.max.dynamic.partitions.pernode (100). There is a typo (missed 's' in 'partitions') in the tutorial (sorry about that). If you correct the typo in your query, it should work. 

One thing to watch out is that if you increase the parameter to a large value, it could cause unexpected HDFS errors. The reason is that for each dynamic partitions, we need to open at least 1 file. As long as a file is open, there will be one connection open to one of the HDFS data nodes. There is a limit of the max # of simultaneous connections to any data node (configurable, but the default is 256). So you might want to increase that HDFS parameter as well. 

Ning

On Jul 26, 2010, at 7:49 AM, Edward Capriolo wrote:

> We are working with a trunk of hive hive-0.6.0-957988.
> 
> Dynamic partitions are working for us in testing, we tested with about
> 100 dynamic partitions.
> 
> For our production run we have about 1000 (offer_id)'s.
> 
> HQL="set hive.exec.dynamic.partition.mode=nonstrict;
>     set hive.exec.max.dynamic.partition.pernode=200000;
>     set hive.exec.max.dynamic.partitions=20000;
>        insert overwrite table bco PARTITION (dt=20100721,offer)
>                SELECT * FROM (
>                   SELECT browser_id, country, dma_code,
>                          offer_id from baseline_raw where
> gen_date=20100721 and blt='bco' DISTRIBUTE BY offer_id
>                   ) X;
> "
> 
> 2010-07-22 22:36:21,499 Stage-1 map = 100%,  reduce = 50%[Fatal Error]
> Operator FS_6 (id=6): Number of dynamic partitions exceeded
> hive.exec.max.dynamic.partitions.pernode.. Killing the job.
> 
> As you can see we always die in the reducer with the same exception.
> Is it possible that hive.exec.max.dynamic.partitions is not being used
> in the reducer code?
> 
> Thanks,
> Edward