You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Suresh Krishnappa <su...@gmail.com> on 2013/03/07 15:31:19 UTC

HIVE issues when using large number of partitions

Hi All,
I have a hadoop cluster with data present in large number of directories (
> 10,000)
To run HIVE queries over this data I created an external partitioned table
and pointed each directory as a partition to the external table using
'alter table add partition' command.
Is there a better way to create a HIVE external table over large number of
directories?

Also I am facing the following issues due to the large number of partitions
1) The DDL operations of creating the table and adding partitions to the
table takes a very long time. Takes about an hour to add around 10,000
partitions
2) Getting 'out of memory' java exception while adding partitions > 50000
3) Sometimes getting 'out of memory' java exception for select queries for
partitions > 10000

What is the recommended limit to the number of partitions that we can
create with an HIVE table?
Are there any configuration settings in hive/hadoop to support large number
of partitions?

I am using HIVE 0.10.0. I re-ran the tests by replacing derby with
postgresql as metastore and still faced similar issues.

Would appreciate any inputs on this

Thanks
Suresh

Re: HIVE issues when using large number of partitions

Posted by Edward Capriolo <ed...@gmail.com>.

2) Getting 'out of memory' java exception while adding partitions > 50000
3) Sometimes getting 'out of memory' java exception for select queries for
partitions > 10000

So hive/hadoop have to "plan" the job. Planning involves building all the
partitions into a list (in memory) of the client. It also involves hadoop
jobtracker calculating all the split information. With two many partitions
you push the limits of your client and the job tracker, which can not be
distributed. You can up your client heap and job tracker memory to a point,
but this is more of an anti-pattern.

Do not plan on having must success with a task spanning 20K + partitions.


On Sat, Mar 9, 2013 at 11:49 AM, Ramki Palle <ra...@gmail.com> wrote:

> Check this for your first question:
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>
> Please post if you find any solution for your 2nd and 3rd questions.
>
> Regards,
> Ramki.
>
>
> On Thu, Mar 7, 2013 at 8:01 PM, Suresh Krishnappa <
> suresh.krishnappa@gmail.com> wrote:
>
>> Hi All,
>> I have a hadoop cluster with data present in large number of directories
>> ( > 10,000)
>> To run HIVE queries over this data I created an external partitioned
>> table and pointed each directory as a partition to the external table using
>> 'alter table add partition' command.
>> Is there a better way to create a HIVE external table over large number
>> of directories?
>>
>> Also I am facing the following issues due to the large number of
>> partitions
>> 1) The DDL operations of creating the table and adding partitions to the
>> table takes a very long time. Takes about an hour to add around 10,000
>> partitions
>> 2) Getting 'out of memory' java exception while adding partitions > 50000
>> 3) Sometimes getting 'out of memory' java exception for select queries
>> for partitions > 10000
>>
>> What is the recommended limit to the number of partitions that we can
>> create with an HIVE table?
>> Are there any configuration settings in hive/hadoop to support large
>> number of partitions?
>>
>> I am using HIVE 0.10.0. I re-ran the tests by replacing derby with
>> postgresql as metastore and still faced similar issues.
>>
>> Would appreciate any inputs on this
>>
>> Thanks
>>  Suresh
>>
>>
>

Re: HIVE issues when using large number of partitions

Posted by Ramki Palle <ra...@gmail.com>.

Check this for your first question:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions

Please post if you find any solution for your 2nd and 3rd questions.

Regards,
Ramki.


On Thu, Mar 7, 2013 at 8:01 PM, Suresh Krishnappa <
suresh.krishnappa@gmail.com> wrote:

> Hi All,
> I have a hadoop cluster with data present in large number of directories (
> > 10,000)
> To run HIVE queries over this data I created an external partitioned table
> and pointed each directory as a partition to the external table using
> 'alter table add partition' command.
> Is there a better way to create a HIVE external table over large number of
> directories?
>
> Also I am facing the following issues due to the large number of partitions
> 1) The DDL operations of creating the table and adding partitions to the
> table takes a very long time. Takes about an hour to add around 10,000
> partitions
> 2) Getting 'out of memory' java exception while adding partitions > 50000
> 3) Sometimes getting 'out of memory' java exception for select queries for
> partitions > 10000
>
> What is the recommended limit to the number of partitions that we can
> create with an HIVE table?
> Are there any configuration settings in hive/hadoop to support large
> number of partitions?
>
> I am using HIVE 0.10.0. I re-ran the tests by replacing derby with
> postgresql as metastore and still faced similar issues.
>
> Would appreciate any inputs on this
>
> Thanks
> Suresh
>
>