You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@impala.apache.org by Peter Horvath <pe...@gmail.com> on 2018/02/27 09:58:10 UTC

Impala State Store and Catalog Service resource requirements

Dear All,

I am in the process of setting up a Hadoop cluster including Impala v2.10.0.

I would like to configure Impala State Store and Catalog Service
appropriately (maybe even on a dedicated host), however I cannot really
find any documentation on the resource needs of these services or any other
best practices regarding the sizing of the host machine.

For example I do not know how much memory or disk space should I reserve
for these services: based on my understanding Impala State Store and
Catalog Service should be of relatively small footprint compared to other
big data components, but I am not sure I would be able make a right
estimation on my own.

Could someone please point me into the right direction?

Thank you,
Peter

Re: Impala State Store and Catalog Service resource requirements

Posted by Peter Horvath <pe...@gmail.com>.

Hi Laszlo,

Thank you for you inputs, this was indeed absolutely helpful.

The formula for the Catalog Service is precisely what I have been looking
for:

Catalog memory usage
• Metadata cache heap memory usage can be calculated by
• num of tables * 5KB + num of partitions * 2KB + num of files * 750B + num
of file blocks * 300B + sum(incremental col stats per table)
• Incremental stats
•For each table, num columns * num partitions * 400B

Have you by any chance seen recommendations regarding the hardware
requirements of the statestore?
Is my understanding correct that catalog service and statestore are almost
like shared in-memory caches, where the focus is on memory (and not CPU)?

I would like to get a good understanding of the relative weight of these
components compared to the core impalad daemon.

Once again, thank you very much for you help.

Cheers,
Peter












On Tue, Feb 27, 2018 at 5:44 PM, Laszlo Gaal <la...@cloudera.com>
wrote:

> Hi Peter,
>
> For starters I would recommend the following overviews:
> 1. The Apache Impala website has a pretty comprehensive Impala guide, the
> 2.10.0 version can be found at http://impala.apache.org/
> docs/build/impala-2.10.pdf. Sizing considerations start on page 20.
> 2. Putting on my Cloudera hat for a moment: A good summary slide deck is
> the Impala Cookbook, created by Cloudera's Impala developers and field
> engineers, available on SlideShare: https://www.
> slideshare.net/cloudera/the-impala-cookbook-42530186
>
> To answer your specific question: The statestore and the catalog are
> usually recommended to run on their own dedicated hosts, separate from the
> worker nodes. The catalog has significant memory requirements, as it has to
> keep the complete metadata in memory (databases/tables/fields, the file
> layout for the tables and the HDFS block layout of the files, and
> optionally all security permissions from Sentry). You can find sizing
> formulas both for the memory requirements and for storage sizing in the
> above documents.
>
> I'm sure the community would be able to offer more specific help given
> more details about your setup and workload.
>
> Hope this helps,
>
>   - LaszloG
>
> On Tue, Feb 27, 2018 at 1:58 AM, Peter Horvath <pe...@gmail.com> wrote:
>
>> Dear All,
>>
>> I am in the process of setting up a Hadoop cluster including Impala
>> v2.10.0.
>>
>> I would like to configure Impala State Store and Catalog Service
>> appropriately (maybe even on a dedicated host), however I cannot really
>> find any documentation on the resource needs of these services or any
>> other best practices regarding the sizing of the host machine.
>>
>> For example I do not know how much memory or disk space should I reserve
>> for these services: based on my understanding Impala State Store and
>> Catalog Service should be of relatively small footprint compared to
>> other big data components, but I am not sure I would be able make a right
>> estimation on my own.
>>
>> Could someone please point me into the right direction?
>>
>> Thank you,
>> Peter
>>
>>
>

Re: Impala State Store and Catalog Service resource requirements

Posted by Laszlo Gaal <la...@cloudera.com>.

Hi Peter,

For starters I would recommend the following overviews:
1. The Apache Impala website has a pretty comprehensive Impala guide, the
2.10.0 version can be found at
http://impala.apache.org/docs/build/impala-2.10.pdf. Sizing considerations
start on page 20.
2. Putting on my Cloudera hat for a moment: A good summary slide deck is
the Impala Cookbook, created by Cloudera's Impala developers and field
engineers, available on SlideShare:
https://www.slideshare.net/cloudera/the-impala-cookbook-42530186

To answer your specific question: The statestore and the catalog are
usually recommended to run on their own dedicated hosts, separate from the
worker nodes. The catalog has significant memory requirements, as it has to
keep the complete metadata in memory (databases/tables/fields, the file
layout for the tables and the HDFS block layout of the files, and
optionally all security permissions from Sentry). You can find sizing
formulas both for the memory requirements and for storage sizing in the
above documents.

I'm sure the community would be able to offer more specific help given more
details about your setup and workload.

Hope this helps,

  - LaszloG

On Tue, Feb 27, 2018 at 1:58 AM, Peter Horvath <pe...@gmail.com> wrote:

> Dear All,
>
> I am in the process of setting up a Hadoop cluster including Impala
> v2.10.0.
>
> I would like to configure Impala State Store and Catalog Service
> appropriately (maybe even on a dedicated host), however I cannot really
> find any documentation on the resource needs of these services or any
> other best practices regarding the sizing of the host machine.
>
> For example I do not know how much memory or disk space should I reserve
> for these services: based on my understanding Impala State Store and
> Catalog Service should be of relatively small footprint compared to other
> big data components, but I am not sure I would be able make a right
> estimation on my own.
>
> Could someone please point me into the right direction?
>
> Thank you,
> Peter
>
>