You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Boris Tyukin <bo...@boristyukin.com> on 2019/02/18 13:47:36 UTC

Impala and Kudu resouces allocation

Hello,

We are trying to figure out how much memory / CPU cores allocated to Impala
versus Kudu, since both of them are on the same cluster.

Let's say we have 200Gb of RAM and 48 cores and 70% of the data stored in
Kudu, while 30% in Impala/HDFS. How we would assign resources in this case
since both of them are the memory intensive apps?

Thanks,
Boris

Re: Impala and Kudu resouces allocation

Posted by Brock Noland <br...@phdata.io>.
Hi Boris,

I am not sure what the exact count is, but we have somewhere around 25
of our Managed Services customers using Kudu and several more
consulting customers.

In addition to IoT and just generally large continually updated fact
tabs, we've also found the CDC use case to be desirable for customers.
By using CDC + Kudu + Impala, an EDW offload use case is much, much
easier than it was several years ago.

Brock

On Mon, Feb 18, 2019 at 10:19 AM Boris Tyukin <bo...@boristyukin.com> wrote:
>
> thanks Brock, I really like your idea to start with Kudu, and then divide between Impala/Yarn.
>
> How is Kudu working for you guys? I see you've published a book recently - will check it out!!
>
> We were really impressed with Kudu after did some benchmarks I posted on my blog https://boristyukin.com/benchmarking-apache-kudu-vs-apache-impala. And now will be using it for real-time replication of data from Cerner EHR, using GoldenGate, Kafka and NiFi. I know you are in healthcare space too..
>
> Thanks,
> Boris
>
> On Mon, Feb 18, 2019 at 10:45 AM Brock Noland <br...@phdata.io> wrote:
>>
>> I would do this:
>>
>> - Give Kudu required memory as described here - https://www.cloudera.com/documentation/enterprise/latest/topics/kudu_scaling.html
>> - Give HDFS minimum required memory
>> - Decide how you use YARN vs Impala and split up the memory based on that usage. We tend to use lots of impala on Kudu and HDFS, so we give Impala lots of memory and YARN a smaller amount.
>>
>> Brock
>>
>> On Mon, Feb 18, 2019 at 7:49 AM Boris Tyukin <bo...@boristyukin.com> wrote:
>>>
>>> Hello,
>>>
>>> We are trying to figure out how much memory / CPU cores allocated to Impala versus Kudu, since both of them are on the same cluster.
>>>
>>> Let's say we have 200Gb of RAM and 48 cores and 70% of the data stored in Kudu, while 30% in Impala/HDFS. How we would assign resources in this case since both of them are the memory intensive apps?
>>>
>>> Thanks,
>>> Boris

Re: Impala and Kudu resouces allocation

Posted by Boris Tyukin <bo...@boristyukin.com>.
thanks Brock, I really like your idea to start with Kudu, and then divide
between Impala/Yarn.

How is Kudu working for you guys? I see you've published a book recently -
will check it out!!

We were really impressed with Kudu after did some benchmarks I posted on my
blog https://boristyukin.com/benchmarking-apache-kudu-vs-apache-impala. And
now will be using it for real-time replication of data from Cerner EHR,
using GoldenGate, Kafka and NiFi. I know you are in healthcare space too..

Thanks,
Boris

On Mon, Feb 18, 2019 at 10:45 AM Brock Noland <br...@phdata.io> wrote:

> I would do this:
>
> - Give Kudu required memory as described here -
> https://www.cloudera.com/documentation/enterprise/latest/topics/kudu_scaling.html
> - Give HDFS minimum required memory
> - Decide how you use YARN vs Impala and split up the memory based on that
> usage. We tend to use lots of impala on Kudu and HDFS, so we give Impala
> lots of memory and YARN a smaller amount.
>
> Brock
>
> On Mon, Feb 18, 2019 at 7:49 AM Boris Tyukin <bo...@boristyukin.com>
> wrote:
>
>> Hello,
>>
>> We are trying to figure out how much memory / CPU cores allocated to
>> Impala versus Kudu, since both of them are on the same cluster.
>>
>> Let's say we have 200Gb of RAM and 48 cores and 70% of the data stored in
>> Kudu, while 30% in Impala/HDFS. How we would assign resources in this case
>> since both of them are the memory intensive apps?
>>
>> Thanks,
>> Boris
>>
>

Re: Impala and Kudu resouces allocation

Posted by Brock Noland <br...@phdata.io>.
I would do this:

- Give Kudu required memory as described here -
https://www.cloudera.com/documentation/enterprise/latest/topics/kudu_scaling.html
- Give HDFS minimum required memory
- Decide how you use YARN vs Impala and split up the memory based on that
usage. We tend to use lots of impala on Kudu and HDFS, so we give Impala
lots of memory and YARN a smaller amount.

Brock

On Mon, Feb 18, 2019 at 7:49 AM Boris Tyukin <bo...@boristyukin.com> wrote:

> Hello,
>
> We are trying to figure out how much memory / CPU cores allocated to
> Impala versus Kudu, since both of them are on the same cluster.
>
> Let's say we have 200Gb of RAM and 48 cores and 70% of the data stored in
> Kudu, while 30% in Impala/HDFS. How we would assign resources in this case
> since both of them are the memory intensive apps?
>
> Thanks,
> Boris
>