You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Boris Tyukin <bo...@boristyukin.com> on 2018/10/10 21:02:54 UTC

clarification on Partitioning Guidelines and CPU cores

Hi all,

can someone clarify if this recommendation below - does it mean physical or
hyper-threaded CPU cores? quite a big difference...
Thanks,
Boris

Partitioning Guidelines (https://kudu.apache.org/docs/
kudu_impala_integration.html#partitioning_rules_of_thumb)
- For large tables, such as fact tables, aim for as many tablets as you
have cores in the cluster.
- For small tables, such as dimension tables, aim for a large enough number
of tablets that each tablet is at least 1 GB in size.

In general, be mindful the number of tablets limits the parallelism of
reads, in the current implementation. Increasing the number of tablets
significantly beyond the number of cores is likely to have diminishing
returns.

Re: clarification on Partitioning Guidelines and CPU cores

Posted by Boris Tyukin <bo...@boristyukin.com>.
interesting, I did not realize that. Thanks for the tip!

On Wed, Oct 17, 2018 at 9:05 PM Adar Lieber-Dembo <ad...@cloudera.com> wrote:

> The 60 tablets per table per node limit is just at table creation time.
> You can create a table that maxes out the number of tablets, then add more
> range partitions afterwards.
>
> On Wed, Oct 17, 2018 at 6:00 PM Boris Tyukin <bo...@boristyukin.com>
> wrote:
>
>> thanks for replying, Adar. Did some math and in our case we are hitting
>> another Kudu limit - 60 tablets per node. We use high density nodes with 2
>> 24-core CPUs so we have 88 hyperthreaded cores total per node or 88*24=2112
>> cores total. But I cannot create more than 60*24=1440 tablets per table.
>> Looks like my tablets for the largest table will be around 8-10Gb in size.
>> Should I be worried since recommendation is to keep tablets about 1Gb in
>> size?
>>
>> On Wed, Oct 17, 2018 at 8:06 PM Adar Lieber-Dembo <ad...@cloudera.com>
>> wrote:
>>
>>> Hi Boris,
>>>
>>> > Also, when they say tablets - I assume this is before replication? so
>>> in reality, it is number of nodes x cpu cores / replication factor? If this
>>> is the case, it is not looking good...
>>>
>>> No, I think this is post-replication. The underlying assumption is
>>> that you want to maximize parallelism for large tables, and since
>>> Impala only uses one read thread per tablet, that means ensuring the
>>> number of tablets is close or equal to the overall number of cores.
>>> However, during a scan Impala will choose one of the tablet's replicas
>>> to read from, so you don't need to "reserve" a core for the other
>>> replicas.
>>>
>>> >> can someone clarify if this recommendation below - does it mean
>>> physical or hyper-threaded CPU cores? quite a big difference...
>>>
>>> I think this refers to hyper-threaded CPU cores (i.e. a CPU unit
>>> capable of executing an OS thread). But I'd be curious to hear if your
>>> workload is substantially more or less performant either way.
>>>
>>

Re: clarification on Partitioning Guidelines and CPU cores

Posted by Adar Lieber-Dembo <ad...@cloudera.com>.
The 60 tablets per table per node limit is just at table creation time. You
can create a table that maxes out the number of tablets, then add more
range partitions afterwards.

On Wed, Oct 17, 2018 at 6:00 PM Boris Tyukin <bo...@boristyukin.com> wrote:

> thanks for replying, Adar. Did some math and in our case we are hitting
> another Kudu limit - 60 tablets per node. We use high density nodes with 2
> 24-core CPUs so we have 88 hyperthreaded cores total per node or 88*24=2112
> cores total. But I cannot create more than 60*24=1440 tablets per table.
> Looks like my tablets for the largest table will be around 8-10Gb in size.
> Should I be worried since recommendation is to keep tablets about 1Gb in
> size?
>
> On Wed, Oct 17, 2018 at 8:06 PM Adar Lieber-Dembo <ad...@cloudera.com>
> wrote:
>
>> Hi Boris,
>>
>> > Also, when they say tablets - I assume this is before replication? so
>> in reality, it is number of nodes x cpu cores / replication factor? If this
>> is the case, it is not looking good...
>>
>> No, I think this is post-replication. The underlying assumption is
>> that you want to maximize parallelism for large tables, and since
>> Impala only uses one read thread per tablet, that means ensuring the
>> number of tablets is close or equal to the overall number of cores.
>> However, during a scan Impala will choose one of the tablet's replicas
>> to read from, so you don't need to "reserve" a core for the other
>> replicas.
>>
>> >> can someone clarify if this recommendation below - does it mean
>> physical or hyper-threaded CPU cores? quite a big difference...
>>
>> I think this refers to hyper-threaded CPU cores (i.e. a CPU unit
>> capable of executing an OS thread). But I'd be curious to hear if your
>> workload is substantially more or less performant either way.
>>
>

Re: clarification on Partitioning Guidelines and CPU cores

Posted by Boris Tyukin <bo...@boristyukin.com>.
thanks for replying, Adar. Did some math and in our case we are hitting
another Kudu limit - 60 tablets per node. We use high density nodes with 2
24-core CPUs so we have 88 hyperthreaded cores total per node or 88*24=2112
cores total. But I cannot create more than 60*24=1440 tablets per table.
Looks like my tablets for the largest table will be around 8-10Gb in size.
Should I be worried since recommendation is to keep tablets about 1Gb in
size?

On Wed, Oct 17, 2018 at 8:06 PM Adar Lieber-Dembo <ad...@cloudera.com> wrote:

> Hi Boris,
>
> > Also, when they say tablets - I assume this is before replication? so in
> reality, it is number of nodes x cpu cores / replication factor? If this is
> the case, it is not looking good...
>
> No, I think this is post-replication. The underlying assumption is
> that you want to maximize parallelism for large tables, and since
> Impala only uses one read thread per tablet, that means ensuring the
> number of tablets is close or equal to the overall number of cores.
> However, during a scan Impala will choose one of the tablet's replicas
> to read from, so you don't need to "reserve" a core for the other
> replicas.
>
> >> can someone clarify if this recommendation below - does it mean
> physical or hyper-threaded CPU cores? quite a big difference...
>
> I think this refers to hyper-threaded CPU cores (i.e. a CPU unit
> capable of executing an OS thread). But I'd be curious to hear if your
> workload is substantially more or less performant either way.
>

Re: clarification on Partitioning Guidelines and CPU cores

Posted by Adar Lieber-Dembo <ad...@cloudera.com>.
Hi Boris,

> Also, when they say tablets - I assume this is before replication? so in reality, it is number of nodes x cpu cores / replication factor? If this is the case, it is not looking good...

No, I think this is post-replication. The underlying assumption is
that you want to maximize parallelism for large tables, and since
Impala only uses one read thread per tablet, that means ensuring the
number of tablets is close or equal to the overall number of cores.
However, during a scan Impala will choose one of the tablet's replicas
to read from, so you don't need to "reserve" a core for the other
replicas.

>> can someone clarify if this recommendation below - does it mean physical or hyper-threaded CPU cores? quite a big difference...

I think this refers to hyper-threaded CPU cores (i.e. a CPU unit
capable of executing an OS thread). But I'd be curious to hear if your
workload is substantially more or less performant either way.

Re: clarification on Partitioning Guidelines and CPU cores

Posted by Boris Tyukin <bo...@boristyukin.com>.
Also, when they say tablets - I assume this is before replication? so in
reality, it is number of nodes x cpu cores / replication factor? If this is
the case, it is not looking good...

On Wed, Oct 10, 2018 at 5:02 PM Boris Tyukin <bo...@boristyukin.com> wrote:

> Hi all,
>
> can someone clarify if this recommendation below - does it mean physical
> or hyper-threaded CPU cores? quite a big difference...
> Thanks,
> Boris
>
> Partitioning Guidelines (https://kudu.apache.org/docs/
> kudu_impala_integration.html#partitioning_rules_of_thumb)
> - For large tables, such as fact tables, aim for as many tablets as you
> have cores in the cluster.
> - For small tables, such as dimension tables, aim for a large enough
> number of tablets that each tablet is at least 1 GB in size.
>
> In general, be mindful the number of tablets limits the parallelism of
> reads, in the current implementation. Increasing the number of tablets
> significantly beyond the number of cores is likely to have diminishing
> returns.
>
>