You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Кравец Владимир Александрович <kr...@kamatech.ru> on 2018/03/20 09:15:48 UTC

Limitations on total amount of data stored in one kudu table

Hi, I'm new to Kudu and I'm trying to understand the applicability for 
our purposes. So I met the following article about the kudu limitations 
- 
https://www.cloudera.com/documentation/enterprise/latest/topics/kudu_limitations.html#concept_cws_n4n_5z. 
Do I understand correctly that this means that the maximum total amount 
of usefull compressed stored data in one kudu-table  is 8TB? Here my calcs:
1. Amount of stored data per tablet = Recommended maximum amount of 
stored data / Recommended maximum number of tablets per tablet server = 
8 000 / 2 000 = 4 GB per tablet
2. Maximum number of tablets per table for each tablet server 
pre-replication = Maximum number of tablets per table for each tablet 
server is 60, post-replication / number of replicas = 60 / 3 = 20 
tablets per table per tablet server
3. Total amount of stored data per table, pre-replication = Amount of 
stored data per tablet * Maximum number of tablets per table for each 
tablet server pre-replication *  Maximum number of tablet servers = 4 GB 
* 20 * 100 = 8TB
And I also would like to understand how fundamental the nature of the 
limitation "Maximum number of tablets per table for each tablet server 
is 60, post-replication"? Is it possible that this restriction will be 
removed?

Re: Limitations on total amount of data stored in one kudu table

Posted by Todd Lipcon <to...@cloudera.com>.
On Tue, Mar 20, 2018 at 2:15 AM, Кравец Владимир Александрович <
kravets@kamatech.ru> wrote:

> Hi, I'm new to Kudu and I'm trying to understand the applicability for our
> purposes. So I met the following article about the kudu limitations -
> https://www.cloudera.com/documentation/enterprise/latest/topics/kudu_
> limitations.html#concept_cws_n4n_5z. Do I understand correctly that this
> means that the maximum total amount of usefull compressed stored data in
> one kudu-table  is 8TB? Here my calcs:
>

I think there are a few mistakes below. Comments lineline.


> 1. Amount of stored data per tablet = Recommended maximum amount of stored
> data / Recommended maximum number of tablets per tablet server = 8 000 / 2
> 000 = 4 GB per tablet
>

That assumes that every tablet is equally sized and that you have hit the
limit on number of tablets. Even though you _can_ have 2000 tablets per
server, you might want fewer. In addition, you don't need to have every
tablet be the same size -- some might be 10GB while others might be 1GB or
smaller.


> 2. Maximum number of tablets per table for each tablet server
> pre-replication = Maximum number of tablets per table for each tablet
> server is 60, post-replication / number of replicas = 60 / 3 = 20 tablets
> per table per tablet server
>

The key word that you didn't copy here is "at table-creation time". This
limitation has to do with avoiding some issues we have seen when trying to
create too many tablets at the same time on the cluster. With range
partitioning, you can always add more partitions later. For example it's
very common to add a new partition for each day. So, a single table can,
after some days, have more than 20 tablets on a given server.


> 3. Total amount of stored data per table, pre-replication = Amount of
> stored data per tablet * Maximum number of tablets per table for each
> tablet server pre-replication *  Maximum number of tablet servers = 4 GB *
> 20 * 100 = 8TB
>

Per above, this isn't really the case. For example, on one cluster at
Cloudera which runs an internal workload, we have one table that is 82TB
and another which is 46TB. I've seen much larger tables in some user
installations as well.


> And I also would like to understand how fundamental the nature of the
> limitation "Maximum number of tablets per table for each tablet server is
> 60, post-replication"? Is it possible that this restriction will be removed?
>

See above.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera