You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Mario Pastorelli <ma...@teralytics.ch> on 2016/10/07 12:22:22 UTC

Re: Maximum and recommended number of tablets per tserver

Hey Josh,

Thanks for the answer! I was wondering if you have some pointers in the
Accumulo code where I can read the implementation where the tablets
information are stored so that I can read it and understand if it will give
us problems eventually. We may have many tablets per server but we have few
servers which probably means that the number of tablets should not be a
problem.

Thanks,
Mario

On Tue, Sep 27, 2016 at 7:34 PM, Josh Elser <jo...@gmail.com> wrote:

> Hi Mario,
>
> There is no perfect answer here. It's going to be dependent on your
> application latency requirements, data layout, and available hardware
> resources.
>
> In general, we recommend "hundreds of tablets" per tabletserver. This
> should be a decent starting ponit. As you develop more and are able to run
> some experimentations with your application, you might find that more or
> less tablets are actually beneficial (for one reason or another).
>
> I'm not sure what a maximum number of tablets would be per tserver.
> Probably 10's of thousands? It quite depends (again) on the hardware
> resources available. The master is also a single entity managing the
> assignment of each of those tablets -- eventually, just iterating over
> tablets will take more and more time in the Master. Additionally, each
> Tablet represents some additional amount of memory in the TabletServer's
> JVM heap.
>
> Sorry this isn't clear-cut answer -- maybe we can do a better job
> encapsulating the important metrics in the docs somewhere?  You're
> certainly not the first to ask this question :)
>
> - Josh
>
> Mario Pastorelli wrote:
>
>> Hey,
>>
>> I would like to know what is the maximum and recommended number of
>> tablets that we should have per machine. Right now is some clusters we
>> have thousands of tablets and the number of tables will increase in
>> future. Eventually, we will have to add new machines to our clusters and
>> a recommended number could help us decide the size of the new cluster.
>>
>> Thanks
>> Mario
>>
>> --
>> Mario Pastorelli| TERALYTICS
>>
>> *software engineer*
>>
>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>> phone:+41794381682
>> email: mario.pastorelli@teralytics.ch
>> <ma...@teralytics.ch>
>> www.teralytics.net <http://www.teralytics.net/>
>>
>> Company registration number: CH-020.3.037.709-7 | Trade register Canton
>> Zurich
>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
>> Yann de Vries
>>
>> This e-mail message contains confidential information which is for the
>> sole attention and use of the intended recipient. Please notify us at
>> once if you think that it may not be intended for you and delete it
>> immediately.
>>
>>


-- 
Mario Pastorelli | TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone: +41794381682
email: mario.pastorelli@teralytics.ch
www.teralytics.net

Company registration number: CH-020.3.037.709-7 | Trade register Canton
Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
de Vries

This e-mail message contains confidential information which is for the sole
attention and use of the intended recipient. Please notify us at once if
you think that it may not be intended for you and delete it immediately.

Re: Maximum and recommended number of tablets per tserver

Posted by Josh Elser <jo...@gmail.com>.
Hrm, two places I can think of:

One is in TabletServer.java itself. Look for the SortedMap members: 
onlineTablets, unopenedTablets, and openingTablets. The onlineTablets is 
essentially all of the Tablets that the TServer is hosting. You can use 
this as a starting point to see where inside the TServer these tablets 
are referenced.

For the assignment side, look for TabletGroupWatcher.java. This is 
essentially the mechanism running inside the Accumulo Master which makes 
sure that Tablets are online.

Mario Pastorelli wrote:
> Hey Josh,
>
> Thanks for the answer! I was wondering if you have some pointers in the
> Accumulo code where I can read the implementation where the tablets
> information are stored so that I can read it and understand if it will
> give us problems eventually. We may have many tablets per server but we
> have few servers which probably means that the number of tablets should
> not be a problem.
>
> Thanks,
> Mario
>
> On Tue, Sep 27, 2016 at 7:34 PM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Hi Mario,
>
>     There is no perfect answer here. It's going to be dependent on your
>     application latency requirements, data layout, and available
>     hardware resources.
>
>     In general, we recommend "hundreds of tablets" per tabletserver.
>     This should be a decent starting ponit. As you develop more and are
>     able to run some experimentations with your application, you might
>     find that more or less tablets are actually beneficial (for one
>     reason or another).
>
>     I'm not sure what a maximum number of tablets would be per tserver.
>     Probably 10's of thousands? It quite depends (again) on the hardware
>     resources available. The master is also a single entity managing the
>     assignment of each of those tablets -- eventually, just iterating
>     over tablets will take more and more time in the Master.
>     Additionally, each Tablet represents some additional amount of
>     memory in the TabletServer's JVM heap.
>
>     Sorry this isn't clear-cut answer -- maybe we can do a better job
>     encapsulating the important metrics in the docs somewhere?  You're
>     certainly not the first to ask this question :)
>
>     - Josh
>
>     Mario Pastorelli wrote:
>
>         Hey,
>
>         I would like to know what is the maximum and recommended number of
>         tablets that we should have per machine. Right now is some
>         clusters we
>         have thousands of tablets and the number of tables will increase in
>         future. Eventually, we will have to add new machines to our
>         clusters and
>         a recommended number could help us decide the size of the new
>         cluster.
>
>         Thanks
>         Mario
>
>         --
>         Mario Pastorelli| TERALYTICS
>
>         *software engineer*
>
>         Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>         phone:+41794381682 <tel:%2B41794381682>
>         email: mario.pastorelli@teralytics.ch
>         <ma...@teralytics.ch>
>         <mailto:mario.pastorelli@teralytics.ch
>         <ma...@teralytics.ch>>
>         www.teralytics.net <http://www.teralytics.net>
>         <http://www.teralytics.net/>
>
>         Company registration number: CH-020.3.037.709-7 | Trade register
>         Canton
>         Zurich
>         Board of directors: Georg Polzer, Luciano Franceschina, Mark
>         Schmitz,
>         Yann de Vries
>
>         This e-mail message contains confidential information which is
>         for the
>         sole attention and use of the intended recipient. Please notify
>         us at
>         once if you think that it may not be intended for you and delete it
>         immediately.
>
>
>
>
> --
> Mario Pastorelli| TERALYTICS
>
> *software engineer*
>
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> phone:+41794381682
> email: mario.pastorelli@teralytics.ch
> <ma...@teralytics.ch>
> www.teralytics.net <http://www.teralytics.net/>
>
> Company registration number: CH-020.3.037.709-7 | Trade register Canton
> Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
> Yann de Vries
>
> This e-mail message contains confidential information which is for the
> sole attention and use of the intended recipient. Please notify us at
> once if you think that it may not be intended for you and delete it
> immediately.
>