You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Adrien Mogenet <ad...@gmail.com> on 2012/07/13 09:04:40 UTC

Maximum number of tables ?

Hi there,

I read some good practices about number of columns / column families, but
nothing about the number of tables.
What if I need to spread my data among hundred or thousand (big) tables ?
What should I care about ? I guess I should keep a tight number of
storeFiles per RegionServer ?

-- 
Adrien Mogenet
http://www.mogenet.me

Re: Maximum number of tables ?

Posted by Michael Segel <mi...@hotmail.com>.
Currently there is a hardcoded limit on the number of regions that a region server can manage. 
Its 1500.
Note that if the number of regions gets to around 1000 regions per region server, you end up with a performance hit. (YMMV) 

So if you have 1 region per table, there's a real limit of 1500 tables * number of RS nodes. 

Note: You will probably die well before hitting this limit, again YMMV.


On Jul 13, 2012, at 3:14 AM, N Keywal wrote:

> Hi,
> 
> There is no real limits as far as I know. As you will have one region
> per table (at least :-), the number of region will be something to
> monitor carefully  if you need thousands of table. See
> http://hbase.apache.org/book.html#arch.regions.size.
> 
> Don't forget that you can add as many column as you want, and that an
> empty cell cost nothing. For example, a class hierarchy is often
> mapped to multiple tables in a RDBMS, while in HBase having a single
> table for the same hierarchy makes much more sense. Moreover, there is
> no transaction between tables, so sometimes a 'uml composition' will
> go to a single table. And so on.
> 
> N.
> 
> On Fri, Jul 13, 2012 at 9:04 AM, Adrien Mogenet
> <ad...@gmail.com> wrote:
>> Hi there,
>> 
>> I read some good practices about number of columns / column families, but
>> nothing about the number of tables.
>> What if I need to spread my data among hundred or thousand (big) tables ?
>> What should I care about ? I guess I should keep a tight number of
>> storeFiles per RegionServer ?
>> 
>> --
>> Adrien Mogenet
>> http://www.mogenet.me
> 


Re: Maximum number of tables ?

Posted by N Keywal <nk...@gmail.com>.
Hi,

There is no real limits as far as I know. As you will have one region
per table (at least :-), the number of region will be something to
monitor carefully  if you need thousands of table. See
http://hbase.apache.org/book.html#arch.regions.size.

Don't forget that you can add as many column as you want, and that an
empty cell cost nothing. For example, a class hierarchy is often
mapped to multiple tables in a RDBMS, while in HBase having a single
table for the same hierarchy makes much more sense. Moreover, there is
no transaction between tables, so sometimes a 'uml composition' will
go to a single table. And so on.

N.

On Fri, Jul 13, 2012 at 9:04 AM, Adrien Mogenet
<ad...@gmail.com> wrote:
> Hi there,
>
> I read some good practices about number of columns / column families, but
> nothing about the number of tables.
> What if I need to spread my data among hundred or thousand (big) tables ?
> What should I care about ? I guess I should keep a tight number of
> storeFiles per RegionServer ?
>
> --
> Adrien Mogenet
> http://www.mogenet.me

Re: Maximum number of tables ?

Posted by Adrien Mogenet <ad...@gmail.com>.
Thanks for these answers ; it was a theoretical question. Actually, a
common pattern in other solutions for batch deletion is to organize data in
- for instance - one table per day and remove the eldest day after day.
That way is more efficient than finding old rows, then delete them (due to
lock strategy, fragmentation, blocking compaction, etc.). Not sure it's
relevant for HBase!

On Fri, Jul 13, 2012 at 7:47 PM, Lars George <la...@gmail.com> wrote:

> It is basically unset:
>
>     this.regionSplitLimit =
> conf.getInt("hbase.regionserver.regionSplitLimit",
>         Integer.MAX_VALUE);
>
> (from CompactSplitThread.java).
>
> The number of regions is OK until you dilute the available heap share too
> much. So you can have >1000 regions (given the block index, file handles
> etc. keep up) but only a few them can be active most of the time.
>
> Lars
>
> On Jul 13, 2012, at 7:40 PM, Michael Segel wrote:
>
> > I'm going from memory. There was a hardcoded number. I'd have to go back
> and try to find it.
> >
> > From a practical standpoint, going over 1000 regions per RS will put you
> on thin ice.
> >
> > Too many regions can kill your system.
> >
> > On Jul 13, 2012, at 12:36 PM, Kevin O'dell wrote:
> >
> >> Mike,
> >>
> >> I just saw a system with 2500 Regions per RS(crazy I know, we are fixing
> >> that).  I did not think there was a hard coded limit...
> >>
> >> On Fri, Jul 13, 2012 at 11:50 AM, Amandeep Khurana <am...@gmail.com>
> wrote:
> >>
> >>> I have come across clusters with 100s of tables but that typically is
> >>> due to a sub optimal table design.
> >>>
> >>> The question here is - why do you need to distribute your data over
> >>> lots of tables? What's your access pattern and what kind of data are
> >>> you putting in? Or is this just a theoretical question?
> >>>
> >>> On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <adrien.mogenet@gmail.com
> >
> >>> wrote:
> >>>
> >>>> Hi there,
> >>>>
> >>>> I read some good practices about number of columns / column families,
> but
> >>>> nothing about the number of tables.
> >>>> What if I need to spread my data among hundred or thousand (big)
> tables ?
> >>>> What should I care about ? I guess I should keep a tight number of
> >>>> storeFiles per RegionServer ?
> >>>>
> >>>> --
> >>>> Adrien Mogenet
> >>>> http://www.mogenet.me
> >>>
> >>
> >>
> >>
> >> --
> >> Kevin O'Dell
> >> Customer Operations Engineer, Cloudera
> >
>
>


-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Re: Maximum number of tables ?

Posted by Lars George <la...@gmail.com>.
It is basically unset:

    this.regionSplitLimit = conf.getInt("hbase.regionserver.regionSplitLimit",
        Integer.MAX_VALUE);

(from CompactSplitThread.java).

The number of regions is OK until you dilute the available heap share too much. So you can have >1000 regions (given the block index, file handles etc. keep up) but only a few them can be active most of the time.

Lars

On Jul 13, 2012, at 7:40 PM, Michael Segel wrote:

> I'm going from memory. There was a hardcoded number. I'd have to go back and try to find it. 
> 
> From a practical standpoint, going over 1000 regions per RS will put you on thin ice. 
> 
> Too many regions can kill your system.
> 
> On Jul 13, 2012, at 12:36 PM, Kevin O'dell wrote:
> 
>> Mike,
>> 
>> I just saw a system with 2500 Regions per RS(crazy I know, we are fixing
>> that).  I did not think there was a hard coded limit...
>> 
>> On Fri, Jul 13, 2012 at 11:50 AM, Amandeep Khurana <am...@gmail.com> wrote:
>> 
>>> I have come across clusters with 100s of tables but that typically is
>>> due to a sub optimal table design.
>>> 
>>> The question here is - why do you need to distribute your data over
>>> lots of tables? What's your access pattern and what kind of data are
>>> you putting in? Or is this just a theoretical question?
>>> 
>>> On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <ad...@gmail.com>
>>> wrote:
>>> 
>>>> Hi there,
>>>> 
>>>> I read some good practices about number of columns / column families, but
>>>> nothing about the number of tables.
>>>> What if I need to spread my data among hundred or thousand (big) tables ?
>>>> What should I care about ? I guess I should keep a tight number of
>>>> storeFiles per RegionServer ?
>>>> 
>>>> --
>>>> Adrien Mogenet
>>>> http://www.mogenet.me
>>> 
>> 
>> 
>> 
>> -- 
>> Kevin O'Dell
>> Customer Operations Engineer, Cloudera
> 


Re: Maximum number of tables ?

Posted by Michael Segel <mi...@hotmail.com>.
I'm going from memory. There was a hardcoded number. I'd have to go back and try to find it. 

From a practical standpoint, going over 1000 regions per RS will put you on thin ice. 

Too many regions can kill your system.

On Jul 13, 2012, at 12:36 PM, Kevin O'dell wrote:

> Mike,
> 
>  I just saw a system with 2500 Regions per RS(crazy I know, we are fixing
> that).  I did not think there was a hard coded limit...
> 
> On Fri, Jul 13, 2012 at 11:50 AM, Amandeep Khurana <am...@gmail.com> wrote:
> 
>> I have come across clusters with 100s of tables but that typically is
>> due to a sub optimal table design.
>> 
>> The question here is - why do you need to distribute your data over
>> lots of tables? What's your access pattern and what kind of data are
>> you putting in? Or is this just a theoretical question?
>> 
>> On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <ad...@gmail.com>
>> wrote:
>> 
>>> Hi there,
>>> 
>>> I read some good practices about number of columns / column families, but
>>> nothing about the number of tables.
>>> What if I need to spread my data among hundred or thousand (big) tables ?
>>> What should I care about ? I guess I should keep a tight number of
>>> storeFiles per RegionServer ?
>>> 
>>> --
>>> Adrien Mogenet
>>> http://www.mogenet.me
>> 
> 
> 
> 
> -- 
> Kevin O'Dell
> Customer Operations Engineer, Cloudera


Re: Maximum number of tables ?

Posted by Kevin O'dell <ke...@cloudera.com>.
Mike,

  I just saw a system with 2500 Regions per RS(crazy I know, we are fixing
that).  I did not think there was a hard coded limit...

On Fri, Jul 13, 2012 at 11:50 AM, Amandeep Khurana <am...@gmail.com> wrote:

> I have come across clusters with 100s of tables but that typically is
> due to a sub optimal table design.
>
> The question here is - why do you need to distribute your data over
> lots of tables? What's your access pattern and what kind of data are
> you putting in? Or is this just a theoretical question?
>
> On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <ad...@gmail.com>
> wrote:
>
> > Hi there,
> >
> > I read some good practices about number of columns / column families, but
> > nothing about the number of tables.
> > What if I need to spread my data among hundred or thousand (big) tables ?
> > What should I care about ? I guess I should keep a tight number of
> > storeFiles per RegionServer ?
> >
> > --
> > Adrien Mogenet
> > http://www.mogenet.me
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Maximum number of tables ?

Posted by Amandeep Khurana <am...@gmail.com>.
I have come across clusters with 100s of tables but that typically is
due to a sub optimal table design.

The question here is - why do you need to distribute your data over
lots of tables? What's your access pattern and what kind of data are
you putting in? Or is this just a theoretical question?

On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <ad...@gmail.com> wrote:

> Hi there,
>
> I read some good practices about number of columns / column families, but
> nothing about the number of tables.
> What if I need to spread my data among hundred or thousand (big) tables ?
> What should I care about ? I guess I should keep a tight number of
> storeFiles per RegionServer ?
>
> --
> Adrien Mogenet
> http://www.mogenet.me