You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Cheng Su <sc...@gmail.com> on 2012/11/01 06:22:54 UTC

Does hbase.hregion.max.filesize have a limit?

Hi, all.

I have a simple question: does hbase.hregion.max.filesize have a limit?
May I specify a very large value to this? like 40G or more? (don't
consider the performance)
I didn't find any description about this from official site or google.

Thanks.

-- 

Regards,
Cheng Su

Re: Does hbase.hregion.max.filesize have a limit?

Posted by Kevin O'dell <ke...@cloudera.com>.
There are two trains of thought here.  The first is manually splitting your
own regions.  In this case you would not want your regions over 20GB for
HFilev2 or 4GB for HFilev1, but you would set your maxfile size to
something like 100GB so you can split when you want to and the system won't
automagically do it for you.  The second is letting HBase handle this for
you.  In which case you still would not want your max filesize over 20GB
for HFilev2 or 4GB for HFilev1, and then HBase would handle your splits(if
this seems redundant sorry).

On Thu, Nov 1, 2012 at 8:26 AM, Doug Meil <do...@explorysmedical.com>wrote:

>
> Hi there-
>
> re:  "The max file size the whole cluster can store for one CF is 60G,
> right?"
>
> No, the max file-size for a region, in your example, is 60GB.  When the
> data exceeds that the region will split - and then you'll have 2 regions
> with 60GB limit.
>
> Check out this section of the RefGuide:
>
> http://hbase.apache.org/book.html#regions.arch
>
> Which explains how regions are how data is distributed across your cluster.
>
> The trick is that you don't want regions to small, but you also don't want
> them too big - because you'll wind up with what the ref guide describes in
> this chapter...
>
>
> 9.7.1. Region Size
>
> HBase scales by having regions across many servers. Thus if
>           you have 2 regions for 16GB data, on a 20 node machine your data
>           will be concentrated on just a few machines - nearly the entire
>           cluster will be idle.  This really cant be stressed enough,
> since a
>           common problem is loading 200MB data into HBase then wondering
> why
>           your awesome 10 node cluster isn't doing anything.
>
>
>
>
>
> On 11/1/12 4:09 AM, "Cheng Su" <sc...@gmail.com> wrote:
>
> >Thank you for your answer.
> >The max file size the whole cluster can store for one CF is 60G, right?
> >Maybe the only way is to split the large table into small tables...
> >
> >On Thu, Nov 1, 2012 at 3:05 PM, ramkrishna vasudevan
> ><ra...@gmail.com> wrote:
> >> Can multiple region servers runs on one real machine?
> >> (I guess not though)
> >> No.. Every RS runs in different physical machines.
> >>
> >> max.file.size actually applies for region.  Suppose you create a table
> >>then
> >> insert data for 20G that will get explicitly splitted into further
> >>regions.
> >> Yes all 60G of data can be stored in one physical machine but that means
> >> that you have the data is logically served by 3 regions.
> >> Does this help you?
> >>
> >> Regards
> >> Ram
> >>
> >> On Thu, Nov 1, 2012 at 12:15 PM, Cheng Su <sc...@gmail.com> wrote:
> >>
> >>> Does that means the max file size of 1 cf is 20G? If I have 3 region
> >>> servers, then 60G total?
> >>> I have a very large table, size of one cf (contains only one column)
> >>> may exceed 60G.
> >>> Is there any chance to store the data without increase machines?
> >>>
> >>> Can multiple region servers runs on one real machine?
> >>> (I guess not though)
> >>>
> >>> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <lh...@yahoo.com>
> >>>wrote:
> >>> > The tribal knowledge would say about 20G is the max.
> >>> > The fellas from Facebook will have a more definite answer.
> >>> >
> >>> > -- Lars
> >>> >
> >>> >
> >>> >
> >>> > ________________________________
> >>> >  From: Cheng Su <sc...@gmail.com>
> >>> > To: user@hbase.apache.org
> >>> > Sent: Wednesday, October 31, 2012 10:22 PM
> >>> > Subject: Does hbase.hregion.max.filesize have a limit?
> >>> >
> >>> > Hi, all.
> >>> >
> >>> > I have a simple question: does hbase.hregion.max.filesize have a
> >>>limit?
> >>> > May I specify a very large value to this? like 40G or more? (don't
> >>> > consider the performance)
> >>> > I didn't find any description about this from official site or
> >>>google.
> >>> >
> >>> > Thanks.
> >>> >
> >>> > --
> >>> >
> >>> > Regards,
> >>> > Cheng Su
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Regards,
> >>> Cheng Su
> >>>
> >
> >
> >
> >--
> >
> >Regards,
> >Cheng Su
> >
>
>
>


-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Does hbase.hregion.max.filesize have a limit?

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there-

re:  "The max file size the whole cluster can store for one CF is 60G,
right?"

No, the max file-size for a region, in your example, is 60GB.  When the
data exceeds that the region will split - and then you'll have 2 regions
with 60GB limit.  

Check out this section of the RefGuide:

http://hbase.apache.org/book.html#regions.arch

Which explains how regions are how data is distributed across your cluster.

The trick is that you don't want regions to small, but you also don't want
them too big - because you'll wind up with what the ref guide describes in
this chapter...


9.7.1. Region Size

HBase scales by having regions across many servers. Thus if
          you have 2 regions for 16GB data, on a 20 node machine your data
          will be concentrated on just a few machines - nearly the entire
          cluster will be idle.  This really cant be stressed enough,
since a
          common problem is loading 200MB data into HBase then wondering
why
          your awesome 10 node cluster isn't doing anything.





On 11/1/12 4:09 AM, "Cheng Su" <sc...@gmail.com> wrote:

>Thank you for your answer.
>The max file size the whole cluster can store for one CF is 60G, right?
>Maybe the only way is to split the large table into small tables...
>
>On Thu, Nov 1, 2012 at 3:05 PM, ramkrishna vasudevan
><ra...@gmail.com> wrote:
>> Can multiple region servers runs on one real machine?
>> (I guess not though)
>> No.. Every RS runs in different physical machines.
>>
>> max.file.size actually applies for region.  Suppose you create a table
>>then
>> insert data for 20G that will get explicitly splitted into further
>>regions.
>> Yes all 60G of data can be stored in one physical machine but that means
>> that you have the data is logically served by 3 regions.
>> Does this help you?
>>
>> Regards
>> Ram
>>
>> On Thu, Nov 1, 2012 at 12:15 PM, Cheng Su <sc...@gmail.com> wrote:
>>
>>> Does that means the max file size of 1 cf is 20G? If I have 3 region
>>> servers, then 60G total?
>>> I have a very large table, size of one cf (contains only one column)
>>> may exceed 60G.
>>> Is there any chance to store the data without increase machines?
>>>
>>> Can multiple region servers runs on one real machine?
>>> (I guess not though)
>>>
>>> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <lh...@yahoo.com>
>>>wrote:
>>> > The tribal knowledge would say about 20G is the max.
>>> > The fellas from Facebook will have a more definite answer.
>>> >
>>> > -- Lars
>>> >
>>> >
>>> >
>>> > ________________________________
>>> >  From: Cheng Su <sc...@gmail.com>
>>> > To: user@hbase.apache.org
>>> > Sent: Wednesday, October 31, 2012 10:22 PM
>>> > Subject: Does hbase.hregion.max.filesize have a limit?
>>> >
>>> > Hi, all.
>>> >
>>> > I have a simple question: does hbase.hregion.max.filesize have a
>>>limit?
>>> > May I specify a very large value to this? like 40G or more? (don't
>>> > consider the performance)
>>> > I didn't find any description about this from official site or
>>>google.
>>> >
>>> > Thanks.
>>> >
>>> > --
>>> >
>>> > Regards,
>>> > Cheng Su
>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Cheng Su
>>>
>
>
>
>-- 
>
>Regards,
>Cheng Su
>



Re: Does hbase.hregion.max.filesize have a limit?

Posted by Cheng Su <sc...@gmail.com>.
Thank you for your answer.
The max file size the whole cluster can store for one CF is 60G, right?
Maybe the only way is to split the large table into small tables...

On Thu, Nov 1, 2012 at 3:05 PM, ramkrishna vasudevan
<ra...@gmail.com> wrote:
> Can multiple region servers runs on one real machine?
> (I guess not though)
> No.. Every RS runs in different physical machines.
>
> max.file.size actually applies for region.  Suppose you create a table then
> insert data for 20G that will get explicitly splitted into further regions.
> Yes all 60G of data can be stored in one physical machine but that means
> that you have the data is logically served by 3 regions.
> Does this help you?
>
> Regards
> Ram
>
> On Thu, Nov 1, 2012 at 12:15 PM, Cheng Su <sc...@gmail.com> wrote:
>
>> Does that means the max file size of 1 cf is 20G? If I have 3 region
>> servers, then 60G total?
>> I have a very large table, size of one cf (contains only one column)
>> may exceed 60G.
>> Is there any chance to store the data without increase machines?
>>
>> Can multiple region servers runs on one real machine?
>> (I guess not though)
>>
>> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <lh...@yahoo.com> wrote:
>> > The tribal knowledge would say about 20G is the max.
>> > The fellas from Facebook will have a more definite answer.
>> >
>> > -- Lars
>> >
>> >
>> >
>> > ________________________________
>> >  From: Cheng Su <sc...@gmail.com>
>> > To: user@hbase.apache.org
>> > Sent: Wednesday, October 31, 2012 10:22 PM
>> > Subject: Does hbase.hregion.max.filesize have a limit?
>> >
>> > Hi, all.
>> >
>> > I have a simple question: does hbase.hregion.max.filesize have a limit?
>> > May I specify a very large value to this? like 40G or more? (don't
>> > consider the performance)
>> > I didn't find any description about this from official site or google.
>> >
>> > Thanks.
>> >
>> > --
>> >
>> > Regards,
>> > Cheng Su
>>
>>
>>
>> --
>>
>> Regards,
>> Cheng Su
>>



-- 

Regards,
Cheng Su

Re: Does hbase.hregion.max.filesize have a limit?

Posted by ramkrishna vasudevan <ra...@gmail.com>.
Can multiple region servers runs on one real machine?
(I guess not though)
No.. Every RS runs in different physical machines.

max.file.size actually applies for region.  Suppose you create a table then
insert data for 20G that will get explicitly splitted into further regions.
Yes all 60G of data can be stored in one physical machine but that means
that you have the data is logically served by 3 regions.
Does this help you?

Regards
Ram

On Thu, Nov 1, 2012 at 12:15 PM, Cheng Su <sc...@gmail.com> wrote:

> Does that means the max file size of 1 cf is 20G? If I have 3 region
> servers, then 60G total?
> I have a very large table, size of one cf (contains only one column)
> may exceed 60G.
> Is there any chance to store the data without increase machines?
>
> Can multiple region servers runs on one real machine?
> (I guess not though)
>
> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <lh...@yahoo.com> wrote:
> > The tribal knowledge would say about 20G is the max.
> > The fellas from Facebook will have a more definite answer.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Cheng Su <sc...@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Wednesday, October 31, 2012 10:22 PM
> > Subject: Does hbase.hregion.max.filesize have a limit?
> >
> > Hi, all.
> >
> > I have a simple question: does hbase.hregion.max.filesize have a limit?
> > May I specify a very large value to this? like 40G or more? (don't
> > consider the performance)
> > I didn't find any description about this from official site or google.
> >
> > Thanks.
> >
> > --
> >
> > Regards,
> > Cheng Su
>
>
>
> --
>
> Regards,
> Cheng Su
>

Re: Does hbase.hregion.max.filesize have a limit?

Posted by Cheng Su <sc...@gmail.com>.
Thank you all guys.

I found out that I misunderstood the "size of a region" and "size of a
region server".
I found this property
193-  <property>
194-    <name>hbase.regionserver.regionSplitLimit</name>
195-    <value>2147483647</value>
196-    <description>Limit for the number of regions after which no more region
197:    splitting should take place. This is not a hard limit for the number of
198:    regions but acts as a guideline for the regionserver to stop
splitting after
199:    a certain limit. Default is set to MAX_INT; i.e. do not block splitting.
200-    </description>
201-  </property>

So in practice, a region server can handle enough regions, so I don't
need worry about the store size.

Thank you all again.

On Fri, Nov 2, 2012 at 12:39 AM, Jeremy Carroll <ph...@gmail.com> wrote:
> To speak to 'if it's possible', yes it is. We have some tables over here at
> Klout during testing where we set the max region size to 100Gb, and
> actually had tables of that size during a MR job that created HFileV2's for
> us to import. So I can say that I have seen 100Gb regions that still work.
>
> As to if this is a good idea, it's probably not. As a capacity planning
> exercise we added additional nodes to the cluster, and split these regions
> down to 10-20Gb in size.
>
> On Wed, Oct 31, 2012 at 11:45 PM, Cheng Su <sc...@gmail.com> wrote:
>
>> Does that means the max file size of 1 cf is 20G? If I have 3 region
>> servers, then 60G total?
>> I have a very large table, size of one cf (contains only one column)
>> may exceed 60G.
>> Is there any chance to store the data without increase machines?
>>
>> Can multiple region servers runs on one real machine?
>> (I guess not though)
>>
>> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <lh...@yahoo.com> wrote:
>> > The tribal knowledge would say about 20G is the max.
>> > The fellas from Facebook will have a more definite answer.
>> >
>> > -- Lars
>> >
>> >
>> >
>> > ________________________________
>> >  From: Cheng Su <sc...@gmail.com>
>> > To: user@hbase.apache.org
>> > Sent: Wednesday, October 31, 2012 10:22 PM
>> > Subject: Does hbase.hregion.max.filesize have a limit?
>> >
>> > Hi, all.
>> >
>> > I have a simple question: does hbase.hregion.max.filesize have a limit?
>> > May I specify a very large value to this? like 40G or more? (don't
>> > consider the performance)
>> > I didn't find any description about this from official site or google.
>> >
>> > Thanks.
>> >
>> > --
>> >
>> > Regards,
>> > Cheng Su
>>
>>
>>
>> --
>>
>> Regards,
>> Cheng Su
>>



-- 

Regards,
Cheng Su

Re: Does hbase.hregion.max.filesize have a limit?

Posted by Jeremy Carroll <ph...@gmail.com>.
To speak to 'if it's possible', yes it is. We have some tables over here at
Klout during testing where we set the max region size to 100Gb, and
actually had tables of that size during a MR job that created HFileV2's for
us to import. So I can say that I have seen 100Gb regions that still work.

As to if this is a good idea, it's probably not. As a capacity planning
exercise we added additional nodes to the cluster, and split these regions
down to 10-20Gb in size.

On Wed, Oct 31, 2012 at 11:45 PM, Cheng Su <sc...@gmail.com> wrote:

> Does that means the max file size of 1 cf is 20G? If I have 3 region
> servers, then 60G total?
> I have a very large table, size of one cf (contains only one column)
> may exceed 60G.
> Is there any chance to store the data without increase machines?
>
> Can multiple region servers runs on one real machine?
> (I guess not though)
>
> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <lh...@yahoo.com> wrote:
> > The tribal knowledge would say about 20G is the max.
> > The fellas from Facebook will have a more definite answer.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Cheng Su <sc...@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Wednesday, October 31, 2012 10:22 PM
> > Subject: Does hbase.hregion.max.filesize have a limit?
> >
> > Hi, all.
> >
> > I have a simple question: does hbase.hregion.max.filesize have a limit?
> > May I specify a very large value to this? like 40G or more? (don't
> > consider the performance)
> > I didn't find any description about this from official site or google.
> >
> > Thanks.
> >
> > --
> >
> > Regards,
> > Cheng Su
>
>
>
> --
>
> Regards,
> Cheng Su
>

Re: Does hbase.hregion.max.filesize have a limit?

Posted by Cheng Su <sc...@gmail.com>.
Does that means the max file size of 1 cf is 20G? If I have 3 region
servers, then 60G total?
I have a very large table, size of one cf (contains only one column)
may exceed 60G.
Is there any chance to store the data without increase machines?

Can multiple region servers runs on one real machine?
(I guess not though)

On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <lh...@yahoo.com> wrote:
> The tribal knowledge would say about 20G is the max.
> The fellas from Facebook will have a more definite answer.
>
> -- Lars
>
>
>
> ________________________________
>  From: Cheng Su <sc...@gmail.com>
> To: user@hbase.apache.org
> Sent: Wednesday, October 31, 2012 10:22 PM
> Subject: Does hbase.hregion.max.filesize have a limit?
>
> Hi, all.
>
> I have a simple question: does hbase.hregion.max.filesize have a limit?
> May I specify a very large value to this? like 40G or more? (don't
> consider the performance)
> I didn't find any description about this from official site or google.
>
> Thanks.
>
> --
>
> Regards,
> Cheng Su



-- 

Regards,
Cheng Su

Re: Does hbase.hregion.max.filesize have a limit?

Posted by lars hofhansl <lh...@yahoo.com>.
The tribal knowledge would say about 20G is the max.
The fellas from Facebook will have a more definite answer.

-- Lars



________________________________
 From: Cheng Su <sc...@gmail.com>
To: user@hbase.apache.org 
Sent: Wednesday, October 31, 2012 10:22 PM
Subject: Does hbase.hregion.max.filesize have a limit?
 
Hi, all.

I have a simple question: does hbase.hregion.max.filesize have a limit?
May I specify a very large value to this? like 40G or more? (don't
consider the performance)
I didn't find any description about this from official site or google.

Thanks.

-- 

Regards,
Cheng Su