You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by William Kang <we...@gmail.com> on 2010/09/06 22:56:45 UTC

Limits on HBase

Hi folks,
I know this question may have been asked many times, but I am wondering if
there is any update on the optimized cell size (in megabytes) and row size
(in megabytes)? Many thanks.


William

Re: how to remove dangling region and table?

Posted by Stack <st...@duboce.net>.
hbase> scan '.META.'
# Look at output for rows w/ your table mentioned
hbase> delete '.META.', 'ROW_FROM_META_WITH_YOUR_TABLE'

Or you could dump scan to a file

echo "scan '.META.'"|./bin/hbase shell &> /tmp/meta.txt

... figure list of rows to delete

Then write script to delete rows.

St.Ack

On Wed, Sep 8, 2010 at 9:29 PM, Jinsong Hu <ji...@hotmail.com> wrote:
> how do I delete the  entries from .META. ? Is there a command or utility
> that I can use ?
>
> the table doesn't even appear in "list" command any more.
> if I try to run disable command, the shell says the table doesn't exist.
> if I try to a table with that name, the shell says the table already exists.
>
> Jimmy.
>
> --------------------------------------------------
> From: "Stack" <st...@duboce.net>
> Sent: Wednesday, September 08, 2010 8:54 PM
> To: <us...@hbase.apache.org>
> Subject: Re: how to remove dangling region and table?
>
>> On Wed, Sep 8, 2010 at 6:05 PM, Jinsong Hu <ji...@hotmail.com> wrote:
>>>
>>> Hi,
>>>  I created a table and inserted 1.1 TB data, then I tried to drop the
>>> table
>>> and data. I understand that current hbase has a bug,
>>
>>
>> Yeah, enable/disable is kinda flakey in 0.20.x hbase.
>>
>>> so I renamed the table to a temp name. and then dropped the temp table.
>>>  However, I found that some of the regions are not renamed to the temp
>>> table
>>> and they are still with the original table name.
>>> I tried to physically remove the HFS dir, and restart master and
>>> regionserver, delete it from HDFS again , and restart master/regionserver
>>> again. the regions just continue to hang there. and I can't create a
>>> table
>>> with the original name.
>>
>> Yes.  The table has mention in the .META. table.  You'll need to
>> delete the entries here as well as remove the table dir from hdfs.
>>
>> Or, just keep trying to disable.  It usually succeeds eventually.
>> Then do drop table.
>>
>>>  Is there any way I can remove the dangling region and then I can create
>>> a
>>> brand new table with original name ?
>>>
>>
>> Go through meta and delete each row which has the tablename for a prefix.
>>
>> Or shutdown hbase, remove the hbase.rootdir and restart.
>>
>> St.Ack
>>
>

Re: how to remove dangling region and table?

Posted by Jinsong Hu <ji...@hotmail.com>.
how do I delete the  entries from .META. ? Is there a command or utility 
that I can use ?

the table doesn't even appear in "list" command any more.
if I try to run disable command, the shell says the table doesn't exist.
if I try to a table with that name, the shell says the table already exists.

Jimmy.

--------------------------------------------------
From: "Stack" <st...@duboce.net>
Sent: Wednesday, September 08, 2010 8:54 PM
To: <us...@hbase.apache.org>
Subject: Re: how to remove dangling region and table?

> On Wed, Sep 8, 2010 at 6:05 PM, Jinsong Hu <ji...@hotmail.com> wrote:
>> Hi,
>>  I created a table and inserted 1.1 TB data, then I tried to drop the 
>> table
>> and data. I understand that current hbase has a bug,
>
>
> Yeah, enable/disable is kinda flakey in 0.20.x hbase.
>
>> so I renamed the table to a temp name. and then dropped the temp table.
>>  However, I found that some of the regions are not renamed to the temp 
>> table
>> and they are still with the original table name.
>> I tried to physically remove the HFS dir, and restart master and
>> regionserver, delete it from HDFS again , and restart master/regionserver
>> again. the regions just continue to hang there. and I can't create a 
>> table
>> with the original name.
>
> Yes.  The table has mention in the .META. table.  You'll need to
> delete the entries here as well as remove the table dir from hdfs.
>
> Or, just keep trying to disable.  It usually succeeds eventually.
> Then do drop table.
>
>>  Is there any way I can remove the dangling region and then I can create 
>> a
>> brand new table with original name ?
>>
>
> Go through meta and delete each row which has the tablename for a prefix.
>
> Or shutdown hbase, remove the hbase.rootdir and restart.
>
> St.Ack
> 

Re: how to remove dangling region and table?

Posted by Stack <st...@duboce.net>.
On Wed, Sep 8, 2010 at 6:05 PM, Jinsong Hu <ji...@hotmail.com> wrote:
> Hi,
>  I created a table and inserted 1.1 TB data, then I tried to drop the table
> and data. I understand that current hbase has a bug,


Yeah, enable/disable is kinda flakey in 0.20.x hbase.

> so I renamed the table to a temp name. and then dropped the temp table.
>  However, I found that some of the regions are not renamed to the temp table
> and they are still with the original table name.
> I tried to physically remove the HFS dir, and restart master and
> regionserver, delete it from HDFS again , and restart master/regionserver
> again. the regions just continue to hang there. and I can't create a table
> with the original name.

Yes.  The table has mention in the .META. table.  You'll need to
delete the entries here as well as remove the table dir from hdfs.

Or, just keep trying to disable.  It usually succeeds eventually.
Then do drop table.

>  Is there any way I can remove the dangling region and then I can create a
> brand new table with original name ?
>

Go through meta and delete each row which has the tablename for a prefix.

Or shutdown hbase, remove the hbase.rootdir and restart.

St.Ack

how to remove dangling region and table?

Posted by Jinsong Hu <ji...@hotmail.com>.
Hi,
  I created a table and inserted 1.1 TB data, then I tried to drop the table 
and data. I understand that current hbase has a bug,
so I renamed the table to a temp name. and then dropped the temp table.
  However, I found that some of the regions are not renamed to the temp 
table and they are still with the original table name.
I tried to physically remove the HFS dir, and restart master and 
regionserver, delete it from HDFS again , and restart master/regionserver
again. the regions just continue to hang there. and I can't create a table 
with the original name.
  Is there any way I can remove the dangling region and then I can create a 
brand new table with original name ?

Jimmy. 


Re: Limits on HBase

Posted by William Kang <we...@gmail.com>.
So, basically, there is no limit for row as long as a single row does not go
beyond the region server's storage capacity?
And why the cell size should not be larger than 20M? Dose the data block in
HFile store cells or whole rows?
If the data block stores the cells (qualifiers and values), where is the key
point to the row in the HFile file structure or is a HFile is just for a
row? I seem cannot find a direct answer to this questions from
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html.

On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray <jg...@facebook.com> wrote:

> You can go way beyond the max region split / split size.  HBase will never
> split the region once it is a single row, even if beyond the split size.
>
> Also, if you're using large values, you should have region sizes much
> larger than the default.  It's common to run with 1-2GB regions in many
> cases.
>
> What you may have seen are recommendations that if your cell values are
> approaching the default block size on HDFS (64MB), you should consider
> putting the data directly into HDFS rather than HBase.
>
> JG
>
> > -----Original Message-----
> > From: William Kang [mailto:weliam.cloud@gmail.com]
> > Sent: Tuesday, September 07, 2010 7:36 PM
> > To: user@hbase.apache.org; apurtell@apache.org
> > Subject: Re: Limits on HBase
> >
> > Hi,
> > Thanks for your reply. How about the row size? I read that a row should
> > not
> > be larger than the hdfs file on region server which is 256M in default.
> > Is
> > it right? Many thanks.
> >
> >
> > William
> >
> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > In addition to what Jon said please be aware that if compression is
> > > specified in the table schema, it happens at the store file level --
> > > compression happens after write I/O, before read I/O, so if you
> > transmit a
> > > 100MB object that compresses to 30MB, the performance impact is that
> > of
> > > 100MB, not 30MB.
> > >
> > > I also try not to go above 50MB as largest cell size, for the same
> > reason.
> > > I have tried storing objects larger than 100MB but this can cause out
> > of
> > > memory issues on busy regionservers no matter the size of the heap.
> > When/if
> > > HBase RPC can send large objects in smaller chunks, this will be less
> > of an
> > > issue.
> > >
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Why is this email five sentences or less?
> > > http://five.sentenc.es/
> > >
> > >
> > > --- On Mon, 9/6/10, Jonathan Gray <jg...@facebook.com> wrote:
> > >
> > > > From: Jonathan Gray <jg...@facebook.com>
> > > > Subject: RE: Limits on HBase
> > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > > Date: Monday, September 6, 2010, 4:10 PM
> > > > I'm not sure what you mean by
> > > > "optimized cell size" or whether you're just asking about
> > > > practical limits?
> > > >
> > > > HBase is generally used with cells in the range of tens of
> > > > bytes to hundreds of kilobytes.  However, I have used
> > > > it with cells that are several megabytes, up to about
> > > > 50MB.  Up at that level, I have seen some weird
> > > > performance issues.
> > > >
> > > > The most important thing is to be sure to tweak all of your
> > > > settings.  If you have 20MB cells, you need to be sure
> > > > to increase the flush size beyond 64MB and the split size
> > > > beyond 256MB.  You also need enough memory to support
> > > > all this large object allocation.
> > > >
> > > > And of course, test test test.  That's the easiest way
> > > > to see if what you want to do will work :)
> > > >
> > > > When you run into problems, e-mail the list.
> > > >
> > > > As far as row size is concerned, the only issue is that a
> > > > row can never span multiple regions so a given row can only
> > > > be in one region and thus be hosted on one server at a
> > > > time.
> > > >
> > > > JG
> > > >
> > > > > -----Original Message-----
> > > > > From: William Kang [mailto:weliam.cloud@gmail.com]
> > > > > Sent: Monday, September 06, 2010 1:57 PM
> > > > > To: hbase-user
> > > > > Subject: Limits on HBase
> > > > >
> > > > > Hi folks,
> > > > > I know this question may have been asked many times,
> > > > but I am wondering
> > > > > if
> > > > > there is any update on the optimized cell size (in
> > > > megabytes) and row
> > > > > size
> > > > > (in megabytes)? Many thanks.
> > > > >
> > > > >
> > > > > William
> > > >
> > >
> > >
> > >
> > >
> > >
>

Re: Limits on HBase

Posted by Ryan Rawson <ry...@gmail.com>.
There are 2 definitions of random access:
1) within a file (hdfs can be less than ideal)
2) randomly getting an entire file (not usually considered random gets)

for the latter, streaming an entire file from HDFS is actually pretty
good.  You can see performances of substantial percentages (think
80%+) of the raw disk perf.  I benched hdfs and got 90MB/sec last year
some time just writing raw files.

-ryan


On Tue, Sep 7, 2010 at 9:07 PM, William Kang <we...@gmail.com> wrote:
> Hi,
> What's the performance looks like if we put large cell in HDFS vs local file
> system? Random access to HDFS would be slow, right?
>
>
> William
>
> On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray <jg...@facebook.com> wrote:
>
>> You can go way beyond the max region split / split size.  HBase will never
>> split the region once it is a single row, even if beyond the split size.
>>
>> Also, if you're using large values, you should have region sizes much
>> larger than the default.  It's common to run with 1-2GB regions in many
>> cases.
>>
>> What you may have seen are recommendations that if your cell values are
>> approaching the default block size on HDFS (64MB), you should consider
>> putting the data directly into HDFS rather than HBase.
>>
>> JG
>>
>> > -----Original Message-----
>> > From: William Kang [mailto:weliam.cloud@gmail.com]
>> > Sent: Tuesday, September 07, 2010 7:36 PM
>> > To: user@hbase.apache.org; apurtell@apache.org
>> > Subject: Re: Limits on HBase
>> >
>> > Hi,
>> > Thanks for your reply. How about the row size? I read that a row should
>> > not
>> > be larger than the hdfs file on region server which is 256M in default.
>> > Is
>> > it right? Many thanks.
>> >
>> >
>> > William
>> >
>> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <ap...@apache.org>
>> > wrote:
>> >
>> > > In addition to what Jon said please be aware that if compression is
>> > > specified in the table schema, it happens at the store file level --
>> > > compression happens after write I/O, before read I/O, so if you
>> > transmit a
>> > > 100MB object that compresses to 30MB, the performance impact is that
>> > of
>> > > 100MB, not 30MB.
>> > >
>> > > I also try not to go above 50MB as largest cell size, for the same
>> > reason.
>> > > I have tried storing objects larger than 100MB but this can cause out
>> > of
>> > > memory issues on busy regionservers no matter the size of the heap.
>> > When/if
>> > > HBase RPC can send large objects in smaller chunks, this will be less
>> > of an
>> > > issue.
>> > >
>> > > Best regards,
>> > >
>> > >    - Andy
>> > >
>> > > Why is this email five sentences or less?
>> > > http://five.sentenc.es/
>> > >
>> > >
>> > > --- On Mon, 9/6/10, Jonathan Gray <jg...@facebook.com> wrote:
>> > >
>> > > > From: Jonathan Gray <jg...@facebook.com>
>> > > > Subject: RE: Limits on HBase
>> > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
>> > > > Date: Monday, September 6, 2010, 4:10 PM
>> > > > I'm not sure what you mean by
>> > > > "optimized cell size" or whether you're just asking about
>> > > > practical limits?
>> > > >
>> > > > HBase is generally used with cells in the range of tens of
>> > > > bytes to hundreds of kilobytes.  However, I have used
>> > > > it with cells that are several megabytes, up to about
>> > > > 50MB.  Up at that level, I have seen some weird
>> > > > performance issues.
>> > > >
>> > > > The most important thing is to be sure to tweak all of your
>> > > > settings.  If you have 20MB cells, you need to be sure
>> > > > to increase the flush size beyond 64MB and the split size
>> > > > beyond 256MB.  You also need enough memory to support
>> > > > all this large object allocation.
>> > > >
>> > > > And of course, test test test.  That's the easiest way
>> > > > to see if what you want to do will work :)
>> > > >
>> > > > When you run into problems, e-mail the list.
>> > > >
>> > > > As far as row size is concerned, the only issue is that a
>> > > > row can never span multiple regions so a given row can only
>> > > > be in one region and thus be hosted on one server at a
>> > > > time.
>> > > >
>> > > > JG
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: William Kang [mailto:weliam.cloud@gmail.com]
>> > > > > Sent: Monday, September 06, 2010 1:57 PM
>> > > > > To: hbase-user
>> > > > > Subject: Limits on HBase
>> > > > >
>> > > > > Hi folks,
>> > > > > I know this question may have been asked many times,
>> > > > but I am wondering
>> > > > > if
>> > > > > there is any update on the optimized cell size (in
>> > > > megabytes) and row
>> > > > > size
>> > > > > (in megabytes)? Many thanks.
>> > > > >
>> > > > >
>> > > > > William
>> > > >
>> > >
>> > >
>> > >
>> > >
>> > >
>>
>

Re: Limits on HBase

Posted by William Kang <we...@gmail.com>.
Hi,
What's the performance looks like if we put large cell in HDFS vs local file
system? Random access to HDFS would be slow, right?


William

On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray <jg...@facebook.com> wrote:

> You can go way beyond the max region split / split size.  HBase will never
> split the region once it is a single row, even if beyond the split size.
>
> Also, if you're using large values, you should have region sizes much
> larger than the default.  It's common to run with 1-2GB regions in many
> cases.
>
> What you may have seen are recommendations that if your cell values are
> approaching the default block size on HDFS (64MB), you should consider
> putting the data directly into HDFS rather than HBase.
>
> JG
>
> > -----Original Message-----
> > From: William Kang [mailto:weliam.cloud@gmail.com]
> > Sent: Tuesday, September 07, 2010 7:36 PM
> > To: user@hbase.apache.org; apurtell@apache.org
> > Subject: Re: Limits on HBase
> >
> > Hi,
> > Thanks for your reply. How about the row size? I read that a row should
> > not
> > be larger than the hdfs file on region server which is 256M in default.
> > Is
> > it right? Many thanks.
> >
> >
> > William
> >
> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > In addition to what Jon said please be aware that if compression is
> > > specified in the table schema, it happens at the store file level --
> > > compression happens after write I/O, before read I/O, so if you
> > transmit a
> > > 100MB object that compresses to 30MB, the performance impact is that
> > of
> > > 100MB, not 30MB.
> > >
> > > I also try not to go above 50MB as largest cell size, for the same
> > reason.
> > > I have tried storing objects larger than 100MB but this can cause out
> > of
> > > memory issues on busy regionservers no matter the size of the heap.
> > When/if
> > > HBase RPC can send large objects in smaller chunks, this will be less
> > of an
> > > issue.
> > >
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Why is this email five sentences or less?
> > > http://five.sentenc.es/
> > >
> > >
> > > --- On Mon, 9/6/10, Jonathan Gray <jg...@facebook.com> wrote:
> > >
> > > > From: Jonathan Gray <jg...@facebook.com>
> > > > Subject: RE: Limits on HBase
> > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > > Date: Monday, September 6, 2010, 4:10 PM
> > > > I'm not sure what you mean by
> > > > "optimized cell size" or whether you're just asking about
> > > > practical limits?
> > > >
> > > > HBase is generally used with cells in the range of tens of
> > > > bytes to hundreds of kilobytes.  However, I have used
> > > > it with cells that are several megabytes, up to about
> > > > 50MB.  Up at that level, I have seen some weird
> > > > performance issues.
> > > >
> > > > The most important thing is to be sure to tweak all of your
> > > > settings.  If you have 20MB cells, you need to be sure
> > > > to increase the flush size beyond 64MB and the split size
> > > > beyond 256MB.  You also need enough memory to support
> > > > all this large object allocation.
> > > >
> > > > And of course, test test test.  That's the easiest way
> > > > to see if what you want to do will work :)
> > > >
> > > > When you run into problems, e-mail the list.
> > > >
> > > > As far as row size is concerned, the only issue is that a
> > > > row can never span multiple regions so a given row can only
> > > > be in one region and thus be hosted on one server at a
> > > > time.
> > > >
> > > > JG
> > > >
> > > > > -----Original Message-----
> > > > > From: William Kang [mailto:weliam.cloud@gmail.com]
> > > > > Sent: Monday, September 06, 2010 1:57 PM
> > > > > To: hbase-user
> > > > > Subject: Limits on HBase
> > > > >
> > > > > Hi folks,
> > > > > I know this question may have been asked many times,
> > > > but I am wondering
> > > > > if
> > > > > there is any update on the optimized cell size (in
> > > > megabytes) and row
> > > > > size
> > > > > (in megabytes)? Many thanks.
> > > > >
> > > > >
> > > > > William
> > > >
> > >
> > >
> > >
> > >
> > >
>

Re: Limits on HBase

Posted by Sean Bigdatafun <se...@gmail.com>.
On Thu, Oct 14, 2010 at 6:41 PM, Ryan Rawson <ry...@gmail.com> wrote:

> If you have a single row that approaches then exceeds the size of a
> region, eventually you will end up having that row as a single region,
> with the region encompassing only that one region.
>
> The reason for HBase and bigtable is that the overhead that HDFS
> has... every file in HDFS uses a size of RAM that is not dependent on
> the size of the file.  Meaning the more files you have, that are
> small, you use more and more RAM and run out of namenode scalability.
> So HBase exists to store smaller values. There is some overhead. Thus
> once you start putting in larger values, you might as well avoid the
> overhead and go straight to/from HDFS.


While, for the scenario that I listed above: millions of small key-value
pairs that end up with exceed 256MB, storing these key-value pairs directly
into a file in HDFS would not be an option. If we do so, we end up scan
throught the whole file; and if we store them into HBase, we are going to
leverage the information of the index.






>
> -ryan
>
>
> On Thu, Oct 14, 2010 at 5:23 PM, Sean Bigdatafun
> <se...@gmail.com> wrote:
> > Let me ask this question from another angle:
> >
> > The first question is ---
> > if I have millions of column in a column family in the same row, such
> that
> > the sum of the key-value pairs exceeds 256MB, what will happen?
> >
> > example:
> > I have a column with key of 256bytes, and the value of 2K, then let's
> assume
> > (256 + timestampe size + 2056) ~=2.5k,
> > then I understand I can at most story 256 * 1024 / 2.5 = 104,875 columns
> in
> > this column family at this row.
> >
> > Anyone has comments on the math I gave above?
> >
> >
> > The second question is --
> > By the way, if I do not turn on the LZO, is my data also compressed (by
> the
> > system)? -- if so, then the above number will increase a couple of times,
> > but still there exists a number for the limit of how many columns I can
> put
> > in a row.
> >
> > The third question is --
> > If I do turn on LZO, does that mean the value get compressed first, and
> then
> > the HBase mechanism further compress the key-value pair?
> >
> > Thanks,
> > Sean
> >
> >
> > On Tue, Sep 7, 2010 at 8:30 PM, Jonathan Gray <jg...@facebook.com>
> wrote:
> >
> >> You can go way beyond the max region split / split size.  HBase will
> never
> >> split the region once it is a single row, even if beyond the split size.
> >>
> >> Also, if you're using large values, you should have region sizes much
> >> larger than the default.  It's common to run with 1-2GB regions in many
> >> cases.
> >>
> >> What you may have seen are recommendations that if your cell values are
> >> approaching the default block size on HDFS (64MB), you should consider
> >> putting the data directly into HDFS rather than HBase.
> >>
> >> JG
> >>
> >> > -----Original Message-----
> >> > From: William Kang [mailto:weliam.cloud@gmail.com]
> >>  > Sent: Tuesday, September 07, 2010 7:36 PM
> >> > To: user@hbase.apache.org; apurtell@apache.org
> >> > Subject: Re: Limits on HBase
> >> >
> >> > Hi,
> >> > Thanks for your reply. How about the row size? I read that a row
> should
> >> > not
> >> > be larger than the hdfs file on region server which is 256M in
> default.
> >> > Is
> >> > it right? Many thanks.
> >> >
> >> >
> >> > William
> >> >
> >> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <ap...@apache.org>
> >> > wrote:
> >> >
> >> > > In addition to what Jon said please be aware that if compression is
> >> > > specified in the table schema, it happens at the store file level --
> >> > > compression happens after write I/O, before read I/O, so if you
> >> > transmit a
> >> > > 100MB object that compresses to 30MB, the performance impact is that
> >> > of
> >> > > 100MB, not 30MB.
> >> > >
> >> > > I also try not to go above 50MB as largest cell size, for the same
> >> > reason.
> >> > > I have tried storing objects larger than 100MB but this can cause
> out
> >> > of
> >> > > memory issues on busy regionservers no matter the size of the heap.
> >> > When/if
> >> > > HBase RPC can send large objects in smaller chunks, this will be
> less
> >> > of an
> >> > > issue.
> >> > >
> >> > > Best regards,
> >> > >
> >> > >    - Andy
> >> > >
> >> > > Why is this email five sentences or less?
> >> > > http://five.sentenc.es/
> >> > >
> >> > >
> >> > > --- On Mon, 9/6/10, Jonathan Gray <jg...@facebook.com> wrote:
> >> > >
> >> > > > From: Jonathan Gray <jg...@facebook.com>
> >> > > > Subject: RE: Limits on HBase
> >> > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> >> > > > Date: Monday, September 6, 2010, 4:10 PM
> >> > > > I'm not sure what you mean by
> >> > > > "optimized cell size" or whether you're just asking about
> >> > > > practical limits?
> >> > > >
> >> > > > HBase is generally used with cells in the range of tens of
> >> > > > bytes to hundreds of kilobytes.  However, I have used
> >> > > > it with cells that are several megabytes, up to about
> >> > > > 50MB.  Up at that level, I have seen some weird
> >> > > > performance issues.
> >> > > >
> >> > > > The most important thing is to be sure to tweak all of your
> >> > > > settings.  If you have 20MB cells, you need to be sure
> >> > > > to increase the flush size beyond 64MB and the split size
> >> > > > beyond 256MB.  You also need enough memory to support
> >> > > > all this large object allocation.
> >> > > >
> >> > > > And of course, test test test.  That's the easiest way
> >> > > > to see if what you want to do will work :)
> >> > > >
> >> > > > When you run into problems, e-mail the list.
> >> > > >
> >> > > > As far as row size is concerned, the only issue is that a
> >> > > > row can never span multiple regions so a given row can only
> >> > > > be in one region and thus be hosted on one server at a
> >> > > > time.
> >> > > >
> >> > > > JG
> >> > > >
> >> > > > > -----Original Message-----
> >> > > > > From: William Kang [mailto:weliam.cloud@gmail.com]
> >> > > > > Sent: Monday, September 06, 2010 1:57 PM
> >> > > > > To: hbase-user
> >> > > > > Subject: Limits on HBase
> >> > > > >
> >> > > > > Hi folks,
> >> > > > > I know this question may have been asked many times,
> >> > > > but I am wondering
> >> > > > > if
> >> > > > > there is any update on the optimized cell size (in
> >> > > > megabytes) and row
> >> > > > > size
> >> > > > > (in megabytes)? Many thanks.
> >> > > > >
> >> > > > >
> >> > > > > William
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >>
> >
>

Re: Limits on HBase

Posted by Ryan Rawson <ry...@gmail.com>.
If you have a single row that approaches then exceeds the size of a
region, eventually you will end up having that row as a single region,
with the region encompassing only that one region.

The reason for HBase and bigtable is that the overhead that HDFS
has... every file in HDFS uses a size of RAM that is not dependent on
the size of the file.  Meaning the more files you have, that are
small, you use more and more RAM and run out of namenode scalability.
So HBase exists to store smaller values. There is some overhead. Thus
once you start putting in larger values, you might as well avoid the
overhead and go straight to/from HDFS.

-ryan


On Thu, Oct 14, 2010 at 5:23 PM, Sean Bigdatafun
<se...@gmail.com> wrote:
> Let me ask this question from another angle:
>
> The first question is ---
> if I have millions of column in a column family in the same row, such that
> the sum of the key-value pairs exceeds 256MB, what will happen?
>
> example:
> I have a column with key of 256bytes, and the value of 2K, then let's assume
> (256 + timestampe size + 2056) ~=2.5k,
> then I understand I can at most story 256 * 1024 / 2.5 = 104,875 columns in
> this column family at this row.
>
> Anyone has comments on the math I gave above?
>
>
> The second question is --
> By the way, if I do not turn on the LZO, is my data also compressed (by the
> system)? -- if so, then the above number will increase a couple of times,
> but still there exists a number for the limit of how many columns I can put
> in a row.
>
> The third question is --
> If I do turn on LZO, does that mean the value get compressed first, and then
> the HBase mechanism further compress the key-value pair?
>
> Thanks,
> Sean
>
>
> On Tue, Sep 7, 2010 at 8:30 PM, Jonathan Gray <jg...@facebook.com> wrote:
>
>> You can go way beyond the max region split / split size.  HBase will never
>> split the region once it is a single row, even if beyond the split size.
>>
>> Also, if you're using large values, you should have region sizes much
>> larger than the default.  It's common to run with 1-2GB regions in many
>> cases.
>>
>> What you may have seen are recommendations that if your cell values are
>> approaching the default block size on HDFS (64MB), you should consider
>> putting the data directly into HDFS rather than HBase.
>>
>> JG
>>
>> > -----Original Message-----
>> > From: William Kang [mailto:weliam.cloud@gmail.com]
>>  > Sent: Tuesday, September 07, 2010 7:36 PM
>> > To: user@hbase.apache.org; apurtell@apache.org
>> > Subject: Re: Limits on HBase
>> >
>> > Hi,
>> > Thanks for your reply. How about the row size? I read that a row should
>> > not
>> > be larger than the hdfs file on region server which is 256M in default.
>> > Is
>> > it right? Many thanks.
>> >
>> >
>> > William
>> >
>> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <ap...@apache.org>
>> > wrote:
>> >
>> > > In addition to what Jon said please be aware that if compression is
>> > > specified in the table schema, it happens at the store file level --
>> > > compression happens after write I/O, before read I/O, so if you
>> > transmit a
>> > > 100MB object that compresses to 30MB, the performance impact is that
>> > of
>> > > 100MB, not 30MB.
>> > >
>> > > I also try not to go above 50MB as largest cell size, for the same
>> > reason.
>> > > I have tried storing objects larger than 100MB but this can cause out
>> > of
>> > > memory issues on busy regionservers no matter the size of the heap.
>> > When/if
>> > > HBase RPC can send large objects in smaller chunks, this will be less
>> > of an
>> > > issue.
>> > >
>> > > Best regards,
>> > >
>> > >    - Andy
>> > >
>> > > Why is this email five sentences or less?
>> > > http://five.sentenc.es/
>> > >
>> > >
>> > > --- On Mon, 9/6/10, Jonathan Gray <jg...@facebook.com> wrote:
>> > >
>> > > > From: Jonathan Gray <jg...@facebook.com>
>> > > > Subject: RE: Limits on HBase
>> > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
>> > > > Date: Monday, September 6, 2010, 4:10 PM
>> > > > I'm not sure what you mean by
>> > > > "optimized cell size" or whether you're just asking about
>> > > > practical limits?
>> > > >
>> > > > HBase is generally used with cells in the range of tens of
>> > > > bytes to hundreds of kilobytes.  However, I have used
>> > > > it with cells that are several megabytes, up to about
>> > > > 50MB.  Up at that level, I have seen some weird
>> > > > performance issues.
>> > > >
>> > > > The most important thing is to be sure to tweak all of your
>> > > > settings.  If you have 20MB cells, you need to be sure
>> > > > to increase the flush size beyond 64MB and the split size
>> > > > beyond 256MB.  You also need enough memory to support
>> > > > all this large object allocation.
>> > > >
>> > > > And of course, test test test.  That's the easiest way
>> > > > to see if what you want to do will work :)
>> > > >
>> > > > When you run into problems, e-mail the list.
>> > > >
>> > > > As far as row size is concerned, the only issue is that a
>> > > > row can never span multiple regions so a given row can only
>> > > > be in one region and thus be hosted on one server at a
>> > > > time.
>> > > >
>> > > > JG
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: William Kang [mailto:weliam.cloud@gmail.com]
>> > > > > Sent: Monday, September 06, 2010 1:57 PM
>> > > > > To: hbase-user
>> > > > > Subject: Limits on HBase
>> > > > >
>> > > > > Hi folks,
>> > > > > I know this question may have been asked many times,
>> > > > but I am wondering
>> > > > > if
>> > > > > there is any update on the optimized cell size (in
>> > > > megabytes) and row
>> > > > > size
>> > > > > (in megabytes)? Many thanks.
>> > > > >
>> > > > >
>> > > > > William
>> > > >
>> > >
>> > >
>> > >
>> > >
>> > >
>>
>

Re: Limits on HBase

Posted by Sean Bigdatafun <se...@gmail.com>.
Let me ask this question from another angle:

The first question is ---
if I have millions of column in a column family in the same row, such that
the sum of the key-value pairs exceeds 256MB, what will happen?

example:
I have a column with key of 256bytes, and the value of 2K, then let's assume
(256 + timestampe size + 2056) ~=2.5k,
then I understand I can at most story 256 * 1024 / 2.5 = 104,875 columns in
this column family at this row.

Anyone has comments on the math I gave above?


The second question is --
By the way, if I do not turn on the LZO, is my data also compressed (by the
system)? -- if so, then the above number will increase a couple of times,
but still there exists a number for the limit of how many columns I can put
in a row.

The third question is --
If I do turn on LZO, does that mean the value get compressed first, and then
the HBase mechanism further compress the key-value pair?

Thanks,
Sean


On Tue, Sep 7, 2010 at 8:30 PM, Jonathan Gray <jg...@facebook.com> wrote:

> You can go way beyond the max region split / split size.  HBase will never
> split the region once it is a single row, even if beyond the split size.
>
> Also, if you're using large values, you should have region sizes much
> larger than the default.  It's common to run with 1-2GB regions in many
> cases.
>
> What you may have seen are recommendations that if your cell values are
> approaching the default block size on HDFS (64MB), you should consider
> putting the data directly into HDFS rather than HBase.
>
> JG
>
> > -----Original Message-----
> > From: William Kang [mailto:weliam.cloud@gmail.com]
>  > Sent: Tuesday, September 07, 2010 7:36 PM
> > To: user@hbase.apache.org; apurtell@apache.org
> > Subject: Re: Limits on HBase
> >
> > Hi,
> > Thanks for your reply. How about the row size? I read that a row should
> > not
> > be larger than the hdfs file on region server which is 256M in default.
> > Is
> > it right? Many thanks.
> >
> >
> > William
> >
> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > In addition to what Jon said please be aware that if compression is
> > > specified in the table schema, it happens at the store file level --
> > > compression happens after write I/O, before read I/O, so if you
> > transmit a
> > > 100MB object that compresses to 30MB, the performance impact is that
> > of
> > > 100MB, not 30MB.
> > >
> > > I also try not to go above 50MB as largest cell size, for the same
> > reason.
> > > I have tried storing objects larger than 100MB but this can cause out
> > of
> > > memory issues on busy regionservers no matter the size of the heap.
> > When/if
> > > HBase RPC can send large objects in smaller chunks, this will be less
> > of an
> > > issue.
> > >
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Why is this email five sentences or less?
> > > http://five.sentenc.es/
> > >
> > >
> > > --- On Mon, 9/6/10, Jonathan Gray <jg...@facebook.com> wrote:
> > >
> > > > From: Jonathan Gray <jg...@facebook.com>
> > > > Subject: RE: Limits on HBase
> > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > > Date: Monday, September 6, 2010, 4:10 PM
> > > > I'm not sure what you mean by
> > > > "optimized cell size" or whether you're just asking about
> > > > practical limits?
> > > >
> > > > HBase is generally used with cells in the range of tens of
> > > > bytes to hundreds of kilobytes.  However, I have used
> > > > it with cells that are several megabytes, up to about
> > > > 50MB.  Up at that level, I have seen some weird
> > > > performance issues.
> > > >
> > > > The most important thing is to be sure to tweak all of your
> > > > settings.  If you have 20MB cells, you need to be sure
> > > > to increase the flush size beyond 64MB and the split size
> > > > beyond 256MB.  You also need enough memory to support
> > > > all this large object allocation.
> > > >
> > > > And of course, test test test.  That's the easiest way
> > > > to see if what you want to do will work :)
> > > >
> > > > When you run into problems, e-mail the list.
> > > >
> > > > As far as row size is concerned, the only issue is that a
> > > > row can never span multiple regions so a given row can only
> > > > be in one region and thus be hosted on one server at a
> > > > time.
> > > >
> > > > JG
> > > >
> > > > > -----Original Message-----
> > > > > From: William Kang [mailto:weliam.cloud@gmail.com]
> > > > > Sent: Monday, September 06, 2010 1:57 PM
> > > > > To: hbase-user
> > > > > Subject: Limits on HBase
> > > > >
> > > > > Hi folks,
> > > > > I know this question may have been asked many times,
> > > > but I am wondering
> > > > > if
> > > > > there is any update on the optimized cell size (in
> > > > megabytes) and row
> > > > > size
> > > > > (in megabytes)? Many thanks.
> > > > >
> > > > >
> > > > > William
> > > >
> > >
> > >
> > >
> > >
> > >
>

RE: Limits on HBase

Posted by Jonathan Gray <jg...@facebook.com>.
You can go way beyond the max region split / split size.  HBase will never split the region once it is a single row, even if beyond the split size.

Also, if you're using large values, you should have region sizes much larger than the default.  It's common to run with 1-2GB regions in many cases.

What you may have seen are recommendations that if your cell values are approaching the default block size on HDFS (64MB), you should consider putting the data directly into HDFS rather than HBase.

JG

> -----Original Message-----
> From: William Kang [mailto:weliam.cloud@gmail.com]
> Sent: Tuesday, September 07, 2010 7:36 PM
> To: user@hbase.apache.org; apurtell@apache.org
> Subject: Re: Limits on HBase
> 
> Hi,
> Thanks for your reply. How about the row size? I read that a row should
> not
> be larger than the hdfs file on region server which is 256M in default.
> Is
> it right? Many thanks.
> 
> 
> William
> 
> On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <ap...@apache.org>
> wrote:
> 
> > In addition to what Jon said please be aware that if compression is
> > specified in the table schema, it happens at the store file level --
> > compression happens after write I/O, before read I/O, so if you
> transmit a
> > 100MB object that compresses to 30MB, the performance impact is that
> of
> > 100MB, not 30MB.
> >
> > I also try not to go above 50MB as largest cell size, for the same
> reason.
> > I have tried storing objects larger than 100MB but this can cause out
> of
> > memory issues on busy regionservers no matter the size of the heap.
> When/if
> > HBase RPC can send large objects in smaller chunks, this will be less
> of an
> > issue.
> >
> > Best regards,
> >
> >    - Andy
> >
> > Why is this email five sentences or less?
> > http://five.sentenc.es/
> >
> >
> > --- On Mon, 9/6/10, Jonathan Gray <jg...@facebook.com> wrote:
> >
> > > From: Jonathan Gray <jg...@facebook.com>
> > > Subject: RE: Limits on HBase
> > > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > > Date: Monday, September 6, 2010, 4:10 PM
> > > I'm not sure what you mean by
> > > "optimized cell size" or whether you're just asking about
> > > practical limits?
> > >
> > > HBase is generally used with cells in the range of tens of
> > > bytes to hundreds of kilobytes.  However, I have used
> > > it with cells that are several megabytes, up to about
> > > 50MB.  Up at that level, I have seen some weird
> > > performance issues.
> > >
> > > The most important thing is to be sure to tweak all of your
> > > settings.  If you have 20MB cells, you need to be sure
> > > to increase the flush size beyond 64MB and the split size
> > > beyond 256MB.  You also need enough memory to support
> > > all this large object allocation.
> > >
> > > And of course, test test test.  That's the easiest way
> > > to see if what you want to do will work :)
> > >
> > > When you run into problems, e-mail the list.
> > >
> > > As far as row size is concerned, the only issue is that a
> > > row can never span multiple regions so a given row can only
> > > be in one region and thus be hosted on one server at a
> > > time.
> > >
> > > JG
> > >
> > > > -----Original Message-----
> > > > From: William Kang [mailto:weliam.cloud@gmail.com]
> > > > Sent: Monday, September 06, 2010 1:57 PM
> > > > To: hbase-user
> > > > Subject: Limits on HBase
> > > >
> > > > Hi folks,
> > > > I know this question may have been asked many times,
> > > but I am wondering
> > > > if
> > > > there is any update on the optimized cell size (in
> > > megabytes) and row
> > > > size
> > > > (in megabytes)? Many thanks.
> > > >
> > > >
> > > > William
> > >
> >
> >
> >
> >
> >

Re: Limits on HBase

Posted by William Kang <we...@gmail.com>.
Hi,
Thanks for your reply. How about the row size? I read that a row should not
be larger than the hdfs file on region server which is 256M in default. Is
it right? Many thanks.


William

On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <ap...@apache.org> wrote:

> In addition to what Jon said please be aware that if compression is
> specified in the table schema, it happens at the store file level --
> compression happens after write I/O, before read I/O, so if you transmit a
> 100MB object that compresses to 30MB, the performance impact is that of
> 100MB, not 30MB.
>
> I also try not to go above 50MB as largest cell size, for the same reason.
> I have tried storing objects larger than 100MB but this can cause out of
> memory issues on busy regionservers no matter the size of the heap. When/if
> HBase RPC can send large objects in smaller chunks, this will be less of an
> issue.
>
> Best regards,
>
>    - Andy
>
> Why is this email five sentences or less?
> http://five.sentenc.es/
>
>
> --- On Mon, 9/6/10, Jonathan Gray <jg...@facebook.com> wrote:
>
> > From: Jonathan Gray <jg...@facebook.com>
> > Subject: RE: Limits on HBase
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>
> > Date: Monday, September 6, 2010, 4:10 PM
> > I'm not sure what you mean by
> > "optimized cell size" or whether you're just asking about
> > practical limits?
> >
> > HBase is generally used with cells in the range of tens of
> > bytes to hundreds of kilobytes.  However, I have used
> > it with cells that are several megabytes, up to about
> > 50MB.  Up at that level, I have seen some weird
> > performance issues.
> >
> > The most important thing is to be sure to tweak all of your
> > settings.  If you have 20MB cells, you need to be sure
> > to increase the flush size beyond 64MB and the split size
> > beyond 256MB.  You also need enough memory to support
> > all this large object allocation.
> >
> > And of course, test test test.  That's the easiest way
> > to see if what you want to do will work :)
> >
> > When you run into problems, e-mail the list.
> >
> > As far as row size is concerned, the only issue is that a
> > row can never span multiple regions so a given row can only
> > be in one region and thus be hosted on one server at a
> > time.
> >
> > JG
> >
> > > -----Original Message-----
> > > From: William Kang [mailto:weliam.cloud@gmail.com]
> > > Sent: Monday, September 06, 2010 1:57 PM
> > > To: hbase-user
> > > Subject: Limits on HBase
> > >
> > > Hi folks,
> > > I know this question may have been asked many times,
> > but I am wondering
> > > if
> > > there is any update on the optimized cell size (in
> > megabytes) and row
> > > size
> > > (in megabytes)? Many thanks.
> > >
> > >
> > > William
> >
>
>
>
>
>

Re: Limits on HBase

Posted by Himanshu Vashishtha <va...@gmail.com>.
but yes you will not be having different versions of those objects as they
are not stored as such in a table. So, that's the down side. In case your
objects are write once read multi types, I think it should work.

Let's see what others say :)

~Himanshu


On Tue, Sep 7, 2010 at 12:49 AM, Himanshu Vashishtha <vashishtha.h@gmail.com
> wrote:

> Assuming you will be using hdfs as the file system: wouldn't saving those
> large objects in the fs and keeping a pointer to them in a hbase table serve
> the purpose.
>
> [I haven't done it myself but I can't see it not working. In fact, I
> remember reading it somewhere in the list.]
>
> ~Himanshu
>
>
> On Mon, Sep 6, 2010 at 11:40 PM, William Kang <we...@gmail.com>wrote:
>
>> Hi JG,
>> Thanks for your reply. As far as I have read in Hbase's documentation and
>> wiki, the cell size is not supposed to be larger than 10 MB. For the row,
>> I
>> am not quite sure, but it looks like 256 MB is the upper limit. I am
>> considering store some binary data used to be stored in RDBM blob field.
>> The
>> size of those binary objects may vary from hundreds of KB to hundreds of
>> MB.
>> What would be a good way to use Hbase for it? We really want to use hbase
>> to
>> avoid that scaling problem.
>> Many thanks.
>>
>>
>> William
>>
>> On Mon, Sep 6, 2010 at 7:10 PM, Jonathan Gray <jg...@facebook.com> wrote:
>>
>> > I'm not sure what you mean by "optimized cell size" or whether you're
>> just
>> > asking about practical limits?
>> >
>> > HBase is generally used with cells in the range of tens of bytes to
>> > hundreds of kilobytes.  However, I have used it with cells that are
>> several
>> > megabytes, up to about 50MB.  Up at that level, I have seen some weird
>> > performance issues.
>> >
>> > The most important thing is to be sure to tweak all of your settings.
>>  If
>> > you have 20MB cells, you need to be sure to increase the flush size
>> beyond
>> > 64MB and the split size beyond 256MB.  You also need enough memory to
>> > support all this large object allocation.
>> >
>> > And of course, test test test.  That's the easiest way to see if what
>> you
>> > want to do will work :)
>> >
>> > When you run into problems, e-mail the list.
>> >
>> > As far as row size is concerned, the only issue is that a row can never
>> > span multiple regions so a given row can only be in one region and thus
>> be
>> > hosted on one server at a time.
>> >
>> > JG
>> >
>> > > -----Original Message-----
>> > > From: William Kang [mailto:weliam.cloud@gmail.com]
>> > > Sent: Monday, September 06, 2010 1:57 PM
>> > > To: hbase-user
>> > > Subject: Limits on HBase
>> > >
>> > > Hi folks,
>> > > I know this question may have been asked many times, but I am
>> wondering
>> > > if
>> > > there is any update on the optimized cell size (in megabytes) and row
>> > > size
>> > > (in megabytes)? Many thanks.
>> > >
>> > >
>> > > William
>> >
>>
>
>

Re: Limits on HBase

Posted by Himanshu Vashishtha <va...@gmail.com>.
Assuming you will be using hdfs as the file system: wouldn't saving those
large objects in the fs and keeping a pointer to them in a hbase table serve
the purpose.

[I haven't done it myself but I can't see it not working. In fact, I
remember reading it somewhere in the list.]

~Himanshu

On Mon, Sep 6, 2010 at 11:40 PM, William Kang <we...@gmail.com>wrote:

> Hi JG,
> Thanks for your reply. As far as I have read in Hbase's documentation and
> wiki, the cell size is not supposed to be larger than 10 MB. For the row, I
> am not quite sure, but it looks like 256 MB is the upper limit. I am
> considering store some binary data used to be stored in RDBM blob field.
> The
> size of those binary objects may vary from hundreds of KB to hundreds of
> MB.
> What would be a good way to use Hbase for it? We really want to use hbase
> to
> avoid that scaling problem.
> Many thanks.
>
>
> William
>
> On Mon, Sep 6, 2010 at 7:10 PM, Jonathan Gray <jg...@facebook.com> wrote:
>
> > I'm not sure what you mean by "optimized cell size" or whether you're
> just
> > asking about practical limits?
> >
> > HBase is generally used with cells in the range of tens of bytes to
> > hundreds of kilobytes.  However, I have used it with cells that are
> several
> > megabytes, up to about 50MB.  Up at that level, I have seen some weird
> > performance issues.
> >
> > The most important thing is to be sure to tweak all of your settings.  If
> > you have 20MB cells, you need to be sure to increase the flush size
> beyond
> > 64MB and the split size beyond 256MB.  You also need enough memory to
> > support all this large object allocation.
> >
> > And of course, test test test.  That's the easiest way to see if what you
> > want to do will work :)
> >
> > When you run into problems, e-mail the list.
> >
> > As far as row size is concerned, the only issue is that a row can never
> > span multiple regions so a given row can only be in one region and thus
> be
> > hosted on one server at a time.
> >
> > JG
> >
> > > -----Original Message-----
> > > From: William Kang [mailto:weliam.cloud@gmail.com]
> > > Sent: Monday, September 06, 2010 1:57 PM
> > > To: hbase-user
> > > Subject: Limits on HBase
> > >
> > > Hi folks,
> > > I know this question may have been asked many times, but I am wondering
> > > if
> > > there is any update on the optimized cell size (in megabytes) and row
> > > size
> > > (in megabytes)? Many thanks.
> > >
> > >
> > > William
> >
>

Re: Limits on HBase

Posted by William Kang <we...@gmail.com>.
Hi JG,
Thanks for your reply. As far as I have read in Hbase's documentation and
wiki, the cell size is not supposed to be larger than 10 MB. For the row, I
am not quite sure, but it looks like 256 MB is the upper limit. I am
considering store some binary data used to be stored in RDBM blob field. The
size of those binary objects may vary from hundreds of KB to hundreds of MB.
What would be a good way to use Hbase for it? We really want to use hbase to
avoid that scaling problem.
Many thanks.


William

On Mon, Sep 6, 2010 at 7:10 PM, Jonathan Gray <jg...@facebook.com> wrote:

> I'm not sure what you mean by "optimized cell size" or whether you're just
> asking about practical limits?
>
> HBase is generally used with cells in the range of tens of bytes to
> hundreds of kilobytes.  However, I have used it with cells that are several
> megabytes, up to about 50MB.  Up at that level, I have seen some weird
> performance issues.
>
> The most important thing is to be sure to tweak all of your settings.  If
> you have 20MB cells, you need to be sure to increase the flush size beyond
> 64MB and the split size beyond 256MB.  You also need enough memory to
> support all this large object allocation.
>
> And of course, test test test.  That's the easiest way to see if what you
> want to do will work :)
>
> When you run into problems, e-mail the list.
>
> As far as row size is concerned, the only issue is that a row can never
> span multiple regions so a given row can only be in one region and thus be
> hosted on one server at a time.
>
> JG
>
> > -----Original Message-----
> > From: William Kang [mailto:weliam.cloud@gmail.com]
> > Sent: Monday, September 06, 2010 1:57 PM
> > To: hbase-user
> > Subject: Limits on HBase
> >
> > Hi folks,
> > I know this question may have been asked many times, but I am wondering
> > if
> > there is any update on the optimized cell size (in megabytes) and row
> > size
> > (in megabytes)? Many thanks.
> >
> >
> > William
>

RE: Limits on HBase

Posted by Andrew Purtell <ap...@apache.org>.
In addition to what Jon said please be aware that if compression is specified in the table schema, it happens at the store file level -- compression happens after write I/O, before read I/O, so if you transmit a 100MB object that compresses to 30MB, the performance impact is that of 100MB, not 30MB. 

I also try not to go above 50MB as largest cell size, for the same reason. I have tried storing objects larger than 100MB but this can cause out of memory issues on busy regionservers no matter the size of the heap. When/if HBase RPC can send large objects in smaller chunks, this will be less of an issue. 

Best regards,

    - Andy

Why is this email five sentences or less?
http://five.sentenc.es/


--- On Mon, 9/6/10, Jonathan Gray <jg...@facebook.com> wrote:

> From: Jonathan Gray <jg...@facebook.com>
> Subject: RE: Limits on HBase
> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> Date: Monday, September 6, 2010, 4:10 PM
> I'm not sure what you mean by
> "optimized cell size" or whether you're just asking about
> practical limits?
> 
> HBase is generally used with cells in the range of tens of
> bytes to hundreds of kilobytes.  However, I have used
> it with cells that are several megabytes, up to about
> 50MB.  Up at that level, I have seen some weird
> performance issues.
> 
> The most important thing is to be sure to tweak all of your
> settings.  If you have 20MB cells, you need to be sure
> to increase the flush size beyond 64MB and the split size
> beyond 256MB.  You also need enough memory to support
> all this large object allocation.
> 
> And of course, test test test.  That's the easiest way
> to see if what you want to do will work :)
> 
> When you run into problems, e-mail the list.
> 
> As far as row size is concerned, the only issue is that a
> row can never span multiple regions so a given row can only
> be in one region and thus be hosted on one server at a
> time.
> 
> JG
> 
> > -----Original Message-----
> > From: William Kang [mailto:weliam.cloud@gmail.com]
> > Sent: Monday, September 06, 2010 1:57 PM
> > To: hbase-user
> > Subject: Limits on HBase
> > 
> > Hi folks,
> > I know this question may have been asked many times,
> but I am wondering
> > if
> > there is any update on the optimized cell size (in
> megabytes) and row
> > size
> > (in megabytes)? Many thanks.
> > 
> > 
> > William
> 


      


RE: Limits on HBase

Posted by Jonathan Gray <jg...@facebook.com>.
I'm not sure what you mean by "optimized cell size" or whether you're just asking about practical limits?

HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes.  However, I have used it with cells that are several megabytes, up to about 50MB.  Up at that level, I have seen some weird performance issues.

The most important thing is to be sure to tweak all of your settings.  If you have 20MB cells, you need to be sure to increase the flush size beyond 64MB and the split size beyond 256MB.  You also need enough memory to support all this large object allocation.

And of course, test test test.  That's the easiest way to see if what you want to do will work :)

When you run into problems, e-mail the list.

As far as row size is concerned, the only issue is that a row can never span multiple regions so a given row can only be in one region and thus be hosted on one server at a time.

JG

> -----Original Message-----
> From: William Kang [mailto:weliam.cloud@gmail.com]
> Sent: Monday, September 06, 2010 1:57 PM
> To: hbase-user
> Subject: Limits on HBase
> 
> Hi folks,
> I know this question may have been asked many times, but I am wondering
> if
> there is any update on the optimized cell size (in megabytes) and row
> size
> (in megabytes)? Many thanks.
> 
> 
> William