You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bryan Keller <br...@gmail.com> on 2011/04/07 09:24:02 UTC

Is there a setting to cap row size?

I have a wide table schema for an HBase table, where I model a one-to-many relationship of purchase orders and line items. Each row is a purchase order, and I add columns for each line item. Under normal circumstances I don't expect more than a few thousand columns per row, totalling less than 1mb per row in general.

In one of my stress tests, I was inserting many line items into the same row. Eventually, the row's region server shut down. In the log there was an IOException with an error about the wite-ahead log failing to close, with the subsequent regionserver shutdown.

Once in this state, the only way I could manage to get my system functional again was to wipe the /hbase directory in HDFS and start from scratch.

To avoid having my system susceptible to total data loss due to some bad import data or the like, I'd like to limit the size of the row so it will simply throw an exception if it reaches a certain size (either in bytes or in number of columns). Does such a setting exist?

Re: Is there a setting to cap row size?

Posted by Ryan Rawson <ry...@gmail.com>.
Sounds like you are having a HDFS related problem.  Check those
datanode logs for errors.

As for a setting for max row size, this might not be so easy to do,
since during the Put time we don't actually know anything about the
existing row data. To find that out we'd have to go and read the row
first then make a decision.

There are some sources for how HBase stores on disk, we are also very
similar to the bigtable paper.

-ryan

On Thu, Apr 7, 2011 at 7:24 AM, Bryan Keller <br...@gmail.com> wrote:
> I have a wide table schema for an HBase table, where I model a one-to-many relationship of purchase orders and line items. Each row is a purchase order, and I add columns for each line item. Under normal circumstances I don't expect more than a few thousand columns per row, totalling less than 1mb per row in general.
>
> In one of my stress tests, I was inserting many line items into the same row. Eventually, the row's region server shut down. In the log there was an IOException with an error about the wite-ahead log failing to close, with the subsequent regionserver shutdown.
>
> Once in this state, the only way I could manage to get my system functional again was to wipe the /hbase directory in HDFS and start from scratch.
>
> To avoid having my system susceptible to total data loss due to some bad import data or the like, I'd like to limit the size of the row so it will simply throw an exception if it reaches a certain size (either in bytes or in number of columns). Does such a setting exist?