You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2010/03/12 02:21:27 UTC

[jira] Commented: (HBASE-2007) handle overly large column family in one row

    [ https://issues.apache.org/jira/browse/HBASE-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844318#action_12844318 ] 

Jonathan Gray commented on HBASE-2007:
--------------------------------------

I personally don't think it's necessary to support rows spanning regions.  It would be extremely non-trivial to implement (as I see it) for a completely controllable use case.

I think it's fine to have the story that a row can never cross regions, so therefore can never be distributed.  If you need distribution, you have to use more fine grained keys.  I don't think any DHT would allow the value of a key to span servers.

There are other issues related to better seeking and the read path that are non-optimal for gigantic rows that could be tackled and I think are more important.


What is it that actually made the RS crash?  Too many StoreFiles or too many HFile indexes?

Also, how do you envision we detect this kind of issue?  We could block writes to a region once the total region size > 2X max region size or somethin.  But that could create another issue for other users who want to push it to 3-4X.

> handle overly large column family in one row
> --------------------------------------------
>
>                 Key: HBASE-2007
>                 URL: https://issues.apache.org/jira/browse/HBASE-2007
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>             Fix For: 0.20.4, 0.21.0
>
>
> From a team in TM:
> {quote}
> I tried to create the large column in one row and one family. The value of each column has only 10 bytes. When I created the 804226850th column, the region server was crashed. I find that all columns of one row and one family will in the same region. The region server will crash, because this region is too large. And then I want to reboot HBase, after several minutes, the region servers will crash one by one.
> If one row and one family cannot split, then no matter how many machines are in HBase system, the capacity of HBase will be limited by one machine. I want to know whether this problem is a bug. If the column quantity in one row and one family is limited, can you tell me the safe range? 
> {quote}
> Currently a row cannot be split. So an individual row can expand only to some finite limit constrained by the region server capability. 
> I am impressed that a row was able to successfully contain 804,226,849 columns. 
> The HBase storage capability goals are currently "billions of rows, millions of columns, thousands of tables". A test involving hundreds of millions of columns is very challenging. 
> Most important, HBase should not accept input beyond some limit which produces a cascading failure. 
> I think we also do want to have the architectural discussion about rows that must span region servers due to immensity. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.