You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jeremy Smith <xp...@gmail.com> on 2011/08/07 20:17:16 UTC

Help with HBase table design needed

I plan on using HTable, and then querying it using Elasticsearch. The problem is that I'm new to both technologies, and it would be great to have some guidance as to how to set up my data models.


The primary table that will be queried against will have potentially hundreds of millions of rows, with each user having a variable amount of data that will be up into the millions. Primarily the data is going to be maybe 30 key/value fields that represent different states, and then 100s of boolean fields.


Most of the querying will be ad hoc realtime queries where I need the boolean fields aggregated into percentages when filtered by date, state conditions, and some arbitrary set of conditions on the booleans. The other common type of query would be simply by date and state conditions, with the booleans aggregated into percentages.


So my basic question is what to do with the boolean fields, on a given row there is likely to only be 20-50 fields set to true out of 100s. But I don't understand the query language yet, so don't know whether I can just have a column for "booleans" with an array of all true booleans, and query against that.


If I do have to create a column for each boolean field, does it make sense that this would be its own column family?




Re: Help with HBase table design needed

Posted by Stack <st...@duboce.net>.
On Sun, Aug 7, 2011 at 11:17 AM, Jeremy Smith <xp...@gmail.com> wrote:
> So my basic question is what to do with the boolean fields, on a given row there is likely to only be 20-50 fields set to true out of 100s. But I don't understand the query language yet, so don't know whether I can just have a column for "booleans" with an array of all true booleans, and query against that.
>

You could do this, all the booleans into a single cell.  How are the
updates done?  You set a bunch of booleans in one go or just one at a
time?  If so, could be expensive pulling across the cell to client
each time to set a single attribute.

>
> If I do have to create a column for each boolean field, does it make sense that this would be its own column family?
>

Probably.

St.Ack