You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Piyush Goel <pi...@gmail.com> on 2009/07/13 12:31:01 UTC

help needed with base schema

Hi,

I am trying to design a high scale key value storage system. The hbase table
for the same is outlined below:

{
  "userid1" : {
    "update" : {
        t3 : "some update1",
        t2 : "some update2",
        t1 : "some update3"
    },
    "sender" : {
        t3 : "sender3"
        t2 : "sender2"
        t1 : "sender1"
    },

  "userid2" : {
    "update" : {
        t9 : "some update9",
        t6 : "some update534",
        t1 : "some update343"
    },
    "sender" : {
        t9 : "sender3"
        t6 : "sender2"
        t1 : "sender1"
    },


}

The system is going to have around 15-20M users with around 3-4M put write
operations per day (which rules out mysql automatically). The max number of
entries in "update" and "sender" columns  will be around 1000 (around 1
weeks updates)

My queries would be like "For a given userid, return top 20 updates, senders
based on timestamp". Is there a way to make a secondary index on "userid,
timestamp" which can help speed up my "get" calls? Or how can I change my
schema design to minimize response time for get calls ?


Regards,

Piyush Goel
Software Engineer
Yahoo! Software Development India Pvt. Ltd.
Bangalore, India
Ph : +91 80 66949816 (O)
            9980616752  (M)

If you're not failing every now and again, it's a sign you're not doing
anything very innovative.  - Woody Allen

Re: help needed with base schema

Posted by Erik Holstad <er...@gmail.com>.
Hi Piyush!
First I have to ask what version of HBase you are on?
For the new HBase 0.20 we have made some major rewrites of the internal
structure to get much greater read and
scan speeds.

Depending a little if you store your updates as versions or using the
qualifier as a timestamp we have two different ways
to query, you might know this already. you can either set the number of
versions that you which to get returned or use a filter.

Regards Erik