You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Marcus Herou <ma...@tailsweep.com> on 2008/07/22 15:33:44 UTC

HBase one-to-many, many-to-one, many-to-many

Hi.

What is the best practice in hbase when it comes to creating "mapping"
tables between objects?

Let's say you want to create two tables named "User" and "Role" where the
user can be in many roles.

User->Role

I guess you could create some specially, proprietary cells like role:someuid
which contains the ref to the Role table but this seems a little strange.

Another quite normal example (for me at lesast) is to tag various content.

Eg:
BlogEntry<-BlogEntryCategory->Category

where in a rdbms the BlogEntryCategory would just contain two cols
blogEntryId and categoryId.

Howto model that with column families ?

Right now I'm creating Serializers which can serialize arrays back and forth

Eg StringArraySerializer
 public byte[] serialize(Object object) throws IOException
    {
        String[] a = (String[])object;
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < a.length; i++)
        {
            sb.append(a[i]);
            if(i < (a.length - 1))
            {
                sb.append(this.delimiter);
            }
        }
        return sb.toString().getBytes("UTF-8");
    }

    public Object deserialize(byte[] bytes) throws IOException
    {
        String str = new String(bytes, "UTF-8");
        StringTokenizer st = new StringTokenizer(str, delimiter);

        List<String> list = new ArrayList();
        while(st.hasMoreTokens())
        {
            String token = st.nextToken();
            list.add(token);
        }
        return list.toArray(new String[list.size()]);
    }


and then store the byte[] in hbase. Ugly....

Please guide my sorry ass.

Kindly

//Marcus




-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: HBase one-to-many, many-to-one, many-to-many

Posted by Marcus Herou <ma...@tailsweep.com>.
Yep I _really_ understand denormalization... But! Still sometimes you want
to have the choice of whether you will denormalize or not. I prefer to
normalize at first and measure the bottlenecks then getting pragmatic :)

I'm not so worried about diskspace as I am about shuffling uneccessary data
around making request roundtrip times longer than necessary.

Consider this:

Papa
   Lots and lots of columns and data

Daughter
  Few columns

If I'm 90%+ only interested in the Daughters of the Papa I want to have the
choice of not seeing Papa's data.

Typically I want to store normalized data in a db and denormalize like hell
with Lucene indexes for searching since Lucene beats the crap out of db
indexing. Get me ?

By the time I'm writing this I have already written a simple ORM for HBase
with lazy fetching, one-to-many, many-to-one etc :)

More about that later if you or the group are interested.


Kindly

//Marcus









On Tue, Jul 22, 2008 at 4:54 PM, Jean-Daniel Cryans <jd...@gmail.com>
wrote:

> Marcus,
>
> Denormalization implies duplication. See this excellent article on the
> subject:
>
> http://highscalability.com/how-i-learned-stop-worrying-and-love-using-lot-disk-space-scale
>
> In your case, you could keep the "role:" family that contains the row keys
> to all roles (a user has) as a column key and value (or the value could be
> the description) and if you have to know who has a particular role, have a
> new family in Role named "user:" that would map the other way.
>
> Same thing with category.
>
> J-D
>
> On Tue, Jul 22, 2008 at 9:33 AM, Marcus Herou <ma...@tailsweep.com>
> wrote:
>
> > Hi.
> >
> > What is the best practice in hbase when it comes to creating "mapping"
> > tables between objects?
> >
> > Let's say you want to create two tables named "User" and "Role" where the
> > user can be in many roles.
> >
> > User->Role
> >
> > I guess you could create some specially, proprietary cells like
> > role:someuid
> > which contains the ref to the Role table but this seems a little strange.
> >
> > Another quite normal example (for me at lesast) is to tag various
> content.
> >
> > Eg:
> > BlogEntry<-BlogEntryCategory->Category
> >
> > where in a rdbms the BlogEntryCategory would just contain two cols
> > blogEntryId and categoryId.
> >
> > Howto model that with column families ?
> >
> > Right now I'm creating Serializers which can serialize arrays back and
> > forth
> >
> > Eg StringArraySerializer
> >  public byte[] serialize(Object object) throws IOException
> >    {
> >        String[] a = (String[])object;
> >        StringBuilder sb = new StringBuilder();
> >        for (int i = 0; i < a.length; i++)
> >        {
> >            sb.append(a[i]);
> >            if(i < (a.length - 1))
> >            {
> >                sb.append(this.delimiter);
> >            }
> >        }
> >        return sb.toString().getBytes("UTF-8");
> >    }
> >
> >    public Object deserialize(byte[] bytes) throws IOException
> >    {
> >        String str = new String(bytes, "UTF-8");
> >        StringTokenizer st = new StringTokenizer(str, delimiter);
> >
> >        List<String> list = new ArrayList();
> >        while(st.hasMoreTokens())
> >        {
> >            String token = st.nextToken();
> >            list.add(token);
> >        }
> >        return list.toArray(new String[list.size()]);
> >    }
> >
> >
> > and then store the byte[] in hbase. Ugly....
> >
> > Please guide my sorry ass.
> >
> > Kindly
> >
> > //Marcus
> >
> >
> >
> >
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > marcus.herou@tailsweep.com
> > http://www.tailsweep.com/
> > http://blogg.tailsweep.com/
> >
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: HBase one-to-many, many-to-one, many-to-many

Posted by Jean-Daniel Cryans <jd...@gmail.com>.
Marcus,

Denormalization implies duplication. See this excellent article on the
subject:
http://highscalability.com/how-i-learned-stop-worrying-and-love-using-lot-disk-space-scale

In your case, you could keep the "role:" family that contains the row keys
to all roles (a user has) as a column key and value (or the value could be
the description) and if you have to know who has a particular role, have a
new family in Role named "user:" that would map the other way.

Same thing with category.

J-D

On Tue, Jul 22, 2008 at 9:33 AM, Marcus Herou <ma...@tailsweep.com>
wrote:

> Hi.
>
> What is the best practice in hbase when it comes to creating "mapping"
> tables between objects?
>
> Let's say you want to create two tables named "User" and "Role" where the
> user can be in many roles.
>
> User->Role
>
> I guess you could create some specially, proprietary cells like
> role:someuid
> which contains the ref to the Role table but this seems a little strange.
>
> Another quite normal example (for me at lesast) is to tag various content.
>
> Eg:
> BlogEntry<-BlogEntryCategory->Category
>
> where in a rdbms the BlogEntryCategory would just contain two cols
> blogEntryId and categoryId.
>
> Howto model that with column families ?
>
> Right now I'm creating Serializers which can serialize arrays back and
> forth
>
> Eg StringArraySerializer
>  public byte[] serialize(Object object) throws IOException
>    {
>        String[] a = (String[])object;
>        StringBuilder sb = new StringBuilder();
>        for (int i = 0; i < a.length; i++)
>        {
>            sb.append(a[i]);
>            if(i < (a.length - 1))
>            {
>                sb.append(this.delimiter);
>            }
>        }
>        return sb.toString().getBytes("UTF-8");
>    }
>
>    public Object deserialize(byte[] bytes) throws IOException
>    {
>        String str = new String(bytes, "UTF-8");
>        StringTokenizer st = new StringTokenizer(str, delimiter);
>
>        List<String> list = new ArrayList();
>        while(st.hasMoreTokens())
>        {
>            String token = st.nextToken();
>            list.add(token);
>        }
>        return list.toArray(new String[list.size()]);
>    }
>
>
> and then store the byte[] in hbase. Ugly....
>
> Please guide my sorry ass.
>
> Kindly
>
> //Marcus
>
>
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.herou@tailsweep.com
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/
>