You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Amit Sela <am...@infolinks.com> on 2013/11/28 14:01:50 UTC

HBase value design

There are a lot of discussions here regarding the row design but I have a
question about the value design:

Say I have a table t1 with rows r1,r2...rn and family f.
I also have qualifiers q1,q2...,qm

For each (ri,fi,qi) tuple I want to store a value vi that is a data blob
that implements Writable and has two members:
Integer countInt
Float countFloat

Would you change the design so that I'll have 2m qualifiers i.e.,
q1_countInt and q1_countFloat etc.
with IntWritable and FloatWritable values (respectively) ? or stay with the
data blob ?

Thanks,

Amit.

Re: HBase value design

Posted by Asaf Mesika <as...@gmail.com>.
On our project we store nested record structures with 10-40 fields. We have
decided to save on storage and write throughout by writing a serialized
avro record as value. We place one byte before to allow versioning. We did
it since each column is written with its rowkey, cq, cf and timestamp. Your
write throughput can be severely impacted if you write each field as a
column as Phoenix does.
We addressed the read partially: we do read the entire record since you
can't read part of the value yet, and only send the fields we need - this
was achieved using a coprocessor we wrote.

In your case if it's only two fields, I'm not I would bother and simply use
columns.

We have plans to open source the query layer but it will only happen in
2014 :)

On Thursday, November 28, 2013, Amit Sela wrote:

> I am using some sort of schema that allows me to expand my data blob if
> needed.
> However, I'm considering testing Phoenix (or maybe prestoDB once it gets an
> HBase connector) and I was wondering if the common practice is "simple
> type" values and not data blobs because I saw that Phoenix doesn't support
> data blob values.
>
> What does it mean "If there is a possibility a new member would be added to
> the tuple" ?
>
> Thanks.
>
>
>
> On Thu, Nov 28, 2013 at 5:22 PM, Ted Yu <yuzhihong@gmail.com<javascript:;>>
> wrote:
>
> > Amit:
> > In your example you use Writable for serialization.
> > In 0.96 and beyond, protobuf is used in place of Writable.
> >
> > If there is a possibility a new member would be added to the tuple,
> > consider using some scheme that allows the expansion.
> >
> > Please take a look at this as well:
> > HBASE-8089 Add type support
> >
> > Cheers
> >
> >
> > On Thu, Nov 28, 2013 at 5:17 AM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org <javascript:;>> wrote:
> >
> > > Hi Amit,
> > >
> > > It all depends on your usecase ;)
> > >
> > > If you always access countIn and countFloat when you access a value,
> then
> > > put them together to avoid to have to do 2 calls or a scan or a
> multiget.
> > > But if you never access them together, you might want to separate them
> to
> > > reduce RCP transfert, etc.
> > >
> > >
> > > JM
> > >
> > >
> > > 2013/11/28 Amit Sela <amits@infolinks.com <javascript:;>>
> > >
> > > > There are a lot of discussions here regarding the row design but I
> > have a
> > > > question about the value design:
> > > >
> > > > Say I have a table t1 with rows r1,r2...rn and family f.
> > > > I also have qualifiers q1,q2...,qm
> > > >
> > > > For each (ri,fi,qi) tuple I want to store a value vi that is a data
> > blob
> > > > that implements Writable and has two members:
> > > > Integer countInt
> > > > Float countFloat
> > > >
> > > > Would you change the design so that I'll have 2m qualifiers i.e.,
> > > > q1_countInt and q1_countFloat etc.
> > > > with IntWritable and FloatWritable values (respectively) ? or stay
> with
> > > the
> > > > data blob ?
> > > >
> > > > Thanks,
> > > >
> > > > Amit.
> > > >
> > >
> >
>

Re: HBase value design

Posted by Ted Yu <yu...@gmail.com>.
It means you may have a new member other than the following two:

> Integer countInt
> Float countFloat


On Thu, Nov 28, 2013 at 7:40 AM, Amit Sela <am...@infolinks.com> wrote:

> I am using some sort of schema that allows me to expand my data blob if
> needed.
> However, I'm considering testing Phoenix (or maybe prestoDB once it gets an
> HBase connector) and I was wondering if the common practice is "simple
> type" values and not data blobs because I saw that Phoenix doesn't support
> data blob values.
>
> What does it mean "If there is a possibility a new member would be added to
> the tuple" ?
>
> Thanks.
>
>
>
> On Thu, Nov 28, 2013 at 5:22 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Amit:
> > In your example you use Writable for serialization.
> > In 0.96 and beyond, protobuf is used in place of Writable.
> >
> > If there is a possibility a new member would be added to the tuple,
> > consider using some scheme that allows the expansion.
> >
> > Please take a look at this as well:
> > HBASE-8089 Add type support
> >
> > Cheers
> >
> >
> > On Thu, Nov 28, 2013 at 5:17 AM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Amit,
> > >
> > > It all depends on your usecase ;)
> > >
> > > If you always access countIn and countFloat when you access a value,
> then
> > > put them together to avoid to have to do 2 calls or a scan or a
> multiget.
> > > But if you never access them together, you might want to separate them
> to
> > > reduce RCP transfert, etc.
> > >
> > >
> > > JM
> > >
> > >
> > > 2013/11/28 Amit Sela <am...@infolinks.com>
> > >
> > > > There are a lot of discussions here regarding the row design but I
> > have a
> > > > question about the value design:
> > > >
> > > > Say I have a table t1 with rows r1,r2...rn and family f.
> > > > I also have qualifiers q1,q2...,qm
> > > >
> > > > For each (ri,fi,qi) tuple I want to store a value vi that is a data
> > blob
> > > > that implements Writable and has two members:
> > > > Integer countInt
> > > > Float countFloat
> > > >
> > > > Would you change the design so that I'll have 2m qualifiers i.e.,
> > > > q1_countInt and q1_countFloat etc.
> > > > with IntWritable and FloatWritable values (respectively) ? or stay
> with
> > > the
> > > > data blob ?
> > > >
> > > > Thanks,
> > > >
> > > > Amit.
> > > >
> > >
> >
>

Re: HBase value design

Posted by Amit Sela <am...@infolinks.com>.
I am using some sort of schema that allows me to expand my data blob if
needed.
However, I'm considering testing Phoenix (or maybe prestoDB once it gets an
HBase connector) and I was wondering if the common practice is "simple
type" values and not data blobs because I saw that Phoenix doesn't support
data blob values.

What does it mean "If there is a possibility a new member would be added to
the tuple" ?

Thanks.



On Thu, Nov 28, 2013 at 5:22 PM, Ted Yu <yu...@gmail.com> wrote:

> Amit:
> In your example you use Writable for serialization.
> In 0.96 and beyond, protobuf is used in place of Writable.
>
> If there is a possibility a new member would be added to the tuple,
> consider using some scheme that allows the expansion.
>
> Please take a look at this as well:
> HBASE-8089 Add type support
>
> Cheers
>
>
> On Thu, Nov 28, 2013 at 5:17 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Amit,
> >
> > It all depends on your usecase ;)
> >
> > If you always access countIn and countFloat when you access a value, then
> > put them together to avoid to have to do 2 calls or a scan or a multiget.
> > But if you never access them together, you might want to separate them to
> > reduce RCP transfert, etc.
> >
> >
> > JM
> >
> >
> > 2013/11/28 Amit Sela <am...@infolinks.com>
> >
> > > There are a lot of discussions here regarding the row design but I
> have a
> > > question about the value design:
> > >
> > > Say I have a table t1 with rows r1,r2...rn and family f.
> > > I also have qualifiers q1,q2...,qm
> > >
> > > For each (ri,fi,qi) tuple I want to store a value vi that is a data
> blob
> > > that implements Writable and has two members:
> > > Integer countInt
> > > Float countFloat
> > >
> > > Would you change the design so that I'll have 2m qualifiers i.e.,
> > > q1_countInt and q1_countFloat etc.
> > > with IntWritable and FloatWritable values (respectively) ? or stay with
> > the
> > > data blob ?
> > >
> > > Thanks,
> > >
> > > Amit.
> > >
> >
>

Re: HBase value design

Posted by Ted Yu <yu...@gmail.com>.
Amit:
In your example you use Writable for serialization.
In 0.96 and beyond, protobuf is used in place of Writable.

If there is a possibility a new member would be added to the tuple,
consider using some scheme that allows the expansion.

Please take a look at this as well:
HBASE-8089 Add type support

Cheers


On Thu, Nov 28, 2013 at 5:17 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Amit,
>
> It all depends on your usecase ;)
>
> If you always access countIn and countFloat when you access a value, then
> put them together to avoid to have to do 2 calls or a scan or a multiget.
> But if you never access them together, you might want to separate them to
> reduce RCP transfert, etc.
>
>
> JM
>
>
> 2013/11/28 Amit Sela <am...@infolinks.com>
>
> > There are a lot of discussions here regarding the row design but I have a
> > question about the value design:
> >
> > Say I have a table t1 with rows r1,r2...rn and family f.
> > I also have qualifiers q1,q2...,qm
> >
> > For each (ri,fi,qi) tuple I want to store a value vi that is a data blob
> > that implements Writable and has two members:
> > Integer countInt
> > Float countFloat
> >
> > Would you change the design so that I'll have 2m qualifiers i.e.,
> > q1_countInt and q1_countFloat etc.
> > with IntWritable and FloatWritable values (respectively) ? or stay with
> the
> > data blob ?
> >
> > Thanks,
> >
> > Amit.
> >
>

Re: HBase value design

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Amit,

It all depends on your usecase ;)

If you always access countIn and countFloat when you access a value, then
put them together to avoid to have to do 2 calls or a scan or a multiget.
But if you never access them together, you might want to separate them to
reduce RCP transfert, etc.


JM


2013/11/28 Amit Sela <am...@infolinks.com>

> There are a lot of discussions here regarding the row design but I have a
> question about the value design:
>
> Say I have a table t1 with rows r1,r2...rn and family f.
> I also have qualifiers q1,q2...,qm
>
> For each (ri,fi,qi) tuple I want to store a value vi that is a data blob
> that implements Writable and has two members:
> Integer countInt
> Float countFloat
>
> Would you change the design so that I'll have 2m qualifiers i.e.,
> q1_countInt and q1_countFloat etc.
> with IntWritable and FloatWritable values (respectively) ? or stay with the
> data blob ?
>
> Thanks,
>
> Amit.
>