You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Varun Sharma <va...@pinterest.com> on 2012/12/03 20:58:38 UTC

Long row + column keys

Hi,

I have a schema where the rows are 8 bytes long and the columns are 12
bytes long (roughly 1000 columns per row). The value is 0 bytes. Is this
going to be space inefficient in terms of HFile size (large index + blocks)
? The total key size, as far as i know, would be 8 + 12 + 8 (timestamp) =
28 bytes. I am using hbase 0.94.0 which has HFile v2.

Also, should I be using an encoding technique to get the number of bytes
down (like PrefixDeltaEncoding) which is provided by hbase ?

Thanks !
Varun

Re: Long row + column keys

Posted by Varun Sharma <va...@pinterest.com>.
Hi Anoop,

I agree - I am not so concerned about the savings on disk - rather I am
thinking about the savings inside the block cache. I am not sure how stable
PrefixDeltaEncoding is and who else uses it. If not, are there people using
FastDiff encoding - it seems like any form of encoding scheme would get us
huge wins.

Thanks !
Varun

On Mon, Dec 3, 2012 at 8:23 PM, Anoop Sam John <an...@huawei.com> wrote:

> Hi Varun
>                  It looks to be very clear that you need to use some sort
> of encoding scheme.  PrefixDeltaEncoding would be fine may be..  You can
> see the other algos also like the FastDiff...  and see how much space it
> can save in your case. Also suggest you can use the encoding for data on
> disk as well as in memory (block cache)
> >The total key size, as far as i know, would be 8 + 12 + 8 (timestamp) =
> 28 bytes
> In every KV that is getting stored the key size would be
> 4(key length) + 4(value length) + 2(rowkey length) + 8(rowkey) + 1[cf
> length] + 12(cf + qualifer) + 8(timestamp) + 1( type PUT/DELETE...)  +
> value (0 bytes???? atleast 1 byte right) = 39+  bytes...
>
> Just making it clear for you :)
>
> -Anoop-
> ________________________________________
> From: Varun Sharma [varun@pinterest.com]
> Sent: Tuesday, December 04, 2012 2:36 AM
> To: Marcos Ortiz
> Cc: user@hbase.apache.org
> Subject: Re: Long row + column keys
>
> Hi Marcos,
>
> Thanks for the links. We have gone through these and thought about the
> schema. My question is about whether using PrefixDeltaEncoding makes sense
> in our situation...
>
> Varun
>
> On Mon, Dec 3, 2012 at 12:36 PM, Marcos Ortiz <ml...@uci.cu> wrote:
>
> > Regards, Varun.
> > I think that you can see the Bernoit Sigoure (@tsuna)愀 talk called
> > "Lessons learned from OpenTSDB" in the last
> > HBaseCon . [1]
> > He explained in great detail how to design your schema to obtain the best
> > performance from HBase.
> >
> > Other recommended talks are: "HBase Internals" from Lars, and "HBase
> > Schema Design" from Ian
> > [2][3]
> >
> > [1] http://www.slideshare.net/**cloudera/4-opentsdb-hbasecon<
> http://www.slideshare.net/cloudera/4-opentsdb-hbasecon>
> > [2] http://www.slideshare.net/**cloudera/3-learning-h-base-**
> > internals-lars-hofhansl-**salesforce-final/<
> http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final/
> >
> > [3] http://www.slideshare.net/**cloudera/5-h-base-**schemahbasecon2012<
> http://www.slideshare.net/cloudera/5-h-base-schemahbasecon2012>
> >
> >
> > On 12/03/2012 02:58 PM, Varun Sharma wrote:
> >
> >> Hi,
> >>
> >> I have a schema where the rows are 8 bytes long and the columns are 12
> >> bytes long (roughly 1000 columns per row). The value is 0 bytes. Is this
> >> going to be space inefficient in terms of HFile size (large index +
> >> blocks)
> >> ? The total key size, as far as i know, would be 8 + 12 + 8 (timestamp)
> =
> >> 28 bytes. I am using hbase 0.94.0 which has HFile v2.
> >>
> > Yes, like you said, HFile v2 is included in 0.94, but although is in
> trunk
> > right now, your should
> > keep following the development of HBase, focused on HBASE-5313 and
> > HBASE-5521, because
> > the development team is working in a new file storage format called HFile
> > v3, based on a columnar
> > format called Trevni for Avro by Dug Cutting.[4][5][6][7]
> >
> >
> > [4] https://issues.apache.org/**jira/browse/HBASE-5313<
> https://issues.apache.org/jira/browse/HBASE-5313>
> > [5] https://issues.apache.org/**jira/browse/HBASE-5521<
> https://issues.apache.org/jira/browse/HBASE-5521>
> > [6] https://github.com/cutting/**trevni<
> https://github.com/cutting/trevni>
> > [7] https://issues.apache.org/**jira/browse/AVRO-806<
> https://issues.apache.org/jira/browse/AVRO-806>
> >
> >
> >
> >
> >> Also, should I be using an encoding technique to get the number of bytes
> >> down (like PrefixDeltaEncoding) which is provided by hbase ?
> >>
> > Read the Cloudera愀 blog post called "HBase I/O - HFile" to see how Prefix
> > and Diff encodings
> > works, and decide which is the more suitable for you.[8]
> >
> >
> > [8]
> http://blog.cloudera.com/blog/**2012/06/hbase-io-hfile-input-**output/<
> http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/>
> >
> > I hope that all this information could help you.
> > Best wishes
> >
> >>
> >>
> >
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> > INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >
> > http://www.uci.cu
> > http://www.facebook.com/**universidad.uci<
> http://www.facebook.com/universidad.uci>
> > http://www.flickr.com/photos/**universidad_uci<
> http://www.flickr.com/photos/universidad_uci>
> >

RE: Long row + column keys

Posted by Anoop Sam John <an...@huawei.com>.
Hi Varun
                 It looks to be very clear that you need to use some sort of encoding scheme.  PrefixDeltaEncoding would be fine may be..  You can see the other algos also like the FastDiff...  and see how much space it can save in your case. Also suggest you can use the encoding for data on disk as well as in memory (block cache)
>The total key size, as far as i know, would be 8 + 12 + 8 (timestamp) = 28 bytes
In every KV that is getting stored the key size would be
4(key length) + 4(value length) + 2(rowkey length) + 8(rowkey) + 1[cf length] + 12(cf + qualifer) + 8(timestamp) + 1( type PUT/DELETE...)  + value (0 bytes???? atleast 1 byte right) = 39+  bytes... 

Just making it clear for you :)

-Anoop-
________________________________________
From: Varun Sharma [varun@pinterest.com]
Sent: Tuesday, December 04, 2012 2:36 AM
To: Marcos Ortiz
Cc: user@hbase.apache.org
Subject: Re: Long row + column keys

Hi Marcos,

Thanks for the links. We have gone through these and thought about the
schema. My question is about whether using PrefixDeltaEncoding makes sense
in our situation...

Varun

On Mon, Dec 3, 2012 at 12:36 PM, Marcos Ortiz <ml...@uci.cu> wrote:

> Regards, Varun.
> I think that you can see the Bernoit Sigoure (@tsuna)愀 talk called
> "Lessons learned from OpenTSDB" in the last
> HBaseCon . [1]
> He explained in great detail how to design your schema to obtain the best
> performance from HBase.
>
> Other recommended talks are: "HBase Internals" from Lars, and "HBase
> Schema Design" from Ian
> [2][3]
>
> [1] http://www.slideshare.net/**cloudera/4-opentsdb-hbasecon<http://www.slideshare.net/cloudera/4-opentsdb-hbasecon>
> [2] http://www.slideshare.net/**cloudera/3-learning-h-base-**
> internals-lars-hofhansl-**salesforce-final/<http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final/>
> [3] http://www.slideshare.net/**cloudera/5-h-base-**schemahbasecon2012<http://www.slideshare.net/cloudera/5-h-base-schemahbasecon2012>
>
>
> On 12/03/2012 02:58 PM, Varun Sharma wrote:
>
>> Hi,
>>
>> I have a schema where the rows are 8 bytes long and the columns are 12
>> bytes long (roughly 1000 columns per row). The value is 0 bytes. Is this
>> going to be space inefficient in terms of HFile size (large index +
>> blocks)
>> ? The total key size, as far as i know, would be 8 + 12 + 8 (timestamp) =
>> 28 bytes. I am using hbase 0.94.0 which has HFile v2.
>>
> Yes, like you said, HFile v2 is included in 0.94, but although is in trunk
> right now, your should
> keep following the development of HBase, focused on HBASE-5313 and
> HBASE-5521, because
> the development team is working in a new file storage format called HFile
> v3, based on a columnar
> format called Trevni for Avro by Dug Cutting.[4][5][6][7]
>
>
> [4] https://issues.apache.org/**jira/browse/HBASE-5313<https://issues.apache.org/jira/browse/HBASE-5313>
> [5] https://issues.apache.org/**jira/browse/HBASE-5521<https://issues.apache.org/jira/browse/HBASE-5521>
> [6] https://github.com/cutting/**trevni<https://github.com/cutting/trevni>
> [7] https://issues.apache.org/**jira/browse/AVRO-806<https://issues.apache.org/jira/browse/AVRO-806>
>
>
>
>
>> Also, should I be using an encoding technique to get the number of bytes
>> down (like PrefixDeltaEncoding) which is provided by hbase ?
>>
> Read the Cloudera愀 blog post called "HBase I/O - HFile" to see how Prefix
> and Diff encodings
> works, and decide which is the more suitable for you.[8]
>
>
> [8] http://blog.cloudera.com/blog/**2012/06/hbase-io-hfile-input-**output/<http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/>
>
> I hope that all this information could help you.
> Best wishes
>
>>
>>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>

Re: Long row + column keys

Posted by Varun Sharma <va...@pinterest.com>.
Hi Marcos,

Thanks for the links. We have gone through these and thought about the
schema. My question is about whether using PrefixDeltaEncoding makes sense
in our situation...

Varun

On Mon, Dec 3, 2012 at 12:36 PM, Marcos Ortiz <ml...@uci.cu> wrote:

> Regards, Varun.
> I think that you can see the Bernoit Sigoure (@tsuna)愀 talk called
> "Lessons learned from OpenTSDB" in the last
> HBaseCon . [1]
> He explained in great detail how to design your schema to obtain the best
> performance from HBase.
>
> Other recommended talks are: "HBase Internals" from Lars, and "HBase
> Schema Design" from Ian
> [2][3]
>
> [1] http://www.slideshare.net/**cloudera/4-opentsdb-hbasecon<http://www.slideshare.net/cloudera/4-opentsdb-hbasecon>
> [2] http://www.slideshare.net/**cloudera/3-learning-h-base-**
> internals-lars-hofhansl-**salesforce-final/<http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final/>
> [3] http://www.slideshare.net/**cloudera/5-h-base-**schemahbasecon2012<http://www.slideshare.net/cloudera/5-h-base-schemahbasecon2012>
>
>
> On 12/03/2012 02:58 PM, Varun Sharma wrote:
>
>> Hi,
>>
>> I have a schema where the rows are 8 bytes long and the columns are 12
>> bytes long (roughly 1000 columns per row). The value is 0 bytes. Is this
>> going to be space inefficient in terms of HFile size (large index +
>> blocks)
>> ? The total key size, as far as i know, would be 8 + 12 + 8 (timestamp) =
>> 28 bytes. I am using hbase 0.94.0 which has HFile v2.
>>
> Yes, like you said, HFile v2 is included in 0.94, but although is in trunk
> right now, your should
> keep following the development of HBase, focused on HBASE-5313 and
> HBASE-5521, because
> the development team is working in a new file storage format called HFile
> v3, based on a columnar
> format called Trevni for Avro by Dug Cutting.[4][5][6][7]
>
>
> [4] https://issues.apache.org/**jira/browse/HBASE-5313<https://issues.apache.org/jira/browse/HBASE-5313>
> [5] https://issues.apache.org/**jira/browse/HBASE-5521<https://issues.apache.org/jira/browse/HBASE-5521>
> [6] https://github.com/cutting/**trevni<https://github.com/cutting/trevni>
> [7] https://issues.apache.org/**jira/browse/AVRO-806<https://issues.apache.org/jira/browse/AVRO-806>
>
>
>
>
>> Also, should I be using an encoding technique to get the number of bytes
>> down (like PrefixDeltaEncoding) which is provided by hbase ?
>>
> Read the Cloudera愀 blog post called "HBase I/O - HFile" to see how Prefix
> and Diff encodings
> works, and decide which is the more suitable for you.[8]
>
>
> [8] http://blog.cloudera.com/blog/**2012/06/hbase-io-hfile-input-**output/<http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/>
>
> I hope that all this information could help you.
> Best wishes
>
>>
>>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>

Re: Long row + column keys

Posted by Marcos Ortiz <ml...@uci.cu>.
Regards, Varun.
I think that you can see the Bernoit Sigoure (@tsuna)´s talk called 
"Lessons learned from OpenTSDB" in the last
HBaseCon . [1]
He explained in great detail how to design your schema to obtain the 
best performance from HBase.

Other recommended talks are: "HBase Internals" from Lars, and "HBase 
Schema Design" from Ian
[2][3]

[1] http://www.slideshare.net/cloudera/4-opentsdb-hbasecon
[2] 
http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final/
[3] http://www.slideshare.net/cloudera/5-h-base-schemahbasecon2012

On 12/03/2012 02:58 PM, Varun Sharma wrote:
> Hi,
>
> I have a schema where the rows are 8 bytes long and the columns are 12
> bytes long (roughly 1000 columns per row). The value is 0 bytes. Is this
> going to be space inefficient in terms of HFile size (large index + blocks)
> ? The total key size, as far as i know, would be 8 + 12 + 8 (timestamp) =
> 28 bytes. I am using hbase 0.94.0 which has HFile v2.
Yes, like you said, HFile v2 is included in 0.94, but although is in 
trunk right now, your should
keep following the development of HBase, focused on HBASE-5313 and 
HBASE-5521, because
the development team is working in a new file storage format called 
HFile v3, based on a columnar
format called Trevni for Avro by Dug Cutting.[4][5][6][7]


[4] https://issues.apache.org/jira/browse/HBASE-5313
[5] https://issues.apache.org/jira/browse/HBASE-5521
[6] https://github.com/cutting/trevni
[7] https://issues.apache.org/jira/browse/AVRO-806


>
> Also, should I be using an encoding technique to get the number of bytes
> down (like PrefixDeltaEncoding) which is provided by hbase ?
Read the Cloudera´s blog post called "HBase I/O - HFile" to see how 
Prefix and Diff encodings
works, and decide which is the more suitable for you.[8]


[8] http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/

I hope that all this information could help you.
Best wishes
>


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci