You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Christian Schäfer <sy...@yahoo.de> on 2012/07/06 12:21:46 UTC

Question about compression

Hi there,

two beginner question I got concerning compression:

a) Where does compression (like snappy) actually occur.

I set snappy to a column family and filled it with some data (30 MB) -> 640x480 array of 11 Bit values.

After flushing the memstore the size of the data kept exactly the same but flushing was 10x faster than flushing of the table without compression.

So it's "only" the transfer that is compressed? Or are there possibilities to apply compression to the HFiles?

(I'm still using 0.90.4-cdh3u2 because upgrading instructions seems quite tedious to me)


b) Are there some possibilities to apply delta-compression to HBase to minimize disk usage due to duplicated data?

Has it to be added or even built or is it already included in HBase?


Thanks for any feedback.

regards
Chris


Re: Question about compression

Posted by Vinithra Varadharajan <vi...@cloudera.com>.
Christian,

In order to upgrade HBase that is managed by CM, you do not need to
uninstall CM. For that matter, even if you want to upgrade CM you don't
need to uninstall it first. Here are docs on how to upgrade CM and CDH. You
might want to consider upgrading to CM 4.0 based on its new feature list:
https://ccp.cloudera.com/display/ENT4DOC/Upgrade+from+Cloudera+Manager+3.7.x+to+Cloudera+Manager+4.0

-Vinithra

On Sun, Jul 8, 2012 at 6:00 AM, Christian Schäfer <sy...@yahoo.de>wrote:

>
>
> Thanks J-D for your information.
> At the HMaster WebGUI (http://host:60030/regionserver.jsp) I saw the same
> size of memstoreSizeMB as well as storefileSizeMB  after flushing.
> I will check  out the logs for the size as you proposed.
>
> >> (I'm still using 0.90.4-cdh3u2 because upgrading instructions seems
> quite tedious to me)
>
> > Stop everything, deploy new version, restart.
>
>
> I'm using Cloudera Manager Free 3.7 and on cloudera.com there are some
> instructions that require several step especially concerning uninstallation.
> Not that I'am too lazy to do that, however I don't want loose my working
> test system due to a an error on upgrading process.
> But someday I will have to do that because we need to evaluate
> coprocessors for our use case.
>
> Sorry for this "let me google
> that for you situation" but I just added the that point to the post
> expecting that not every feature is listed on jira..but now I know better
> :-)
> Thanks anyway for the jira links.
>
> regards
> Chris
>
>
> ________________________________
> Von: Jean-Daniel Cryans <jd...@apache.org>
> An: user@hbase.apache.org; Christian Schäfer <sy...@yahoo.de>
> Gesendet: 23:53 Freitag, 6.Juli 2012
> Betreff: Re: Question about compression
>
> Inline.
>
> J-D
>
> On Fri, Jul 6, 2012 at 3:21 AM, Christian Schäfer <sy...@yahoo.de>
> wrote:
> > a) Where does compression (like snappy) actually occur.
> >
> > I set snappy to a column family and filled it with some data (30 MB) ->
> 640x480 array of 11 Bit values.
> >
> > After flushing the memstore the size of the data kept exactly the same
> but flushing was 10x faster than flushing of the table without compression.
> >
> > So it's "only" the transfer that is compressed? Or are there
> possibilities to apply compression to the HFiles?
>
> The files are compressed on flush/compact and it's done per 64KB
> block. I doubt it the file was the same size as the memstore, look at
> your log where it gives the numbers for each flush.
>
> >
> > (I'm still using 0.90.4-cdh3u2 because upgrading instructions seems
> quite tedious to me)
>
> Stop everything, deploy new version,
> restart.
>
> >
> >
> > b) Are there some possibilities to apply delta-compression to HBase to
> minimize disk usage due to duplicated data?
> >
> > Has it to be added or even built or is it already included in HBase?
>
> The first hit when googling "hbase delta compression" returns this:
> https://issues.apache.org/jira/browse/HBASE-4218
>
> As you can see it was included in 0.94 (no clue how that translates
> for CDH... CDH5??)
>
> There is also prefix compression in the pipeline:
> https://issues.apache.org/jira/browse/HBASE-4676
>
> Hope this helps,
>
> J-D
>

RE: Question about compression

Posted by Christian Schäfer <sy...@yahoo.de>.

Thanks J-D for your information.
At the HMaster WebGUI (http://host:60030/regionserver.jsp) I saw the same size of memstoreSizeMB as well as storefileSizeMB  after flushing.
I will check  out the logs for the size as you proposed.

>> (I'm still using 0.90.4-cdh3u2 because upgrading instructions seems quite tedious to me)

> Stop everything, deploy new version, restart.


I'm using Cloudera Manager Free 3.7 and on cloudera.com there are some instructions that require several step especially concerning uninstallation.
Not that I'am too lazy to do that, however I don't want loose my working test system due to a an error on upgrading process.
But someday I will have to do that because we need to evaluate coprocessors for our use case.

Sorry for this "let me google
that for you situation" but I just added the that point to the post expecting that not every feature is listed on jira..but now I know better :-)
Thanks anyway for the jira links.

regards
Chris


________________________________
Von: Jean-Daniel Cryans <jd...@apache.org>
An: user@hbase.apache.org; Christian Schäfer <sy...@yahoo.de> 
Gesendet: 23:53 Freitag, 6.Juli 2012
Betreff: Re: Question about compression

Inline.

J-D

On Fri, Jul 6, 2012 at 3:21 AM, Christian Schäfer <sy...@yahoo.de> wrote:
> a) Where does compression (like snappy) actually occur.
>
> I set snappy to a column family and filled it with some data (30 MB) -> 640x480 array of 11 Bit values.
>
> After flushing the memstore the size of the data kept exactly the same but flushing was 10x faster than flushing of the table without compression.
>
> So it's "only" the transfer that is compressed? Or are there possibilities to apply compression to the HFiles?

The files are compressed on flush/compact and it's done per 64KB
block. I doubt it the file was the same size as the memstore, look at
your log where it gives the numbers for each flush.

>
> (I'm still using 0.90.4-cdh3u2 because upgrading instructions seems quite tedious to me)

Stop everything, deploy new version,
restart.

>
>
> b) Are there some possibilities to apply delta-compression to HBase to minimize disk usage due to duplicated data?
>
> Has it to be added or even built or is it already included in HBase?

The first hit when googling "hbase delta compression" returns this:
https://issues.apache.org/jira/browse/HBASE-4218

As you can see it was included in 0.94 (no clue how that translates
for CDH... CDH5??)

There is also prefix compression in the pipeline:
https://issues.apache.org/jira/browse/HBASE-4676

Hope this helps,

J-D

Re: Question about compression

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Inline.

J-D

On Fri, Jul 6, 2012 at 3:21 AM, Christian Schäfer <sy...@yahoo.de> wrote:
> a) Where does compression (like snappy) actually occur.
>
> I set snappy to a column family and filled it with some data (30 MB) -> 640x480 array of 11 Bit values.
>
> After flushing the memstore the size of the data kept exactly the same but flushing was 10x faster than flushing of the table without compression.
>
> So it's "only" the transfer that is compressed? Or are there possibilities to apply compression to the HFiles?

The files are compressed on flush/compact and it's done per 64KB
block. I doubt it the file was the same size as the memstore, look at
your log where it gives the numbers for each flush.

>
> (I'm still using 0.90.4-cdh3u2 because upgrading instructions seems quite tedious to me)

Stop everything, deploy new version, restart.

>
>
> b) Are there some possibilities to apply delta-compression to HBase to minimize disk usage due to duplicated data?
>
> Has it to be added or even built or is it already included in HBase?

The first hit when googling "hbase delta compression" returns this:
https://issues.apache.org/jira/browse/HBASE-4218

As you can see it was included in 0.94 (no clue how that translates
for CDH... CDH5??)

There is also prefix compression in the pipeline:
https://issues.apache.org/jira/browse/HBASE-4676

Hope this helps,

J-D