You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by jeremy p <at...@gmail.com> on 2014/06/13 19:35:27 UTC

Does compression ever improve performance?

Hey all,

Right now, I'm not using compression on any of my tables, because our data
doesn't take up a huge amount of space.  However, I would turn on
compression if there was a chance it would improve HBase's performance.  By
performance, I'm talking about the speed with which HBase responds to
requests and retrieves data.

Should I turn compression on?

--Jeremy

Re: Does compression ever improve performance?

Posted by Michael Segel <mi...@hotmail.com>.
That works since you don’t need a region to be splittable… 

On Jun 14, 2014, at 4:36 PM, Kevin O'dell <ke...@cloudera.com> wrote:

> Hi Jeremy,
> 
>  I always recommend turning on snappy compression,  I have ~20%
> performance increases.
> On Jun 14, 2014 10:25 AM, "Ted Yu" <yu...@gmail.com> wrote:
> 
>> You may have read Doug Meil's writeup where he tried out different
>> ColumnFamily
>> compressions :
>> 
>> https://blogs.apache.org/hbase/
>> 
>> Cheers
>> 
>> 
>> On Fri, Jun 13, 2014 at 11:33 AM, jeremy p <athomewithagroovebox@gmail.com
>>> 
>> wrote:
>> 
>>> Thank you -- I'll go ahead and try compression.
>>> 
>>> --Jeremy
>>> 
>>> 
>>> On Fri, Jun 13, 2014 at 10:59 AM, Dima Spivak <ds...@cloudera.com>
>>> wrote:
>>> 
>>>> I'd highly recommend it. In general, compressing your column families
>>> will
>>>> improve performance by reducing the resources required to get data from
>>>> disk (even when taking into account the CPU overhead of compressing and
>>>> decompressing).
>>>> 
>>>> -Dima
>>>> 
>>>> 
>>>> On Fri, Jun 13, 2014 at 10:35 AM, jeremy p <
>>> athomewithagroovebox@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Hey all,
>>>>> 
>>>>> Right now, I'm not using compression on any of my tables, because our
>>>> data
>>>>> doesn't take up a huge amount of space.  However, I would turn on
>>>>> compression if there was a chance it would improve HBase's
>> performance.
>>>> By
>>>>> performance, I'm talking about the speed with which HBase responds to
>>>>> requests and retrieves data.
>>>>> 
>>>>> Should I turn compression on?
>>>>> 
>>>>> --Jeremy
>>>>> 
>>>> 
>>> 
>> 


Re: Does compression ever improve performance?

Posted by lars hofhansl <la...@apache.org>.
Unfortunately it's not quite that simple.
Currently the HBase scanning guts expect all KeyValues to be laid out in memory in a continuous way, so with encoding they need to be copied in memory to make it... We're working on fixing it, but this is currently the way it is.

So on the one hand you fit more data into the block cache (which is unlike compression, where the data is uncompressed before the blocks get cached), but on the other hand much more garbage is produced during scanning and more CPU and memory bandwidth is used. So you need to test for your use case.

-- Lars



________________________________
 From: Ted Yu <yu...@gmail.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Sunday, June 15, 2014 3:46 PM
Subject: Re: Does compression ever improve performance?
 

Data block encoding enables block cache to hold more entries, thereby
lifting performance.

You can find coverage of data block encoding in this wiki as well:
https://blogs.apache.org/hbase/

Cheers





On Sun, Jun 15, 2014 at 2:00 PM, Tom Brown <to...@gmail.com> wrote:

> I don't mean to hijack the thread, but this question seems relevant:
>
> Does data block encoding also help performance, or does it just enable more
> efficient compression?
>
> --Tom
>
> On Saturday, June 14, 2014, Guillermo Ortiz <ko...@gmail.com> wrote:
>
> > I would like to see the times they got doing some scans or get with the
> > benchmark about compression and block code to figure out how much time to
> > save if your data are smaller but you have to decompress them.
> >
> > El sábado, 14 de junio de 2014, Kevin O'dell <kevin.odell@cloudera.com
> > <javascript:;>>
> > escribió:
> >
> > > Hi Jeremy,
> > >
> > >   I always recommend turning on snappy compression,  I have ~20%
> > > performance increases.
> > > On Jun 14, 2014 10:25 AM, "Ted Yu" <yuzhihong@gmail.com <javascript:;>
> > <javascript:;>>
> > > wrote:
> > >
> > > > You may have read Doug Meil's writeup where he tried out different
> > > > ColumnFamily
> > > > compressions :
> > > >
> > > > https://blogs.apache.org/hbase/
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Fri, Jun 13, 2014 at 11:33 AM, jeremy p <
> > > athomewithagroovebox@gmail.com <javascript:;> <javascript:;>
> > > > >
> > > > wrote:
> > > >
> > > > > Thank you -- I'll go ahead and try compression.
> > > > >
> > > > > --Jeremy
> > > > >
> > > > >
> > > > > On Fri, Jun 13, 2014 at 10:59 AM, Dima Spivak <
> dspivak@cloudera.com
> > <javascript:;>
> > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > I'd highly recommend it. In general, compressing your column
> > families
> > > > > will
> > > > > > improve performance by reducing the resources required to get
> data
> > > from
> > > > > > disk (even when taking into account the CPU overhead of
> compressing
> > > and
> > > > > > decompressing).
> > > > > >
> > > > > > -Dima
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 13, 2014 at 10:35 AM, jeremy p <
> > > > > athomewithagroovebox@gmail.com <javascript:;> <javascript:;>
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey all,
> > > > > > >
> > > > > > > Right now, I'm not using compression on any of my tables,
> because
> > > our
> > > > > > data
> > > > > > > doesn't take up a huge amount of space.  However, I would turn
> on
> > > > > > > compression if there was a chance it would improve HBase's
> > > > performance.
> > > > > >  By
> > > > > > > performance, I'm talking about the speed with which HBase
> > responds
> > > to
> > > > > > > requests and retrieves data.
> > > > > > >
> > > > > > > Should I turn compression on?
> > > > > > >
> > > > > > > --Jeremy
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Does compression ever improve performance?

Posted by Ted Yu <yu...@gmail.com>.
Data block encoding enables block cache to hold more entries, thereby
lifting performance.

You can find coverage of data block encoding in this wiki as well:
https://blogs.apache.org/hbase/

Cheers


On Sun, Jun 15, 2014 at 2:00 PM, Tom Brown <to...@gmail.com> wrote:

> I don't mean to hijack the thread, but this question seems relevant:
>
> Does data block encoding also help performance, or does it just enable more
> efficient compression?
>
> --Tom
>
> On Saturday, June 14, 2014, Guillermo Ortiz <ko...@gmail.com> wrote:
>
> > I would like to see the times they got doing some scans or get with the
> > benchmark about compression and block code to figure out how much time to
> > save if your data are smaller but you have to decompress them.
> >
> > El sábado, 14 de junio de 2014, Kevin O'dell <kevin.odell@cloudera.com
> > <javascript:;>>
> > escribió:
> >
> > > Hi Jeremy,
> > >
> > >   I always recommend turning on snappy compression,  I have ~20%
> > > performance increases.
> > > On Jun 14, 2014 10:25 AM, "Ted Yu" <yuzhihong@gmail.com <javascript:;>
> > <javascript:;>>
> > > wrote:
> > >
> > > > You may have read Doug Meil's writeup where he tried out different
> > > > ColumnFamily
> > > > compressions :
> > > >
> > > > https://blogs.apache.org/hbase/
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Fri, Jun 13, 2014 at 11:33 AM, jeremy p <
> > > athomewithagroovebox@gmail.com <javascript:;> <javascript:;>
> > > > >
> > > > wrote:
> > > >
> > > > > Thank you -- I'll go ahead and try compression.
> > > > >
> > > > > --Jeremy
> > > > >
> > > > >
> > > > > On Fri, Jun 13, 2014 at 10:59 AM, Dima Spivak <
> dspivak@cloudera.com
> > <javascript:;>
> > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > I'd highly recommend it. In general, compressing your column
> > families
> > > > > will
> > > > > > improve performance by reducing the resources required to get
> data
> > > from
> > > > > > disk (even when taking into account the CPU overhead of
> compressing
> > > and
> > > > > > decompressing).
> > > > > >
> > > > > > -Dima
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 13, 2014 at 10:35 AM, jeremy p <
> > > > > athomewithagroovebox@gmail.com <javascript:;> <javascript:;>
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey all,
> > > > > > >
> > > > > > > Right now, I'm not using compression on any of my tables,
> because
> > > our
> > > > > > data
> > > > > > > doesn't take up a huge amount of space.  However, I would turn
> on
> > > > > > > compression if there was a chance it would improve HBase's
> > > > performance.
> > > > > >  By
> > > > > > > performance, I'm talking about the speed with which HBase
> > responds
> > > to
> > > > > > > requests and retrieves data.
> > > > > > >
> > > > > > > Should I turn compression on?
> > > > > > >
> > > > > > > --Jeremy
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Does compression ever improve performance?

Posted by Tom Brown <to...@gmail.com>.
I don't mean to hijack the thread, but this question seems relevant:

Does data block encoding also help performance, or does it just enable more
efficient compression?

--Tom

On Saturday, June 14, 2014, Guillermo Ortiz <ko...@gmail.com> wrote:

> I would like to see the times they got doing some scans or get with the
> benchmark about compression and block code to figure out how much time to
> save if your data are smaller but you have to decompress them.
>
> El sábado, 14 de junio de 2014, Kevin O'dell <kevin.odell@cloudera.com
> <javascript:;>>
> escribió:
>
> > Hi Jeremy,
> >
> >   I always recommend turning on snappy compression,  I have ~20%
> > performance increases.
> > On Jun 14, 2014 10:25 AM, "Ted Yu" <yuzhihong@gmail.com <javascript:;>
> <javascript:;>>
> > wrote:
> >
> > > You may have read Doug Meil's writeup where he tried out different
> > > ColumnFamily
> > > compressions :
> > >
> > > https://blogs.apache.org/hbase/
> > >
> > > Cheers
> > >
> > >
> > > On Fri, Jun 13, 2014 at 11:33 AM, jeremy p <
> > athomewithagroovebox@gmail.com <javascript:;> <javascript:;>
> > > >
> > > wrote:
> > >
> > > > Thank you -- I'll go ahead and try compression.
> > > >
> > > > --Jeremy
> > > >
> > > >
> > > > On Fri, Jun 13, 2014 at 10:59 AM, Dima Spivak <dspivak@cloudera.com
> <javascript:;>
> > <javascript:;>>
> > > > wrote:
> > > >
> > > > > I'd highly recommend it. In general, compressing your column
> families
> > > > will
> > > > > improve performance by reducing the resources required to get data
> > from
> > > > > disk (even when taking into account the CPU overhead of compressing
> > and
> > > > > decompressing).
> > > > >
> > > > > -Dima
> > > > >
> > > > >
> > > > > On Fri, Jun 13, 2014 at 10:35 AM, jeremy p <
> > > > athomewithagroovebox@gmail.com <javascript:;> <javascript:;>
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey all,
> > > > > >
> > > > > > Right now, I'm not using compression on any of my tables, because
> > our
> > > > > data
> > > > > > doesn't take up a huge amount of space.  However, I would turn on
> > > > > > compression if there was a chance it would improve HBase's
> > > performance.
> > > > >  By
> > > > > > performance, I'm talking about the speed with which HBase
> responds
> > to
> > > > > > requests and retrieves data.
> > > > > >
> > > > > > Should I turn compression on?
> > > > > >
> > > > > > --Jeremy
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Does compression ever improve performance?

Posted by Guillermo Ortiz <ko...@gmail.com>.
I would like to see the times they got doing some scans or get with the
benchmark about compression and block code to figure out how much time to
save if your data are smaller but you have to decompress them.

El sábado, 14 de junio de 2014, Kevin O'dell <ke...@cloudera.com>
escribió:

> Hi Jeremy,
>
>   I always recommend turning on snappy compression,  I have ~20%
> performance increases.
> On Jun 14, 2014 10:25 AM, "Ted Yu" <yuzhihong@gmail.com <javascript:;>>
> wrote:
>
> > You may have read Doug Meil's writeup where he tried out different
> > ColumnFamily
> > compressions :
> >
> > https://blogs.apache.org/hbase/
> >
> > Cheers
> >
> >
> > On Fri, Jun 13, 2014 at 11:33 AM, jeremy p <
> athomewithagroovebox@gmail.com <javascript:;>
> > >
> > wrote:
> >
> > > Thank you -- I'll go ahead and try compression.
> > >
> > > --Jeremy
> > >
> > >
> > > On Fri, Jun 13, 2014 at 10:59 AM, Dima Spivak <dspivak@cloudera.com
> <javascript:;>>
> > > wrote:
> > >
> > > > I'd highly recommend it. In general, compressing your column families
> > > will
> > > > improve performance by reducing the resources required to get data
> from
> > > > disk (even when taking into account the CPU overhead of compressing
> and
> > > > decompressing).
> > > >
> > > > -Dima
> > > >
> > > >
> > > > On Fri, Jun 13, 2014 at 10:35 AM, jeremy p <
> > > athomewithagroovebox@gmail.com <javascript:;>
> > > > >
> > > > wrote:
> > > >
> > > > > Hey all,
> > > > >
> > > > > Right now, I'm not using compression on any of my tables, because
> our
> > > > data
> > > > > doesn't take up a huge amount of space.  However, I would turn on
> > > > > compression if there was a chance it would improve HBase's
> > performance.
> > > >  By
> > > > > performance, I'm talking about the speed with which HBase responds
> to
> > > > > requests and retrieves data.
> > > > >
> > > > > Should I turn compression on?
> > > > >
> > > > > --Jeremy
> > > > >
> > > >
> > >
> >
>

Re: Does compression ever improve performance?

Posted by Kevin O'dell <ke...@cloudera.com>.
Hi Jeremy,

  I always recommend turning on snappy compression,  I have ~20%
performance increases.
On Jun 14, 2014 10:25 AM, "Ted Yu" <yu...@gmail.com> wrote:

> You may have read Doug Meil's writeup where he tried out different
> ColumnFamily
> compressions :
>
> https://blogs.apache.org/hbase/
>
> Cheers
>
>
> On Fri, Jun 13, 2014 at 11:33 AM, jeremy p <athomewithagroovebox@gmail.com
> >
> wrote:
>
> > Thank you -- I'll go ahead and try compression.
> >
> > --Jeremy
> >
> >
> > On Fri, Jun 13, 2014 at 10:59 AM, Dima Spivak <ds...@cloudera.com>
> > wrote:
> >
> > > I'd highly recommend it. In general, compressing your column families
> > will
> > > improve performance by reducing the resources required to get data from
> > > disk (even when taking into account the CPU overhead of compressing and
> > > decompressing).
> > >
> > > -Dima
> > >
> > >
> > > On Fri, Jun 13, 2014 at 10:35 AM, jeremy p <
> > athomewithagroovebox@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hey all,
> > > >
> > > > Right now, I'm not using compression on any of my tables, because our
> > > data
> > > > doesn't take up a huge amount of space.  However, I would turn on
> > > > compression if there was a chance it would improve HBase's
> performance.
> > >  By
> > > > performance, I'm talking about the speed with which HBase responds to
> > > > requests and retrieves data.
> > > >
> > > > Should I turn compression on?
> > > >
> > > > --Jeremy
> > > >
> > >
> >
>

Re: Does compression ever improve performance?

Posted by Ted Yu <yu...@gmail.com>.
You may have read Doug Meil's writeup where he tried out different ColumnFamily
compressions :

https://blogs.apache.org/hbase/

Cheers


On Fri, Jun 13, 2014 at 11:33 AM, jeremy p <at...@gmail.com>
wrote:

> Thank you -- I'll go ahead and try compression.
>
> --Jeremy
>
>
> On Fri, Jun 13, 2014 at 10:59 AM, Dima Spivak <ds...@cloudera.com>
> wrote:
>
> > I'd highly recommend it. In general, compressing your column families
> will
> > improve performance by reducing the resources required to get data from
> > disk (even when taking into account the CPU overhead of compressing and
> > decompressing).
> >
> > -Dima
> >
> >
> > On Fri, Jun 13, 2014 at 10:35 AM, jeremy p <
> athomewithagroovebox@gmail.com
> > >
> > wrote:
> >
> > > Hey all,
> > >
> > > Right now, I'm not using compression on any of my tables, because our
> > data
> > > doesn't take up a huge amount of space.  However, I would turn on
> > > compression if there was a chance it would improve HBase's performance.
> >  By
> > > performance, I'm talking about the speed with which HBase responds to
> > > requests and retrieves data.
> > >
> > > Should I turn compression on?
> > >
> > > --Jeremy
> > >
> >
>

Re: Does compression ever improve performance?

Posted by jeremy p <at...@gmail.com>.
Thank you -- I'll go ahead and try compression.

--Jeremy


On Fri, Jun 13, 2014 at 10:59 AM, Dima Spivak <ds...@cloudera.com> wrote:

> I'd highly recommend it. In general, compressing your column families will
> improve performance by reducing the resources required to get data from
> disk (even when taking into account the CPU overhead of compressing and
> decompressing).
>
> -Dima
>
>
> On Fri, Jun 13, 2014 at 10:35 AM, jeremy p <athomewithagroovebox@gmail.com
> >
> wrote:
>
> > Hey all,
> >
> > Right now, I'm not using compression on any of my tables, because our
> data
> > doesn't take up a huge amount of space.  However, I would turn on
> > compression if there was a chance it would improve HBase's performance.
>  By
> > performance, I'm talking about the speed with which HBase responds to
> > requests and retrieves data.
> >
> > Should I turn compression on?
> >
> > --Jeremy
> >
>

Re: Does compression ever improve performance?

Posted by Dima Spivak <ds...@cloudera.com>.
I'd highly recommend it. In general, compressing your column families will
improve performance by reducing the resources required to get data from
disk (even when taking into account the CPU overhead of compressing and
decompressing).

-Dima


On Fri, Jun 13, 2014 at 10:35 AM, jeremy p <at...@gmail.com>
wrote:

> Hey all,
>
> Right now, I'm not using compression on any of my tables, because our data
> doesn't take up a huge amount of space.  However, I would turn on
> compression if there was a chance it would improve HBase's performance.  By
> performance, I'm talking about the speed with which HBase responds to
> requests and retrieves data.
>
> Should I turn compression on?
>
> --Jeremy
>