You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2013/02/19 04:30:43 UTC

HBase without compactions?

Hello,

It's kind of funny, we run SPM, which includes SPM for HBase (performance
monitoring service/tool for HBase essentially) and we currently store all
performance metrics in HBase.

I see a ton of HBase development activity, which is great, but it just
occurred to me that I don't think I recall seeing anything about getting
rid of compactions.  Yet, compactions are one thing that I know hurt us the
most and is one thing that MapR somehow got rid of in their implementation.

Have there been any discussions,attempts, or thoughts about finding a way
to avoid compactions?

Thanks,
Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html

Re: HBase without compactions?

Posted by Enis Söztutar <en...@gmail.com>.
>From some of their presentations, I've gathered that they implement
B-Tree's instead of LSM's on top of their file system which allows random
writes. They also claim that they are converting random mutation requests
to the B-Tree leafs to sequential-writes. They are also talking about
mini-WALs to do this, so there might be mini-LSM's going on. Not sure.

Any case, agreed with, if there are LSMs there are compactions. LSM vs
B-Trees tradeoff's are well understood.

Enis


On Tue, Feb 19, 2013 at 12:12 AM, lars hofhansl <la...@apache.org> wrote:

> If you store data in LSM trees you need compactions.
> The advantage is that your data files are immutable.
> MapR has a mutable file system and they probably store their data in
> something more akin to B-Trees...?
> Or maybe they somehow avoid the expensive merge sorting of many small
> files. It seems that is has to be one or the other.
>
> (Maybe somebody from MapR reads this and can explain how it actually
> works.)
>
> Compations let you trade random IO for sequential IO (just to state the
> obvious). It seems that you can't have it both ways.
>
> -- Lars
>
>
>
> ________________________________
>  From: Otis Gospodnetic <ot...@gmail.com>
> To: user@hbase.apache.org
> Sent: Monday, February 18, 2013 7:30 PM
> Subject: HBase without compactions?
>
> Hello,
>
> It's kind of funny, we run SPM, which includes SPM for HBase (performance
> monitoring service/tool for HBase essentially) and we currently store all
> performance metrics in HBase.
>
> I see a ton of HBase development activity, which is great, but it just
> occurred to me that I don't think I recall seeing anything about getting
> rid of compactions.  Yet, compactions are one thing that I know hurt us the
> most and is one thing that MapR somehow got rid of in their implementation.
>
> Have there been any discussions,attempts, or thoughts about finding a way
> to avoid compactions?
>
> Thanks,
> Otis
> --
> HBASE Performance Monitoring - http://sematext.com/spm/index.html
>

Re: HBase without compactions?

Posted by lars hofhansl <la...@apache.org>.
If you store data in LSM trees you need compactions.
The advantage is that your data files are immutable.
MapR has a mutable file system and they probably store their data in something more akin to B-Trees...?
Or maybe they somehow avoid the expensive merge sorting of many small files. It seems that is has to be one or the other.

(Maybe somebody from MapR reads this and can explain how it actually works.)

Compations let you trade random IO for sequential IO (just to state the obvious). It seems that you can't have it both ways.

-- Lars



________________________________
 From: Otis Gospodnetic <ot...@gmail.com>
To: user@hbase.apache.org 
Sent: Monday, February 18, 2013 7:30 PM
Subject: HBase without compactions?
 
Hello,

It's kind of funny, we run SPM, which includes SPM for HBase (performance
monitoring service/tool for HBase essentially) and we currently store all
performance metrics in HBase.

I see a ton of HBase development activity, which is great, but it just
occurred to me that I don't think I recall seeing anything about getting
rid of compactions.  Yet, compactions are one thing that I know hurt us the
most and is one thing that MapR somehow got rid of in their implementation.

Have there been any discussions,attempts, or thoughts about finding a way
to avoid compactions?

Thanks,
Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html

Re: HBase without compactions?

Posted by Andrew Purtell <ap...@apache.org>.
MapR had a distributed key value store internal to the FS for its
metadata. Eventually they got the idea to put an API on it that mimics the
HBase client API. This is not "removing compactions". I can't say for sure
but feel pretty comfortable stating its an alternate architecture to
BigTable, there was never a need for compactions in the first place but
instead some other tradeoffs. Apples and oranges.



On Monday, February 18, 2013, Michael Segel wrote:

> Well
>
> M7 is supposed to be in open public beta,  I think.
>
> I haven't had time to play with it, but MapR has a lot of nice features
> that can't really be done in HDFS.
> Its basically the benefits of being almost POSIX compliant.
>
> The reason I mention M7 is that they supposedly get rid of compactions,
> however I havent seen it in action.
>
> In theory I can see this happening because you have a rw filesystem so why
> would would you need to have a write only file  and then compaction where
> the write only files merge?
>
>
> I would imagine with some thought and time, HDFS could evolve to this ...
>
> Its an interesting evolution on MapR's part and you have to give those
> guys credit for doing something cool.
>
> On Feb 19, 2013, at 12:09 AM, Stack <stack@duboce.net <javascript:;>>
> wrote:
>
> > On Mon, Feb 18, 2013 at 7:30 PM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com <javascript:;>> wrote:
> >
> >> Have there been any discussions,attempts, or thoughts about finding a
> way
> >> to avoid compactions?
> >>
> >
> >
> > Any ideas on how it would work Otis?
> >
> > Anyone know what m7 does?
> >
> > St.Ack
>
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase without compactions?

Posted by Michael Segel <mi...@hotmail.com>.
Well

M7 is supposed to be in open public beta,  I think. 

I haven't had time to play with it, but MapR has a lot of nice features that can't really be done in HDFS. 
Its basically the benefits of being almost POSIX compliant. 

The reason I mention M7 is that they supposedly get rid of compactions, however I havent seen it in action. 

In theory I can see this happening because you have a rw filesystem so why would would you need to have a write only file  and then compaction where the write only files merge? 


I would imagine with some thought and time, HDFS could evolve to this ... 

Its an interesting evolution on MapR's part and you have to give those guys credit for doing something cool.

On Feb 19, 2013, at 12:09 AM, Stack <st...@duboce.net> wrote:

> On Mon, Feb 18, 2013 at 7:30 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
> 
>> Have there been any discussions,attempts, or thoughts about finding a way
>> to avoid compactions?
>> 
> 
> 
> Any ideas on how it would work Otis?
> 
> Anyone know what m7 does?
> 
> St.Ack


Re: HBase without compactions?

Posted by Stack <st...@duboce.net>.
On Mon, Feb 18, 2013 at 7:30 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Have there been any discussions,attempts, or thoughts about finding a way
> to avoid compactions?
>


Any ideas on how it would work Otis?

Anyone know what m7 does?

St.Ack

Re: HBase without compactions?

Posted by Otis Gospodnetic <ot...@gmail.com>.
And MapR has their own, completely reimplemented HDFS without these
deficiencies.... and I can stop dreaming about compactionless Apache HBase?

Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html



On Tue, Feb 19, 2013 at 12:46 AM, Michael Segel
<mi...@hotmail.com>wrote:

>
> In a single word, yes.
>
> Or rather you can't have a compactionless HBase without fixing the
> deficiencies in HDFS.
>
> On Feb 18, 2013, at 11:01 PM, Otis Gospodnetic <ot...@gmail.com>
> wrote:
>
> > Hi,
> >
> >
> > On Mon, Feb 18, 2013 at 11:50 PM, Michael Segel
> > <mi...@hotmail.com>wrote:
> >
> >> Take a look at MapR's M7
> >>
> >> For Apache based Hadoop and HBase, you will need to evolve HDFS.
> >>
> >
> >
> > What do you mean by evolve HDFS?  You mean HDFS would need to change if
> > Apache HBase were to become compactionless?
> >
> > Otis
> > --
> > HBASE Performance Monitoring - http://sematext.com/spm/index.html
> >
> >
> >
> >>
> >>
> >> On Feb 18, 2013, at 9:30 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> It's kind of funny, we run SPM, which includes SPM for HBase
> (performance
> >>> monitoring service/tool for HBase essentially) and we currently store
> all
> >>> performance metrics in HBase.
> >>>
> >>> I see a ton of HBase development activity, which is great, but it just
> >>> occurred to me that I don't think I recall seeing anything about
> getting
> >>> rid of compactions.  Yet, compactions are one thing that I know hurt us
> >> the
> >>> most and is one thing that MapR somehow got rid of in their
> >> implementation.
> >>>
> >>> Have there been any discussions,attempts, or thoughts about finding a
> way
> >>> to avoid compactions?
> >>>
> >>> Thanks,
> >>> Otis
> >>> --
> >>> HBASE Performance Monitoring - http://sematext.com/spm/index.html
> >>
> >>
>
>

Re: HBase without compactions?

Posted by Michael Segel <mi...@hotmail.com>.
In a single word, yes.

Or rather you can't have a compactionless HBase without fixing the deficiencies in HDFS. 

On Feb 18, 2013, at 11:01 PM, Otis Gospodnetic <ot...@gmail.com> wrote:

> Hi,
> 
> 
> On Mon, Feb 18, 2013 at 11:50 PM, Michael Segel
> <mi...@hotmail.com>wrote:
> 
>> Take a look at MapR's M7
>> 
>> For Apache based Hadoop and HBase, you will need to evolve HDFS.
>> 
> 
> 
> What do you mean by evolve HDFS?  You mean HDFS would need to change if
> Apache HBase were to become compactionless?
> 
> Otis
> --
> HBASE Performance Monitoring - http://sematext.com/spm/index.html
> 
> 
> 
>> 
>> 
>> On Feb 18, 2013, at 9:30 PM, Otis Gospodnetic <ot...@gmail.com>
>> wrote:
>> 
>>> Hello,
>>> 
>>> It's kind of funny, we run SPM, which includes SPM for HBase (performance
>>> monitoring service/tool for HBase essentially) and we currently store all
>>> performance metrics in HBase.
>>> 
>>> I see a ton of HBase development activity, which is great, but it just
>>> occurred to me that I don't think I recall seeing anything about getting
>>> rid of compactions.  Yet, compactions are one thing that I know hurt us
>> the
>>> most and is one thing that MapR somehow got rid of in their
>> implementation.
>>> 
>>> Have there been any discussions,attempts, or thoughts about finding a way
>>> to avoid compactions?
>>> 
>>> Thanks,
>>> Otis
>>> --
>>> HBASE Performance Monitoring - http://sematext.com/spm/index.html
>> 
>> 


Re: HBase without compactions?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,


On Mon, Feb 18, 2013 at 11:50 PM, Michael Segel
<mi...@hotmail.com>wrote:

> Take a look at MapR's M7
>
> For Apache based Hadoop and HBase, you will need to evolve HDFS.
>


What do you mean by evolve HDFS?  You mean HDFS would need to change if
Apache HBase were to become compactionless?

Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html



>
>
> On Feb 18, 2013, at 9:30 PM, Otis Gospodnetic <ot...@gmail.com>
> wrote:
>
> > Hello,
> >
> > It's kind of funny, we run SPM, which includes SPM for HBase (performance
> > monitoring service/tool for HBase essentially) and we currently store all
> > performance metrics in HBase.
> >
> > I see a ton of HBase development activity, which is great, but it just
> > occurred to me that I don't think I recall seeing anything about getting
> > rid of compactions.  Yet, compactions are one thing that I know hurt us
> the
> > most and is one thing that MapR somehow got rid of in their
> implementation.
> >
> > Have there been any discussions,attempts, or thoughts about finding a way
> > to avoid compactions?
> >
> > Thanks,
> > Otis
> > --
> > HBASE Performance Monitoring - http://sematext.com/spm/index.html
>
>

Re: HBase without compactions?

Posted by Michael Segel <mi...@hotmail.com>.
Take a look at MapR's M7

For Apache based Hadoop and HBase, you will need to evolve HDFS. 


On Feb 18, 2013, at 9:30 PM, Otis Gospodnetic <ot...@gmail.com> wrote:

> Hello,
> 
> It's kind of funny, we run SPM, which includes SPM for HBase (performance
> monitoring service/tool for HBase essentially) and we currently store all
> performance metrics in HBase.
> 
> I see a ton of HBase development activity, which is great, but it just
> occurred to me that I don't think I recall seeing anything about getting
> rid of compactions.  Yet, compactions are one thing that I know hurt us the
> most and is one thing that MapR somehow got rid of in their implementation.
> 
> Have there been any discussions,attempts, or thoughts about finding a way
> to avoid compactions?
> 
> Thanks,
> Otis
> --
> HBASE Performance Monitoring - http://sematext.com/spm/index.html


Re: HBase without compactions?

Posted by Michael Segel <mi...@hotmail.com>.
He asked for a compactionless version. 
You still have compactions w a stripe compaction. 


On Feb 18, 2013, at 10:54 PM, Ted Yu <yu...@gmail.com> wrote:

> Take a look at HBASE-7667 Support stripe compaction
> 
> Cheers
> 
> On Mon, Feb 18, 2013 at 7:30 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
> 
>> Hello,
>> 
>> It's kind of funny, we run SPM, which includes SPM for HBase (performance
>> monitoring service/tool for HBase essentially) and we currently store all
>> performance metrics in HBase.
>> 
>> I see a ton of HBase development activity, which is great, but it just
>> occurred to me that I don't think I recall seeing anything about getting
>> rid of compactions.  Yet, compactions are one thing that I know hurt us the
>> most and is one thing that MapR somehow got rid of in their implementation.
>> 
>> Have there been any discussions,attempts, or thoughts about finding a way
>> to avoid compactions?
>> 
>> Thanks,
>> Otis
>> --
>> HBASE Performance Monitoring - http://sematext.com/spm/index.html
>> 


Re: HBase without compactions?

Posted by Ted Yu <yu...@gmail.com>.
We will do some testing along with code reviews.

You can watch HBASE-7667 for further development.

You're right in that its goal is not to get rid of compaction.

Thanks

On Mon, Feb 18, 2013 at 9:06 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hi,
>
> HBASE-7667 sounds like an improvement whose details I don't fully
> understand, but not quite the same as compaction elimination.  And I don't
> understand HBASE-7667 enough to have the feeling for how much less painful
> compactions would become after this.  Any way to quantify that?
>
> Thanks,
> Otis
> --
> HBASE Performance Monitoring - http://sematext.com/spm/index.html
>
>
>
> On Mon, Feb 18, 2013 at 11:54 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Take a look at HBASE-7667 Support stripe compaction
> >
> > Cheers
> >
> > On Mon, Feb 18, 2013 at 7:30 PM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > It's kind of funny, we run SPM, which includes SPM for HBase
> (performance
> > > monitoring service/tool for HBase essentially) and we currently store
> all
> > > performance metrics in HBase.
> > >
> > > I see a ton of HBase development activity, which is great, but it just
> > > occurred to me that I don't think I recall seeing anything about
> getting
> > > rid of compactions.  Yet, compactions are one thing that I know hurt us
> > the
> > > most and is one thing that MapR somehow got rid of in their
> > implementation.
> > >
> > > Have there been any discussions,attempts, or thoughts about finding a
> way
> > > to avoid compactions?
> > >
> > > Thanks,
> > > Otis
> > > --
> > > HBASE Performance Monitoring - http://sematext.com/spm/index.html
> > >
> >
>

Re: HBase without compactions?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

HBASE-7667 sounds like an improvement whose details I don't fully
understand, but not quite the same as compaction elimination.  And I don't
understand HBASE-7667 enough to have the feeling for how much less painful
compactions would become after this.  Any way to quantify that?

Thanks,
Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html



On Mon, Feb 18, 2013 at 11:54 PM, Ted Yu <yu...@gmail.com> wrote:

> Take a look at HBASE-7667 Support stripe compaction
>
> Cheers
>
> On Mon, Feb 18, 2013 at 7:30 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
> > Hello,
> >
> > It's kind of funny, we run SPM, which includes SPM for HBase (performance
> > monitoring service/tool for HBase essentially) and we currently store all
> > performance metrics in HBase.
> >
> > I see a ton of HBase development activity, which is great, but it just
> > occurred to me that I don't think I recall seeing anything about getting
> > rid of compactions.  Yet, compactions are one thing that I know hurt us
> the
> > most and is one thing that MapR somehow got rid of in their
> implementation.
> >
> > Have there been any discussions,attempts, or thoughts about finding a way
> > to avoid compactions?
> >
> > Thanks,
> > Otis
> > --
> > HBASE Performance Monitoring - http://sematext.com/spm/index.html
> >
>

Re: HBase without compactions?

Posted by Ted Yu <yu...@gmail.com>.
Take a look at HBASE-7667 Support stripe compaction

Cheers

On Mon, Feb 18, 2013 at 7:30 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hello,
>
> It's kind of funny, we run SPM, which includes SPM for HBase (performance
> monitoring service/tool for HBase essentially) and we currently store all
> performance metrics in HBase.
>
> I see a ton of HBase development activity, which is great, but it just
> occurred to me that I don't think I recall seeing anything about getting
> rid of compactions.  Yet, compactions are one thing that I know hurt us the
> most and is one thing that MapR somehow got rid of in their implementation.
>
> Have there been any discussions,attempts, or thoughts about finding a way
> to avoid compactions?
>
> Thanks,
> Otis
> --
> HBASE Performance Monitoring - http://sematext.com/spm/index.html
>