You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by 林煒清 <th...@gmail.com> on 2013/12/02 07:00:13 UTC

What is HBase compaction-queue-size at all?

Any one knows what compaction queue size is meant?

By doc's definition:

*9.2.5.* hbase.regionserver.compactionQueueSize Size of the compaction
queue. This is the number of stores in the region that have been targeted
for compaction.


   - Is it the *number of Store* of regionserver need to be major compacted
   ? or numbers of which is* being* compacted currently ?

I have a job writing data in a hotspot style using sequential key(non
distributed) with 1 family so that 1 Store each region.

I was discovering that at some time it got *regionserver
compaction-queue-size = 4*.(I check it from Ambari). That's theoretically
impossible since I have only *one Store *to write(sequential key) at any
time, incurring only one major compaction is more reasonable.


   - Then I dig into the logs ,found there is no thing about hints of
    queue size > 0: Every major compaction just say *"This selection was in
   queue for 0sec", *I don't really understand what's it to means? is it
   saying hbase has nothing in compaction queue?

013-11-26 12:28:00,778 INFO
[regionserver60020-smallCompactions-1385440028938] regionserver.HStore:
Completed major compaction of 3 file(s) in f1 of myTable.key.md5.... into
md5....(size=607.8 M), total size for store is 645.8 M.*This selection was
in queue for 0sec*, and took 39sec to execute.


   - Just more confusing is : Isn't multi-thread enabled at earlier version
   that will  allocate each compaction job to a thread , by this reason why
   there exists compaction queue waiting for processing ?

Re: What is HBase compaction-queue-size at all?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
> So I think it includes both running compactions and those in queue. Am I
missing  something?
Yes, that's correct. A major is just a compaction running on all the
regions. So a region server will count it like a compaction. But it can
also be a minor that the RS is seeing. So not necessary a major, but can be.


2013/12/2 Bharath Vissapragada <bh...@cloudera.com>

> Hi,
>
>
> On Mon, Dec 2, 2013 at 8:07 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org
> > wrote:
>
> >    - Is it the *number of Store* of regionserver need to be major
> compacted
> >    ? or numbers of which is* being* compacted currently ?
> >
> >
> > This is the number that are currently in the pipe. Doesn.'t mean they are
> > compacting right now, but they are queued for compaction. and not
> necessary
> > major compaction. Major is only if all the regions need to compact.
> >
>
> Are you sure about this? I had a quick look at the code and this value is
> sum of sizes of queues largeCompactions and smallCompactions. The code
> doesn't keep track of whether they are running/in the queue. So I think it
> includes both running compactions and those in queue. Am I missing
> something?
>
>
> > "I was discovering that at some time it got *regionserver
> > compaction-queue-size = 4*.(I check it from Ambari). That's theoretically
> > impossible since I have only *one Store *to write(sequential key) at any
> > time, incurring only one major compaction is more reasonable."
> >
>
> Adding to what JMS said, compaction is a per region thing. If your write
> test creates multiple regions, there is a possibility that multiple
> compactions happen at the same time since they are queued.
>
>
> >
> > Why is this "impossible"? A store file is a dump of HBase memory blocks
> > written into the disk. Even if you write to a single region, single
> table,
> > with keys all close-by (even if it's all the same exact key). When the
> > block in memory reach a threshold, it's then written into the disk. When
> > more than x blocks (3 is the default) are there in disk, compaction is
> > launched.
> >
> >    - Just more confusing is : Isn't multi-thread enabled at earlier
> version
> >    that will  allocate each compaction job to a thread , by this reason
> why
> >    there exists compaction queue waiting for processing ?
> >
> > Yes, compaction is done on a separate thread, but there is one single
> > queue. You don't want to take 100% of you RS resources to do
> compactions...
> >
> > Depending if you are doing mostly writes and almost no reads, you might
> > want to tweek some parameters. And also, you might want to look into bulk
> > loading...
> >
> > Last, maybe you should review you key and distribution.
> >
> > And last again ;) What is your table definition? Multiplying the columns
> > famillies can also sometime lend to this kind of issues...
> >
> > JM
> >
> >
> >
> >
> > 2013/12/2 林煒清 <th...@gmail.com>
> >
> > > Any one knows what compaction queue size is meant?
> > >
> > > By doc's definition:
> > >
> > > *9.2.5.* hbase.regionserver.compactionQueueSize Size of the compaction
> > > queue. This is the number of stores in the region that have been
> targeted
> > > for compaction.
> > >
> > >
> > >    - Is it the *number of Store* of regionserver need to be major
> > compacted
> > >    ? or numbers of which is* being* compacted currently ?
> > >
> > > I have a job writing data in a hotspot style using sequential key(non
> > > distributed) with 1 family so that 1 Store each region.
> > >
> > > I was discovering that at some time it got *regionserver
> > > compaction-queue-size = 4*.(I check it from Ambari). That's
> theoretically
> > > impossible since I have only *one Store *to write(sequential key) at
> any
> > > time, incurring only one major compaction is more reasonable.
> > >
> > >
> > >    - Then I dig into the logs ,found there is no thing about hints of
> > >     queue size > 0: Every major compaction just say *"This selection
> was
> > in
> > >    queue for 0sec", *I don't really understand what's it to means? is
> it
> > >    saying hbase has nothing in compaction queue?
> > >
> > > 013-11-26 12:28:00,778 INFO
> > > [regionserver60020-smallCompactions-1385440028938] regionserver.HStore:
> > > Completed major compaction of 3 file(s) in f1 of myTable.key.md5....
> into
> > > md5....(size=607.8 M), total size for store is 645.8 M.*This selection
> > was
> > > in queue for 0sec*, and took 39sec to execute.
> > >
> > >
> > >    - Just more confusing is : Isn't multi-thread enabled at earlier
> > version
> > >    that will  allocate each compaction job to a thread , by this reason
> > why
> > >    there exists compaction queue waiting for processing ?
> > >
> >
>
>
>
> --
> Bharath Vissapragada
> <http://www.cloudera.com>
>

Re: What is HBase compaction-queue-size at all?

Posted by Bharath Vissapragada <bh...@cloudera.com>.
Hi,


On Mon, Dec 2, 2013 at 8:07 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

>    - Is it the *number of Store* of regionserver need to be major compacted
>    ? or numbers of which is* being* compacted currently ?
>
>
> This is the number that are currently in the pipe. Doesn.'t mean they are
> compacting right now, but they are queued for compaction. and not necessary
> major compaction. Major is only if all the regions need to compact.
>

Are you sure about this? I had a quick look at the code and this value is
sum of sizes of queues largeCompactions and smallCompactions. The code
doesn't keep track of whether they are running/in the queue. So I think it
includes both running compactions and those in queue. Am I missing
something?


> "I was discovering that at some time it got *regionserver
> compaction-queue-size = 4*.(I check it from Ambari). That's theoretically
> impossible since I have only *one Store *to write(sequential key) at any
> time, incurring only one major compaction is more reasonable."
>

Adding to what JMS said, compaction is a per region thing. If your write
test creates multiple regions, there is a possibility that multiple
compactions happen at the same time since they are queued.


>
> Why is this "impossible"? A store file is a dump of HBase memory blocks
> written into the disk. Even if you write to a single region, single table,
> with keys all close-by (even if it's all the same exact key). When the
> block in memory reach a threshold, it's then written into the disk. When
> more than x blocks (3 is the default) are there in disk, compaction is
> launched.
>
>    - Just more confusing is : Isn't multi-thread enabled at earlier version
>    that will  allocate each compaction job to a thread , by this reason why
>    there exists compaction queue waiting for processing ?
>
> Yes, compaction is done on a separate thread, but there is one single
> queue. You don't want to take 100% of you RS resources to do compactions...
>
> Depending if you are doing mostly writes and almost no reads, you might
> want to tweek some parameters. And also, you might want to look into bulk
> loading...
>
> Last, maybe you should review you key and distribution.
>
> And last again ;) What is your table definition? Multiplying the columns
> famillies can also sometime lend to this kind of issues...
>
> JM
>
>
>
>
> 2013/12/2 林煒清 <th...@gmail.com>
>
> > Any one knows what compaction queue size is meant?
> >
> > By doc's definition:
> >
> > *9.2.5.* hbase.regionserver.compactionQueueSize Size of the compaction
> > queue. This is the number of stores in the region that have been targeted
> > for compaction.
> >
> >
> >    - Is it the *number of Store* of regionserver need to be major
> compacted
> >    ? or numbers of which is* being* compacted currently ?
> >
> > I have a job writing data in a hotspot style using sequential key(non
> > distributed) with 1 family so that 1 Store each region.
> >
> > I was discovering that at some time it got *regionserver
> > compaction-queue-size = 4*.(I check it from Ambari). That's theoretically
> > impossible since I have only *one Store *to write(sequential key) at any
> > time, incurring only one major compaction is more reasonable.
> >
> >
> >    - Then I dig into the logs ,found there is no thing about hints of
> >     queue size > 0: Every major compaction just say *"This selection was
> in
> >    queue for 0sec", *I don't really understand what's it to means? is it
> >    saying hbase has nothing in compaction queue?
> >
> > 013-11-26 12:28:00,778 INFO
> > [regionserver60020-smallCompactions-1385440028938] regionserver.HStore:
> > Completed major compaction of 3 file(s) in f1 of myTable.key.md5.... into
> > md5....(size=607.8 M), total size for store is 645.8 M.*This selection
> was
> > in queue for 0sec*, and took 39sec to execute.
> >
> >
> >    - Just more confusing is : Isn't multi-thread enabled at earlier
> version
> >    that will  allocate each compaction job to a thread , by this reason
> why
> >    there exists compaction queue waiting for processing ?
> >
>



-- 
Bharath Vissapragada
<http://www.cloudera.com>

Re: What is HBase compaction-queue-size at all?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
   - Is it the *number of Store* of regionserver need to be major compacted
   ? or numbers of which is* being* compacted currently ?


This is the number that are currently in the pipe. Doesn.'t mean they are
compacting right now, but they are queued for compaction. and not necessary
major compaction. Major is only if all the regions need to compact.

"I was discovering that at some time it got *regionserver
compaction-queue-size = 4*.(I check it from Ambari). That's theoretically
impossible since I have only *one Store *to write(sequential key) at any
time, incurring only one major compaction is more reasonable."

Why is this "impossible"? A store file is a dump of HBase memory blocks
written into the disk. Even if you write to a single region, single table,
with keys all close-by (even if it's all the same exact key). When the
block in memory reach a threshold, it's then written into the disk. When
more than x blocks (3 is the default) are there in disk, compaction is
launched.

   - Just more confusing is : Isn't multi-thread enabled at earlier version
   that will  allocate each compaction job to a thread , by this reason why
   there exists compaction queue waiting for processing ?

Yes, compaction is done on a separate thread, but there is one single
queue. You don't want to take 100% of you RS resources to do compactions...

Depending if you are doing mostly writes and almost no reads, you might
want to tweek some parameters. And also, you might want to look into bulk
loading...

Last, maybe you should review you key and distribution.

And last again ;) What is your table definition? Multiplying the columns
famillies can also sometime lend to this kind of issues...

JM




2013/12/2 林煒清 <th...@gmail.com>

> Any one knows what compaction queue size is meant?
>
> By doc's definition:
>
> *9.2.5.* hbase.regionserver.compactionQueueSize Size of the compaction
> queue. This is the number of stores in the region that have been targeted
> for compaction.
>
>
>    - Is it the *number of Store* of regionserver need to be major compacted
>    ? or numbers of which is* being* compacted currently ?
>
> I have a job writing data in a hotspot style using sequential key(non
> distributed) with 1 family so that 1 Store each region.
>
> I was discovering that at some time it got *regionserver
> compaction-queue-size = 4*.(I check it from Ambari). That's theoretically
> impossible since I have only *one Store *to write(sequential key) at any
> time, incurring only one major compaction is more reasonable.
>
>
>    - Then I dig into the logs ,found there is no thing about hints of
>     queue size > 0: Every major compaction just say *"This selection was in
>    queue for 0sec", *I don't really understand what's it to means? is it
>    saying hbase has nothing in compaction queue?
>
> 013-11-26 12:28:00,778 INFO
> [regionserver60020-smallCompactions-1385440028938] regionserver.HStore:
> Completed major compaction of 3 file(s) in f1 of myTable.key.md5.... into
> md5....(size=607.8 M), total size for store is 645.8 M.*This selection was
> in queue for 0sec*, and took 39sec to execute.
>
>
>    - Just more confusing is : Isn't multi-thread enabled at earlier version
>    that will  allocate each compaction job to a thread , by this reason why
>    there exists compaction queue waiting for processing ?
>