You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Matt Corgan (Created) (JIRA)" <ji...@apache.org> on 2012/02/26 01:51:48 UTC

[jira] [Created] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Postpone CompactionSelection to compaction execution time
---------------------------------------------------------

                 Key: HBASE-5479
                 URL: https://issues.apache.org/jira/browse/HBASE-5479
             Project: HBase
          Issue Type: New Feature
          Components: io, performance, regionserver
            Reporter: Matt Corgan


It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.

The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  

The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.

There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.

A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217358#comment-13217358 ] 

Nicolas Spiegelberg commented on HBASE-5479:
--------------------------------------------

@Ted: right now, compaction priority is based upon congestion, so you want a single CF to get the top queue entries if it's the only one congested.  the real problem is that we need to calculate the IO overhead for not compacting files and prioritize the queue so that compaction to provide the maximum IO savings is done first.  That related to Todd's comment in point #3.  That said, we need to write the scaffolding so it's easy for us to calculate IO per file and average per-file seek redundancy.  This is a very complicated problem, not a trivial feature, so manual tuning right now is the best strategy until someone wants to dedicate enough time & resources to conquer that problem.
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Matt Corgan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217410#comment-13217410 ] 

Matt Corgan commented on HBASE-5479:
------------------------------------

{quote}you need to do a bulk import MR (vs Put-based) or you have your compaction algorithm tuned incorrectly... you probably want to switch your compaction ratio to 0.125 and play with it from there{quote}
yeah, just using it as an opportunity to push HBase with real data to see what breaks first.  i hesitate to change the global compaction ratio when it's just a couple out of ~20 tables

Agree pluggable compaction strategies would be great, as would many other per-CF settings.  Making them pluggable would be far more useful than perfecting a general algorithm.

Is there a quick fix that could deal with outdated requests?  Like ignoring a CompactionRequest if the files in its CompactionSelection are not all there.  Or when pulling a CompactionRequest from the head of the queue, iterate the entire queue to check if there's a newer CompactionRequest for the same Store.
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Matt Corgan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217655#comment-13217655 ] 

Matt Corgan commented on HBASE-5479:
------------------------------------

Rather than postponing file selection till execution time, what do you think about halting new CompactionsRequests for a Store if there is already one in the queue?  That would allow files to build up in bigger batches.

Seems rare that the hbase.hstore.compaction.max variable will come into play under the current system.
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Matt Corgan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217550#comment-13217550 ] 

Matt Corgan commented on HBASE-5479:
------------------------------------

re, outdated requests: i now see that in Store.requestCompaction, you are eliminating already queued files from consideration, so requested files will never have disappeared between when a compaction is requested vs executed.

Let me take another stab at explaining the problem.  Say you have hbase.hstore.compactionThreshold=3, hbase.hstore.compaction.max=20.  You are flushing a particular memstore every minute and compactions are backed up by an hour for whatever reason.  After 3 minutes of inserting, the CompactSplitThread will create a CompactionRequest for the first 3 StoreFiles.  During the next hour, while that first CompactionRequest is sitting in the queue, 60 new StoreFiles were added, and 20 additional CompactionRequests were queued.

Finally, the first CompactionRequest makes it to the head of the queue and is ready to be executed.  At this point, there are 63 small StoreFiles in the Store.  While this original CompactionRequest was correct at the time it was created, I would now prefer that it compacted the first 20 files, not just the first 3.

Maybe it could abort a CompactionRequest if there are already items in Store.filesCompacting.
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246405#comment-13246405 ] 

Keith Turner commented on HBASE-5479:
-------------------------------------

Accumulo does something similar to what this ticket describes.  It has a priority queue of tablets/regions that need to be major compacted. There is a thread that scans all tablets every 30 seconds to see if a compaction is needed and if so throws it on the queue.  Should probably check after flush and bulk import.  I do not think multiple entries are placed on the queue.  When something is pulled of of the queue it decides then which files to compact.   

The priority queue is sorted on compaction type and then number of files per tablet.  User requested compactions come first, then chops (special compaction for merging tablets), then system initiated compactions, then idle compactions.   Among the same type of compaction, it will take the tablet/region with the most files.  To find the tablet/region with the most files it does a linear scan of all of the tablets in the queue.  I do not like the linear scan, but I am not sure of a better way to do this since the number of files could change while something is in the queue.  Once we started taking the tablet w/ the most files it really helped overall query performance by keeping the avg files per tablet and std dev as low as possible. 

One other wrinkle is that Accumulo will only compact up to 10 files at a time (configurable).  If a tablet has 30 files, it will compact the smallest 10 files and throw the tablet back on the major compaction queue.  From a tablet/region server perspective this also helps keep the number of total files in the server down.  We used to do compaction depth first, where the tablet with 30 files would be compacted to one file.  However this could take a long time and a lot of compaction work could back up.  Doing compactions breadth first and taking the tablet with the most files has really helped keep the number of files manageable under continuous ingest.  Our continuous ingest test tracks statistics (min, max, avg, std dev) on files per tablet over time and we plot this info using gnuplot at the end of test.  Doing this type of test and looking at the data helped us formulate our current strategy.  I would encourage starting with test.

                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217456#comment-13217456 ] 

Nicolas Spiegelberg commented on HBASE-5479:
--------------------------------------------

@Matt: Also see HBASE-5335, which will allow you to change the compaction ratio on a per-cf basis for multi-flow clusters.  I am currently working on that JIRA, so I suggest you watch it.  

i.r.t. outdated requests.  The fact that you have outdated requests should mean an HBase bug, not design.  A compaction request should lock all the StoreFiles in question.  These storefiles should only be removed by the compaction & compaction requests should be disjoint.  Any break of this contract is a bug :P.  Did this arise because of splitting?
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216608#comment-13216608 ] 

stack commented on HBASE-5479:
------------------------------

Todd suggests something like a scoring over here Matt: https://issues.apache.org/jira/browse/HBASE-2457?focusedCommentId=12857705&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12857705

Lets verify that we do indeed do selection at queuing time.  Thats my suspicion.  If thats the case, for sure needs fixing.  Thanks for filing this one Matt.
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217572#comment-13217572 ] 

Nicolas Spiegelberg commented on HBASE-5479:
--------------------------------------------

I think that adding mutability to compaction enqueuing is a little bit of an advanced topic that really masks the true problem of wrong compaction settings.  The most important thing is to ensure that we eventually get down to 1 file over time instead of minimizing IO for a bad configuration that more-dramatically inflates IO.
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Mikhail Bautin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411799#comment-13411799 ] 

Mikhail Bautin commented on HBASE-5479:
---------------------------------------

HBASE-6361 (closed as duplicate) has additional relevant discussion and ideas.
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246443#comment-13246443 ] 

Keith Turner commented on HBASE-5479:
-------------------------------------

There is exception to what I said above.  User requested compaction are still done depth first with an optimization.  If a user request a tablet with 30 files compact, it will allocate a compaction thread to compact that tablet to one file.  It still only does up to 10 files at time though.
 
 * compact 10 smallest, results in 21 files
 * compact 10 smallest, results in 12 files
 * compact 3 smallest, results in 10 files  <--this is the optimization to avoid redundant work
 * compact 10 smallest, results in 1 file
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216639#comment-13216639 ] 

Zhihong Yu commented on HBASE-5479:
-----------------------------------

I think compactionPriority score needs to be designed in such a way, when multiple column families are involved, that no single column family would consistently come off the head of PriorityQueue for extended period of time.
                
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5479) Postpone CompactionSelection to compaction execution time

Posted by "Zhihong Yu (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216639#comment-13216639 ] 

Zhihong Yu edited comment on HBASE-5479 at 2/26/12 5:46 AM:
------------------------------------------------------------

I think compactionPriority score needs to be designed in such a way, when multiple column families are involved, that no single column family would exclusively come off the head of PriorityQueue for extended period of time.
                
      was (Author: zhihyu@ebaysf.com):
    I think compactionPriority score needs to be designed in such a way, when multiple column families are involved, that no single column family would consistently come off the head of PriorityQueue for extended period of time.
                  
> Postpone CompactionSelection to compaction execution time
> ---------------------------------------------------------
>
>                 Key: HBASE-5479
>                 URL: https://issues.apache.org/jira/browse/HBASE-5479
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, performance, regionserver
>            Reporter: Matt Corgan
>
> It can be commonplace for regionservers to develop long compaction queues, meaning a CompactionRequest may execute hours after it was created.  The CompactionRequest holds a CompactionSelection that was selected at request time but may no longer be the optimal selection.  The CompactionSelection should be created at compaction execution time rather than compaction request time.
> The current mechanism breaks down during high volume insertion.  The inefficiency is clearest when the inserts are finished.  Inserting for 5 hours may build up 50 storefiles and a 40 element compaction queue.  When finished inserting, you would prefer that the next compaction merges all 50 files (or some large subset), but the current system will churn through each of the 40 compaction requests, the first of which may be hours old.  This ends up re-compacting the same data many times.  
> The current system is especially inefficient when dealing with time series data where the data in the storefiles has minimal overlap.  With time series data, there is even less benefit to intermediate merges because most storefiles can be eliminated based on their key range during a read, even without bloomfilters.  The only goal should be to reduce file count, not to minimize number of files merged for each read.
> There are other aspects to the current queuing mechanism that would need to be looked at.  You would want to avoid having the same Store in the queue multiple times.  And you would want the completion of one compaction to possibly queue another compaction request for the store.
> A alternative architecture to the current style of queues would be to have each Store (all open in memory) keep a compactionPriority score up to date after events like flushes, compactions, schema changes, etc.  Then you create a "CompactionPriorityComparator implements Comparator<Store>" and stick all the Stores into a PriorityQueue (synchronized remove/add from the queue when the value changes).  The async compaction threads would keep pulling off the head of that queue as long as the head has compactionPriority > X.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira