You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Karthik Ranganathan (JIRA)" <ji...@apache.org> on 2011/09/22 23:55:26 UTC
[jira] [Created] (HBASE-4463) Run more aggressive compactions
during off peak hours
Run more aggressive compactions during off peak hours
-----------------------------------------------------
Key: HBASE-4463
URL: https://issues.apache.org/jira/browse/HBASE-4463
Project: HBase
Issue Type: Improvement
Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160481#comment-13160481 ]
Phabricator commented on HBASE-4463:
------------------------------------
Karthik has committed the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
REVISION DETAIL
https://reviews.facebook.net/D471
COMMIT
https://reviews.facebook.net/rHBASE1208885
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Fix For: 0.94.0
>
> Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112949#comment-13112949 ]
Ted Yu commented on HBASE-4463:
-------------------------------
This is a great idea.
How do we determine the off peak hours ?
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113665#comment-13113665 ]
stack commented on HBASE-4463:
------------------------------
@Dhruba We need this too. How would we do it?
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160459#comment-13160459 ]
Phabricator commented on HBASE-4463:
------------------------------------
nspiegelberg has accepted the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
REVISION DETAIL
https://reviews.facebook.net/D471
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Karthik Ranganathan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113847#comment-13113847 ]
Karthik Ranganathan commented on HBASE-4463:
--------------------------------------------
@Stack - we can find the exact amount of data we are writing to the dfs (only hfile blocks will contribute to this during compactions). So adding a threshold like this is not too hard... but there could be disk iops pressure (instead of network bandwidth) and detecting that would be hard. So we would still need to set off-peak time.
I was trying to come up with a more generic solution but that involves setting up a feedback loop inside the regionserver - keep track of max, min and average latencies over the last k days (would have to store this in META or some other location as it needs to persist beyond restarts). Need to remove any spikes in the values. When we run an aggressive compaction, we need to make sure the latencies are still acceptable, otherwise dont run aggressive compactions. This is much harder to get right though.
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151856#comment-13151856 ]
Phabricator commented on HBASE-4463:
------------------------------------
lhofhansl has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
Great idea.
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1074 Or use an iterator with a conditioned break?
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1106 Should we move functionality like this into CompactionSelection?
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1120 Might be good to have a CompationSelection.numFiles() (or similar) method
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:992 Add method to CompactionSelection so that we do not constantly "leak" filesToCompact?
REVISION DETAIL
https://reviews.facebook.net/D471
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160458#comment-13160458 ]
Phabricator commented on HBASE-4463:
------------------------------------
Karthik has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
Hey guys, skipping the cosmetic clean-ups for now - we can do those as a followup if needed, addressed the core issues that Nicolas and Ted pointed out.
REVISION DETAIL
https://reviews.facebook.net/D471
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-4463:
-------------------------------
Attachment: HBASE-4463.D471.2.patch
Karthik updated the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
Reviewers: JIRA, Kannan, nspiegelberg, mbautin, stack
Fixed Nicolas' comment about the time computation in the unit test.
Fixed Ted's comment about repeated compact selection line.
REVISION DETAIL
https://reviews.facebook.net/D471
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java
src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151849#comment-13151849 ]
Phabricator commented on HBASE-4463:
------------------------------------
nspiegelberg has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
@ted : I think allowing minute-level granularity might be little over optimizing. It just means that aggressive compact selection ends then. An agressive compaction can be sitting in the compaction queue and actually executing after the ending time. Therefore, you want to disable aggressive compactions a decent bit of time before your traffic begins to pick up.
REVISION DETAIL
https://reviews.facebook.net/D471
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160692#comment-13160692 ]
Hudson commented on HBASE-4463:
-------------------------------
Integrated in HBase-TRUNK-security #17 (See [https://builds.apache.org/job/HBase-TRUNK-security/17/])
HBASE-4463 [jira] Run more aggressive compactions during off peak hours
Summary:
HBASE-4463 Run more aggressive compactions during off peak hours
Increases the compact selection ratio from 1.3 to 5 at off-peak hours. This
will help utilize the available iops and bandwidth to decrease average num of
files per store. Only one such aggressive compaction is queued per store at any
point.
The number of iops on the disk and the top of the rack bandwidth utilization at
off peak hours is much lower than at peak hours depending on the application
usage pattern. We can utilize this knowledge to improve the performance of the
HBase cluster by increasing the compact selection ratio to a much larger value
during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio
(1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will
help reduce the average number of files per store.
Test Plan: Started running the unit tests.
Reviewers: JIRA, Kannan, nspiegelberg, mbautin, stack
Reviewed By: nspiegelberg
CC: nspiegelberg, tedyu, lhofhansl, Karthik
Differential Revision: 471
karthik :
Files :
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Fix For: 0.94.0
>
> Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151710#comment-13151710 ]
Phabricator commented on HBASE-4463:
------------------------------------
nspiegelberg has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
INLINE COMMENTS
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java:248-250 need to +24 or else it will fail at night, correct?
REVISION DETAIL
https://reviews.facebook.net/D471
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114116#comment-13114116 ]
Kannan Muthukkaruppan commented on HBASE-4463:
----------------------------------------------
Dhruba: Just wanted to point out that one knob that exists today to throttle the number of concurrent compactions is the size of the compaction thread pools. We can drop it all the way down to 1 thread. If further throttling is needed, then it'll require some new scheme.
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Karthik Ranganathan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karthik Ranganathan updated HBASE-4463:
---------------------------------------
Description: The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store. (was: The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.)
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151829#comment-13151829 ]
Phabricator commented on HBASE-4463:
------------------------------------
tedyu has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
This is a useful feature.
Currently off-peak time is at hour level. It would be nice to support minute-level off-peak time.
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:989 filesToCompact is overwritten here.
What about the assignment at line 980 above ?
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1074 compactSelection.getFilesToCompact() is repeatedly called.
Shall we store the return value in a variable ?
src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java:62 Should read 'Off-peak time'
src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java:153 This method isn't used.
Shall we remove it ?
REVISION DETAIL
https://reviews.facebook.net/D471
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160636#comment-13160636 ]
Hudson commented on HBASE-4463:
-------------------------------
Integrated in HBase-TRUNK #2504 (See [https://builds.apache.org/job/HBase-TRUNK/2504/])
HBASE-4463 [jira] Run more aggressive compactions during off peak hours
Summary:
HBASE-4463 Run more aggressive compactions during off peak hours
Increases the compact selection ratio from 1.3 to 5 at off-peak hours. This
will help utilize the available iops and bandwidth to decrease average num of
files per store. Only one such aggressive compaction is queued per store at any
point.
The number of iops on the disk and the top of the rack bandwidth utilization at
off peak hours is much lower than at peak hours depending on the application
usage pattern. We can utilize this knowledge to improve the performance of the
HBase cluster by increasing the compact selection ratio to a much larger value
during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio
(1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will
help reduce the average number of files per store.
Test Plan: Started running the unit tests.
Reviewers: JIRA, Kannan, nspiegelberg, mbautin, stack
Reviewed By: nspiegelberg
CC: nspiegelberg, tedyu, lhofhansl, Karthik
Differential Revision: 471
karthik :
Files :
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Fix For: 0.94.0
>
> Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113040#comment-13113040 ]
dhruba borthakur commented on HBASE-4463:
-----------------------------------------
Can we do something like ability to throttle the max bandwidth/server allowed for compaction? (A similar philosophy is used the HDFS to ensure that background replication does not swamp the network).
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151846#comment-13151846 ]
Lars Hofhansl commented on HBASE-4463:
--------------------------------------
re: previous discussions. Automatically detecting a good time to perform more aggressive compactions would be nice; my vote would be to let's get this simpler time-based scheme in first.
@Ted: Isn't hour-granularity good enough?
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151855#comment-13151855 ]
Phabricator commented on HBASE-4463:
------------------------------------
tedyu has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
Hour-granularity is good for now.
Nicolas' comment makes sense.
INLINE COMMENTS
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:1061 Shall we log the value for "hbase.offpeak.end.hour" so that we know the (possible) delay beyond end time ?
REVISION DETAIL
https://reviews.facebook.net/D471
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Karthik Ranganathan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113017#comment-13113017 ]
Karthik Ranganathan commented on HBASE-4463:
--------------------------------------------
Initially we are going to specify a start and stop for off peak hours... a more automatic detection based on response latencies and data read/transferred could be done, but is much harder to get right.
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Karthik Ranganathan (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karthik Ranganathan resolved HBASE-4463.
----------------------------------------
Resolution: Fixed
Fix Version/s: 0.94.0
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Fix For: 0.94.0
>
> Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4463) Run more aggressive compactions
during off peak hours
Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HBASE-4463:
-------------------------------
Attachment: HBASE-4463.D471.1.patch
Karthik requested code review of "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
Reviewers: JIRA
HBASE-4463 Run more aggressive compactions during off peak hours
Increases the compact selection ratio from 1.3 to 5 at off-peak hours. This
will help utilize the available iops and bandwidth to decrease average num of
files per store. Only one such aggressive compaction is queued per store at any
point.
The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
TEST PLAN
Started running the unit tests.
REVISION DETAIL
https://reviews.facebook.net/D471
AFFECTED FILES
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java
src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/993/
Tip: use the X-Herald-Rules header to filter Herald messages in your client.
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira