You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Karthik Ranganathan (JIRA)" <ji...@apache.org> on 2011/09/22 23:55:26 UTC

[jira] [Created] (HBASE-4463) Run more aggressive compactions during off peak hours

Run more aggressive compactions during off peak hours
-----------------------------------------------------

                 Key: HBASE-4463
                 URL: https://issues.apache.org/jira/browse/HBASE-4463
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
            Reporter: Karthik Ranganathan
            Assignee: Karthik Ranganathan


The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160481#comment-13160481 ] 

Phabricator commented on HBASE-4463:
------------------------------------

Karthik has committed the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".

REVISION DETAIL
  https://reviews.facebook.net/D471

COMMIT
  https://reviews.facebook.net/rHBASE1208885

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112949#comment-13112949 ] 

Ted Yu commented on HBASE-4463:
-------------------------------

This is a great idea.
How do we determine the off peak hours ?

> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113665#comment-13113665 ] 

stack commented on HBASE-4463:
------------------------------

@Dhruba We need this too.  How would we do it?

> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160459#comment-13160459 ] 

Phabricator commented on HBASE-4463:
------------------------------------

nspiegelberg has accepted the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".

REVISION DETAIL
  https://reviews.facebook.net/D471

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Karthik Ranganathan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113847#comment-13113847 ] 

Karthik Ranganathan commented on HBASE-4463:
--------------------------------------------

@Stack - we can find the exact amount of data we are writing to the dfs (only hfile blocks will contribute to this during compactions). So adding a threshold like this is not too hard... but there could be disk iops pressure (instead of network bandwidth) and detecting that would be hard. So we would still need to set off-peak time.

I was trying to come up with a more generic solution but that involves setting up a feedback loop inside the regionserver - keep track of max, min and average latencies over the last k days (would have to store this in META or some other location as it needs to persist beyond restarts). Need to remove any spikes in the values. When we run an aggressive compaction, we need to make sure the latencies are still acceptable, otherwise dont run aggressive compactions. This is much harder to get right though.

> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151856#comment-13151856 ] 

Phabricator commented on HBASE-4463:
------------------------------------

lhofhansl has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".

  Great idea.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1074 Or use an iterator with a conditioned break?
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1106 Should we move functionality like this into CompactionSelection?
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1120 Might be good to have a CompationSelection.numFiles() (or similar) method
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:992 Add method to CompactionSelection so that we do not constantly "leak" filesToCompact?

REVISION DETAIL
  https://reviews.facebook.net/D471

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160458#comment-13160458 ] 

Phabricator commented on HBASE-4463:
------------------------------------

Karthik has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".

  Hey guys, skipping the cosmetic clean-ups for now - we can do those as a followup if needed, addressed the core issues that Nicolas and Ted pointed out.

REVISION DETAIL
  https://reviews.facebook.net/D471

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-4463:
-------------------------------

    Attachment: HBASE-4463.D471.2.patch

Karthik updated the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
Reviewers: JIRA, Kannan, nspiegelberg, mbautin, stack

  Fixed Nicolas' comment about the time computation in the unit test.
  Fixed Ted's comment about repeated compact selection line.

REVISION DETAIL
  https://reviews.facebook.net/D471

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java
  src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151849#comment-13151849 ] 

Phabricator commented on HBASE-4463:
------------------------------------

nspiegelberg has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".

  @ted : I think allowing minute-level granularity might be little over optimizing.  It just means that aggressive compact selection ends then.  An agressive compaction can be sitting in the compaction queue and actually executing after the ending time.  Therefore, you want to disable aggressive compactions a decent bit of time before your traffic begins to pick up.

REVISION DETAIL
  https://reviews.facebook.net/D471

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160692#comment-13160692 ] 

Hudson commented on HBASE-4463:
-------------------------------

Integrated in HBase-TRUNK-security #17 (See [https://builds.apache.org/job/HBase-TRUNK-security/17/])
    HBASE-4463 [jira] Run more aggressive compactions during off peak hours

Summary:
HBASE-4463 Run more aggressive compactions during off peak hours

Increases the compact selection ratio from 1.3 to 5 at off-peak hours. This
will help utilize the available iops and bandwidth to decrease average num of
files per store. Only one such aggressive compaction is queued per store at any
point.

The number of iops on the disk and the top of the rack bandwidth utilization at
off peak hours is much lower than at peak hours depending on the application
usage pattern. We can utilize this knowledge to improve the performance of the
HBase cluster by increasing the compact selection ratio to a much larger value
during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio
(1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will
help reduce the average number of files per store.

Test Plan: Started running the unit tests.

Reviewers: JIRA, Kannan, nspiegelberg, mbautin, stack

Reviewed By: nspiegelberg

CC: nspiegelberg, tedyu, lhofhansl, Karthik

Differential Revision: 471

karthik : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151710#comment-13151710 ] 

Phabricator commented on HBASE-4463:
------------------------------------

nspiegelberg has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".

INLINE COMMENTS
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java:248-250 need to +24 or else it will fail at night, correct?

REVISION DETAIL
  https://reviews.facebook.net/D471

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114116#comment-13114116 ] 

Kannan Muthukkaruppan commented on HBASE-4463:
----------------------------------------------

Dhruba: Just wanted to point out that one knob that exists today to throttle the number of concurrent compactions is the size of the compaction thread pools. We can drop it all the way down to 1 thread. If further throttling is needed, then it'll require some new scheme.

> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Karthik Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karthik Ranganathan updated HBASE-4463:
---------------------------------------

    Description: The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.  (was: The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.)

> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151829#comment-13151829 ] 

Phabricator commented on HBASE-4463:
------------------------------------

tedyu has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".

  This is a useful feature.
  Currently off-peak time is at hour level. It would be nice to support minute-level off-peak time.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:989 filesToCompact is overwritten here.
  What about the assignment at line 980 above ?
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1074 compactSelection.getFilesToCompact() is repeatedly called.
  Shall we store the return value in a variable ?
  src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java:62 Should read 'Off-peak time'
  src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java:153 This method isn't used.
  Shall we remove it ?

REVISION DETAIL
  https://reviews.facebook.net/D471

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160636#comment-13160636 ] 

Hudson commented on HBASE-4463:
-------------------------------

Integrated in HBase-TRUNK #2504 (See [https://builds.apache.org/job/HBase-TRUNK/2504/])
    HBASE-4463 [jira] Run more aggressive compactions during off peak hours

Summary:
HBASE-4463 Run more aggressive compactions during off peak hours

Increases the compact selection ratio from 1.3 to 5 at off-peak hours. This
will help utilize the available iops and bandwidth to decrease average num of
files per store. Only one such aggressive compaction is queued per store at any
point.

The number of iops on the disk and the top of the rack bandwidth utilization at
off peak hours is much lower than at peak hours depending on the application
usage pattern. We can utilize this knowledge to improve the performance of the
HBase cluster by increasing the compact selection ratio to a much larger value
during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio
(1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will
help reduce the average number of files per store.

Test Plan: Started running the unit tests.

Reviewers: JIRA, Kannan, nspiegelberg, mbautin, stack

Reviewed By: nspiegelberg

CC: nspiegelberg, tedyu, lhofhansl, Karthik

Differential Revision: 471

karthik : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113040#comment-13113040 ] 

dhruba borthakur commented on HBASE-4463:
-----------------------------------------

Can we do something like ability to throttle the max bandwidth/server allowed for compaction? (A similar philosophy is used the HDFS to ensure that background replication does not swamp the network).

> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151846#comment-13151846 ] 

Lars Hofhansl commented on HBASE-4463:
--------------------------------------

re: previous discussions. Automatically detecting a good time to perform more aggressive compactions would be nice; my vote would be to let's get this simpler time-based scheme in first.

@Ted: Isn't hour-granularity good enough?

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151855#comment-13151855 ] 

Phabricator commented on HBASE-4463:
------------------------------------

tedyu has commented on the revision "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".

  Hour-granularity is good for now.
  Nicolas' comment makes sense.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:1061 Shall we log the value for "hbase.offpeak.end.hour" so that we know the (possible) delay beyond end time ?

REVISION DETAIL
  https://reviews.facebook.net/D471

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Karthik Ranganathan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113017#comment-13113017 ] 

Karthik Ranganathan commented on HBASE-4463:
--------------------------------------------

Initially we are going to specify a start and stop for off peak hours... a more automatic detection based on response latencies and data read/transferred could be done, but is much harder to get right.

> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Karthik Ranganathan (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karthik Ranganathan resolved HBASE-4463.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.94.0
    
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4463.D471.1.patch, HBASE-4463.D471.2.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4463) Run more aggressive compactions during off peak hours

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-4463:
-------------------------------

    Attachment: HBASE-4463.D471.1.patch

Karthik requested code review of "HBASE-4463 [jira] Run more aggressive compactions during off peak hours".
Reviewers: JIRA

  HBASE-4463 Run more aggressive compactions during off peak hours

  Increases the compact selection ratio from 1.3 to 5 at off-peak hours. This
  will help utilize the available iops and bandwidth to decrease average num of
  files per store. Only one such aggressive compaction is queued per store at any
  point.

  The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

TEST PLAN
  Started running the unit tests.

REVISION DETAIL
  https://reviews.facebook.net/D471

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactSelection.java
  src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/993/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Run more aggressive compactions during off peak hours
> -----------------------------------------------------
>
>                 Key: HBASE-4463
>                 URL: https://issues.apache.org/jira/browse/HBASE-4463
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-4463.D471.1.patch
>
>
> The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira