You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Nicolas Spiegelberg (Created) (JIRA)" <ji...@apache.org> on 2012/02/04 01:51:53 UTC

[jira] [Created] (HBASE-5332) Deterministic Compaction Jitter

Deterministic Compaction Jitter
-------------------------------

                 Key: HBASE-5332
                 URL: https://issues.apache.org/jira/browse/HBASE-5332
             Project: HBase
          Issue Type: Improvement
            Reporter: Nicolas Spiegelberg
            Priority: Minor


Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Prakash Khemani (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212858#comment-13212858 ] 

Prakash Khemani commented on HBASE-5332:
----------------------------------------

The major compactions are jittered so that too many of them don't happen at the same time. Rather than relying on random jitter, why can't the compaction thread simply ensure that it doesn't schedule too many compactions at the same time?
                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200233#comment-13200233 ] 

Nicolas Spiegelberg commented on HBASE-5332:
--------------------------------------------

The current proposal is to seed the random number generator with existing persistent data:
{code}
 java.util.Random(hash(Region Name + oldest file timestamp)).nextDouble()  
{code}

Note that that the currently algorithm is normally an annoyance in 92 and above because major compactions will normally go in the large compaction queue.  However, 90 and users who have the large compaction queue incorrectly configured could experience high latency after restart because a lot of major compactions will pile up.
                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210751#comment-13210751 ] 

Lars Hofhansl commented on HBASE-5332:
--------------------------------------

I kinda like the "simpleness" of the random jitter. Part of the problem seems to be that we only get a few random choices with "delay + jitter*(1 - 2*Math.random())"

What if we just change this to "delay + jitter*(2 - 4*Math.random())" or "delay + jitter*(3 - 6*Math.random())" and decrease jitter accordingly?

                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Nicolas Spiegelberg (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-5332:
---------------------------------------

    Status: Patch Available  (was: Open)

resubmitting after creating trunk-specific patch file
                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>             Fix For: 0.94.0
>
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch, HBASE-5332.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5332:
-------------------------------

    Attachment: D1785.1.patch

nspiegelberg requested code review of "[jira] [HBASE-5332] Deterministic Compaction Jitter".
Reviewers: JIRA, Kannan, aaiyer, stack

  Changing the jitter in major compactions to be deterministic,
  so reboots don't cause a time-based compaction storm.  This
  implementation seeds a random number generator with HDFS data for
  persistence.

TEST PLAN
   - mvn test -Dtest=TestCompaction,TestCompactSelection,TestHeapSize

REVISION DETAIL
  https://reviews.facebook.net/D1785

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/3807/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Nicolas Spiegelberg (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-5332:
---------------------------------------

    Attachment: HBASE-5332.patch

trunk patch
                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>             Fix For: 0.94.0
>
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch, HBASE-5332.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211641#comment-13211641 ] 

Phabricator commented on HBASE-5332:
------------------------------------

nspiegelberg has commented on the revision "[jira] [HBASE-5332] Deterministic Compaction Jitter".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1067 I can null check.  note that this variable is not null-checked in a lot of places in Store.java.  Wish there was a way to guarantee non-null
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1065 I was worried about a case where this check happened while a major compaction was finishing up  (see Store.completeCompaction).  Besides per-file compactions, the HRegionServer.CompactionChecker thread is another area where this race can occur.

  Maybe this isn't a practical problem since we don't cache this calculation anymore & the major compaction interval should normally be very large so a race condition won't trigger a major compaction.
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1068 Thinking about it more, the pathname alone should suffice since it's a 'crypographically strong' random UUID

REVISION DETAIL
  https://reviews.facebook.net/D1785

                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5332:
-------------------------------

    Attachment: D1785.3.patch

nspiegelberg updated the revision "[jira] [HBASE-5332] Deterministic Compaction Jitter".
Reviewers: JIRA, Kannan, aaiyer, stack

  Addressed Kannan's peer review.  Raised compaction delay because of false positives when the jitter brought it to 3 sec.

REVISION DETAIL
  https://reviews.facebook.net/D1785

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Nicolas Spiegelberg (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-5332:
---------------------------------------

    Fix Version/s: 0.94.0
           Status: Patch Available  (was: Open)
    
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>             Fix For: 0.94.0
>
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Nicolas Spiegelberg (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-5332:
---------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)
    
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>             Fix For: 0.94.0
>
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch, HBASE-5332.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213318#comment-13213318 ] 

Phabricator commented on HBASE-5332:
------------------------------------

Kannan has accepted the revision "[jira] [HBASE-5332] Deterministic Compaction Jitter".

  lgtm!

REVISION DETAIL
  https://reviews.facebook.net/D1785

BRANCH
  HBASE-5332

                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Nicolas Spiegelberg (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-5332:
---------------------------------------

    Status: Open  (was: Patch Available)
    
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>             Fix For: 0.94.0
>
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch, HBASE-5332.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210725#comment-13210725 ] 

Phabricator commented on HBASE-5332:
------------------------------------

Kannan has commented on the revision "[jira] [HBASE-5332] Deterministic Compaction Jitter".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1065 s/update/updated



REVISION DETAIL
  https://reviews.facebook.net/D1785

                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211050#comment-13211050 ] 

Phabricator commented on HBASE-5332:
------------------------------------

Kannan has commented on the revision "[jira] [HBASE-5332] Deterministic Compaction Jitter".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1067 Also, once the read lock is removed, we should assign (snapshot) the immutable list into another variable so that the various accesses on it are working on the same/consistent view.

  So, something like:

   storefilesSnapShot = this.storefiles;
   if (storefilesSnapshot != null && !storefilesSnapshot.isEmpty()) {
        ... storefilesSnapshot.get(0)....
   }


REVISION DETAIL
  https://reviews.facebook.net/D1785

                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210721#comment-13210721 ] 

Phabricator commented on HBASE-5332:
------------------------------------

Kannan has commented on the revision "[jira] [HBASE-5332] Deterministic Compaction Jitter".

  Nicolas -- excellent! Minor comments...

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1068 storeNameStr appears to be the CF name only. Did you intend to take the regionName also into the seed computation?

  Not a big deal.. the fact that you are taking the path name of the first file should itself be good enough since that's randomized enough. We may not even need the storeNameStr.


  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1067 maybe safer to:

  if (storefiles != null && !storefiles.isEmpty()) {
    ..
  }

  Maybe the constructor always guarantees that this is initialized to a 0-element list. But nevertheless...
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:1065 I think this lock here is unnecessary.

  storeFiles points to a immutable list, something that's update wholesale rather than in-parts at the end of a flush or compaction.

  See other uses of this like: getStorefilesIndexSize() which don't take a lock either.

REVISION DETAIL
  https://reviews.facebook.net/D1785

                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213994#comment-13213994 ] 

Phabricator commented on HBASE-5332:
------------------------------------

nspiegelberg has committed the revision "[jira] [HBASE-5332] Deterministic Compaction Jitter".

REVISION DETAIL
  https://reviews.facebook.net/D1785

COMMIT
  https://reviews.facebook.net/rHBASE1292495

                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212876#comment-13212876 ] 

Nicolas Spiegelberg commented on HBASE-5332:
--------------------------------------------

@Prakash: are you thinking some sort of priority queue where we'll issue a major compaction if the queue is empty and a minimum time has elapsed?  That's a possibility but would require a stateful class to handle the problem.  We could just reject a compaction request if the queue is too full to avoid state.  

However, the main issue is that the non-persistence causes large waves of major compactions versus a small batch of 3 or 4.  The random jitter isn't really an issue because the compactions should be put in the large compaction queue and not affect normal operation.  Really, this is more of an inconvenience right now, so I think a small, non-disruptive fix is best.  Other thoughts?
                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211642#comment-13211642 ] 

Nicolas Spiegelberg commented on HBASE-5332:
--------------------------------------------

@Lars: note that the operation you describe is operating on a double and converting to a long at the end, so there is no loss of granularity.  Also note that the other proposed changes would 2x and 3x the jitter from the user-configured percentage value.
                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5332:
-------------------------------

    Attachment: D1785.2.patch

nspiegelberg updated the revision "[jira] [HBASE-5332] Deterministic Compaction Jitter".
Reviewers: JIRA, Kannan, aaiyer, stack

  Remove unnecessary logs added during debug

REVISION DETAIL
  https://reviews.facebook.net/D1785

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java

                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Nicolas Spiegelberg (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg reassigned HBASE-5332:
------------------------------------------

    Assignee: Nicolas Spiegelberg
    
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213987#comment-13213987 ] 

Nicolas Spiegelberg commented on HBASE-5332:
--------------------------------------------

Patch works for trunk as well.  Needed to fix a single conflict with Store.FIXED_OVERHEAD.
                
> Deterministic Compaction Jitter
> -------------------------------
>
>                 Key: HBASE-5332
>                 URL: https://issues.apache.org/jira/browse/HBASE-5332
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>         Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch
>
>
> Currently, we add jitter to a compaction using "delay + jitter*(1 - 2*Math.random())".  Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to "delay + jitter" will now be set to "delay - jitter".  We need a more deterministic way to jitter major compactions so this information can persist across server restarts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira