You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org> on 2010/09/28 03:01:33 UTC

[jira] Created: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
--------------------------------------------------------------------------------

                 Key: HBASE-3043
                 URL: https://issues.apache.org/jira/browse/HBASE-3043
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.89.20100621
            Reporter: Nicolas Spiegelberg
            Assignee: Nicolas Spiegelberg
             Fix For: 0.89.20100924


During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917414#action_12917414 ] 

stack commented on HBASE-3043:
------------------------------

Oh, one thought, if we are interrupted, we do not seem to cleanup after ourselves.  Is the thought that cleanup happens when the region is next opened?

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-3043:
---------------------------------------

    Attachment:     (was: HBASE-3043_0.89.patch)

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-3043:
---------------------------------------

    Status: Patch Available  (was: Open)

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-3043:
---------------------------------------

        Fix Version/s: 0.90.0
    Affects Version/s: 0.90.0

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917678#action_12917678 ] 

stack commented on HBASE-3043:
------------------------------

@Nicolas Excellent... Let me apply (running tests now to check nothing broke)

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917275#action_12917275 ] 

Nicolas Spiegelberg commented on HBASE-3043:
--------------------------------------------

NOTE: The default writeCheckInterval of 10MB came out to about 2sec between checks on my system.  I choose a high number so minimize the performance impact on stores while maintaining a decently-responsive delay.

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917423#action_12917423 ] 

Nicolas Spiegelberg commented on HBASE-3043:
--------------------------------------------

Pranav's comments: 
1) On WriteState variable privacy: 6 of one, half-dozen of the other.  I made sure the WriteState variable was package private.  I was looking at possibly some more unit tests dealing with our write state, so I didn't want to write a bunch of accessors just to deal with unit tests.  In the unit test case, we don't really need to worry about synchronization either.  My thought was to add accessor methods if we're going to use it outside of a unit test.  Okay?
2) The lack of unlock() actually could have caused some extremely-rare deadlock conditions but only on exit, so no one's probably run across it.  Just mainly wanted to fix poor practice.

Stack's comment:
Your thought is correct.  However, I do need to make a small change that I had done internally, but lost when I refactored.  This works because of some subtle interactions between server.stopRequested(), CompactSplitThread.lock, & HRegion.writeState.writesEnabled.  States that can happen:
1) We get the lock & interrupt compactionQueue.poll().  It throws an InterruptedException, which calls continue, which fails the next while() check, which finishes the close
2) We get the lock & interrupt, but the thread is somewhere between the poll() and the lock().  [In new patch] CompactSplitThread.run() queries stopRequested() immediately after getting the lock(), which skips the compact/split code to return to the while() check and ...
3) We don't get the lock.  HRegionServer.run() calls closeAllRegions(), which calls HRegion.close(), which sets the writeState.  The compaction sees this, throws an InterruptedIOE, which is aborts the current compaction, goes to the while() check in CompactSplitThread.run() and ...

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915862#action_12915862 ] 

Nicolas Spiegelberg commented on HBASE-3043:
--------------------------------------------

I'm developing this patch against the 0.89 branch that we're running.  It should work fine with 0.90, but I just wanted to note that in case any problems occur

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919385#action_12919385 ] 

Kannan Muthukkaruppan commented on HBASE-3043:
----------------------------------------------

I love how this patch also speeds up operations like table disables, region reassignments etc. which might otherwise be stuck on compactions to finish.


> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3043:
-------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.89.20100924)
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed.  Thanks for the sweet patch Nicolas.

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-3043:
---------------------------------------

    Attachment:     (was: HBASE-3043_0.90.patch)

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Pranav Khaitan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917308#action_12917308 ] 

Pranav Khaitan commented on HBASE-3043:
---------------------------------------

Patch looks good!

One possibility is to maintain the variable 'WriteState writestate' as private and add a set method.

Surprising that the following lock was never being unlocked before.
   void interruptIfNecessary() {
     if (lock.tryLock()) {
-      this.interrupt();
+      try {
+        this.interrupt();
+      } finally {
+        lock.unlock();
+      }
     }
   }


> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-3043:
---------------------------------------

    Attachment: HBASE-3043_0.90.patch
                HBASE-3043_0.89.patch

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915612#action_12915612 ] 

Jean-Daniel Cryans commented on HBASE-3043:
-------------------------------------------

+1, although I don't think it's going to be included in 0924 unless the RC's sunk.

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-3043:
---------------------------------------

    Attachment: HBASE-3043_0.89.patch
                HBASE-3043_0.90.patch

Patches, updated after peer review.  Also changed closeCheckInterval to static to fix TestHeapSize (no current reason for it to be per-instance)

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917413#action_12917413 ] 

stack commented on HBASE-3043:
------------------------------

Looks grand to me Nicolas.  Answer Pranav above and then I'll commit.

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress

Posted by "Pranav Khaitan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917430#action_12917430 ] 

Pranav Khaitan commented on HBASE-3043:
---------------------------------------

Nicolas: great catch about that lock! That may have led to a deadlock
The access issue can be ignored as it was just a trivial thing.

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.