You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2010/08/19 23:59:41 UTC

Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.

flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.

Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.


This addresses bug HBASE-2915.
    http://issues.apache.org/jira/browse/HBASE-2915


Diffs
-----

  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 
  /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 

Diff: http://review.cloudera.org/r/691/diff


Testing
-------

5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.


Thanks,

Jean-Daniel


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by Jean-Daniel Cryans <jd...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review973
-----------------------------------------------------------


Further testing shows that I completely forgot to properly lock incrementColumnValue, thus missing the point of the title of the jira :S Uploading a new patch soon that addresses that, along with comments from the reviews.

- Jean-Daniel


On 2010-08-19 14:59:41, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-19 14:59:41)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I'd like to but without this patch the region server deadlocks like I
described in today's comment in HBASE-2915. I personally don't expect
to see any noticeable difference as most of the locks are just
replaced.

J-D

On Thu, Aug 19, 2010 at 3:34 PM, Ted Yu <yu...@gmail.com> wrote:
> J-D:
> Can you collect performance numbers in your cluster after deployment ?
>
> Thanks
>
> On Thu, Aug 19, 2010 at 2:59 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>>
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> http://review.cloudera.org/r/691/
>> -----------------------------------------------------------
>>
>> Review request for hbase.
>>
>>
>> Summary
>> -------
>>
>> This patch removes newScannerLock and renames splitAndClose lock to just
>> "lock". Every operation is now required to obtain the read lock on "lock"
>> before doing anything (including getting a row lock). This is done by
>> calling openRegionTransaction inside a try statement and by calling
>> closeRegionTransaction in finally.
>>
>> flushcache got refactored some more in order to do the locking in the
>> proper order; first get the read lock, then do the writestate handling.
>>
>> Finally, it removes the need to have a writeLock when flushing when
>> subclassers give atomic work do to via internalPreFlushcacheCommit. This
>> means that this patch breaks external contribs. This is required to keep our
>> whole locking mechanism simpler.
>>
>>
>> This addresses bug HBASE-2915.
>>    http://issues.apache.org/jira/browse/HBASE-2915
>>
>>
>> Diffs
>> -----
>>
>>  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
>> 987300
>>  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
>> 987300
>>  /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
>> 987300
>>
>> Diff: http://review.cloudera.org/r/691/diff
>>
>>
>> Testing
>> -------
>>
>> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also
>> in the process of deploying it on a cluster.
>>
>>
>> Thanks,
>>
>> Jean-Daniel
>>
>>
>

Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by Ted Yu <yu...@gmail.com>.
J-D:
Can you collect performance numbers in your cluster after deployment ?

Thanks

On Thu, Aug 19, 2010 at 2:59 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
>
> Review request for hbase.
>
>
> Summary
> -------
>
> This patch removes newScannerLock and renames splitAndClose lock to just
> "lock". Every operation is now required to obtain the read lock on "lock"
> before doing anything (including getting a row lock). This is done by
> calling openRegionTransaction inside a try statement and by calling
> closeRegionTransaction in finally.
>
> flushcache got refactored some more in order to do the locking in the
> proper order; first get the read lock, then do the writestate handling.
>
> Finally, it removes the need to have a writeLock when flushing when
> subclassers give atomic work do to via internalPreFlushcacheCommit. This
> means that this patch breaks external contribs. This is required to keep our
> whole locking mechanism simpler.
>
>
> This addresses bug HBASE-2915.
>    http://issues.apache.org/jira/browse/HBASE-2915
>
>
> Diffs
> -----
>
>  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
> 987300
>  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
> 987300
>  /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
> 987300
>
> Diff: http://review.cloudera.org/r/691/diff
>
>
> Testing
> -------
>
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also
> in the process of deploying it on a cluster.
>
>
> Thanks,
>
> Jean-Daniel
>
>

Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by Ted Yu <te...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review969
-----------------------------------------------------------



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3155>

    Or regionOperationProlog()



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3156>

    and regionOperationEpilog()


- Ted


On 2010-08-19 14:59:41, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-19 14:59:41)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by Jean-Daniel Cryans <jd...@apache.org>.

> On 2010-08-20 15:58:23, stack wrote:
> > oh, one other thing, in discussions, we talked of no longer needing to wait on row locks to expire... I don't see this being excised from the close method.  Should that be in here?

Yep, it needs to go.


- Jean-Daniel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review975
-----------------------------------------------------------


On 2010-08-19 14:59:41, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-19 14:59:41)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by st...@duboce.net.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review975
-----------------------------------------------------------


oh, one other thing, in discussions, we talked of no longer needing to wait on row locks to expire... I don't see this being excised from the close method.  Should that be in here?

- stack


On 2010-08-19 14:59:41, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-19 14:59:41)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by Jean-Daniel Cryans <jd...@apache.org>.

> On 2010-08-20 15:57:34, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 507
> > <http://review.cloudera.org/r/691/diff/1/?file=7612#file7612line507>
> >
> >     I suppose this order is ok if the first thing we do on entrance to HRegion is get the read lock before check of closing.

So I just redid that part. setClosing is first taken so that when the client threads arrive they can fast fail by looking at closing.get before trying to get the readLock.


> On 2010-08-20 15:57:34, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 712
> > <http://review.cloudera.org/r/691/diff/1/?file=7612#file7612line712>
> >
> >     Seems like you could use your opentransaction/closetransaction methods here and in flush too to be consistent?

Yeah the issue with compact and flush is that the callers don't expect to see NSRE, the want null values.


> On 2010-08-20 15:57:34, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1115
> > <http://review.cloudera.org/r/691/diff/1/?file=7612#file7612line1115>
> >
> >     Aren't these lines unnecessary?  openRegionTransaction does it?

Good catch.


> On 2010-08-20 15:57:34, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 3142
> > <http://review.cloudera.org/r/691/diff/1/?file=7612#file7612line3142>
> >
> >     So, this javadoc is good but do you think we need some more doc?  Does there need to be more detail on new locking regime? Maybe there is no more to be said that what is here in this paragraph.  You've done all the work unravelling our lock mess.  With time your nice unravelling will rot unless its clear what the pattern is.    I'm just trying to think of ways of preventing that happening.

Yeah I'll included some more javadoc, maybe with code examples?


- Jean-Daniel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review974
-----------------------------------------------------------


On 2010-08-19 14:59:41, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-19 14:59:41)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by st...@duboce.net.

> On 2010-08-20 15:57:34, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 507
> > <http://review.cloudera.org/r/691/diff/1/?file=7612#file7612line507>
> >
> >     I suppose this order is ok if the first thing we do on entrance to HRegion is get the read lock before check of closing.
> 
> Jean-Daniel Cryans wrote:
>     So I just redid that part. setClosing is first taken so that when the client threads arrive they can fast fail by looking at closing.get before trying to get the readLock.
> 
> stack wrote:
>     Don't you have to check again the setClosing after you get the read lock?
> 
> Jean-Daniel Cryans wrote:
>     I'm about to post a new patch, but it looks like this and it has to be called just before "try" instead of inside:
>          if (this.closing.get()) {
>           throw new NotServingRegionException(regionInfo.getRegionNameAsString() +
>               " is closing");
>         }
>         lock.readLock().lock();
>         if (this.closed.get()) {
>           throw new NotServingRegionException(regionInfo.getRegionNameAsString() +
>               " is closed");
>         }

That looks right.


> On 2010-08-20 15:57:34, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 712
> > <http://review.cloudera.org/r/691/diff/1/?file=7612#file7612line712>
> >
> >     Seems like you could use your opentransaction/closetransaction methods here and in flush too to be consistent?
> 
> Jean-Daniel Cryans wrote:
>     Yeah the issue with compact and flush is that the callers don't expect to see NSRE, the want null values.
> 
> stack wrote:
>     OK. Not important. This is deep internal stuff or make a version that takes a flag on whether to throw exception (default throws exception .. might get messy though... not important).
> 
> Jean-Daniel Cryans wrote:
>     I'm afraid those little methods could be clogged fast.

Yeah. Not important.


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review974
-----------------------------------------------------------


On 2010-08-19 14:59:41, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-19 14:59:41)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by st...@duboce.net.

> On 2010-08-20 15:57:34, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 507
> > <http://review.cloudera.org/r/691/diff/1/?file=7612#file7612line507>
> >
> >     I suppose this order is ok if the first thing we do on entrance to HRegion is get the read lock before check of closing.
> 
> Jean-Daniel Cryans wrote:
>     So I just redid that part. setClosing is first taken so that when the client threads arrive they can fast fail by looking at closing.get before trying to get the readLock.

Don't you have to check again the setClosing after you get the read lock?


> On 2010-08-20 15:57:34, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 712
> > <http://review.cloudera.org/r/691/diff/1/?file=7612#file7612line712>
> >
> >     Seems like you could use your opentransaction/closetransaction methods here and in flush too to be consistent?
> 
> Jean-Daniel Cryans wrote:
>     Yeah the issue with compact and flush is that the callers don't expect to see NSRE, the want null values.

OK. Not important. This is deep internal stuff or make a version that takes a flag on whether to throw exception (default throws exception .. might get messy though... not important).


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review974
-----------------------------------------------------------


On 2010-08-19 14:59:41, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-19 14:59:41)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by st...@duboce.net.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review974
-----------------------------------------------------------

Ship it!


+1  Nice fix.  A few comments below.


/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3163>

    I suppose this order is ok if the first thing we do on entrance to HRegion is get the read lock before check of closing.



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3164>

    Seems like you could use your opentransaction/closetransaction methods here and in flush too to be consistent?



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3165>

    Yeah, just remove.



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3166>

    Aren't these lines unnecessary?  openRegionTransaction does it?



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3167>

    So, this javadoc is good but do you think we need some more doc?  Does there need to be more detail on new locking regime? Maybe there is no more to be said that what is here in this paragraph.  You've done all the work unravelling our lock mess.  With time your nice unravelling will rot unless its clear what the pattern is.    I'm just trying to think of ways of preventing that happening.


- stack


On 2010-08-19 14:59:41, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-19 14:59:41)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by Jean-Daniel Cryans <jd...@apache.org>.

> On 2010-08-20 20:49:15, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 716
> > <http://review.cloudera.org/r/691/diff/2/?file=7680#file7680line716>
> >
> >     Any reason this does not follow the pattern used elsewhere?  You are not testing closing before getting the read lock?

mmm didn't want to copy the same chunk of code over there, but yeah it will also give us a fail-fast behavior which will speed up flushing.


> On 2010-08-20 20:49:15, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 785
> > <http://review.cloudera.org/r/691/diff/2/?file=7680#file7680line785>
> >
> >     ditto

ditto


> On 2010-08-20 20:49:15, stack wrote:
> > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2960
> > <http://review.cloudera.org/r/691/diff/2/?file=7680#file7680line2960>
> >
> >     Is this supposed to be inside the try?

Per that method's javadoc, no. The reason is that if it's all in a try block, and that closing is true, then you won't hold a readLock and you can't do a isHeldByCurrent thread on that lock. So I thought we could catch IllegalStateException in closeRegionOperation, but that would be really ugly. This leaves us to the current situation where if an exception is thrown when we are on this line:
     if (this.closed.get()) {
that it would leave the readLock locked by the thread, although in this case the region is closing so we're getting rid of it anyways. Although, thinking about it, I should probably do this instead:

    if (this.closed.get()) {
      lock.readLock().unlock();
      throw new NotServingRegionException(regionInfo.getRegionNameAsString() +
          " is closed");
    }


- Jean-Daniel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review988
-----------------------------------------------------------


On 2010-08-20 17:43:52, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-20 17:43:52)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987355 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987355 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987355 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by st...@duboce.net.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review988
-----------------------------------------------------------

Ship it!


+1

if below are issues address on commit.


/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3194>

    Any reason this does not follow the pattern used elsewhere?  You are not testing closing before getting the read lock?



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3195>

    ditto



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3196>

    Is this supposed to be inside the try?


- stack


On 2010-08-20 17:43:52, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-20 17:43:52)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987355 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987355 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987355 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by Jean-Daniel Cryans <jd...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/
-----------------------------------------------------------

(Updated 2010-08-20 17:43:52.668799)


Review request for hbase.


Changes
-------

Updated patch with "closing" handling and ICV locking, still needs more javadoc.


Summary
-------

This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.

flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.

Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.


This addresses bug HBASE-2915.
    http://issues.apache.org/jira/browse/HBASE-2915


Diffs (updated)
-----

  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987355 
  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987355 
  /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987355 

Diff: http://review.cloudera.org/r/691/diff


Testing
-------

5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.


Thanks,

Jean-Daniel


Re: Review Request: HBASE-2915 Deadlock between HRegion.ICV and HRegion.close

Posted by Ryan Rawson <ry...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review966
-----------------------------------------------------------



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3144>

    oh wow i cant believe this was ever here



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3145>

    i thought we agreed that the closing flag had to be set BEFORE the write lock was acquired to prevent race conditions?



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3146>

    lets just excise this and break compile time compatibility.  Also remove internalPreFlushcacheCommit too



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3147>

    ditto remove this whole try/finally bit



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
<http://review.cloudera.org/r/691/#comment3148>

    maybe we shouldnt call this 'transaction' might confuse people into thinking we support real transactions... not sure what to call it at this moment tho


- Ryan


On 2010-08-19 14:59:41, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/691/
> -----------------------------------------------------------
> 
> (Updated 2010-08-19 14:59:41)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch removes newScannerLock and renames splitAndClose lock to just "lock". Every operation is now required to obtain the read lock on "lock" before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally.
> 
> flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling.
> 
> Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler.
> 
> 
> This addresses bug HBASE-2915.
>     http://issues.apache.org/jira/browse/HBASE-2915
> 
> 
> Diffs
> -----
> 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
>   /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 
>   /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 
> 
> Diff: http://review.cloudera.org/r/691/diff
> 
> 
> Testing
> -------
> 
> 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster.
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>