You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by 鈴木俊裕 <br...@gmail.com> on 2015/09/08 11:01:35 UTC

Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Hi,

We upgraded our cluster from CDH5.3.1(HBase0.98.6) to CDH5.4.5(HBase1.0.0)
and we experience slowdown in increment operation.

Here's an extract from thread dump of the RegionServer of our cluster:

Thread 68 (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
  State: BLOCKED
  Blocked count: 21689888
  Waited count: 39828360
  Blocked on java.util.LinkedList@3474e4b2
  Blocked by 63 (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
  Stack:
    java.lang.Object.wait(Native Method)

org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)

org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)

org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)

org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)

org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)

org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)

org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
    org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
    org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)

org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
    org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
    java.lang.Thread.run(Thread.java:745)

There are many similar threads in the thread dump.

I read the source code and I think this is caused by changes of
MultiVersionConsistencyControl.
A region lock (not a row lock) seems to occur in
waitForPreviousTransactionsComplete().


Also we wrote performance test code for increment operation that included
100 threads and ran it in local mode.

The result is shown below:

CDH5.3.1(HBase0.98.6)
Throughput(op/s): 12757, Latency(ms): 7.975072509210629

CDH5.4.5(HBase1.0.0)
Throughput(op/s): 2027, Latency(ms): 49.11840157868772


Thanks,

Toshihiro Suzuki

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Elliott Clark <ec...@apache.org>.

Commented up on the jira. But I think there's a pretty easy solution that
we can do that should be possible in the near future. We will continue to
have issues in situations that are highly contended on just a small number
of rows. But there's not a whole lot that I can see to make that situation
much faster.

On Mon, Sep 21, 2015 at 10:14 PM, Stack <st...@duboce.net> wrote:

> Back to this problem. Simple tests confirm that as is, the
> single-queue-backed MVCC instance can slow Region ops if some other row is
> slow to complete. In particular Increment, checkAndPut, and batch mutations
> are effected. I opened HBASE-14460 to start in on a fix up. Lets see if we
> can somehow scope mvcc to row or at least shard mvcc so not all Region ops
> are paused.
>
> St.Ack
>
>
> On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
>
> > > Thank you for the below reasoning (with accompanying helpful diagram).
> > > Makes sense. Let me hack up a test case to help with the illustration.
> It
> > > is as though the mvcc should be scoped to a row only... Writes against
> > > other rows should not hold up my read of my row. Tag an mvcc with a
> 'row'
> > > scope so we can see which on-going writes pertain to current operation?
> > Thank you St.Ack! I think this approach would work.
> >
> > > You need to read back the increment and have it be 'correct' at
> increment
> > > time?
> > Yes, we need it.
> >
> > I would like to help if there is anything I can do.
> >
> > Thanks,
> > Toshihiro Suzuki
> >
> >
> > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> >
> > > Thank you for the below reasoning (with accompanying helpful diagram).
> > > Makes sense. Let me hack up a test case to help with the illustration.
> It
> > > is as though the mvcc should be scoped to a row only... Writes against
> > > other rows should not hold up my read of my row. Tag an mvcc with a
> 'row'
> > > scope so we can see which on-going writes pertain to current operation?
> > >
> > > You need to read back the increment and have it be 'correct' at
> increment
> > > time?
> > >
> > > (This is a good one)
> > >
> > > Thank you Toshihiro Suzuki
> > > St.Ack
> > >
> > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > >
> > > > St.Ack,
> > > >
> > > > Thank you for your response.
> > > >
> > > > Why I make out that "A region lock (not a row lock) seems to occur in
> > > > waitForPreviousTransactionsComplete()" is as follows:
> > > >
> > > > A increment operation has 3 procedures for MVCC.
> > > >
> > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > >
> > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > >
> > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > >
> > > >
> > > > I think that MultiVersionConsistencyControl's writeQueue can cause a
> > > region
> > > > lock.
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > >
> > > >
> > > > Step 2 adds to a WriteEntry to writeQueue.
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > >
> > > > Step 3 removes the WriteEntry from writeQueue.
> > > >
> > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > >
> > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue and
> > > waits
> > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > >
> > > >
> > > > I think when a handler thread is processing between step 2 and step
> 3,
> > > the
> > > > other handler threads can wait at step 1 until the thread completes
> > step
> > > 3
> > > > This is depicted as follows:
> > > >
> > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > >
> > > >
> > > > Actually, in the thread dump of our region server, many handler
> threads
> > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > (waitForPreviousTransactionsComplete()).
> > > >
> > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > >
> > > > Many handler threads wait at this:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > >
> > > >
> > > > > Is it possible you are contending on a counter post-upgrade?  Is it
> > > > > possible that all these threads are trying to get to the same row
> to
> > > > update
> > > > > it? Could the app behavior have changed?  Or are you thinking
> > increment
> > > > > itself has slowed significantly?
> > > > We have just upgraded HBase, not changed the app behavior. We are
> > > thinking
> > > > increment itself has slowed significantly.
> > > > Before upgrading HBase, it was good throughput and latency.
> > > > Currently, to cope with this problem, we split the regions finely.
> > > >
> > > > Thanks,
> > > >
> > > > Toshihiro Suzuki
> > > >
> > > >
> > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > >
> > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > > >
> > > > > > Ted,
> > > > > >
> > > > > > Thank you for your response.
> > > > > >
> > > > > > I uploaded the complete stack trace to Gist.
> > > > > >
> > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > >
> > > > > >
> > > > > > I think that increment operation works as follows:
> > > > > >
> > > > > > 1. get row lock
> > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all
> prior
> > > > MVCC
> > > > > > transactions to finish
> > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> > > > > > 4. get previous values
> > > > > > 5. create KVs
> > > > > > 6. write to Memstore
> > > > > > 7. write to WAL
> > > > > > 8. release row lock
> > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> > > transaction
> > > > > >
> > > > > > A instance of MultiVersionConsistencyControl has a pending queue
> of
> > > > > writes
> > > > > > named writeQueue.
> > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
> writeQueue
> > > is
> > > > > > empty or writeQueue.getFirst() == w.
> > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the
> > > > WriteEntry
> > > > > > from writeQueue.
> > > > > >
> > > > > > I think that when a handler thread is processing between step 2
> and
> > > > step
> > > > > 9,
> > > > > > the other handler threads can wait until the thread completes
> step
> > 9.
> > > > > >
> > > > > >
> > > > > That is right. We need to read, after all outstanding updates are
> > > done...
> > > > > because we need to read the latest update before we go to
> > > > modify/increment
> > > > > it.
> > > > >
> > > > > How do you make out this?
> > > > >
> > > > > "A region lock (not a row lock) seems to occur in
> > > > > waitForPreviousTransactionsComplete()."
> > > > >
> > > > > In 0.98.x we did this:
> > > > >
> > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > >
> > > > > ... and in 1.0 we do this:
> > > > >
> > > > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > > > >
> > > > > +  public void waitForPreviousTransactionsComplete() {
> > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > +  }
> > > > >
> > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc
> and
> > > > > region
> > > > > sequenceid would spin independent of each other. Perhaps this
> > > responsible
> > > > > for some slow down.
> > > > >
> > > > > That said, looking in your thread dump, we seem to be down in the
> > Get.
> > > If
> > > > > you do a bunch of thread dumps in a row, where is the lock-holding
> > > > thread?
> > > > > In Get or writing Increment... or waiting on sequence id?
> > > > >
> > > > > Is it possible you are contending on a counter post-upgrade?  Is it
> > > > > possible that all these threads are trying to get to the same row
> to
> > > > update
> > > > > it? Could the app behavior have changed?  Or are you thinking
> > increment
> > > > > itself has slowed significantly?
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Toshihiro Suzuki
> > > > > >
> > > > > >
> > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > > > > >
> > > > > > > In HRegion#increment(), we lock the row (not region):
> > > > > > >
> > > > > > >     try {
> > > > > > >       rowLock = getRowLock(row);
> > > > > > >
> > > > > > > Can you pastebin the complete stack trace ?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com>
> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > and we experience slowdown in increment operation.
> > > > > > > >
> > > > > > > > Here's an extract from thread dump of the RegionServer of our
> > > > > cluster:
> > > > > > > >
> > > > > > > > Thread 68
> > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > > >   State: BLOCKED
> > > > > > > >   Blocked count: 21689888
> > > > > > > >   Waited count: 39828360
> > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > > >   Blocked by 63
> > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > > >   Stack:
> > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > > >
> > >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > > >
> > >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > > >
> > > > >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > > >
> > > > > > > > There are many similar threads in the thread dump.
> > > > > > > >
> > > > > > > > I read the source code and I think this is caused by changes
> of
> > > > > > > > MultiVersionConsistencyControl.
> > > > > > > > A region lock (not a row lock) seems to occur in
> > > > > > > > waitForPreviousTransactionsComplete().
> > > > > > > >
> > > > > > > >
> > > > > > > > Also we wrote performance test code for increment operation
> > that
> > > > > > included
> > > > > > > > 100 threads and ran it in local mode.
> > > > > > > >
> > > > > > > > The result is shown below:
> > > > > > > >
> > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > > > >
> > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Toshihiro Suzuki
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

On Mon, Nov 30, 2015 at 9:16 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> I'll try to get another one.  We are currently not seeing the issue due to
> lack of contention (it is off hours for our customers).
>
> Note that the stack trace I gave you was taken with a tool we have which
> aggregates common stacks. The one at the bottom occurred 122 times (out of
> 128 handlers -- this is pre-tuning when we added 1000 handlers and the read
> vs write).  So to me it looks like 122 of 128 handlers were waiting on:
>
> if (!existingContext.latch.await(this.rowLockWaitDuration,
> TimeUnit.MILLISECONDS)) {
>   throw new IOException("Timed out waiting for lock for row: " + rowKey);
> }
>
>
>
That would explain the nice numbers. Looks like useful tool. Can I have
unadulterated trace? All increments are trying to go against same row?  If
only the one thread doing an actual increment and it is up in the memstore
doing a family compare, is it stuck not letting anyone else in? Only using
5% CPU though... so maybe not....

St.Ack




> On Tue, Dec 1, 2015 at 12:08 AM Stack <st...@duboce.net> wrote:
>
> > Looking at that stack trace, nothing showing as blocked or slowed by
> > another operation. You have others I could look at Bryan?
> > St.Ack
> >
> > On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com
> > > wrote:
> >
> > > Yea sorry if I was misleading.  The nonce loglines we saw only happened
> > on
> > > full cluster restart, it may have been the HLog's replaying, not sure.
> > >
> > > We are still seeing slow Increments. Where Gets and Mutates will be on
> > the
> > > order of 50-150ms according to metrics, Increment will be in the
> > > 1000-5000ms range. It seems we may be blocking on FSHLog#syncer.
> > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
> > >
> > >
> > >
> > > On Mon, Nov 30, 2015 at 11:26 PM Stack <st...@duboce.net> wrote:
> > >
> > > > Still slow increments though?
> > > >
> > > > On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com
> > > > > wrote:
> > > >
> > > > > Those log lines have settled down, they may have been related to a
> > > > > cluster-wide forced restart at the time.
> > > > >
> > > > > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> > > > > bbeaudreault@hubspot.com>
> > > > > wrote:
> > > > >
> > > > > > We've been doing more debugging of this and have set up the read
> vs
> > > > write
> > > > > > handlers to try to at least segment this away so reads can work.
> We
> > > > have
> > > > > > pretty beefy servers, and are running wiht the following
> settings:
> > > > > >
> > > > > > hbase.regionserver.handler.count=1000
> > > > > > hbase.ipc.server.read.threadpool.size=50
> > > > > > hbase.ipc.server.callqueue.handler.factor=0.025
> > > > > > hbase.ipc.server.callqueue.read.ratio=0.6
> > > > > > hbase.ipc.server.callqueue.scan.ratio=0.5
> > > > > >
> > > > > > We are seeing all 400 write handlers taken up by row locks for
> the
> > > most
> > > > > > part. The read handlers are mostly idle. We're thinking of
> changing
> > > the
> > > > > > ratio here, but are not sure it will help if they are all blocked
> > on
> > > a
> > > > > row
> > > > > > lock.  We just enabled DEBUG logging on all our servers and
> notice
> > > the
> > > > > > following:
> > > > > >
> > > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > > detected
> > > > > > by nonce: [-687451119961178644:7664336281906118656], [state 0,
> > > hasWait
> > > > > > false, activity 00:54:36.240]
> > > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > > detected
> > > > > > by nonce: [-687451119961178644:-7119840249342174227], [state 0,
> > > hasWait
> > > > > > false, activity 00:54:36.256]
> > > > > > 2015-12-01 00:56:09,268 DEBUG
> > > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > > detected
> > > > > > by nonce: [-5946137511131403479:2112661701888365489], [state 0,
> > > hasWait
> > > > > > false, activity 00:55:01.259]
> > > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > > detected
> > > > > > by nonce: [4165332617675853029:6256955295384472057], [state 0,
> > > hasWait
> > > > > > false, activity 00:53:58.151]
> > > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > > detected
> > > > > > by nonce: [4165332617675853029:4961178013070912522], [state 0,
> > > hasWait
> > > > > > false, activity 00:53:58.162]
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > > > > > bbeaudreault@hubspot.com> wrote:
> > > > > >
> > > > > >> Sorry the second link should be
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> > > > > >>
> > > > > >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> > > > > >> bbeaudreault@hubspot.com> wrote:
> > > > > >>
> > > > > >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> > > > > >>>
> > > > > >>> An active handler:
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > > > > >>> One that is locked:
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> > > > > >>>
> > > > > >>> The difference between pre-rollback and post is that previously
> > we
> > > > were
> > > > > >>> seeing things blocked in mvcc.  Now we are seeing them blocked
> on
> > > the
> > > > > >>> upsert.
> > > > > >>>
> > > > > >>> It always follows the same pattern, of 1 active handler in the
> > > upsert
> > > > > >>> and the rest blocked waiting for it.
> > > > > >>>
> > > > > >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net>
> wrote:
> > > > > >>>
> > > > > >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > > > > >>>> bbeaudreault@hubspot.com
> > > > > >>>> > wrote:
> > > > > >>>>
> > > > > >>>> > The rollback seems to have mostly solved the issue for one
> of
> > > our
> > > > > >>>> clusters,
> > > > > >>>> > but another one is still seeing long increment times:
> > > > > >>>> >
> > > > > >>>> > "slowIncrementCount": 52080,
> > > > > >>>> > "Increment_num_ops": 325236,"Increment_min":
> > 1,"Increment_max":
> > > > > 6162,"
> > > > > >>>> > Increment_mean": 465.68678129112396,"Increment_median":
> 216,"
> > > > > >>>> > Increment_75th_percentile":
> > 450.25,"Increment_95th_percentile":
> > > > > >>>> > 1052.6499999999999,"Increment_99th_percentile":
> > > 1635.2399999999998
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > Any ideas if there are other changes that may be causing a
> > > > > performance
> > > > > >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> No.
> > > > > >>>>
> > > > > >>>> Post a thread dump Bryan and it might prompt something.
> > > > > >>>>
> > > > > >>>> St.Ack
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> >
> > > > > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net>
> > wrote:
> > > > > >>>> >
> > > > > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > > > >>>> > > bbeaudreault@hubspot.com> wrote:
> > > > > >>>> > >
> > > > > >>>> > > > Should this be added as a known issue in the CDH or
> hbase
> > > > > >>>> > documentation?
> > > > > >>>> > > It
> > > > > >>>> > > > was a severe performance hit for us, all of our
> > > regionservers
> > > > > were
> > > > > >>>> > > sitting
> > > > > >>>> > > > at a few thousand queued requests.
> > > > > >>>> > > >
> > > > > >>>> > > >
> > > > > >>>> > > Let me take care of that.
> > > > > >>>> > > St.Ack
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > > >>>> > > > bbeaudreault@hubspot.com>
> > > > > >>>> > > > wrote:
> > > > > >>>> > > >
> > > > > >>>> > > > > Yea, they are all over the place and called from
> client
> > > and
> > > > > >>>> > coprocessor
> > > > > >>>> > > > > code. We ended up having no other option but to
> > rollback,
> > > > and
> > > > > >>>> aside
> > > > > >>>> > > from
> > > > > >>>> > > > a
> > > > > >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > > > > >>>> Put#addColumn),
> > > > > >>>> > > it
> > > > > >>>> > > > > seems to be working and fixing our problem.
> > > > > >>>> > > > >
> > > > > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <
> stack@duboce.net
> > >
> > > > > wrote:
> > > > > >>>> > > > >
> > > > > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to
> > work
> > > on
> > > > > >>>> this
> > > > > >>>> > now.
> > > > > >>>> > > > >> Where
> > > > > >>>> > > > >> are your counters Bryan? In their own column family
> or
> > > > > >>>> scattered
> > > > > >>>> > about
> > > > > >>>> > > > in
> > > > > >>>> > > > >> a
> > > > > >>>> > > > >> row with other Cell types?
> > > > > >>>> > > > >> St.Ack
> > > > > >>>> > > > >>
> > > > > >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > > >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> > > > > >>>> > > > >>
> > > > > >>>> > > > >> > Is there any update to this? We just upgraded all
> of
> > > our
> > > > > >>>> > production
> > > > > >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this
> > > JIRA
> > > > > >>>> listed in
> > > > > >>>> > > the
> > > > > >>>> > > > >> > known issues, did not not about this.  Now we are
> > > seeing
> > > > > >>>> > perfomance
> > > > > >>>> > > > >> issues
> > > > > >>>> > > > >> > across all clusters, as we make heavy use of
> > > increments.
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our
> only
> > > hope
> > > > > to
> > > > > >>>> roll
> > > > > >>>> > > back
> > > > > >>>> > > > >> to
> > > > > >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <
> > > brfrn169@gmail.com
> > > > >
> > > > > >>>> wrote:
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >> > > Thank you St.Ack!
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> > > I would like to follow the ticket.
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> > > Toshihiro Suzuki
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <
> stack@duboce.net
> > >:
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> > > > Back to this problem. Simple tests confirm that
> > as
> > > > is,
> > > > > >>>> the
> > > > > >>>> > > > >> > > > single-queue-backed MVCC instance can slow
> Region
> > > ops
> > > > > if
> > > > > >>>> some
> > > > > >>>> > > > other
> > > > > >>>> > > > >> row
> > > > > >>>> > > > >> > > is
> > > > > >>>> > > > >> > > > slow to complete. In particular Increment,
> > > > checkAndPut,
> > > > > >>>> and
> > > > > >>>> > > batch
> > > > > >>>> > > > >> > > mutations
> > > > > >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in
> > on a
> > > > fix
> > > > > >>>> up.
> > > > > >>>> > Lets
> > > > > >>>> > > > >> see if
> > > > > >>>> > > > >> > > we
> > > > > >>>> > > > >> > > > can somehow scope mvcc to row or at least shard
> > > mvcc
> > > > so
> > > > > >>>> not
> > > > > >>>> > all
> > > > > >>>> > > > >> Region
> > > > > >>>> > > > >> > > ops
> > > > > >>>> > > > >> > > > are paused.
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > > > St.Ack
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> > > > > >>>> brfrn169@gmail.com>
> > > > > >>>> > > wrote:
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > > accompanying
> > > > > >>>> > helpful
> > > > > >>>> > > > >> > > diagram).
> > > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> > help
> > > > > with
> > > > > >>>> the
> > > > > >>>> > > > >> > > illustration.
> > > > > >>>> > > > >> > > > It
> > > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> > row
> > > > > >>>> only...
> > > > > >>>> > > Writes
> > > > > >>>> > > > >> > > against
> > > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > > row.
> > > > > Tag
> > > > > >>>> an
> > > > > >>>> > mvcc
> > > > > >>>> > > > >> with a
> > > > > >>>> > > > >> > > > 'row'
> > > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > > pertain
> > > > > to
> > > > > >>>> > current
> > > > > >>>> > > > >> > > operation?
> > > > > >>>> > > > >> > > > > Thank you St.Ack! I think this approach would
> > > work.
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > > > You need to read back the increment and
> have
> > it
> > > > be
> > > > > >>>> > 'correct'
> > > > > >>>> > > > at
> > > > > >>>> > > > >> > > > increment
> > > > > >>>> > > > >> > > > > > time?
> > > > > >>>> > > > >> > > > > Yes, we need it.
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > > I would like to help if there is anything I
> can
> > > do.
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > > Thanks,
> > > > > >>>> > > > >> > > > > Toshihiro Suzuki
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <
> > > stack@duboce.net
> > > > >:
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > > accompanying
> > > > > >>>> > helpful
> > > > > >>>> > > > >> > > diagram).
> > > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> > help
> > > > > with
> > > > > >>>> the
> > > > > >>>> > > > >> > > illustration.
> > > > > >>>> > > > >> > > > It
> > > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> > row
> > > > > >>>> only...
> > > > > >>>> > > Writes
> > > > > >>>> > > > >> > > against
> > > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > > row.
> > > > > Tag
> > > > > >>>> an
> > > > > >>>> > mvcc
> > > > > >>>> > > > >> with a
> > > > > >>>> > > > >> > > > 'row'
> > > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > > pertain
> > > > > to
> > > > > >>>> > current
> > > > > >>>> > > > >> > > operation?
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > > > You need to read back the increment and
> have
> > it
> > > > be
> > > > > >>>> > 'correct'
> > > > > >>>> > > > at
> > > > > >>>> > > > >> > > > increment
> > > > > >>>> > > > >> > > > > > time?
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > > > (This is a good one)
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > > >>>> > > > >> > > > > > St.Ack
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > > > > >>>> brfrn169@gmail.com
> > > > > >>>> > >
> > > > > >>>> > > > >> wrote:
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > > > > St.Ack,
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Thank you for your response.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a
> > row
> > > > > lock)
> > > > > >>>> > seems
> > > > > >>>> > > to
> > > > > >>>> > > > >> > occur
> > > > > >>>> > > > >> > > in
> > > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is
> > as
> > > > > >>>> follows:
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > A increment operation has 3 procedures
> for
> > > > MVCC.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > 1.
> > > mvcc.waitForPreviousTransactionsComplete();
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > 2. w =
> > > > > mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > 3.
> mvcc.completeMemstoreInsertWithSeqNum(w,
> > > > > >>>> walKey);
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > I think that
> > MultiVersionConsistencyControl's
> > > > > >>>> writeQueue
> > > > > >>>> > > can
> > > > > >>>> > > > >> > cause
> > > > > >>>> > > > >> > > a
> > > > > >>>> > > > >> > > > > > region
> > > > > >>>> > > > >> > > > > > > lock.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to
> writeQueue.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from
> > > writeQueue.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w,
> walKey)
> > > ->
> > > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > > > >>>> > > advanceMemstore(w)
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> > > > > >>>> beginMemstoreInsert() to
> > > > > >>>> > > > >> writeQueue
> > > > > >>>> > > > >> > > and
> > > > > >>>> > > > >> > > > > > waits
> > > > > >>>> > > > >> > > > > > > until writeQueue is empty or
> > > > > writeQueue.getFirst()
> > > > > >>>> == w.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > I think when a handler thread is
> processing
> > > > > between
> > > > > >>>> > step 2
> > > > > >>>> > > > and
> > > > > >>>> > > > >> > step
> > > > > >>>> > > > >> > > > 3,
> > > > > >>>> > > > >> > > > > > the
> > > > > >>>> > > > >> > > > > > > other handler threads can wait at step 1
> > > until
> > > > > the
> > > > > >>>> > thread
> > > > > >>>> > > > >> > completes
> > > > > >>>> > > > >> > > > > step
> > > > > >>>> > > > >> > > > > > 3
> > > > > >>>> > > > >> > > > > > > This is depicted as follows:
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Actually, in the thread dump of our
> region
> > > > > server,
> > > > > >>>> many
> > > > > >>>> > > > >> handler
> > > > > >>>> > > > >> > > > threads
> > > > > >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait
> at
> > > > Step
> > > > > 1
> > > > > >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Many handler threads wait at this:
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > > counter
> > > > > >>>> > > > post-upgrade?
> > > > > >>>> > > > >> > Is
> > > > > >>>> > > > >> > > it
> > > > > >>>> > > > >> > > > > > > > possible that all these threads are
> > trying
> > > to
> > > > > >>>> get to
> > > > > >>>> > the
> > > > > >>>> > > > >> same
> > > > > >>>> > > > >> > row
> > > > > >>>> > > > >> > > > to
> > > > > >>>> > > > >> > > > > > > update
> > > > > >>>> > > > >> > > > > > > > it? Could the app behavior have
> changed?
> > > Or
> > > > > are
> > > > > >>>> you
> > > > > >>>> > > > >> thinking
> > > > > >>>> > > > >> > > > > increment
> > > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > > >>>> > > > >> > > > > > > We have just upgraded HBase, not changed
> > the
> > > > app
> > > > > >>>> > behavior.
> > > > > >>>> > > > We
> > > > > >>>> > > > >> are
> > > > > >>>> > > > >> > > > > > thinking
> > > > > >>>> > > > >> > > > > > > increment itself has slowed
> significantly.
> > > > > >>>> > > > >> > > > > > > Before upgrading HBase, it was good
> > > throughput
> > > > > and
> > > > > >>>> > > latency.
> > > > > >>>> > > > >> > > > > > > Currently, to cope with this problem, we
> > > split
> > > > > the
> > > > > >>>> > regions
> > > > > >>>> > > > >> > finely.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Thanks,
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Toshihiro Suzuki
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> > > > > stack@duboce.net
> > > > > >>>> >:
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > > > >>>> > > brfrn169@gmail.com
> > > > > >>>> > > > >
> > > > > >>>> > > > >> > > wrote:
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > Ted,
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > Thank you for your response.
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace
> to
> > > > Gist.
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > I think that increment operation
> works
> > as
> > > > > >>>> follows:
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > 1. get row lock
> > > > > >>>> > > > >> > > > > > > > > 2.
> > > > mvcc.waitForPreviousTransactionsComplete()
> > > > > >>>> //
> > > > > >>>> > wait
> > > > > >>>> > > > for
> > > > > >>>> > > > >> all
> > > > > >>>> > > > >> > > > prior
> > > > > >>>> > > > >> > > > > > > MVCC
> > > > > >>>> > > > >> > > > > > > > > transactions to finish
> > > > > >>>> > > > >> > > > > > > > > 3.
> mvcc.beginMemstoreInsertWithSeqNum()
> > > //
> > > > > >>>> start a
> > > > > >>>> > > > >> > transaction
> > > > > >>>> > > > >> > > > > > > > > 4. get previous values
> > > > > >>>> > > > >> > > > > > > > > 5. create KVs
> > > > > >>>> > > > >> > > > > > > > > 6. write to Memstore
> > > > > >>>> > > > >> > > > > > > > > 7. write to WAL
> > > > > >>>> > > > >> > > > > > > > > 8. release row lock
> > > > > >>>> > > > >> > > > > > > > > 9.
> > > mvcc.completeMemstoreInsertWithSeqNum()
> > > > //
> > > > > >>>> > complete
> > > > > >>>> > > > the
> > > > > >>>> > > > >> > > > > > transaction
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > A instance of
> > > > MultiVersionConsistencyControl
> > > > > >>>> has a
> > > > > >>>> > > > pending
> > > > > >>>> > > > >> > > queue
> > > > > >>>> > > > >> > > > of
> > > > > >>>> > > > >> > > > > > > > writes
> > > > > >>>> > > > >> > > > > > > > > named writeQueue.
> > > > > >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to
> > writeQueue
> > > > and
> > > > > >>>> waits
> > > > > >>>> > > until
> > > > > >>>> > > > >> > > > writeQueue
> > > > > >>>> > > > >> > > > > > is
> > > > > >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > > >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to
> writeQueue
> > > and
> > > > > >>>> step 9
> > > > > >>>> > > > removes
> > > > > >>>> > > > >> the
> > > > > >>>> > > > >> > > > > > > WriteEntry
> > > > > >>>> > > > >> > > > > > > > > from writeQueue.
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > I think that when a handler thread is
> > > > > >>>> processing
> > > > > >>>> > > between
> > > > > >>>> > > > >> > step 2
> > > > > >>>> > > > >> > > > and
> > > > > >>>> > > > >> > > > > > > step
> > > > > >>>> > > > >> > > > > > > > 9,
> > > > > >>>> > > > >> > > > > > > > > the other handler threads can wait
> > until
> > > > the
> > > > > >>>> thread
> > > > > >>>> > > > >> completes
> > > > > >>>> > > > >> > > > step
> > > > > >>>> > > > >> > > > > 9.
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > That is right. We need to read, after
> all
> > > > > >>>> outstanding
> > > > > >>>> > > > >> updates
> > > > > >>>> > > > >> > are
> > > > > >>>> > > > >> > > > > > done...
> > > > > >>>> > > > >> > > > > > > > because we need to read the latest
> update
> > > > > before
> > > > > >>>> we go
> > > > > >>>> > > to
> > > > > >>>> > > > >> > > > > > > modify/increment
> > > > > >>>> > > > >> > > > > > > > it.
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > How do you make out this?
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems
> to
> > > > occur
> > > > > in
> > > > > >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > In 0.98.x we did this:
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > mvcc.waitForPreviousTransactionsComplete()
> > > > > which
> > > > > >>>> is
> > > > > >>>> > > > this....
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > +  public void
> > > > > >>>> waitForPreviousTransactionsComplete() {
> > > > > >>>> > > > >> > > > > > > > +    WriteEntry w =
> > beginMemstoreInsert();
> > > > > >>>> > > > >> > > > > > > > +
> > > waitForPreviousTransactionsComplete(w);
> > > > > >>>> > > > >> > > > > > > > +  }
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were
> > merged
> > > in
> > > > > >>>> 1.0 (
> > > > > >>>> > > > >> > > > > > > >
> > > > > https://issues.apache.org/jira/browse/HBASE-8763
> > > > > >>>> ).
> > > > > >>>> > > > Previous
> > > > > >>>> > > > >> > mvcc
> > > > > >>>> > > > >> > > > and
> > > > > >>>> > > > >> > > > > > > > region
> > > > > >>>> > > > >> > > > > > > > sequenceid would spin independent of
> each
> > > > > other.
> > > > > >>>> > Perhaps
> > > > > >>>> > > > >> this
> > > > > >>>> > > > >> > > > > > responsible
> > > > > >>>> > > > >> > > > > > > > for some slow down.
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > That said, looking in your thread dump,
> > we
> > > > seem
> > > > > >>>> to be
> > > > > >>>> > > down
> > > > > >>>> > > > >> in
> > > > > >>>> > > > >> > the
> > > > > >>>> > > > >> > > > > Get.
> > > > > >>>> > > > >> > > > > > If
> > > > > >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a
> row,
> > > > where
> > > > > >>>> is the
> > > > > >>>> > > > >> > > lock-holding
> > > > > >>>> > > > >> > > > > > > thread?
> > > > > >>>> > > > >> > > > > > > > In Get or writing Increment... or
> waiting
> > > on
> > > > > >>>> sequence
> > > > > >>>> > > id?
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > > counter
> > > > > >>>> > > > post-upgrade?
> > > > > >>>> > > > >> > Is
> > > > > >>>> > > > >> > > it
> > > > > >>>> > > > >> > > > > > > > possible that all these threads are
> > trying
> > > to
> > > > > >>>> get to
> > > > > >>>> > the
> > > > > >>>> > > > >> same
> > > > > >>>> > > > >> > row
> > > > > >>>> > > > >> > > > to
> > > > > >>>> > > > >> > > > > > > update
> > > > > >>>> > > > >> > > > > > > > it? Could the app behavior have
> changed?
> > > Or
> > > > > are
> > > > > >>>> you
> > > > > >>>> > > > >> thinking
> > > > > >>>> > > > >> > > > > increment
> > > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > St.Ack
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > Thanks,
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > > > >>>> > yuzhihong@gmail.com
> > > > > >>>> > > >:
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the
> > row
> > > > > (not
> > > > > >>>> > > region):
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >     try {
> > > > > >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack
> > > > trace ?
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > Thanks
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM,
> 鈴木俊裕
> > <
> > > > > >>>> > > > >> brfrn169@gmail.com>
> > > > > >>>> > > > >> > > > wrote:
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Hi,
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> > > > > >>>> > CDH5.3.1(HBase0.98.6)
> > > > > >>>> > > > to
> > > > > >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > >>>> > > > >> > > > > > > > > > > and we experience slowdown in
> > > increment
> > > > > >>>> > operation.
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Here's an extract from thread
> dump
> > of
> > > > the
> > > > > >>>> > > > >> RegionServer of
> > > > > >>>> > > > >> > > our
> > > > > >>>> > > > >> > > > > > > > cluster:
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Thread 68
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> > > > > >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > > >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > > >>>> > > > >> > > > > > > > > > >   Blocked on
> > > > > java.util.LinkedList@3474e4b2
> > > > > >>>> > > > >> > > > > > > > > > >   Blocked by 63
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > >
> (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > >>>> > > > >> > > > > > > > > > >   Stack:
> > > > > >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native
> > > > Method)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > >
> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > >
> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > >
> > > > > >>>>
> > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > >  java.lang.Thread.run(Thread.java:745)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > There are many similar threads in
> > the
> > > > > >>>> thread
> > > > > >>>> > dump.
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > I read the source code and I
> think
> > > this
> > > > > is
> > > > > >>>> > caused
> > > > > >>>> > > by
> > > > > >>>> > > > >> > > changes
> > > > > >>>> > > > >> > > > of
> > > > > >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > > >>>> > > > >> > > > > > > > > > > A region lock (not a row lock)
> > seems
> > > to
> > > > > >>>> occur in
> > > > > >>>> > > > >> > > > > > > > > > >
> > > waitForPreviousTransactionsComplete().
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Also we wrote performance test
> code
> > > for
> > > > > >>>> > increment
> > > > > >>>> > > > >> > operation
> > > > > >>>> > > > >> > > > > that
> > > > > >>>> > > > >> > > > > > > > > included
> > > > > >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local
> > mode.
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > The result is shown below:
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757,
> > Latency(ms):
> > > > > >>>> > > > >> 7.975072509210629
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027,
> > Latency(ms):
> > > > > >>>> > > > 49.11840157868772
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Thanks,
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

On Mon, Nov 30, 2015 at 9:16 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> I'll try to get another one.  We are currently not seeing the issue due to
> lack of contention (it is off hours for our customers).
>
> Note that the stack trace I gave you was taken with a tool we have which
> aggregates common stacks. The one at the bottom occurred 122 times (out of
> 128 handlers -- this is pre-tuning when we added 1000 handlers and the read
> vs write).  So to me it looks like 122 of 128 handlers were waiting on:
>
> if (!existingContext.latch.await(this.rowLockWaitDuration,
> TimeUnit.MILLISECONDS)) {
>   throw new IOException("Timed out waiting for lock for row: " + rowKey);
> }
>
>
>
That would explain the nice numbers. Looks like useful tool. Can I have
unadulterated trace? All increments are trying to go against same row?  If
only the one thread doing an actual increment and it is up in the memstore
doing a family compare, is it stuck not letting anyone else in? Only using
5% CPU though... so maybe not....

St.Ack




> On Tue, Dec 1, 2015 at 12:08 AM Stack <st...@duboce.net> wrote:
>
> > Looking at that stack trace, nothing showing as blocked or slowed by
> > another operation. You have others I could look at Bryan?
> > St.Ack
> >
> > On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com
> > > wrote:
> >
> > > Yea sorry if I was misleading.  The nonce loglines we saw only happened
> > on
> > > full cluster restart, it may have been the HLog's replaying, not sure.
> > >
> > > We are still seeing slow Increments. Where Gets and Mutates will be on
> > the
> > > order of 50-150ms according to metrics, Increment will be in the
> > > 1000-5000ms range. It seems we may be blocking on FSHLog#syncer.
> > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
> > >
> > >
> > >
> > > On Mon, Nov 30, 2015 at 11:26 PM Stack <st...@duboce.net> wrote:
> > >
> > > > Still slow increments though?
> > > >
> > > > On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com
> > > > > wrote:
> > > >
> > > > > Those log lines have settled down, they may have been related to a
> > > > > cluster-wide forced restart at the time.
> > > > >
> > > > > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> > > > > bbeaudreault@hubspot.com>
> > > > > wrote:
> > > > >
> > > > > > We've been doing more debugging of this and have set up the read
> vs
> > > > write
> > > > > > handlers to try to at least segment this away so reads can work.
> We
> > > > have
> > > > > > pretty beefy servers, and are running wiht the following
> settings:
> > > > > >
> > > > > > hbase.regionserver.handler.count=1000
> > > > > > hbase.ipc.server.read.threadpool.size=50
> > > > > > hbase.ipc.server.callqueue.handler.factor=0.025
> > > > > > hbase.ipc.server.callqueue.read.ratio=0.6
> > > > > > hbase.ipc.server.callqueue.scan.ratio=0.5
> > > > > >
> > > > > > We are seeing all 400 write handlers taken up by row locks for
> the
> > > most
> > > > > > part. The read handlers are mostly idle. We're thinking of
> changing
> > > the
> > > > > > ratio here, but are not sure it will help if they are all blocked
> > on
> > > a
> > > > > row
> > > > > > lock.  We just enabled DEBUG logging on all our servers and
> notice
> > > the
> > > > > > following:
> > > > > >
> > > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > > detected
> > > > > > by nonce: [-687451119961178644:7664336281906118656], [state 0,
> > > hasWait
> > > > > > false, activity 00:54:36.240]
> > > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > > detected
> > > > > > by nonce: [-687451119961178644:-7119840249342174227], [state 0,
> > > hasWait
> > > > > > false, activity 00:54:36.256]
> > > > > > 2015-12-01 00:56:09,268 DEBUG
> > > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > > detected
> > > > > > by nonce: [-5946137511131403479:2112661701888365489], [state 0,
> > > hasWait
> > > > > > false, activity 00:55:01.259]
> > > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > > detected
> > > > > > by nonce: [4165332617675853029:6256955295384472057], [state 0,
> > > hasWait
> > > > > > false, activity 00:53:58.151]
> > > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > > detected
> > > > > > by nonce: [4165332617675853029:4961178013070912522], [state 0,
> > > hasWait
> > > > > > false, activity 00:53:58.162]
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > > > > > bbeaudreault@hubspot.com> wrote:
> > > > > >
> > > > > >> Sorry the second link should be
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> > > > > >>
> > > > > >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> > > > > >> bbeaudreault@hubspot.com> wrote:
> > > > > >>
> > > > > >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> > > > > >>>
> > > > > >>> An active handler:
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > > > > >>> One that is locked:
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> > > > > >>>
> > > > > >>> The difference between pre-rollback and post is that previously
> > we
> > > > were
> > > > > >>> seeing things blocked in mvcc.  Now we are seeing them blocked
> on
> > > the
> > > > > >>> upsert.
> > > > > >>>
> > > > > >>> It always follows the same pattern, of 1 active handler in the
> > > upsert
> > > > > >>> and the rest blocked waiting for it.
> > > > > >>>
> > > > > >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net>
> wrote:
> > > > > >>>
> > > > > >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > > > > >>>> bbeaudreault@hubspot.com
> > > > > >>>> > wrote:
> > > > > >>>>
> > > > > >>>> > The rollback seems to have mostly solved the issue for one
> of
> > > our
> > > > > >>>> clusters,
> > > > > >>>> > but another one is still seeing long increment times:
> > > > > >>>> >
> > > > > >>>> > "slowIncrementCount": 52080,
> > > > > >>>> > "Increment_num_ops": 325236,"Increment_min":
> > 1,"Increment_max":
> > > > > 6162,"
> > > > > >>>> > Increment_mean": 465.68678129112396,"Increment_median":
> 216,"
> > > > > >>>> > Increment_75th_percentile":
> > 450.25,"Increment_95th_percentile":
> > > > > >>>> > 1052.6499999999999,"Increment_99th_percentile":
> > > 1635.2399999999998
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > Any ideas if there are other changes that may be causing a
> > > > > performance
> > > > > >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> No.
> > > > > >>>>
> > > > > >>>> Post a thread dump Bryan and it might prompt something.
> > > > > >>>>
> > > > > >>>> St.Ack
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> >
> > > > > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net>
> > wrote:
> > > > > >>>> >
> > > > > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > > > >>>> > > bbeaudreault@hubspot.com> wrote:
> > > > > >>>> > >
> > > > > >>>> > > > Should this be added as a known issue in the CDH or
> hbase
> > > > > >>>> > documentation?
> > > > > >>>> > > It
> > > > > >>>> > > > was a severe performance hit for us, all of our
> > > regionservers
> > > > > were
> > > > > >>>> > > sitting
> > > > > >>>> > > > at a few thousand queued requests.
> > > > > >>>> > > >
> > > > > >>>> > > >
> > > > > >>>> > > Let me take care of that.
> > > > > >>>> > > St.Ack
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > >
> > > > > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > > >>>> > > > bbeaudreault@hubspot.com>
> > > > > >>>> > > > wrote:
> > > > > >>>> > > >
> > > > > >>>> > > > > Yea, they are all over the place and called from
> client
> > > and
> > > > > >>>> > coprocessor
> > > > > >>>> > > > > code. We ended up having no other option but to
> > rollback,
> > > > and
> > > > > >>>> aside
> > > > > >>>> > > from
> > > > > >>>> > > > a
> > > > > >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > > > > >>>> Put#addColumn),
> > > > > >>>> > > it
> > > > > >>>> > > > > seems to be working and fixing our problem.
> > > > > >>>> > > > >
> > > > > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <
> stack@duboce.net
> > >
> > > > > wrote:
> > > > > >>>> > > > >
> > > > > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to
> > work
> > > on
> > > > > >>>> this
> > > > > >>>> > now.
> > > > > >>>> > > > >> Where
> > > > > >>>> > > > >> are your counters Bryan? In their own column family
> or
> > > > > >>>> scattered
> > > > > >>>> > about
> > > > > >>>> > > > in
> > > > > >>>> > > > >> a
> > > > > >>>> > > > >> row with other Cell types?
> > > > > >>>> > > > >> St.Ack
> > > > > >>>> > > > >>
> > > > > >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > > >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> > > > > >>>> > > > >>
> > > > > >>>> > > > >> > Is there any update to this? We just upgraded all
> of
> > > our
> > > > > >>>> > production
> > > > > >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this
> > > JIRA
> > > > > >>>> listed in
> > > > > >>>> > > the
> > > > > >>>> > > > >> > known issues, did not not about this.  Now we are
> > > seeing
> > > > > >>>> > perfomance
> > > > > >>>> > > > >> issues
> > > > > >>>> > > > >> > across all clusters, as we make heavy use of
> > > increments.
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our
> only
> > > hope
> > > > > to
> > > > > >>>> roll
> > > > > >>>> > > back
> > > > > >>>> > > > >> to
> > > > > >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <
> > > brfrn169@gmail.com
> > > > >
> > > > > >>>> wrote:
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >> > > Thank you St.Ack!
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> > > I would like to follow the ticket.
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> > > Toshihiro Suzuki
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <
> stack@duboce.net
> > >:
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> > > > Back to this problem. Simple tests confirm that
> > as
> > > > is,
> > > > > >>>> the
> > > > > >>>> > > > >> > > > single-queue-backed MVCC instance can slow
> Region
> > > ops
> > > > > if
> > > > > >>>> some
> > > > > >>>> > > > other
> > > > > >>>> > > > >> row
> > > > > >>>> > > > >> > > is
> > > > > >>>> > > > >> > > > slow to complete. In particular Increment,
> > > > checkAndPut,
> > > > > >>>> and
> > > > > >>>> > > batch
> > > > > >>>> > > > >> > > mutations
> > > > > >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in
> > on a
> > > > fix
> > > > > >>>> up.
> > > > > >>>> > Lets
> > > > > >>>> > > > >> see if
> > > > > >>>> > > > >> > > we
> > > > > >>>> > > > >> > > > can somehow scope mvcc to row or at least shard
> > > mvcc
> > > > so
> > > > > >>>> not
> > > > > >>>> > all
> > > > > >>>> > > > >> Region
> > > > > >>>> > > > >> > > ops
> > > > > >>>> > > > >> > > > are paused.
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > > > St.Ack
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> > > > > >>>> brfrn169@gmail.com>
> > > > > >>>> > > wrote:
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > > accompanying
> > > > > >>>> > helpful
> > > > > >>>> > > > >> > > diagram).
> > > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> > help
> > > > > with
> > > > > >>>> the
> > > > > >>>> > > > >> > > illustration.
> > > > > >>>> > > > >> > > > It
> > > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> > row
> > > > > >>>> only...
> > > > > >>>> > > Writes
> > > > > >>>> > > > >> > > against
> > > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > > row.
> > > > > Tag
> > > > > >>>> an
> > > > > >>>> > mvcc
> > > > > >>>> > > > >> with a
> > > > > >>>> > > > >> > > > 'row'
> > > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > > pertain
> > > > > to
> > > > > >>>> > current
> > > > > >>>> > > > >> > > operation?
> > > > > >>>> > > > >> > > > > Thank you St.Ack! I think this approach would
> > > work.
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > > > You need to read back the increment and
> have
> > it
> > > > be
> > > > > >>>> > 'correct'
> > > > > >>>> > > > at
> > > > > >>>> > > > >> > > > increment
> > > > > >>>> > > > >> > > > > > time?
> > > > > >>>> > > > >> > > > > Yes, we need it.
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > > I would like to help if there is anything I
> can
> > > do.
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > > Thanks,
> > > > > >>>> > > > >> > > > > Toshihiro Suzuki
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <
> > > stack@duboce.net
> > > > >:
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > > accompanying
> > > > > >>>> > helpful
> > > > > >>>> > > > >> > > diagram).
> > > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> > help
> > > > > with
> > > > > >>>> the
> > > > > >>>> > > > >> > > illustration.
> > > > > >>>> > > > >> > > > It
> > > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> > row
> > > > > >>>> only...
> > > > > >>>> > > Writes
> > > > > >>>> > > > >> > > against
> > > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > > row.
> > > > > Tag
> > > > > >>>> an
> > > > > >>>> > mvcc
> > > > > >>>> > > > >> with a
> > > > > >>>> > > > >> > > > 'row'
> > > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > > pertain
> > > > > to
> > > > > >>>> > current
> > > > > >>>> > > > >> > > operation?
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > > > You need to read back the increment and
> have
> > it
> > > > be
> > > > > >>>> > 'correct'
> > > > > >>>> > > > at
> > > > > >>>> > > > >> > > > increment
> > > > > >>>> > > > >> > > > > > time?
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > > > (This is a good one)
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > > >>>> > > > >> > > > > > St.Ack
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > > > > >>>> brfrn169@gmail.com
> > > > > >>>> > >
> > > > > >>>> > > > >> wrote:
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > > > > St.Ack,
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Thank you for your response.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a
> > row
> > > > > lock)
> > > > > >>>> > seems
> > > > > >>>> > > to
> > > > > >>>> > > > >> > occur
> > > > > >>>> > > > >> > > in
> > > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is
> > as
> > > > > >>>> follows:
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > A increment operation has 3 procedures
> for
> > > > MVCC.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > 1.
> > > mvcc.waitForPreviousTransactionsComplete();
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > 2. w =
> > > > > mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > 3.
> mvcc.completeMemstoreInsertWithSeqNum(w,
> > > > > >>>> walKey);
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > I think that
> > MultiVersionConsistencyControl's
> > > > > >>>> writeQueue
> > > > > >>>> > > can
> > > > > >>>> > > > >> > cause
> > > > > >>>> > > > >> > > a
> > > > > >>>> > > > >> > > > > > region
> > > > > >>>> > > > >> > > > > > > lock.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to
> writeQueue.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from
> > > writeQueue.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w,
> walKey)
> > > ->
> > > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > > > >>>> > > advanceMemstore(w)
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> > > > > >>>> beginMemstoreInsert() to
> > > > > >>>> > > > >> writeQueue
> > > > > >>>> > > > >> > > and
> > > > > >>>> > > > >> > > > > > waits
> > > > > >>>> > > > >> > > > > > > until writeQueue is empty or
> > > > > writeQueue.getFirst()
> > > > > >>>> == w.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > I think when a handler thread is
> processing
> > > > > between
> > > > > >>>> > step 2
> > > > > >>>> > > > and
> > > > > >>>> > > > >> > step
> > > > > >>>> > > > >> > > > 3,
> > > > > >>>> > > > >> > > > > > the
> > > > > >>>> > > > >> > > > > > > other handler threads can wait at step 1
> > > until
> > > > > the
> > > > > >>>> > thread
> > > > > >>>> > > > >> > completes
> > > > > >>>> > > > >> > > > > step
> > > > > >>>> > > > >> > > > > > 3
> > > > > >>>> > > > >> > > > > > > This is depicted as follows:
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Actually, in the thread dump of our
> region
> > > > > server,
> > > > > >>>> many
> > > > > >>>> > > > >> handler
> > > > > >>>> > > > >> > > > threads
> > > > > >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait
> at
> > > > Step
> > > > > 1
> > > > > >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Many handler threads wait at this:
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > > counter
> > > > > >>>> > > > post-upgrade?
> > > > > >>>> > > > >> > Is
> > > > > >>>> > > > >> > > it
> > > > > >>>> > > > >> > > > > > > > possible that all these threads are
> > trying
> > > to
> > > > > >>>> get to
> > > > > >>>> > the
> > > > > >>>> > > > >> same
> > > > > >>>> > > > >> > row
> > > > > >>>> > > > >> > > > to
> > > > > >>>> > > > >> > > > > > > update
> > > > > >>>> > > > >> > > > > > > > it? Could the app behavior have
> changed?
> > > Or
> > > > > are
> > > > > >>>> you
> > > > > >>>> > > > >> thinking
> > > > > >>>> > > > >> > > > > increment
> > > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > > >>>> > > > >> > > > > > > We have just upgraded HBase, not changed
> > the
> > > > app
> > > > > >>>> > behavior.
> > > > > >>>> > > > We
> > > > > >>>> > > > >> are
> > > > > >>>> > > > >> > > > > > thinking
> > > > > >>>> > > > >> > > > > > > increment itself has slowed
> significantly.
> > > > > >>>> > > > >> > > > > > > Before upgrading HBase, it was good
> > > throughput
> > > > > and
> > > > > >>>> > > latency.
> > > > > >>>> > > > >> > > > > > > Currently, to cope with this problem, we
> > > split
> > > > > the
> > > > > >>>> > regions
> > > > > >>>> > > > >> > finely.
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Thanks,
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > Toshihiro Suzuki
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> > > > > stack@duboce.net
> > > > > >>>> >:
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > > > >>>> > > brfrn169@gmail.com
> > > > > >>>> > > > >
> > > > > >>>> > > > >> > > wrote:
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > Ted,
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > Thank you for your response.
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace
> to
> > > > Gist.
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > I think that increment operation
> works
> > as
> > > > > >>>> follows:
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > 1. get row lock
> > > > > >>>> > > > >> > > > > > > > > 2.
> > > > mvcc.waitForPreviousTransactionsComplete()
> > > > > >>>> //
> > > > > >>>> > wait
> > > > > >>>> > > > for
> > > > > >>>> > > > >> all
> > > > > >>>> > > > >> > > > prior
> > > > > >>>> > > > >> > > > > > > MVCC
> > > > > >>>> > > > >> > > > > > > > > transactions to finish
> > > > > >>>> > > > >> > > > > > > > > 3.
> mvcc.beginMemstoreInsertWithSeqNum()
> > > //
> > > > > >>>> start a
> > > > > >>>> > > > >> > transaction
> > > > > >>>> > > > >> > > > > > > > > 4. get previous values
> > > > > >>>> > > > >> > > > > > > > > 5. create KVs
> > > > > >>>> > > > >> > > > > > > > > 6. write to Memstore
> > > > > >>>> > > > >> > > > > > > > > 7. write to WAL
> > > > > >>>> > > > >> > > > > > > > > 8. release row lock
> > > > > >>>> > > > >> > > > > > > > > 9.
> > > mvcc.completeMemstoreInsertWithSeqNum()
> > > > //
> > > > > >>>> > complete
> > > > > >>>> > > > the
> > > > > >>>> > > > >> > > > > > transaction
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > A instance of
> > > > MultiVersionConsistencyControl
> > > > > >>>> has a
> > > > > >>>> > > > pending
> > > > > >>>> > > > >> > > queue
> > > > > >>>> > > > >> > > > of
> > > > > >>>> > > > >> > > > > > > > writes
> > > > > >>>> > > > >> > > > > > > > > named writeQueue.
> > > > > >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to
> > writeQueue
> > > > and
> > > > > >>>> waits
> > > > > >>>> > > until
> > > > > >>>> > > > >> > > > writeQueue
> > > > > >>>> > > > >> > > > > > is
> > > > > >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > > >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to
> writeQueue
> > > and
> > > > > >>>> step 9
> > > > > >>>> > > > removes
> > > > > >>>> > > > >> the
> > > > > >>>> > > > >> > > > > > > WriteEntry
> > > > > >>>> > > > >> > > > > > > > > from writeQueue.
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > I think that when a handler thread is
> > > > > >>>> processing
> > > > > >>>> > > between
> > > > > >>>> > > > >> > step 2
> > > > > >>>> > > > >> > > > and
> > > > > >>>> > > > >> > > > > > > step
> > > > > >>>> > > > >> > > > > > > > 9,
> > > > > >>>> > > > >> > > > > > > > > the other handler threads can wait
> > until
> > > > the
> > > > > >>>> thread
> > > > > >>>> > > > >> completes
> > > > > >>>> > > > >> > > > step
> > > > > >>>> > > > >> > > > > 9.
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > That is right. We need to read, after
> all
> > > > > >>>> outstanding
> > > > > >>>> > > > >> updates
> > > > > >>>> > > > >> > are
> > > > > >>>> > > > >> > > > > > done...
> > > > > >>>> > > > >> > > > > > > > because we need to read the latest
> update
> > > > > before
> > > > > >>>> we go
> > > > > >>>> > > to
> > > > > >>>> > > > >> > > > > > > modify/increment
> > > > > >>>> > > > >> > > > > > > > it.
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > How do you make out this?
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems
> to
> > > > occur
> > > > > in
> > > > > >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > In 0.98.x we did this:
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > mvcc.waitForPreviousTransactionsComplete()
> > > > > which
> > > > > >>>> is
> > > > > >>>> > > > this....
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > +  public void
> > > > > >>>> waitForPreviousTransactionsComplete() {
> > > > > >>>> > > > >> > > > > > > > +    WriteEntry w =
> > beginMemstoreInsert();
> > > > > >>>> > > > >> > > > > > > > +
> > > waitForPreviousTransactionsComplete(w);
> > > > > >>>> > > > >> > > > > > > > +  }
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were
> > merged
> > > in
> > > > > >>>> 1.0 (
> > > > > >>>> > > > >> > > > > > > >
> > > > > https://issues.apache.org/jira/browse/HBASE-8763
> > > > > >>>> ).
> > > > > >>>> > > > Previous
> > > > > >>>> > > > >> > mvcc
> > > > > >>>> > > > >> > > > and
> > > > > >>>> > > > >> > > > > > > > region
> > > > > >>>> > > > >> > > > > > > > sequenceid would spin independent of
> each
> > > > > other.
> > > > > >>>> > Perhaps
> > > > > >>>> > > > >> this
> > > > > >>>> > > > >> > > > > > responsible
> > > > > >>>> > > > >> > > > > > > > for some slow down.
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > That said, looking in your thread dump,
> > we
> > > > seem
> > > > > >>>> to be
> > > > > >>>> > > down
> > > > > >>>> > > > >> in
> > > > > >>>> > > > >> > the
> > > > > >>>> > > > >> > > > > Get.
> > > > > >>>> > > > >> > > > > > If
> > > > > >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a
> row,
> > > > where
> > > > > >>>> is the
> > > > > >>>> > > > >> > > lock-holding
> > > > > >>>> > > > >> > > > > > > thread?
> > > > > >>>> > > > >> > > > > > > > In Get or writing Increment... or
> waiting
> > > on
> > > > > >>>> sequence
> > > > > >>>> > > id?
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > > counter
> > > > > >>>> > > > post-upgrade?
> > > > > >>>> > > > >> > Is
> > > > > >>>> > > > >> > > it
> > > > > >>>> > > > >> > > > > > > > possible that all these threads are
> > trying
> > > to
> > > > > >>>> get to
> > > > > >>>> > the
> > > > > >>>> > > > >> same
> > > > > >>>> > > > >> > row
> > > > > >>>> > > > >> > > > to
> > > > > >>>> > > > >> > > > > > > update
> > > > > >>>> > > > >> > > > > > > > it? Could the app behavior have
> changed?
> > > Or
> > > > > are
> > > > > >>>> you
> > > > > >>>> > > > >> thinking
> > > > > >>>> > > > >> > > > > increment
> > > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > St.Ack
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > Thanks,
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > > > >>>> > yuzhihong@gmail.com
> > > > > >>>> > > >:
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the
> > row
> > > > > (not
> > > > > >>>> > > region):
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >     try {
> > > > > >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack
> > > > trace ?
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > Thanks
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM,
> 鈴木俊裕
> > <
> > > > > >>>> > > > >> brfrn169@gmail.com>
> > > > > >>>> > > > >> > > > wrote:
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Hi,
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> > > > > >>>> > CDH5.3.1(HBase0.98.6)
> > > > > >>>> > > > to
> > > > > >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > >>>> > > > >> > > > > > > > > > > and we experience slowdown in
> > > increment
> > > > > >>>> > operation.
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Here's an extract from thread
> dump
> > of
> > > > the
> > > > > >>>> > > > >> RegionServer of
> > > > > >>>> > > > >> > > our
> > > > > >>>> > > > >> > > > > > > > cluster:
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Thread 68
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> > > > > >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > > >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > > >>>> > > > >> > > > > > > > > > >   Blocked on
> > > > > java.util.LinkedList@3474e4b2
> > > > > >>>> > > > >> > > > > > > > > > >   Blocked by 63
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > >
> (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > >>>> > > > >> > > > > > > > > > >   Stack:
> > > > > >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native
> > > > Method)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > >
> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > >
> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > >
> > > > > >>>>
> > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > >  java.lang.Thread.run(Thread.java:745)
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > There are many similar threads in
> > the
> > > > > >>>> thread
> > > > > >>>> > dump.
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > I read the source code and I
> think
> > > this
> > > > > is
> > > > > >>>> > caused
> > > > > >>>> > > by
> > > > > >>>> > > > >> > > changes
> > > > > >>>> > > > >> > > > of
> > > > > >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > > >>>> > > > >> > > > > > > > > > > A region lock (not a row lock)
> > seems
> > > to
> > > > > >>>> occur in
> > > > > >>>> > > > >> > > > > > > > > > >
> > > waitForPreviousTransactionsComplete().
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Also we wrote performance test
> code
> > > for
> > > > > >>>> > increment
> > > > > >>>> > > > >> > operation
> > > > > >>>> > > > >> > > > > that
> > > > > >>>> > > > >> > > > > > > > > included
> > > > > >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local
> > mode.
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > The result is shown below:
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757,
> > Latency(ms):
> > > > > >>>> > > > >> 7.975072509210629
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027,
> > Latency(ms):
> > > > > >>>> > > > 49.11840157868772
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Thanks,
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > > >>>> > > > >> > > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > > >
> > > > > >>>> > > > >> > > > > > > > >
> > > > > >>>> > > > >> > > > > > > >
> > > > > >>>> > > > >> > > > > > >
> > > > > >>>> > > > >> > > > > >
> > > > > >>>> > > > >> > > > >
> > > > > >>>> > > > >> > > >
> > > > > >>>> > > > >> > >
> > > > > >>>> > > > >> >
> > > > > >>>> > > > >>
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

I'll try to get another one.  We are currently not seeing the issue due to
lack of contention (it is off hours for our customers).

Note that the stack trace I gave you was taken with a tool we have which
aggregates common stacks. The one at the bottom occurred 122 times (out of
128 handlers -- this is pre-tuning when we added 1000 handlers and the read
vs write).  So to me it looks like 122 of 128 handlers were waiting on:

if (!existingContext.latch.await(this.rowLockWaitDuration,
TimeUnit.MILLISECONDS)) {
  throw new IOException("Timed out waiting for lock for row: " + rowKey);
}


On Tue, Dec 1, 2015 at 12:08 AM Stack <st...@duboce.net> wrote:

> Looking at that stack trace, nothing showing as blocked or slowed by
> another operation. You have others I could look at Bryan?
> St.Ack
>
> On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > Yea sorry if I was misleading.  The nonce loglines we saw only happened
> on
> > full cluster restart, it may have been the HLog's replaying, not sure.
> >
> > We are still seeing slow Increments. Where Gets and Mutates will be on
> the
> > order of 50-150ms according to metrics, Increment will be in the
> > 1000-5000ms range. It seems we may be blocking on FSHLog#syncer.
> >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
> >
> >
> >
> > On Mon, Nov 30, 2015 at 11:26 PM Stack <st...@duboce.net> wrote:
> >
> > > Still slow increments though?
> > >
> > > On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com
> > > > wrote:
> > >
> > > > Those log lines have settled down, they may have been related to a
> > > > cluster-wide forced restart at the time.
> > > >
> > > > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com>
> > > > wrote:
> > > >
> > > > > We've been doing more debugging of this and have set up the read vs
> > > write
> > > > > handlers to try to at least segment this away so reads can work. We
> > > have
> > > > > pretty beefy servers, and are running wiht the following settings:
> > > > >
> > > > > hbase.regionserver.handler.count=1000
> > > > > hbase.ipc.server.read.threadpool.size=50
> > > > > hbase.ipc.server.callqueue.handler.factor=0.025
> > > > > hbase.ipc.server.callqueue.read.ratio=0.6
> > > > > hbase.ipc.server.callqueue.scan.ratio=0.5
> > > > >
> > > > > We are seeing all 400 write handlers taken up by row locks for the
> > most
> > > > > part. The read handlers are mostly idle. We're thinking of changing
> > the
> > > > > ratio here, but are not sure it will help if they are all blocked
> on
> > a
> > > > row
> > > > > lock.  We just enabled DEBUG logging on all our servers and notice
> > the
> > > > > following:
> > > > >
> > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-687451119961178644:7664336281906118656], [state 0,
> > hasWait
> > > > > false, activity 00:54:36.240]
> > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-687451119961178644:-7119840249342174227], [state 0,
> > hasWait
> > > > > false, activity 00:54:36.256]
> > > > > 2015-12-01 00:56:09,268 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-5946137511131403479:2112661701888365489], [state 0,
> > hasWait
> > > > > false, activity 00:55:01.259]
> > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [4165332617675853029:6256955295384472057], [state 0,
> > hasWait
> > > > > false, activity 00:53:58.151]
> > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [4165332617675853029:4961178013070912522], [state 0,
> > hasWait
> > > > > false, activity 00:53:58.162]
> > > > >
> > > > >
> > > > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > > > > bbeaudreault@hubspot.com> wrote:
> > > > >
> > > > >> Sorry the second link should be
> > > > >>
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> > > > >>
> > > > >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> > > > >> bbeaudreault@hubspot.com> wrote:
> > > > >>
> > > > >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> > > > >>>
> > > > >>> An active handler:
> > > > >>>
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > > > >>> One that is locked:
> > > > >>>
> > > >
> > >
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> > > > >>>
> > > > >>> The difference between pre-rollback and post is that previously
> we
> > > were
> > > > >>> seeing things blocked in mvcc.  Now we are seeing them blocked on
> > the
> > > > >>> upsert.
> > > > >>>
> > > > >>> It always follows the same pattern, of 1 active handler in the
> > upsert
> > > > >>> and the rest blocked waiting for it.
> > > > >>>
> > > > >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> > > > >>>
> > > > >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > > > >>>> bbeaudreault@hubspot.com
> > > > >>>> > wrote:
> > > > >>>>
> > > > >>>> > The rollback seems to have mostly solved the issue for one of
> > our
> > > > >>>> clusters,
> > > > >>>> > but another one is still seeing long increment times:
> > > > >>>> >
> > > > >>>> > "slowIncrementCount": 52080,
> > > > >>>> > "Increment_num_ops": 325236,"Increment_min":
> 1,"Increment_max":
> > > > 6162,"
> > > > >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > > > >>>> > Increment_75th_percentile":
> 450.25,"Increment_95th_percentile":
> > > > >>>> > 1052.6499999999999,"Increment_99th_percentile":
> > 1635.2399999999998
> > > > >>>> >
> > > > >>>> >
> > > > >>>> > Any ideas if there are other changes that may be causing a
> > > > performance
> > > > >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> > > > >>>> >
> > > > >>>> >
> > > > >>>> >
> > > > >>>> No.
> > > > >>>>
> > > > >>>> Post a thread dump Bryan and it might prompt something.
> > > > >>>>
> > > > >>>> St.Ack
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >
> > > > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net>
> wrote:
> > > > >>>> >
> > > > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > > >>>> > > bbeaudreault@hubspot.com> wrote:
> > > > >>>> > >
> > > > >>>> > > > Should this be added as a known issue in the CDH or hbase
> > > > >>>> > documentation?
> > > > >>>> > > It
> > > > >>>> > > > was a severe performance hit for us, all of our
> > regionservers
> > > > were
> > > > >>>> > > sitting
> > > > >>>> > > > at a few thousand queued requests.
> > > > >>>> > > >
> > > > >>>> > > >
> > > > >>>> > > Let me take care of that.
> > > > >>>> > > St.Ack
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > >>>> > > > bbeaudreault@hubspot.com>
> > > > >>>> > > > wrote:
> > > > >>>> > > >
> > > > >>>> > > > > Yea, they are all over the place and called from client
> > and
> > > > >>>> > coprocessor
> > > > >>>> > > > > code. We ended up having no other option but to
> rollback,
> > > and
> > > > >>>> aside
> > > > >>>> > > from
> > > > >>>> > > > a
> > > > >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > > > >>>> Put#addColumn),
> > > > >>>> > > it
> > > > >>>> > > > > seems to be working and fixing our problem.
> > > > >>>> > > > >
> > > > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <stack@duboce.net
> >
> > > > wrote:
> > > > >>>> > > > >
> > > > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to
> work
> > on
> > > > >>>> this
> > > > >>>> > now.
> > > > >>>> > > > >> Where
> > > > >>>> > > > >> are your counters Bryan? In their own column family or
> > > > >>>> scattered
> > > > >>>> > about
> > > > >>>> > > > in
> > > > >>>> > > > >> a
> > > > >>>> > > > >> row with other Cell types?
> > > > >>>> > > > >> St.Ack
> > > > >>>> > > > >>
> > > > >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> > > > >>>> > > > >>
> > > > >>>> > > > >> > Is there any update to this? We just upgraded all of
> > our
> > > > >>>> > production
> > > > >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this
> > JIRA
> > > > >>>> listed in
> > > > >>>> > > the
> > > > >>>> > > > >> > known issues, did not not about this.  Now we are
> > seeing
> > > > >>>> > perfomance
> > > > >>>> > > > >> issues
> > > > >>>> > > > >> > across all clusters, as we make heavy use of
> > increments.
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only
> > hope
> > > > to
> > > > >>>> roll
> > > > >>>> > > back
> > > > >>>> > > > >> to
> > > > >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> > > > >>>> > > > >> >
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <
> > brfrn169@gmail.com
> > > >
> > > > >>>> wrote:
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > > Thank you St.Ack!
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > I would like to follow the ticket.
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > Toshihiro Suzuki
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <stack@duboce.net
> >:
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > > Back to this problem. Simple tests confirm that
> as
> > > is,
> > > > >>>> the
> > > > >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region
> > ops
> > > > if
> > > > >>>> some
> > > > >>>> > > > other
> > > > >>>> > > > >> row
> > > > >>>> > > > >> > > is
> > > > >>>> > > > >> > > > slow to complete. In particular Increment,
> > > checkAndPut,
> > > > >>>> and
> > > > >>>> > > batch
> > > > >>>> > > > >> > > mutations
> > > > >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in
> on a
> > > fix
> > > > >>>> up.
> > > > >>>> > Lets
> > > > >>>> > > > >> see if
> > > > >>>> > > > >> > > we
> > > > >>>> > > > >> > > > can somehow scope mvcc to row or at least shard
> > mvcc
> > > so
> > > > >>>> not
> > > > >>>> > all
> > > > >>>> > > > >> Region
> > > > >>>> > > > >> > > ops
> > > > >>>> > > > >> > > > are paused.
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > St.Ack
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> > > > >>>> brfrn169@gmail.com>
> > > > >>>> > > wrote:
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > accompanying
> > > > >>>> > helpful
> > > > >>>> > > > >> > > diagram).
> > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> help
> > > > with
> > > > >>>> the
> > > > >>>> > > > >> > > illustration.
> > > > >>>> > > > >> > > > It
> > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> row
> > > > >>>> only...
> > > > >>>> > > Writes
> > > > >>>> > > > >> > > against
> > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > row.
> > > > Tag
> > > > >>>> an
> > > > >>>> > mvcc
> > > > >>>> > > > >> with a
> > > > >>>> > > > >> > > > 'row'
> > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > pertain
> > > > to
> > > > >>>> > current
> > > > >>>> > > > >> > > operation?
> > > > >>>> > > > >> > > > > Thank you St.Ack! I think this approach would
> > work.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > > You need to read back the increment and have
> it
> > > be
> > > > >>>> > 'correct'
> > > > >>>> > > > at
> > > > >>>> > > > >> > > > increment
> > > > >>>> > > > >> > > > > > time?
> > > > >>>> > > > >> > > > > Yes, we need it.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > I would like to help if there is anything I can
> > do.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > Thanks,
> > > > >>>> > > > >> > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <
> > stack@duboce.net
> > > >:
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > accompanying
> > > > >>>> > helpful
> > > > >>>> > > > >> > > diagram).
> > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> help
> > > > with
> > > > >>>> the
> > > > >>>> > > > >> > > illustration.
> > > > >>>> > > > >> > > > It
> > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> row
> > > > >>>> only...
> > > > >>>> > > Writes
> > > > >>>> > > > >> > > against
> > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > row.
> > > > Tag
> > > > >>>> an
> > > > >>>> > mvcc
> > > > >>>> > > > >> with a
> > > > >>>> > > > >> > > > 'row'
> > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > pertain
> > > > to
> > > > >>>> > current
> > > > >>>> > > > >> > > operation?
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > You need to read back the increment and have
> it
> > > be
> > > > >>>> > 'correct'
> > > > >>>> > > > at
> > > > >>>> > > > >> > > > increment
> > > > >>>> > > > >> > > > > > time?
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > (This is a good one)
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > St.Ack
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > > > >>>> brfrn169@gmail.com
> > > > >>>> > >
> > > > >>>> > > > >> wrote:
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > > St.Ack,
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Thank you for your response.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a
> row
> > > > lock)
> > > > >>>> > seems
> > > > >>>> > > to
> > > > >>>> > > > >> > occur
> > > > >>>> > > > >> > > in
> > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is
> as
> > > > >>>> follows:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > A increment operation has 3 procedures for
> > > MVCC.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 1.
> > mvcc.waitForPreviousTransactionsComplete();
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 2. w =
> > > > mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> > > > >>>> walKey);
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > I think that
> MultiVersionConsistencyControl's
> > > > >>>> writeQueue
> > > > >>>> > > can
> > > > >>>> > > > >> > cause
> > > > >>>> > > > >> > > a
> > > > >>>> > > > >> > > > > > region
> > > > >>>> > > > >> > > > > > > lock.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from
> > writeQueue.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey)
> > ->
> > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > > >>>> > > advanceMemstore(w)
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> > > > >>>> beginMemstoreInsert() to
> > > > >>>> > > > >> writeQueue
> > > > >>>> > > > >> > > and
> > > > >>>> > > > >> > > > > > waits
> > > > >>>> > > > >> > > > > > > until writeQueue is empty or
> > > > writeQueue.getFirst()
> > > > >>>> == w.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > I think when a handler thread is processing
> > > > between
> > > > >>>> > step 2
> > > > >>>> > > > and
> > > > >>>> > > > >> > step
> > > > >>>> > > > >> > > > 3,
> > > > >>>> > > > >> > > > > > the
> > > > >>>> > > > >> > > > > > > other handler threads can wait at step 1
> > until
> > > > the
> > > > >>>> > thread
> > > > >>>> > > > >> > completes
> > > > >>>> > > > >> > > > > step
> > > > >>>> > > > >> > > > > > 3
> > > > >>>> > > > >> > > > > > > This is depicted as follows:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Actually, in the thread dump of our region
> > > > server,
> > > > >>>> many
> > > > >>>> > > > >> handler
> > > > >>>> > > > >> > > > threads
> > > > >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at
> > > Step
> > > > 1
> > > > >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Many handler threads wait at this:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > counter
> > > > >>>> > > > post-upgrade?
> > > > >>>> > > > >> > Is
> > > > >>>> > > > >> > > it
> > > > >>>> > > > >> > > > > > > > possible that all these threads are
> trying
> > to
> > > > >>>> get to
> > > > >>>> > the
> > > > >>>> > > > >> same
> > > > >>>> > > > >> > row
> > > > >>>> > > > >> > > > to
> > > > >>>> > > > >> > > > > > > update
> > > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> > Or
> > > > are
> > > > >>>> you
> > > > >>>> > > > >> thinking
> > > > >>>> > > > >> > > > > increment
> > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > >>>> > > > >> > > > > > > We have just upgraded HBase, not changed
> the
> > > app
> > > > >>>> > behavior.
> > > > >>>> > > > We
> > > > >>>> > > > >> are
> > > > >>>> > > > >> > > > > > thinking
> > > > >>>> > > > >> > > > > > > increment itself has slowed significantly.
> > > > >>>> > > > >> > > > > > > Before upgrading HBase, it was good
> > throughput
> > > > and
> > > > >>>> > > latency.
> > > > >>>> > > > >> > > > > > > Currently, to cope with this problem, we
> > split
> > > > the
> > > > >>>> > regions
> > > > >>>> > > > >> > finely.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> > > > stack@duboce.net
> > > > >>>> >:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > > >>>> > > brfrn169@gmail.com
> > > > >>>> > > > >
> > > > >>>> > > > >> > > wrote:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Ted,
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Thank you for your response.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to
> > > Gist.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I think that increment operation works
> as
> > > > >>>> follows:
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > 1. get row lock
> > > > >>>> > > > >> > > > > > > > > 2.
> > > mvcc.waitForPreviousTransactionsComplete()
> > > > >>>> //
> > > > >>>> > wait
> > > > >>>> > > > for
> > > > >>>> > > > >> all
> > > > >>>> > > > >> > > > prior
> > > > >>>> > > > >> > > > > > > MVCC
> > > > >>>> > > > >> > > > > > > > > transactions to finish
> > > > >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum()
> > //
> > > > >>>> start a
> > > > >>>> > > > >> > transaction
> > > > >>>> > > > >> > > > > > > > > 4. get previous values
> > > > >>>> > > > >> > > > > > > > > 5. create KVs
> > > > >>>> > > > >> > > > > > > > > 6. write to Memstore
> > > > >>>> > > > >> > > > > > > > > 7. write to WAL
> > > > >>>> > > > >> > > > > > > > > 8. release row lock
> > > > >>>> > > > >> > > > > > > > > 9.
> > mvcc.completeMemstoreInsertWithSeqNum()
> > > //
> > > > >>>> > complete
> > > > >>>> > > > the
> > > > >>>> > > > >> > > > > > transaction
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > A instance of
> > > MultiVersionConsistencyControl
> > > > >>>> has a
> > > > >>>> > > > pending
> > > > >>>> > > > >> > > queue
> > > > >>>> > > > >> > > > of
> > > > >>>> > > > >> > > > > > > > writes
> > > > >>>> > > > >> > > > > > > > > named writeQueue.
> > > > >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to
> writeQueue
> > > and
> > > > >>>> waits
> > > > >>>> > > until
> > > > >>>> > > > >> > > > writeQueue
> > > > >>>> > > > >> > > > > > is
> > > > >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue
> > and
> > > > >>>> step 9
> > > > >>>> > > > removes
> > > > >>>> > > > >> the
> > > > >>>> > > > >> > > > > > > WriteEntry
> > > > >>>> > > > >> > > > > > > > > from writeQueue.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I think that when a handler thread is
> > > > >>>> processing
> > > > >>>> > > between
> > > > >>>> > > > >> > step 2
> > > > >>>> > > > >> > > > and
> > > > >>>> > > > >> > > > > > > step
> > > > >>>> > > > >> > > > > > > > 9,
> > > > >>>> > > > >> > > > > > > > > the other handler threads can wait
> until
> > > the
> > > > >>>> thread
> > > > >>>> > > > >> completes
> > > > >>>> > > > >> > > > step
> > > > >>>> > > > >> > > > > 9.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > That is right. We need to read, after all
> > > > >>>> outstanding
> > > > >>>> > > > >> updates
> > > > >>>> > > > >> > are
> > > > >>>> > > > >> > > > > > done...
> > > > >>>> > > > >> > > > > > > > because we need to read the latest update
> > > > before
> > > > >>>> we go
> > > > >>>> > > to
> > > > >>>> > > > >> > > > > > > modify/increment
> > > > >>>> > > > >> > > > > > > > it.
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > How do you make out this?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to
> > > occur
> > > > in
> > > > >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > In 0.98.x we did this:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> mvcc.waitForPreviousTransactionsComplete()
> > > > which
> > > > >>>> is
> > > > >>>> > > > this....
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > +  public void
> > > > >>>> waitForPreviousTransactionsComplete() {
> > > > >>>> > > > >> > > > > > > > +    WriteEntry w =
> beginMemstoreInsert();
> > > > >>>> > > > >> > > > > > > > +
> > waitForPreviousTransactionsComplete(w);
> > > > >>>> > > > >> > > > > > > > +  }
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were
> merged
> > in
> > > > >>>> 1.0 (
> > > > >>>> > > > >> > > > > > > >
> > > > https://issues.apache.org/jira/browse/HBASE-8763
> > > > >>>> ).
> > > > >>>> > > > Previous
> > > > >>>> > > > >> > mvcc
> > > > >>>> > > > >> > > > and
> > > > >>>> > > > >> > > > > > > > region
> > > > >>>> > > > >> > > > > > > > sequenceid would spin independent of each
> > > > other.
> > > > >>>> > Perhaps
> > > > >>>> > > > >> this
> > > > >>>> > > > >> > > > > > responsible
> > > > >>>> > > > >> > > > > > > > for some slow down.
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > That said, looking in your thread dump,
> we
> > > seem
> > > > >>>> to be
> > > > >>>> > > down
> > > > >>>> > > > >> in
> > > > >>>> > > > >> > the
> > > > >>>> > > > >> > > > > Get.
> > > > >>>> > > > >> > > > > > If
> > > > >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row,
> > > where
> > > > >>>> is the
> > > > >>>> > > > >> > > lock-holding
> > > > >>>> > > > >> > > > > > > thread?
> > > > >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting
> > on
> > > > >>>> sequence
> > > > >>>> > > id?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > counter
> > > > >>>> > > > post-upgrade?
> > > > >>>> > > > >> > Is
> > > > >>>> > > > >> > > it
> > > > >>>> > > > >> > > > > > > > possible that all these threads are
> trying
> > to
> > > > >>>> get to
> > > > >>>> > the
> > > > >>>> > > > >> same
> > > > >>>> > > > >> > row
> > > > >>>> > > > >> > > > to
> > > > >>>> > > > >> > > > > > > update
> > > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> > Or
> > > > are
> > > > >>>> you
> > > > >>>> > > > >> thinking
> > > > >>>> > > > >> > > > > increment
> > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > St.Ack
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > > >>>> > yuzhihong@gmail.com
> > > > >>>> > > >:
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the
> row
> > > > (not
> > > > >>>> > > region):
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >     try {
> > > > >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack
> > > trace ?
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > Thanks
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕
> <
> > > > >>>> > > > >> brfrn169@gmail.com>
> > > > >>>> > > > >> > > > wrote:
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Hi,
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> > > > >>>> > CDH5.3.1(HBase0.98.6)
> > > > >>>> > > > to
> > > > >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >>>> > > > >> > > > > > > > > > > and we experience slowdown in
> > increment
> > > > >>>> > operation.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump
> of
> > > the
> > > > >>>> > > > >> RegionServer of
> > > > >>>> > > > >> > > our
> > > > >>>> > > > >> > > > > > > > cluster:
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Thread 68
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> > > > >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > >>>> > > > >> > > > > > > > > > >   Blocked on
> > > > java.util.LinkedList@3474e4b2
> > > > >>>> > > > >> > > > > > > > > > >   Blocked by 63
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > >>>> > > > >> > > > > > > > > > >   Stack:
> > > > >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native
> > > Method)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> >
> > > > >>>>
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > >
> > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > >
> > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > >
> > > > >>>>
> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > >>>> > > > >> > > > > > > > > > >
> > >  java.lang.Thread.run(Thread.java:745)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > There are many similar threads in
> the
> > > > >>>> thread
> > > > >>>> > dump.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > I read the source code and I think
> > this
> > > > is
> > > > >>>> > caused
> > > > >>>> > > by
> > > > >>>> > > > >> > > changes
> > > > >>>> > > > >> > > > of
> > > > >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > >>>> > > > >> > > > > > > > > > > A region lock (not a row lock)
> seems
> > to
> > > > >>>> occur in
> > > > >>>> > > > >> > > > > > > > > > >
> > waitForPreviousTransactionsComplete().
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Also we wrote performance test code
> > for
> > > > >>>> > increment
> > > > >>>> > > > >> > operation
> > > > >>>> > > > >> > > > > that
> > > > >>>> > > > >> > > > > > > > > included
> > > > >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local
> mode.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > The result is shown below:
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757,
> Latency(ms):
> > > > >>>> > > > >> 7.975072509210629
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027,
> Latency(ms):
> > > > >>>> > > > 49.11840157868772
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

I'll try to get another one.  We are currently not seeing the issue due to
lack of contention (it is off hours for our customers).

Note that the stack trace I gave you was taken with a tool we have which
aggregates common stacks. The one at the bottom occurred 122 times (out of
128 handlers -- this is pre-tuning when we added 1000 handlers and the read
vs write).  So to me it looks like 122 of 128 handlers were waiting on:

if (!existingContext.latch.await(this.rowLockWaitDuration,
TimeUnit.MILLISECONDS)) {
  throw new IOException("Timed out waiting for lock for row: " + rowKey);
}


On Tue, Dec 1, 2015 at 12:08 AM Stack <st...@duboce.net> wrote:

> Looking at that stack trace, nothing showing as blocked or slowed by
> another operation. You have others I could look at Bryan?
> St.Ack
>
> On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > Yea sorry if I was misleading.  The nonce loglines we saw only happened
> on
> > full cluster restart, it may have been the HLog's replaying, not sure.
> >
> > We are still seeing slow Increments. Where Gets and Mutates will be on
> the
> > order of 50-150ms according to metrics, Increment will be in the
> > 1000-5000ms range. It seems we may be blocking on FSHLog#syncer.
> >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
> >
> >
> >
> > On Mon, Nov 30, 2015 at 11:26 PM Stack <st...@duboce.net> wrote:
> >
> > > Still slow increments though?
> > >
> > > On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com
> > > > wrote:
> > >
> > > > Those log lines have settled down, they may have been related to a
> > > > cluster-wide forced restart at the time.
> > > >
> > > > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com>
> > > > wrote:
> > > >
> > > > > We've been doing more debugging of this and have set up the read vs
> > > write
> > > > > handlers to try to at least segment this away so reads can work. We
> > > have
> > > > > pretty beefy servers, and are running wiht the following settings:
> > > > >
> > > > > hbase.regionserver.handler.count=1000
> > > > > hbase.ipc.server.read.threadpool.size=50
> > > > > hbase.ipc.server.callqueue.handler.factor=0.025
> > > > > hbase.ipc.server.callqueue.read.ratio=0.6
> > > > > hbase.ipc.server.callqueue.scan.ratio=0.5
> > > > >
> > > > > We are seeing all 400 write handlers taken up by row locks for the
> > most
> > > > > part. The read handlers are mostly idle. We're thinking of changing
> > the
> > > > > ratio here, but are not sure it will help if they are all blocked
> on
> > a
> > > > row
> > > > > lock.  We just enabled DEBUG logging on all our servers and notice
> > the
> > > > > following:
> > > > >
> > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-687451119961178644:7664336281906118656], [state 0,
> > hasWait
> > > > > false, activity 00:54:36.240]
> > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-687451119961178644:-7119840249342174227], [state 0,
> > hasWait
> > > > > false, activity 00:54:36.256]
> > > > > 2015-12-01 00:56:09,268 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-5946137511131403479:2112661701888365489], [state 0,
> > hasWait
> > > > > false, activity 00:55:01.259]
> > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [4165332617675853029:6256955295384472057], [state 0,
> > hasWait
> > > > > false, activity 00:53:58.151]
> > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [4165332617675853029:4961178013070912522], [state 0,
> > hasWait
> > > > > false, activity 00:53:58.162]
> > > > >
> > > > >
> > > > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > > > > bbeaudreault@hubspot.com> wrote:
> > > > >
> > > > >> Sorry the second link should be
> > > > >>
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> > > > >>
> > > > >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> > > > >> bbeaudreault@hubspot.com> wrote:
> > > > >>
> > > > >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> > > > >>>
> > > > >>> An active handler:
> > > > >>>
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > > > >>> One that is locked:
> > > > >>>
> > > >
> > >
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> > > > >>>
> > > > >>> The difference between pre-rollback and post is that previously
> we
> > > were
> > > > >>> seeing things blocked in mvcc.  Now we are seeing them blocked on
> > the
> > > > >>> upsert.
> > > > >>>
> > > > >>> It always follows the same pattern, of 1 active handler in the
> > upsert
> > > > >>> and the rest blocked waiting for it.
> > > > >>>
> > > > >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> > > > >>>
> > > > >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > > > >>>> bbeaudreault@hubspot.com
> > > > >>>> > wrote:
> > > > >>>>
> > > > >>>> > The rollback seems to have mostly solved the issue for one of
> > our
> > > > >>>> clusters,
> > > > >>>> > but another one is still seeing long increment times:
> > > > >>>> >
> > > > >>>> > "slowIncrementCount": 52080,
> > > > >>>> > "Increment_num_ops": 325236,"Increment_min":
> 1,"Increment_max":
> > > > 6162,"
> > > > >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > > > >>>> > Increment_75th_percentile":
> 450.25,"Increment_95th_percentile":
> > > > >>>> > 1052.6499999999999,"Increment_99th_percentile":
> > 1635.2399999999998
> > > > >>>> >
> > > > >>>> >
> > > > >>>> > Any ideas if there are other changes that may be causing a
> > > > performance
> > > > >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> > > > >>>> >
> > > > >>>> >
> > > > >>>> >
> > > > >>>> No.
> > > > >>>>
> > > > >>>> Post a thread dump Bryan and it might prompt something.
> > > > >>>>
> > > > >>>> St.Ack
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >
> > > > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net>
> wrote:
> > > > >>>> >
> > > > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > > >>>> > > bbeaudreault@hubspot.com> wrote:
> > > > >>>> > >
> > > > >>>> > > > Should this be added as a known issue in the CDH or hbase
> > > > >>>> > documentation?
> > > > >>>> > > It
> > > > >>>> > > > was a severe performance hit for us, all of our
> > regionservers
> > > > were
> > > > >>>> > > sitting
> > > > >>>> > > > at a few thousand queued requests.
> > > > >>>> > > >
> > > > >>>> > > >
> > > > >>>> > > Let me take care of that.
> > > > >>>> > > St.Ack
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > >>>> > > > bbeaudreault@hubspot.com>
> > > > >>>> > > > wrote:
> > > > >>>> > > >
> > > > >>>> > > > > Yea, they are all over the place and called from client
> > and
> > > > >>>> > coprocessor
> > > > >>>> > > > > code. We ended up having no other option but to
> rollback,
> > > and
> > > > >>>> aside
> > > > >>>> > > from
> > > > >>>> > > > a
> > > > >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > > > >>>> Put#addColumn),
> > > > >>>> > > it
> > > > >>>> > > > > seems to be working and fixing our problem.
> > > > >>>> > > > >
> > > > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <stack@duboce.net
> >
> > > > wrote:
> > > > >>>> > > > >
> > > > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to
> work
> > on
> > > > >>>> this
> > > > >>>> > now.
> > > > >>>> > > > >> Where
> > > > >>>> > > > >> are your counters Bryan? In their own column family or
> > > > >>>> scattered
> > > > >>>> > about
> > > > >>>> > > > in
> > > > >>>> > > > >> a
> > > > >>>> > > > >> row with other Cell types?
> > > > >>>> > > > >> St.Ack
> > > > >>>> > > > >>
> > > > >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> > > > >>>> > > > >>
> > > > >>>> > > > >> > Is there any update to this? We just upgraded all of
> > our
> > > > >>>> > production
> > > > >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this
> > JIRA
> > > > >>>> listed in
> > > > >>>> > > the
> > > > >>>> > > > >> > known issues, did not not about this.  Now we are
> > seeing
> > > > >>>> > perfomance
> > > > >>>> > > > >> issues
> > > > >>>> > > > >> > across all clusters, as we make heavy use of
> > increments.
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only
> > hope
> > > > to
> > > > >>>> roll
> > > > >>>> > > back
> > > > >>>> > > > >> to
> > > > >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> > > > >>>> > > > >> >
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <
> > brfrn169@gmail.com
> > > >
> > > > >>>> wrote:
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > > Thank you St.Ack!
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > I would like to follow the ticket.
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > Toshihiro Suzuki
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <stack@duboce.net
> >:
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > > Back to this problem. Simple tests confirm that
> as
> > > is,
> > > > >>>> the
> > > > >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region
> > ops
> > > > if
> > > > >>>> some
> > > > >>>> > > > other
> > > > >>>> > > > >> row
> > > > >>>> > > > >> > > is
> > > > >>>> > > > >> > > > slow to complete. In particular Increment,
> > > checkAndPut,
> > > > >>>> and
> > > > >>>> > > batch
> > > > >>>> > > > >> > > mutations
> > > > >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in
> on a
> > > fix
> > > > >>>> up.
> > > > >>>> > Lets
> > > > >>>> > > > >> see if
> > > > >>>> > > > >> > > we
> > > > >>>> > > > >> > > > can somehow scope mvcc to row or at least shard
> > mvcc
> > > so
> > > > >>>> not
> > > > >>>> > all
> > > > >>>> > > > >> Region
> > > > >>>> > > > >> > > ops
> > > > >>>> > > > >> > > > are paused.
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > St.Ack
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> > > > >>>> brfrn169@gmail.com>
> > > > >>>> > > wrote:
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > accompanying
> > > > >>>> > helpful
> > > > >>>> > > > >> > > diagram).
> > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> help
> > > > with
> > > > >>>> the
> > > > >>>> > > > >> > > illustration.
> > > > >>>> > > > >> > > > It
> > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> row
> > > > >>>> only...
> > > > >>>> > > Writes
> > > > >>>> > > > >> > > against
> > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > row.
> > > > Tag
> > > > >>>> an
> > > > >>>> > mvcc
> > > > >>>> > > > >> with a
> > > > >>>> > > > >> > > > 'row'
> > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > pertain
> > > > to
> > > > >>>> > current
> > > > >>>> > > > >> > > operation?
> > > > >>>> > > > >> > > > > Thank you St.Ack! I think this approach would
> > work.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > > You need to read back the increment and have
> it
> > > be
> > > > >>>> > 'correct'
> > > > >>>> > > > at
> > > > >>>> > > > >> > > > increment
> > > > >>>> > > > >> > > > > > time?
> > > > >>>> > > > >> > > > > Yes, we need it.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > I would like to help if there is anything I can
> > do.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > Thanks,
> > > > >>>> > > > >> > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <
> > stack@duboce.net
> > > >:
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > accompanying
> > > > >>>> > helpful
> > > > >>>> > > > >> > > diagram).
> > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> help
> > > > with
> > > > >>>> the
> > > > >>>> > > > >> > > illustration.
> > > > >>>> > > > >> > > > It
> > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> row
> > > > >>>> only...
> > > > >>>> > > Writes
> > > > >>>> > > > >> > > against
> > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > row.
> > > > Tag
> > > > >>>> an
> > > > >>>> > mvcc
> > > > >>>> > > > >> with a
> > > > >>>> > > > >> > > > 'row'
> > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > pertain
> > > > to
> > > > >>>> > current
> > > > >>>> > > > >> > > operation?
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > You need to read back the increment and have
> it
> > > be
> > > > >>>> > 'correct'
> > > > >>>> > > > at
> > > > >>>> > > > >> > > > increment
> > > > >>>> > > > >> > > > > > time?
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > (This is a good one)
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > St.Ack
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > > > >>>> brfrn169@gmail.com
> > > > >>>> > >
> > > > >>>> > > > >> wrote:
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > > St.Ack,
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Thank you for your response.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a
> row
> > > > lock)
> > > > >>>> > seems
> > > > >>>> > > to
> > > > >>>> > > > >> > occur
> > > > >>>> > > > >> > > in
> > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is
> as
> > > > >>>> follows:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > A increment operation has 3 procedures for
> > > MVCC.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 1.
> > mvcc.waitForPreviousTransactionsComplete();
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 2. w =
> > > > mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> > > > >>>> walKey);
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > I think that
> MultiVersionConsistencyControl's
> > > > >>>> writeQueue
> > > > >>>> > > can
> > > > >>>> > > > >> > cause
> > > > >>>> > > > >> > > a
> > > > >>>> > > > >> > > > > > region
> > > > >>>> > > > >> > > > > > > lock.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from
> > writeQueue.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey)
> > ->
> > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > > >>>> > > advanceMemstore(w)
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> > > > >>>> beginMemstoreInsert() to
> > > > >>>> > > > >> writeQueue
> > > > >>>> > > > >> > > and
> > > > >>>> > > > >> > > > > > waits
> > > > >>>> > > > >> > > > > > > until writeQueue is empty or
> > > > writeQueue.getFirst()
> > > > >>>> == w.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > I think when a handler thread is processing
> > > > between
> > > > >>>> > step 2
> > > > >>>> > > > and
> > > > >>>> > > > >> > step
> > > > >>>> > > > >> > > > 3,
> > > > >>>> > > > >> > > > > > the
> > > > >>>> > > > >> > > > > > > other handler threads can wait at step 1
> > until
> > > > the
> > > > >>>> > thread
> > > > >>>> > > > >> > completes
> > > > >>>> > > > >> > > > > step
> > > > >>>> > > > >> > > > > > 3
> > > > >>>> > > > >> > > > > > > This is depicted as follows:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Actually, in the thread dump of our region
> > > > server,
> > > > >>>> many
> > > > >>>> > > > >> handler
> > > > >>>> > > > >> > > > threads
> > > > >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at
> > > Step
> > > > 1
> > > > >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Many handler threads wait at this:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > counter
> > > > >>>> > > > post-upgrade?
> > > > >>>> > > > >> > Is
> > > > >>>> > > > >> > > it
> > > > >>>> > > > >> > > > > > > > possible that all these threads are
> trying
> > to
> > > > >>>> get to
> > > > >>>> > the
> > > > >>>> > > > >> same
> > > > >>>> > > > >> > row
> > > > >>>> > > > >> > > > to
> > > > >>>> > > > >> > > > > > > update
> > > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> > Or
> > > > are
> > > > >>>> you
> > > > >>>> > > > >> thinking
> > > > >>>> > > > >> > > > > increment
> > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > >>>> > > > >> > > > > > > We have just upgraded HBase, not changed
> the
> > > app
> > > > >>>> > behavior.
> > > > >>>> > > > We
> > > > >>>> > > > >> are
> > > > >>>> > > > >> > > > > > thinking
> > > > >>>> > > > >> > > > > > > increment itself has slowed significantly.
> > > > >>>> > > > >> > > > > > > Before upgrading HBase, it was good
> > throughput
> > > > and
> > > > >>>> > > latency.
> > > > >>>> > > > >> > > > > > > Currently, to cope with this problem, we
> > split
> > > > the
> > > > >>>> > regions
> > > > >>>> > > > >> > finely.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> > > > stack@duboce.net
> > > > >>>> >:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > > >>>> > > brfrn169@gmail.com
> > > > >>>> > > > >
> > > > >>>> > > > >> > > wrote:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Ted,
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Thank you for your response.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to
> > > Gist.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I think that increment operation works
> as
> > > > >>>> follows:
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > 1. get row lock
> > > > >>>> > > > >> > > > > > > > > 2.
> > > mvcc.waitForPreviousTransactionsComplete()
> > > > >>>> //
> > > > >>>> > wait
> > > > >>>> > > > for
> > > > >>>> > > > >> all
> > > > >>>> > > > >> > > > prior
> > > > >>>> > > > >> > > > > > > MVCC
> > > > >>>> > > > >> > > > > > > > > transactions to finish
> > > > >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum()
> > //
> > > > >>>> start a
> > > > >>>> > > > >> > transaction
> > > > >>>> > > > >> > > > > > > > > 4. get previous values
> > > > >>>> > > > >> > > > > > > > > 5. create KVs
> > > > >>>> > > > >> > > > > > > > > 6. write to Memstore
> > > > >>>> > > > >> > > > > > > > > 7. write to WAL
> > > > >>>> > > > >> > > > > > > > > 8. release row lock
> > > > >>>> > > > >> > > > > > > > > 9.
> > mvcc.completeMemstoreInsertWithSeqNum()
> > > //
> > > > >>>> > complete
> > > > >>>> > > > the
> > > > >>>> > > > >> > > > > > transaction
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > A instance of
> > > MultiVersionConsistencyControl
> > > > >>>> has a
> > > > >>>> > > > pending
> > > > >>>> > > > >> > > queue
> > > > >>>> > > > >> > > > of
> > > > >>>> > > > >> > > > > > > > writes
> > > > >>>> > > > >> > > > > > > > > named writeQueue.
> > > > >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to
> writeQueue
> > > and
> > > > >>>> waits
> > > > >>>> > > until
> > > > >>>> > > > >> > > > writeQueue
> > > > >>>> > > > >> > > > > > is
> > > > >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue
> > and
> > > > >>>> step 9
> > > > >>>> > > > removes
> > > > >>>> > > > >> the
> > > > >>>> > > > >> > > > > > > WriteEntry
> > > > >>>> > > > >> > > > > > > > > from writeQueue.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I think that when a handler thread is
> > > > >>>> processing
> > > > >>>> > > between
> > > > >>>> > > > >> > step 2
> > > > >>>> > > > >> > > > and
> > > > >>>> > > > >> > > > > > > step
> > > > >>>> > > > >> > > > > > > > 9,
> > > > >>>> > > > >> > > > > > > > > the other handler threads can wait
> until
> > > the
> > > > >>>> thread
> > > > >>>> > > > >> completes
> > > > >>>> > > > >> > > > step
> > > > >>>> > > > >> > > > > 9.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > That is right. We need to read, after all
> > > > >>>> outstanding
> > > > >>>> > > > >> updates
> > > > >>>> > > > >> > are
> > > > >>>> > > > >> > > > > > done...
> > > > >>>> > > > >> > > > > > > > because we need to read the latest update
> > > > before
> > > > >>>> we go
> > > > >>>> > > to
> > > > >>>> > > > >> > > > > > > modify/increment
> > > > >>>> > > > >> > > > > > > > it.
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > How do you make out this?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to
> > > occur
> > > > in
> > > > >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > In 0.98.x we did this:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> mvcc.waitForPreviousTransactionsComplete()
> > > > which
> > > > >>>> is
> > > > >>>> > > > this....
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > +  public void
> > > > >>>> waitForPreviousTransactionsComplete() {
> > > > >>>> > > > >> > > > > > > > +    WriteEntry w =
> beginMemstoreInsert();
> > > > >>>> > > > >> > > > > > > > +
> > waitForPreviousTransactionsComplete(w);
> > > > >>>> > > > >> > > > > > > > +  }
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were
> merged
> > in
> > > > >>>> 1.0 (
> > > > >>>> > > > >> > > > > > > >
> > > > https://issues.apache.org/jira/browse/HBASE-8763
> > > > >>>> ).
> > > > >>>> > > > Previous
> > > > >>>> > > > >> > mvcc
> > > > >>>> > > > >> > > > and
> > > > >>>> > > > >> > > > > > > > region
> > > > >>>> > > > >> > > > > > > > sequenceid would spin independent of each
> > > > other.
> > > > >>>> > Perhaps
> > > > >>>> > > > >> this
> > > > >>>> > > > >> > > > > > responsible
> > > > >>>> > > > >> > > > > > > > for some slow down.
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > That said, looking in your thread dump,
> we
> > > seem
> > > > >>>> to be
> > > > >>>> > > down
> > > > >>>> > > > >> in
> > > > >>>> > > > >> > the
> > > > >>>> > > > >> > > > > Get.
> > > > >>>> > > > >> > > > > > If
> > > > >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row,
> > > where
> > > > >>>> is the
> > > > >>>> > > > >> > > lock-holding
> > > > >>>> > > > >> > > > > > > thread?
> > > > >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting
> > on
> > > > >>>> sequence
> > > > >>>> > > id?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > counter
> > > > >>>> > > > post-upgrade?
> > > > >>>> > > > >> > Is
> > > > >>>> > > > >> > > it
> > > > >>>> > > > >> > > > > > > > possible that all these threads are
> trying
> > to
> > > > >>>> get to
> > > > >>>> > the
> > > > >>>> > > > >> same
> > > > >>>> > > > >> > row
> > > > >>>> > > > >> > > > to
> > > > >>>> > > > >> > > > > > > update
> > > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> > Or
> > > > are
> > > > >>>> you
> > > > >>>> > > > >> thinking
> > > > >>>> > > > >> > > > > increment
> > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > St.Ack
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > > >>>> > yuzhihong@gmail.com
> > > > >>>> > > >:
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the
> row
> > > > (not
> > > > >>>> > > region):
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >     try {
> > > > >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack
> > > trace ?
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > Thanks
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕
> <
> > > > >>>> > > > >> brfrn169@gmail.com>
> > > > >>>> > > > >> > > > wrote:
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Hi,
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> > > > >>>> > CDH5.3.1(HBase0.98.6)
> > > > >>>> > > > to
> > > > >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >>>> > > > >> > > > > > > > > > > and we experience slowdown in
> > increment
> > > > >>>> > operation.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump
> of
> > > the
> > > > >>>> > > > >> RegionServer of
> > > > >>>> > > > >> > > our
> > > > >>>> > > > >> > > > > > > > cluster:
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Thread 68
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> > > > >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > >>>> > > > >> > > > > > > > > > >   Blocked on
> > > > java.util.LinkedList@3474e4b2
> > > > >>>> > > > >> > > > > > > > > > >   Blocked by 63
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > >>>> > > > >> > > > > > > > > > >   Stack:
> > > > >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native
> > > Method)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> >
> > > > >>>>
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > >
> > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > >
> > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > >
> > > > >>>>
> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > >>>> > > > >> > > > > > > > > > >
> > >  java.lang.Thread.run(Thread.java:745)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > There are many similar threads in
> the
> > > > >>>> thread
> > > > >>>> > dump.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > I read the source code and I think
> > this
> > > > is
> > > > >>>> > caused
> > > > >>>> > > by
> > > > >>>> > > > >> > > changes
> > > > >>>> > > > >> > > > of
> > > > >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > >>>> > > > >> > > > > > > > > > > A region lock (not a row lock)
> seems
> > to
> > > > >>>> occur in
> > > > >>>> > > > >> > > > > > > > > > >
> > waitForPreviousTransactionsComplete().
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Also we wrote performance test code
> > for
> > > > >>>> > increment
> > > > >>>> > > > >> > operation
> > > > >>>> > > > >> > > > > that
> > > > >>>> > > > >> > > > > > > > > included
> > > > >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local
> mode.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > The result is shown below:
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757,
> Latency(ms):
> > > > >>>> > > > >> 7.975072509210629
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027,
> Latency(ms):
> > > > >>>> > > > 49.11840157868772
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Looking at that stack trace, nothing showing as blocked or slowed by
another operation. You have others I could look at Bryan?
St.Ack

On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> Yea sorry if I was misleading.  The nonce loglines we saw only happened on
> full cluster restart, it may have been the HLog's replaying, not sure.
>
> We are still seeing slow Increments. Where Gets and Mutates will be on the
> order of 50-150ms according to metrics, Increment will be in the
> 1000-5000ms range. It seems we may be blocking on FSHLog#syncer.
>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
>
>
>
> On Mon, Nov 30, 2015 at 11:26 PM Stack <st...@duboce.net> wrote:
>
> > Still slow increments though?
> >
> > On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com
> > > wrote:
> >
> > > Those log lines have settled down, they may have been related to a
> > > cluster-wide forced restart at the time.
> > >
> > > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> > > bbeaudreault@hubspot.com>
> > > wrote:
> > >
> > > > We've been doing more debugging of this and have set up the read vs
> > write
> > > > handlers to try to at least segment this away so reads can work. We
> > have
> > > > pretty beefy servers, and are running wiht the following settings:
> > > >
> > > > hbase.regionserver.handler.count=1000
> > > > hbase.ipc.server.read.threadpool.size=50
> > > > hbase.ipc.server.callqueue.handler.factor=0.025
> > > > hbase.ipc.server.callqueue.read.ratio=0.6
> > > > hbase.ipc.server.callqueue.scan.ratio=0.5
> > > >
> > > > We are seeing all 400 write handlers taken up by row locks for the
> most
> > > > part. The read handlers are mostly idle. We're thinking of changing
> the
> > > > ratio here, but are not sure it will help if they are all blocked on
> a
> > > row
> > > > lock.  We just enabled DEBUG logging on all our servers and notice
> the
> > > > following:
> > > >
> > > > 2015-12-01 00:56:09,015 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > detected
> > > > by nonce: [-687451119961178644:7664336281906118656], [state 0,
> hasWait
> > > > false, activity 00:54:36.240]
> > > > 2015-12-01 00:56:09,015 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > detected
> > > > by nonce: [-687451119961178644:-7119840249342174227], [state 0,
> hasWait
> > > > false, activity 00:54:36.256]
> > > > 2015-12-01 00:56:09,268 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > detected
> > > > by nonce: [-5946137511131403479:2112661701888365489], [state 0,
> hasWait
> > > > false, activity 00:55:01.259]
> > > > 2015-12-01 00:56:09,279 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > detected
> > > > by nonce: [4165332617675853029:6256955295384472057], [state 0,
> hasWait
> > > > false, activity 00:53:58.151]
> > > > 2015-12-01 00:56:09,279 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > detected
> > > > by nonce: [4165332617675853029:4961178013070912522], [state 0,
> hasWait
> > > > false, activity 00:53:58.162]
> > > >
> > > >
> > > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com> wrote:
> > > >
> > > >> Sorry the second link should be
> > > >>
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> > > >>
> > > >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> > > >> bbeaudreault@hubspot.com> wrote:
> > > >>
> > > >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> > > >>>
> > > >>> An active handler:
> > > >>>
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > > >>> One that is locked:
> > > >>>
> > >
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> > > >>>
> > > >>> The difference between pre-rollback and post is that previously we
> > were
> > > >>> seeing things blocked in mvcc.  Now we are seeing them blocked on
> the
> > > >>> upsert.
> > > >>>
> > > >>> It always follows the same pattern, of 1 active handler in the
> upsert
> > > >>> and the rest blocked waiting for it.
> > > >>>
> > > >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> > > >>>
> > > >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > > >>>> bbeaudreault@hubspot.com
> > > >>>> > wrote:
> > > >>>>
> > > >>>> > The rollback seems to have mostly solved the issue for one of
> our
> > > >>>> clusters,
> > > >>>> > but another one is still seeing long increment times:
> > > >>>> >
> > > >>>> > "slowIncrementCount": 52080,
> > > >>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max":
> > > 6162,"
> > > >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > > >>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> > > >>>> > 1052.6499999999999,"Increment_99th_percentile":
> 1635.2399999999998
> > > >>>> >
> > > >>>> >
> > > >>>> > Any ideas if there are other changes that may be causing a
> > > performance
> > > >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> > > >>>> >
> > > >>>> >
> > > >>>> >
> > > >>>> No.
> > > >>>>
> > > >>>> Post a thread dump Bryan and it might prompt something.
> > > >>>>
> > > >>>> St.Ack
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> >
> > > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> > > >>>> >
> > > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > >>>> > > bbeaudreault@hubspot.com> wrote:
> > > >>>> > >
> > > >>>> > > > Should this be added as a known issue in the CDH or hbase
> > > >>>> > documentation?
> > > >>>> > > It
> > > >>>> > > > was a severe performance hit for us, all of our
> regionservers
> > > were
> > > >>>> > > sitting
> > > >>>> > > > at a few thousand queued requests.
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > > Let me take care of that.
> > > >>>> > > St.Ack
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > >>>> > > > bbeaudreault@hubspot.com>
> > > >>>> > > > wrote:
> > > >>>> > > >
> > > >>>> > > > > Yea, they are all over the place and called from client
> and
> > > >>>> > coprocessor
> > > >>>> > > > > code. We ended up having no other option but to rollback,
> > and
> > > >>>> aside
> > > >>>> > > from
> > > >>>> > > > a
> > > >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > > >>>> Put#addColumn),
> > > >>>> > > it
> > > >>>> > > > > seems to be working and fixing our problem.
> > > >>>> > > > >
> > > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net>
> > > wrote:
> > > >>>> > > > >
> > > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work
> on
> > > >>>> this
> > > >>>> > now.
> > > >>>> > > > >> Where
> > > >>>> > > > >> are your counters Bryan? In their own column family or
> > > >>>> scattered
> > > >>>> > about
> > > >>>> > > > in
> > > >>>> > > > >> a
> > > >>>> > > > >> row with other Cell types?
> > > >>>> > > > >> St.Ack
> > > >>>> > > > >>
> > > >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> > > >>>> > > > >>
> > > >>>> > > > >> > Is there any update to this? We just upgraded all of
> our
> > > >>>> > production
> > > >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this
> JIRA
> > > >>>> listed in
> > > >>>> > > the
> > > >>>> > > > >> > known issues, did not not about this.  Now we are
> seeing
> > > >>>> > perfomance
> > > >>>> > > > >> issues
> > > >>>> > > > >> > across all clusters, as we make heavy use of
> increments.
> > > >>>> > > > >> >
> > > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only
> hope
> > > to
> > > >>>> roll
> > > >>>> > > back
> > > >>>> > > > >> to
> > > >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> > > >>>> > > > >> >
> > > >>>> > > > >> >
> > > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <
> brfrn169@gmail.com
> > >
> > > >>>> wrote:
> > > >>>> > > > >> >
> > > >>>> > > > >> > > Thank you St.Ack!
> > > >>>> > > > >> > >
> > > >>>> > > > >> > > I would like to follow the ticket.
> > > >>>> > > > >> > >
> > > >>>> > > > >> > > Toshihiro Suzuki
> > > >>>> > > > >> > >
> > > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > > >>>> > > > >> > >
> > > >>>> > > > >> > > > Back to this problem. Simple tests confirm that as
> > is,
> > > >>>> the
> > > >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region
> ops
> > > if
> > > >>>> some
> > > >>>> > > > other
> > > >>>> > > > >> row
> > > >>>> > > > >> > > is
> > > >>>> > > > >> > > > slow to complete. In particular Increment,
> > checkAndPut,
> > > >>>> and
> > > >>>> > > batch
> > > >>>> > > > >> > > mutations
> > > >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a
> > fix
> > > >>>> up.
> > > >>>> > Lets
> > > >>>> > > > >> see if
> > > >>>> > > > >> > > we
> > > >>>> > > > >> > > > can somehow scope mvcc to row or at least shard
> mvcc
> > so
> > > >>>> not
> > > >>>> > all
> > > >>>> > > > >> Region
> > > >>>> > > > >> > > ops
> > > >>>> > > > >> > > > are paused.
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > > > St.Ack
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> > > >>>> brfrn169@gmail.com>
> > > >>>> > > wrote:
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > accompanying
> > > >>>> > helpful
> > > >>>> > > > >> > > diagram).
> > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> > > with
> > > >>>> the
> > > >>>> > > > >> > > illustration.
> > > >>>> > > > >> > > > It
> > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> > > >>>> only...
> > > >>>> > > Writes
> > > >>>> > > > >> > > against
> > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> row.
> > > Tag
> > > >>>> an
> > > >>>> > mvcc
> > > >>>> > > > >> with a
> > > >>>> > > > >> > > > 'row'
> > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> pertain
> > > to
> > > >>>> > current
> > > >>>> > > > >> > > operation?
> > > >>>> > > > >> > > > > Thank you St.Ack! I think this approach would
> work.
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > > > You need to read back the increment and have it
> > be
> > > >>>> > 'correct'
> > > >>>> > > > at
> > > >>>> > > > >> > > > increment
> > > >>>> > > > >> > > > > > time?
> > > >>>> > > > >> > > > > Yes, we need it.
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > > I would like to help if there is anything I can
> do.
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > > Thanks,
> > > >>>> > > > >> > > > > Toshihiro Suzuki
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <
> stack@duboce.net
> > >:
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > accompanying
> > > >>>> > helpful
> > > >>>> > > > >> > > diagram).
> > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> > > with
> > > >>>> the
> > > >>>> > > > >> > > illustration.
> > > >>>> > > > >> > > > It
> > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> > > >>>> only...
> > > >>>> > > Writes
> > > >>>> > > > >> > > against
> > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> row.
> > > Tag
> > > >>>> an
> > > >>>> > mvcc
> > > >>>> > > > >> with a
> > > >>>> > > > >> > > > 'row'
> > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> pertain
> > > to
> > > >>>> > current
> > > >>>> > > > >> > > operation?
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > > > You need to read back the increment and have it
> > be
> > > >>>> > 'correct'
> > > >>>> > > > at
> > > >>>> > > > >> > > > increment
> > > >>>> > > > >> > > > > > time?
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > > > (This is a good one)
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> > > >>>> > > > >> > > > > > St.Ack
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > > >>>> brfrn169@gmail.com
> > > >>>> > >
> > > >>>> > > > >> wrote:
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > > > > St.Ack,
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Thank you for your response.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a row
> > > lock)
> > > >>>> > seems
> > > >>>> > > to
> > > >>>> > > > >> > occur
> > > >>>> > > > >> > > in
> > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
> > > >>>> follows:
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > A increment operation has 3 procedures for
> > MVCC.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > 1.
> mvcc.waitForPreviousTransactionsComplete();
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > 2. w =
> > > mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> > > >>>> walKey);
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> > > >>>> writeQueue
> > > >>>> > > can
> > > >>>> > > > >> > cause
> > > >>>> > > > >> > > a
> > > >>>> > > > >> > > > > > region
> > > >>>> > > > >> > > > > > > lock.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from
> writeQueue.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey)
> ->
> > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > >>>> > > advanceMemstore(w)
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> > > >>>> beginMemstoreInsert() to
> > > >>>> > > > >> writeQueue
> > > >>>> > > > >> > > and
> > > >>>> > > > >> > > > > > waits
> > > >>>> > > > >> > > > > > > until writeQueue is empty or
> > > writeQueue.getFirst()
> > > >>>> == w.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > I think when a handler thread is processing
> > > between
> > > >>>> > step 2
> > > >>>> > > > and
> > > >>>> > > > >> > step
> > > >>>> > > > >> > > > 3,
> > > >>>> > > > >> > > > > > the
> > > >>>> > > > >> > > > > > > other handler threads can wait at step 1
> until
> > > the
> > > >>>> > thread
> > > >>>> > > > >> > completes
> > > >>>> > > > >> > > > > step
> > > >>>> > > > >> > > > > > 3
> > > >>>> > > > >> > > > > > > This is depicted as follows:
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Actually, in the thread dump of our region
> > > server,
> > > >>>> many
> > > >>>> > > > >> handler
> > > >>>> > > > >> > > > threads
> > > >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at
> > Step
> > > 1
> > > >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Many handler threads wait at this:
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > counter
> > > >>>> > > > post-upgrade?
> > > >>>> > > > >> > Is
> > > >>>> > > > >> > > it
> > > >>>> > > > >> > > > > > > > possible that all these threads are trying
> to
> > > >>>> get to
> > > >>>> > the
> > > >>>> > > > >> same
> > > >>>> > > > >> > row
> > > >>>> > > > >> > > > to
> > > >>>> > > > >> > > > > > > update
> > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> Or
> > > are
> > > >>>> you
> > > >>>> > > > >> thinking
> > > >>>> > > > >> > > > > increment
> > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > >>>> > > > >> > > > > > > We have just upgraded HBase, not changed the
> > app
> > > >>>> > behavior.
> > > >>>> > > > We
> > > >>>> > > > >> are
> > > >>>> > > > >> > > > > > thinking
> > > >>>> > > > >> > > > > > > increment itself has slowed significantly.
> > > >>>> > > > >> > > > > > > Before upgrading HBase, it was good
> throughput
> > > and
> > > >>>> > > latency.
> > > >>>> > > > >> > > > > > > Currently, to cope with this problem, we
> split
> > > the
> > > >>>> > regions
> > > >>>> > > > >> > finely.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Thanks,
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Toshihiro Suzuki
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> > > stack@duboce.net
> > > >>>> >:
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > >>>> > > brfrn169@gmail.com
> > > >>>> > > > >
> > > >>>> > > > >> > > wrote:
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > > Ted,
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > Thank you for your response.
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to
> > Gist.
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > I think that increment operation works as
> > > >>>> follows:
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > 1. get row lock
> > > >>>> > > > >> > > > > > > > > 2.
> > mvcc.waitForPreviousTransactionsComplete()
> > > >>>> //
> > > >>>> > wait
> > > >>>> > > > for
> > > >>>> > > > >> all
> > > >>>> > > > >> > > > prior
> > > >>>> > > > >> > > > > > > MVCC
> > > >>>> > > > >> > > > > > > > > transactions to finish
> > > >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum()
> //
> > > >>>> start a
> > > >>>> > > > >> > transaction
> > > >>>> > > > >> > > > > > > > > 4. get previous values
> > > >>>> > > > >> > > > > > > > > 5. create KVs
> > > >>>> > > > >> > > > > > > > > 6. write to Memstore
> > > >>>> > > > >> > > > > > > > > 7. write to WAL
> > > >>>> > > > >> > > > > > > > > 8. release row lock
> > > >>>> > > > >> > > > > > > > > 9.
> mvcc.completeMemstoreInsertWithSeqNum()
> > //
> > > >>>> > complete
> > > >>>> > > > the
> > > >>>> > > > >> > > > > > transaction
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > A instance of
> > MultiVersionConsistencyControl
> > > >>>> has a
> > > >>>> > > > pending
> > > >>>> > > > >> > > queue
> > > >>>> > > > >> > > > of
> > > >>>> > > > >> > > > > > > > writes
> > > >>>> > > > >> > > > > > > > > named writeQueue.
> > > >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue
> > and
> > > >>>> waits
> > > >>>> > > until
> > > >>>> > > > >> > > > writeQueue
> > > >>>> > > > >> > > > > > is
> > > >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue
> and
> > > >>>> step 9
> > > >>>> > > > removes
> > > >>>> > > > >> the
> > > >>>> > > > >> > > > > > > WriteEntry
> > > >>>> > > > >> > > > > > > > > from writeQueue.
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > I think that when a handler thread is
> > > >>>> processing
> > > >>>> > > between
> > > >>>> > > > >> > step 2
> > > >>>> > > > >> > > > and
> > > >>>> > > > >> > > > > > > step
> > > >>>> > > > >> > > > > > > > 9,
> > > >>>> > > > >> > > > > > > > > the other handler threads can wait until
> > the
> > > >>>> thread
> > > >>>> > > > >> completes
> > > >>>> > > > >> > > > step
> > > >>>> > > > >> > > > > 9.
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > That is right. We need to read, after all
> > > >>>> outstanding
> > > >>>> > > > >> updates
> > > >>>> > > > >> > are
> > > >>>> > > > >> > > > > > done...
> > > >>>> > > > >> > > > > > > > because we need to read the latest update
> > > before
> > > >>>> we go
> > > >>>> > > to
> > > >>>> > > > >> > > > > > > modify/increment
> > > >>>> > > > >> > > > > > > > it.
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > How do you make out this?
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to
> > occur
> > > in
> > > >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > In 0.98.x we did this:
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete()
> > > which
> > > >>>> is
> > > >>>> > > > this....
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > +  public void
> > > >>>> waitForPreviousTransactionsComplete() {
> > > >>>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > >>>> > > > >> > > > > > > > +
> waitForPreviousTransactionsComplete(w);
> > > >>>> > > > >> > > > > > > > +  }
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged
> in
> > > >>>> 1.0 (
> > > >>>> > > > >> > > > > > > >
> > > https://issues.apache.org/jira/browse/HBASE-8763
> > > >>>> ).
> > > >>>> > > > Previous
> > > >>>> > > > >> > mvcc
> > > >>>> > > > >> > > > and
> > > >>>> > > > >> > > > > > > > region
> > > >>>> > > > >> > > > > > > > sequenceid would spin independent of each
> > > other.
> > > >>>> > Perhaps
> > > >>>> > > > >> this
> > > >>>> > > > >> > > > > > responsible
> > > >>>> > > > >> > > > > > > > for some slow down.
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > That said, looking in your thread dump, we
> > seem
> > > >>>> to be
> > > >>>> > > down
> > > >>>> > > > >> in
> > > >>>> > > > >> > the
> > > >>>> > > > >> > > > > Get.
> > > >>>> > > > >> > > > > > If
> > > >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row,
> > where
> > > >>>> is the
> > > >>>> > > > >> > > lock-holding
> > > >>>> > > > >> > > > > > > thread?
> > > >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting
> on
> > > >>>> sequence
> > > >>>> > > id?
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > counter
> > > >>>> > > > post-upgrade?
> > > >>>> > > > >> > Is
> > > >>>> > > > >> > > it
> > > >>>> > > > >> > > > > > > > possible that all these threads are trying
> to
> > > >>>> get to
> > > >>>> > the
> > > >>>> > > > >> same
> > > >>>> > > > >> > row
> > > >>>> > > > >> > > > to
> > > >>>> > > > >> > > > > > > update
> > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> Or
> > > are
> > > >>>> you
> > > >>>> > > > >> thinking
> > > >>>> > > > >> > > > > increment
> > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > St.Ack
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > > Thanks,
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > >>>> > yuzhihong@gmail.com
> > > >>>> > > >:
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row
> > > (not
> > > >>>> > > region):
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >     try {
> > > >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack
> > trace ?
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > Thanks
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > > >>>> > > > >> brfrn169@gmail.com>
> > > >>>> > > > >> > > > wrote:
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Hi,
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> > > >>>> > CDH5.3.1(HBase0.98.6)
> > > >>>> > > > to
> > > >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > >>>> > > > >> > > > > > > > > > > and we experience slowdown in
> increment
> > > >>>> > operation.
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of
> > the
> > > >>>> > > > >> RegionServer of
> > > >>>> > > > >> > > our
> > > >>>> > > > >> > > > > > > > cluster:
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Thread 68
> > > >>>> > > > >> > > > > > >
> > > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> > > >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> > > >>>> > > > >> > > > > > > > > > >   Blocked on
> > > java.util.LinkedList@3474e4b2
> > > >>>> > > > >> > > > > > > > > > >   Blocked by 63
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > >>>> > > > >> > > > > > > > > > >   Stack:
> > > >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native
> > Method)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> >
> > > >>>>
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > >
> > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > >
> > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > >
> > > >>>>
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > >>>> > > > >> > > > > > > > > > >
> >  java.lang.Thread.run(Thread.java:745)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > There are many similar threads in the
> > > >>>> thread
> > > >>>> > dump.
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > I read the source code and I think
> this
> > > is
> > > >>>> > caused
> > > >>>> > > by
> > > >>>> > > > >> > > changes
> > > >>>> > > > >> > > > of
> > > >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > >>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems
> to
> > > >>>> occur in
> > > >>>> > > > >> > > > > > > > > > >
> waitForPreviousTransactionsComplete().
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Also we wrote performance test code
> for
> > > >>>> > increment
> > > >>>> > > > >> > operation
> > > >>>> > > > >> > > > > that
> > > >>>> > > > >> > > > > > > > > included
> > > >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > The result is shown below:
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > > >>>> > > > >> 7.975072509210629
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > > >>>> > > > 49.11840157868772
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Thanks,
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > > >>>
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Looking at that stack trace, nothing showing as blocked or slowed by
another operation. You have others I could look at Bryan?
St.Ack

On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> Yea sorry if I was misleading.  The nonce loglines we saw only happened on
> full cluster restart, it may have been the HLog's replaying, not sure.
>
> We are still seeing slow Increments. Where Gets and Mutates will be on the
> order of 50-150ms according to metrics, Increment will be in the
> 1000-5000ms range. It seems we may be blocking on FSHLog#syncer.
>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
>
>
>
> On Mon, Nov 30, 2015 at 11:26 PM Stack <st...@duboce.net> wrote:
>
> > Still slow increments though?
> >
> > On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com
> > > wrote:
> >
> > > Those log lines have settled down, they may have been related to a
> > > cluster-wide forced restart at the time.
> > >
> > > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> > > bbeaudreault@hubspot.com>
> > > wrote:
> > >
> > > > We've been doing more debugging of this and have set up the read vs
> > write
> > > > handlers to try to at least segment this away so reads can work. We
> > have
> > > > pretty beefy servers, and are running wiht the following settings:
> > > >
> > > > hbase.regionserver.handler.count=1000
> > > > hbase.ipc.server.read.threadpool.size=50
> > > > hbase.ipc.server.callqueue.handler.factor=0.025
> > > > hbase.ipc.server.callqueue.read.ratio=0.6
> > > > hbase.ipc.server.callqueue.scan.ratio=0.5
> > > >
> > > > We are seeing all 400 write handlers taken up by row locks for the
> most
> > > > part. The read handlers are mostly idle. We're thinking of changing
> the
> > > > ratio here, but are not sure it will help if they are all blocked on
> a
> > > row
> > > > lock.  We just enabled DEBUG logging on all our servers and notice
> the
> > > > following:
> > > >
> > > > 2015-12-01 00:56:09,015 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > detected
> > > > by nonce: [-687451119961178644:7664336281906118656], [state 0,
> hasWait
> > > > false, activity 00:54:36.240]
> > > > 2015-12-01 00:56:09,015 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > detected
> > > > by nonce: [-687451119961178644:-7119840249342174227], [state 0,
> hasWait
> > > > false, activity 00:54:36.256]
> > > > 2015-12-01 00:56:09,268 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > detected
> > > > by nonce: [-5946137511131403479:2112661701888365489], [state 0,
> hasWait
> > > > false, activity 00:55:01.259]
> > > > 2015-12-01 00:56:09,279 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > detected
> > > > by nonce: [4165332617675853029:6256955295384472057], [state 0,
> hasWait
> > > > false, activity 00:53:58.151]
> > > > 2015-12-01 00:56:09,279 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > detected
> > > > by nonce: [4165332617675853029:4961178013070912522], [state 0,
> hasWait
> > > > false, activity 00:53:58.162]
> > > >
> > > >
> > > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com> wrote:
> > > >
> > > >> Sorry the second link should be
> > > >>
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> > > >>
> > > >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> > > >> bbeaudreault@hubspot.com> wrote:
> > > >>
> > > >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> > > >>>
> > > >>> An active handler:
> > > >>>
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > > >>> One that is locked:
> > > >>>
> > >
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> > > >>>
> > > >>> The difference between pre-rollback and post is that previously we
> > were
> > > >>> seeing things blocked in mvcc.  Now we are seeing them blocked on
> the
> > > >>> upsert.
> > > >>>
> > > >>> It always follows the same pattern, of 1 active handler in the
> upsert
> > > >>> and the rest blocked waiting for it.
> > > >>>
> > > >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> > > >>>
> > > >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > > >>>> bbeaudreault@hubspot.com
> > > >>>> > wrote:
> > > >>>>
> > > >>>> > The rollback seems to have mostly solved the issue for one of
> our
> > > >>>> clusters,
> > > >>>> > but another one is still seeing long increment times:
> > > >>>> >
> > > >>>> > "slowIncrementCount": 52080,
> > > >>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max":
> > > 6162,"
> > > >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > > >>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> > > >>>> > 1052.6499999999999,"Increment_99th_percentile":
> 1635.2399999999998
> > > >>>> >
> > > >>>> >
> > > >>>> > Any ideas if there are other changes that may be causing a
> > > performance
> > > >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> > > >>>> >
> > > >>>> >
> > > >>>> >
> > > >>>> No.
> > > >>>>
> > > >>>> Post a thread dump Bryan and it might prompt something.
> > > >>>>
> > > >>>> St.Ack
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> >
> > > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> > > >>>> >
> > > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > >>>> > > bbeaudreault@hubspot.com> wrote:
> > > >>>> > >
> > > >>>> > > > Should this be added as a known issue in the CDH or hbase
> > > >>>> > documentation?
> > > >>>> > > It
> > > >>>> > > > was a severe performance hit for us, all of our
> regionservers
> > > were
> > > >>>> > > sitting
> > > >>>> > > > at a few thousand queued requests.
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > > Let me take care of that.
> > > >>>> > > St.Ack
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > >>>> > > > bbeaudreault@hubspot.com>
> > > >>>> > > > wrote:
> > > >>>> > > >
> > > >>>> > > > > Yea, they are all over the place and called from client
> and
> > > >>>> > coprocessor
> > > >>>> > > > > code. We ended up having no other option but to rollback,
> > and
> > > >>>> aside
> > > >>>> > > from
> > > >>>> > > > a
> > > >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > > >>>> Put#addColumn),
> > > >>>> > > it
> > > >>>> > > > > seems to be working and fixing our problem.
> > > >>>> > > > >
> > > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net>
> > > wrote:
> > > >>>> > > > >
> > > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work
> on
> > > >>>> this
> > > >>>> > now.
> > > >>>> > > > >> Where
> > > >>>> > > > >> are your counters Bryan? In their own column family or
> > > >>>> scattered
> > > >>>> > about
> > > >>>> > > > in
> > > >>>> > > > >> a
> > > >>>> > > > >> row with other Cell types?
> > > >>>> > > > >> St.Ack
> > > >>>> > > > >>
> > > >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> > > >>>> > > > >>
> > > >>>> > > > >> > Is there any update to this? We just upgraded all of
> our
> > > >>>> > production
> > > >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this
> JIRA
> > > >>>> listed in
> > > >>>> > > the
> > > >>>> > > > >> > known issues, did not not about this.  Now we are
> seeing
> > > >>>> > perfomance
> > > >>>> > > > >> issues
> > > >>>> > > > >> > across all clusters, as we make heavy use of
> increments.
> > > >>>> > > > >> >
> > > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only
> hope
> > > to
> > > >>>> roll
> > > >>>> > > back
> > > >>>> > > > >> to
> > > >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> > > >>>> > > > >> >
> > > >>>> > > > >> >
> > > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <
> brfrn169@gmail.com
> > >
> > > >>>> wrote:
> > > >>>> > > > >> >
> > > >>>> > > > >> > > Thank you St.Ack!
> > > >>>> > > > >> > >
> > > >>>> > > > >> > > I would like to follow the ticket.
> > > >>>> > > > >> > >
> > > >>>> > > > >> > > Toshihiro Suzuki
> > > >>>> > > > >> > >
> > > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > > >>>> > > > >> > >
> > > >>>> > > > >> > > > Back to this problem. Simple tests confirm that as
> > is,
> > > >>>> the
> > > >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region
> ops
> > > if
> > > >>>> some
> > > >>>> > > > other
> > > >>>> > > > >> row
> > > >>>> > > > >> > > is
> > > >>>> > > > >> > > > slow to complete. In particular Increment,
> > checkAndPut,
> > > >>>> and
> > > >>>> > > batch
> > > >>>> > > > >> > > mutations
> > > >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a
> > fix
> > > >>>> up.
> > > >>>> > Lets
> > > >>>> > > > >> see if
> > > >>>> > > > >> > > we
> > > >>>> > > > >> > > > can somehow scope mvcc to row or at least shard
> mvcc
> > so
> > > >>>> not
> > > >>>> > all
> > > >>>> > > > >> Region
> > > >>>> > > > >> > > ops
> > > >>>> > > > >> > > > are paused.
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > > > St.Ack
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> > > >>>> brfrn169@gmail.com>
> > > >>>> > > wrote:
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > accompanying
> > > >>>> > helpful
> > > >>>> > > > >> > > diagram).
> > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> > > with
> > > >>>> the
> > > >>>> > > > >> > > illustration.
> > > >>>> > > > >> > > > It
> > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> > > >>>> only...
> > > >>>> > > Writes
> > > >>>> > > > >> > > against
> > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> row.
> > > Tag
> > > >>>> an
> > > >>>> > mvcc
> > > >>>> > > > >> with a
> > > >>>> > > > >> > > > 'row'
> > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> pertain
> > > to
> > > >>>> > current
> > > >>>> > > > >> > > operation?
> > > >>>> > > > >> > > > > Thank you St.Ack! I think this approach would
> work.
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > > > You need to read back the increment and have it
> > be
> > > >>>> > 'correct'
> > > >>>> > > > at
> > > >>>> > > > >> > > > increment
> > > >>>> > > > >> > > > > > time?
> > > >>>> > > > >> > > > > Yes, we need it.
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > > I would like to help if there is anything I can
> do.
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > > Thanks,
> > > >>>> > > > >> > > > > Toshihiro Suzuki
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <
> stack@duboce.net
> > >:
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > accompanying
> > > >>>> > helpful
> > > >>>> > > > >> > > diagram).
> > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> > > with
> > > >>>> the
> > > >>>> > > > >> > > illustration.
> > > >>>> > > > >> > > > It
> > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> > > >>>> only...
> > > >>>> > > Writes
> > > >>>> > > > >> > > against
> > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> row.
> > > Tag
> > > >>>> an
> > > >>>> > mvcc
> > > >>>> > > > >> with a
> > > >>>> > > > >> > > > 'row'
> > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> pertain
> > > to
> > > >>>> > current
> > > >>>> > > > >> > > operation?
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > > > You need to read back the increment and have it
> > be
> > > >>>> > 'correct'
> > > >>>> > > > at
> > > >>>> > > > >> > > > increment
> > > >>>> > > > >> > > > > > time?
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > > > (This is a good one)
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> > > >>>> > > > >> > > > > > St.Ack
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > > >>>> brfrn169@gmail.com
> > > >>>> > >
> > > >>>> > > > >> wrote:
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > > > > St.Ack,
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Thank you for your response.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a row
> > > lock)
> > > >>>> > seems
> > > >>>> > > to
> > > >>>> > > > >> > occur
> > > >>>> > > > >> > > in
> > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
> > > >>>> follows:
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > A increment operation has 3 procedures for
> > MVCC.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > 1.
> mvcc.waitForPreviousTransactionsComplete();
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > 2. w =
> > > mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> > > >>>> walKey);
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> > > >>>> writeQueue
> > > >>>> > > can
> > > >>>> > > > >> > cause
> > > >>>> > > > >> > > a
> > > >>>> > > > >> > > > > > region
> > > >>>> > > > >> > > > > > > lock.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from
> writeQueue.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey)
> ->
> > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > >>>> > > advanceMemstore(w)
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> > > >>>> beginMemstoreInsert() to
> > > >>>> > > > >> writeQueue
> > > >>>> > > > >> > > and
> > > >>>> > > > >> > > > > > waits
> > > >>>> > > > >> > > > > > > until writeQueue is empty or
> > > writeQueue.getFirst()
> > > >>>> == w.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > I think when a handler thread is processing
> > > between
> > > >>>> > step 2
> > > >>>> > > > and
> > > >>>> > > > >> > step
> > > >>>> > > > >> > > > 3,
> > > >>>> > > > >> > > > > > the
> > > >>>> > > > >> > > > > > > other handler threads can wait at step 1
> until
> > > the
> > > >>>> > thread
> > > >>>> > > > >> > completes
> > > >>>> > > > >> > > > > step
> > > >>>> > > > >> > > > > > 3
> > > >>>> > > > >> > > > > > > This is depicted as follows:
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Actually, in the thread dump of our region
> > > server,
> > > >>>> many
> > > >>>> > > > >> handler
> > > >>>> > > > >> > > > threads
> > > >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at
> > Step
> > > 1
> > > >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Many handler threads wait at this:
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > counter
> > > >>>> > > > post-upgrade?
> > > >>>> > > > >> > Is
> > > >>>> > > > >> > > it
> > > >>>> > > > >> > > > > > > > possible that all these threads are trying
> to
> > > >>>> get to
> > > >>>> > the
> > > >>>> > > > >> same
> > > >>>> > > > >> > row
> > > >>>> > > > >> > > > to
> > > >>>> > > > >> > > > > > > update
> > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> Or
> > > are
> > > >>>> you
> > > >>>> > > > >> thinking
> > > >>>> > > > >> > > > > increment
> > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > >>>> > > > >> > > > > > > We have just upgraded HBase, not changed the
> > app
> > > >>>> > behavior.
> > > >>>> > > > We
> > > >>>> > > > >> are
> > > >>>> > > > >> > > > > > thinking
> > > >>>> > > > >> > > > > > > increment itself has slowed significantly.
> > > >>>> > > > >> > > > > > > Before upgrading HBase, it was good
> throughput
> > > and
> > > >>>> > > latency.
> > > >>>> > > > >> > > > > > > Currently, to cope with this problem, we
> split
> > > the
> > > >>>> > regions
> > > >>>> > > > >> > finely.
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Thanks,
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > Toshihiro Suzuki
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> > > stack@duboce.net
> > > >>>> >:
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > >>>> > > brfrn169@gmail.com
> > > >>>> > > > >
> > > >>>> > > > >> > > wrote:
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > > Ted,
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > Thank you for your response.
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to
> > Gist.
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > I think that increment operation works as
> > > >>>> follows:
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > 1. get row lock
> > > >>>> > > > >> > > > > > > > > 2.
> > mvcc.waitForPreviousTransactionsComplete()
> > > >>>> //
> > > >>>> > wait
> > > >>>> > > > for
> > > >>>> > > > >> all
> > > >>>> > > > >> > > > prior
> > > >>>> > > > >> > > > > > > MVCC
> > > >>>> > > > >> > > > > > > > > transactions to finish
> > > >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum()
> //
> > > >>>> start a
> > > >>>> > > > >> > transaction
> > > >>>> > > > >> > > > > > > > > 4. get previous values
> > > >>>> > > > >> > > > > > > > > 5. create KVs
> > > >>>> > > > >> > > > > > > > > 6. write to Memstore
> > > >>>> > > > >> > > > > > > > > 7. write to WAL
> > > >>>> > > > >> > > > > > > > > 8. release row lock
> > > >>>> > > > >> > > > > > > > > 9.
> mvcc.completeMemstoreInsertWithSeqNum()
> > //
> > > >>>> > complete
> > > >>>> > > > the
> > > >>>> > > > >> > > > > > transaction
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > A instance of
> > MultiVersionConsistencyControl
> > > >>>> has a
> > > >>>> > > > pending
> > > >>>> > > > >> > > queue
> > > >>>> > > > >> > > > of
> > > >>>> > > > >> > > > > > > > writes
> > > >>>> > > > >> > > > > > > > > named writeQueue.
> > > >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue
> > and
> > > >>>> waits
> > > >>>> > > until
> > > >>>> > > > >> > > > writeQueue
> > > >>>> > > > >> > > > > > is
> > > >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue
> and
> > > >>>> step 9
> > > >>>> > > > removes
> > > >>>> > > > >> the
> > > >>>> > > > >> > > > > > > WriteEntry
> > > >>>> > > > >> > > > > > > > > from writeQueue.
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > I think that when a handler thread is
> > > >>>> processing
> > > >>>> > > between
> > > >>>> > > > >> > step 2
> > > >>>> > > > >> > > > and
> > > >>>> > > > >> > > > > > > step
> > > >>>> > > > >> > > > > > > > 9,
> > > >>>> > > > >> > > > > > > > > the other handler threads can wait until
> > the
> > > >>>> thread
> > > >>>> > > > >> completes
> > > >>>> > > > >> > > > step
> > > >>>> > > > >> > > > > 9.
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > That is right. We need to read, after all
> > > >>>> outstanding
> > > >>>> > > > >> updates
> > > >>>> > > > >> > are
> > > >>>> > > > >> > > > > > done...
> > > >>>> > > > >> > > > > > > > because we need to read the latest update
> > > before
> > > >>>> we go
> > > >>>> > > to
> > > >>>> > > > >> > > > > > > modify/increment
> > > >>>> > > > >> > > > > > > > it.
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > How do you make out this?
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to
> > occur
> > > in
> > > >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > In 0.98.x we did this:
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete()
> > > which
> > > >>>> is
> > > >>>> > > > this....
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > +  public void
> > > >>>> waitForPreviousTransactionsComplete() {
> > > >>>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > >>>> > > > >> > > > > > > > +
> waitForPreviousTransactionsComplete(w);
> > > >>>> > > > >> > > > > > > > +  }
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged
> in
> > > >>>> 1.0 (
> > > >>>> > > > >> > > > > > > >
> > > https://issues.apache.org/jira/browse/HBASE-8763
> > > >>>> ).
> > > >>>> > > > Previous
> > > >>>> > > > >> > mvcc
> > > >>>> > > > >> > > > and
> > > >>>> > > > >> > > > > > > > region
> > > >>>> > > > >> > > > > > > > sequenceid would spin independent of each
> > > other.
> > > >>>> > Perhaps
> > > >>>> > > > >> this
> > > >>>> > > > >> > > > > > responsible
> > > >>>> > > > >> > > > > > > > for some slow down.
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > That said, looking in your thread dump, we
> > seem
> > > >>>> to be
> > > >>>> > > down
> > > >>>> > > > >> in
> > > >>>> > > > >> > the
> > > >>>> > > > >> > > > > Get.
> > > >>>> > > > >> > > > > > If
> > > >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row,
> > where
> > > >>>> is the
> > > >>>> > > > >> > > lock-holding
> > > >>>> > > > >> > > > > > > thread?
> > > >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting
> on
> > > >>>> sequence
> > > >>>> > > id?
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > counter
> > > >>>> > > > post-upgrade?
> > > >>>> > > > >> > Is
> > > >>>> > > > >> > > it
> > > >>>> > > > >> > > > > > > > possible that all these threads are trying
> to
> > > >>>> get to
> > > >>>> > the
> > > >>>> > > > >> same
> > > >>>> > > > >> > row
> > > >>>> > > > >> > > > to
> > > >>>> > > > >> > > > > > > update
> > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> Or
> > > are
> > > >>>> you
> > > >>>> > > > >> thinking
> > > >>>> > > > >> > > > > increment
> > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > St.Ack
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > > > > Thanks,
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > >>>> > yuzhihong@gmail.com
> > > >>>> > > >:
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row
> > > (not
> > > >>>> > > region):
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >     try {
> > > >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack
> > trace ?
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > Thanks
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > > >>>> > > > >> brfrn169@gmail.com>
> > > >>>> > > > >> > > > wrote:
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Hi,
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> > > >>>> > CDH5.3.1(HBase0.98.6)
> > > >>>> > > > to
> > > >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > >>>> > > > >> > > > > > > > > > > and we experience slowdown in
> increment
> > > >>>> > operation.
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of
> > the
> > > >>>> > > > >> RegionServer of
> > > >>>> > > > >> > > our
> > > >>>> > > > >> > > > > > > > cluster:
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Thread 68
> > > >>>> > > > >> > > > > > >
> > > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> > > >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> > > >>>> > > > >> > > > > > > > > > >   Blocked on
> > > java.util.LinkedList@3474e4b2
> > > >>>> > > > >> > > > > > > > > > >   Blocked by 63
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > >>>> > > > >> > > > > > > > > > >   Stack:
> > > >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native
> > Method)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> >
> > > >>>>
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > >
> > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > >
> > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > >
> > > >>>>
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > >>>> > > > >> > > > > > > > > > >
> >  java.lang.Thread.run(Thread.java:745)
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > There are many similar threads in the
> > > >>>> thread
> > > >>>> > dump.
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > I read the source code and I think
> this
> > > is
> > > >>>> > caused
> > > >>>> > > by
> > > >>>> > > > >> > > changes
> > > >>>> > > > >> > > > of
> > > >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > >>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems
> to
> > > >>>> occur in
> > > >>>> > > > >> > > > > > > > > > >
> waitForPreviousTransactionsComplete().
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Also we wrote performance test code
> for
> > > >>>> > increment
> > > >>>> > > > >> > operation
> > > >>>> > > > >> > > > > that
> > > >>>> > > > >> > > > > > > > > included
> > > >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > The result is shown below:
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > > >>>> > > > >> 7.975072509210629
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > > >>>> > > > 49.11840157868772
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Thanks,
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > >>>> > > > >> > > > > > > > > > >
> > > >>>> > > > >> > > > > > > > > >
> > > >>>> > > > >> > > > > > > > >
> > > >>>> > > > >> > > > > > > >
> > > >>>> > > > >> > > > > > >
> > > >>>> > > > >> > > > > >
> > > >>>> > > > >> > > > >
> > > >>>> > > > >> > > >
> > > >>>> > > > >> > >
> > > >>>> > > > >> >
> > > >>>> > > > >>
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > > >>>
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Yea sorry if I was misleading.  The nonce loglines we saw only happened on
full cluster restart, it may have been the HLog's replaying, not sure.

We are still seeing slow Increments. Where Gets and Mutates will be on the
order of 50-150ms according to metrics, Increment will be in the
1000-5000ms range. It seems we may be blocking on FSHLog#syncer.
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359



On Mon, Nov 30, 2015 at 11:26 PM Stack <st...@duboce.net> wrote:

> Still slow increments though?
>
> On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > Those log lines have settled down, they may have been related to a
> > cluster-wide forced restart at the time.
> >
> > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> > bbeaudreault@hubspot.com>
> > wrote:
> >
> > > We've been doing more debugging of this and have set up the read vs
> write
> > > handlers to try to at least segment this away so reads can work. We
> have
> > > pretty beefy servers, and are running wiht the following settings:
> > >
> > > hbase.regionserver.handler.count=1000
> > > hbase.ipc.server.read.threadpool.size=50
> > > hbase.ipc.server.callqueue.handler.factor=0.025
> > > hbase.ipc.server.callqueue.read.ratio=0.6
> > > hbase.ipc.server.callqueue.scan.ratio=0.5
> > >
> > > We are seeing all 400 write handlers taken up by row locks for the most
> > > part. The read handlers are mostly idle. We're thinking of changing the
> > > ratio here, but are not sure it will help if they are all blocked on a
> > row
> > > lock.  We just enabled DEBUG logging on all our servers and notice the
> > > following:
> > >
> > > 2015-12-01 00:56:09,015 DEBUG
> > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > detected
> > > by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait
> > > false, activity 00:54:36.240]
> > > 2015-12-01 00:56:09,015 DEBUG
> > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > detected
> > > by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait
> > > false, activity 00:54:36.256]
> > > 2015-12-01 00:56:09,268 DEBUG
> > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > detected
> > > by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait
> > > false, activity 00:55:01.259]
> > > 2015-12-01 00:56:09,279 DEBUG
> > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > detected
> > > by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait
> > > false, activity 00:53:58.151]
> > > 2015-12-01 00:56:09,279 DEBUG
> > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > detected
> > > by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait
> > > false, activity 00:53:58.162]
> > >
> > >
> > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > > bbeaudreault@hubspot.com> wrote:
> > >
> > >> Sorry the second link should be
> > >>
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> > >>
> > >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> > >> bbeaudreault@hubspot.com> wrote:
> > >>
> > >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> > >>>
> > >>> An active handler:
> > >>>
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > >>> One that is locked:
> > >>>
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> > >>>
> > >>> The difference between pre-rollback and post is that previously we
> were
> > >>> seeing things blocked in mvcc.  Now we are seeing them blocked on the
> > >>> upsert.
> > >>>
> > >>> It always follows the same pattern, of 1 active handler in the upsert
> > >>> and the rest blocked waiting for it.
> > >>>
> > >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> > >>>
> > >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > >>>> bbeaudreault@hubspot.com
> > >>>> > wrote:
> > >>>>
> > >>>> > The rollback seems to have mostly solved the issue for one of our
> > >>>> clusters,
> > >>>> > but another one is still seeing long increment times:
> > >>>> >
> > >>>> > "slowIncrementCount": 52080,
> > >>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max":
> > 6162,"
> > >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > >>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> > >>>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
> > >>>> >
> > >>>> >
> > >>>> > Any ideas if there are other changes that may be causing a
> > performance
> > >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> > >>>> >
> > >>>> >
> > >>>> >
> > >>>> No.
> > >>>>
> > >>>> Post a thread dump Bryan and it might prompt something.
> > >>>>
> > >>>> St.Ack
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> >
> > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> > >>>> >
> > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > >>>> > > bbeaudreault@hubspot.com> wrote:
> > >>>> > >
> > >>>> > > > Should this be added as a known issue in the CDH or hbase
> > >>>> > documentation?
> > >>>> > > It
> > >>>> > > > was a severe performance hit for us, all of our regionservers
> > were
> > >>>> > > sitting
> > >>>> > > > at a few thousand queued requests.
> > >>>> > > >
> > >>>> > > >
> > >>>> > > Let me take care of that.
> > >>>> > > St.Ack
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > >>>> > > > bbeaudreault@hubspot.com>
> > >>>> > > > wrote:
> > >>>> > > >
> > >>>> > > > > Yea, they are all over the place and called from client and
> > >>>> > coprocessor
> > >>>> > > > > code. We ended up having no other option but to rollback,
> and
> > >>>> aside
> > >>>> > > from
> > >>>> > > > a
> > >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > >>>> Put#addColumn),
> > >>>> > > it
> > >>>> > > > > seems to be working and fixing our problem.
> > >>>> > > > >
> > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net>
> > wrote:
> > >>>> > > > >
> > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on
> > >>>> this
> > >>>> > now.
> > >>>> > > > >> Where
> > >>>> > > > >> are your counters Bryan? In their own column family or
> > >>>> scattered
> > >>>> > about
> > >>>> > > > in
> > >>>> > > > >> a
> > >>>> > > > >> row with other Cell types?
> > >>>> > > > >> St.Ack
> > >>>> > > > >>
> > >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> > >>>> > > > >>
> > >>>> > > > >> > Is there any update to this? We just upgraded all of our
> > >>>> > production
> > >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
> > >>>> listed in
> > >>>> > > the
> > >>>> > > > >> > known issues, did not not about this.  Now we are seeing
> > >>>> > perfomance
> > >>>> > > > >> issues
> > >>>> > > > >> > across all clusters, as we make heavy use of increments.
> > >>>> > > > >> >
> > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope
> > to
> > >>>> roll
> > >>>> > > back
> > >>>> > > > >> to
> > >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> > >>>> > > > >> >
> > >>>> > > > >> >
> > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <brfrn169@gmail.com
> >
> > >>>> wrote:
> > >>>> > > > >> >
> > >>>> > > > >> > > Thank you St.Ack!
> > >>>> > > > >> > >
> > >>>> > > > >> > > I would like to follow the ticket.
> > >>>> > > > >> > >
> > >>>> > > > >> > > Toshihiro Suzuki
> > >>>> > > > >> > >
> > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > >>>> > > > >> > >
> > >>>> > > > >> > > > Back to this problem. Simple tests confirm that as
> is,
> > >>>> the
> > >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops
> > if
> > >>>> some
> > >>>> > > > other
> > >>>> > > > >> row
> > >>>> > > > >> > > is
> > >>>> > > > >> > > > slow to complete. In particular Increment,
> checkAndPut,
> > >>>> and
> > >>>> > > batch
> > >>>> > > > >> > > mutations
> > >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a
> fix
> > >>>> up.
> > >>>> > Lets
> > >>>> > > > >> see if
> > >>>> > > > >> > > we
> > >>>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc
> so
> > >>>> not
> > >>>> > all
> > >>>> > > > >> Region
> > >>>> > > > >> > > ops
> > >>>> > > > >> > > > are paused.
> > >>>> > > > >> > > >
> > >>>> > > > >> > > > St.Ack
> > >>>> > > > >> > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> > >>>> brfrn169@gmail.com>
> > >>>> > > wrote:
> > >>>> > > > >> > > >
> > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > accompanying
> > >>>> > helpful
> > >>>> > > > >> > > diagram).
> > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> > with
> > >>>> the
> > >>>> > > > >> > > illustration.
> > >>>> > > > >> > > > It
> > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> > >>>> only...
> > >>>> > > Writes
> > >>>> > > > >> > > against
> > >>>> > > > >> > > > > > other rows should not hold up my read of my row.
> > Tag
> > >>>> an
> > >>>> > mvcc
> > >>>> > > > >> with a
> > >>>> > > > >> > > > 'row'
> > >>>> > > > >> > > > > > scope so we can see which on-going writes pertain
> > to
> > >>>> > current
> > >>>> > > > >> > > operation?
> > >>>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > > > You need to read back the increment and have it
> be
> > >>>> > 'correct'
> > >>>> > > > at
> > >>>> > > > >> > > > increment
> > >>>> > > > >> > > > > > time?
> > >>>> > > > >> > > > > Yes, we need it.
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > > I would like to help if there is anything I can do.
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > > Thanks,
> > >>>> > > > >> > > > > Toshihiro Suzuki
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <stack@duboce.net
> >:
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > accompanying
> > >>>> > helpful
> > >>>> > > > >> > > diagram).
> > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> > with
> > >>>> the
> > >>>> > > > >> > > illustration.
> > >>>> > > > >> > > > It
> > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> > >>>> only...
> > >>>> > > Writes
> > >>>> > > > >> > > against
> > >>>> > > > >> > > > > > other rows should not hold up my read of my row.
> > Tag
> > >>>> an
> > >>>> > mvcc
> > >>>> > > > >> with a
> > >>>> > > > >> > > > 'row'
> > >>>> > > > >> > > > > > scope so we can see which on-going writes pertain
> > to
> > >>>> > current
> > >>>> > > > >> > > operation?
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > > > You need to read back the increment and have it
> be
> > >>>> > 'correct'
> > >>>> > > > at
> > >>>> > > > >> > > > increment
> > >>>> > > > >> > > > > > time?
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > > > (This is a good one)
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> > >>>> > > > >> > > > > > St.Ack
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > >>>> brfrn169@gmail.com
> > >>>> > >
> > >>>> > > > >> wrote:
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > > > > St.Ack,
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Thank you for your response.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a row
> > lock)
> > >>>> > seems
> > >>>> > > to
> > >>>> > > > >> > occur
> > >>>> > > > >> > > in
> > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
> > >>>> follows:
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > A increment operation has 3 procedures for
> MVCC.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > 2. w =
> > mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> > >>>> walKey);
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> > >>>> writeQueue
> > >>>> > > can
> > >>>> > > > >> > cause
> > >>>> > > > >> > > a
> > >>>> > > > >> > > > > > region
> > >>>> > > > >> > > > > > > lock.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > >>>> > > advanceMemstore(w)
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> > >>>> beginMemstoreInsert() to
> > >>>> > > > >> writeQueue
> > >>>> > > > >> > > and
> > >>>> > > > >> > > > > > waits
> > >>>> > > > >> > > > > > > until writeQueue is empty or
> > writeQueue.getFirst()
> > >>>> == w.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > I think when a handler thread is processing
> > between
> > >>>> > step 2
> > >>>> > > > and
> > >>>> > > > >> > step
> > >>>> > > > >> > > > 3,
> > >>>> > > > >> > > > > > the
> > >>>> > > > >> > > > > > > other handler threads can wait at step 1 until
> > the
> > >>>> > thread
> > >>>> > > > >> > completes
> > >>>> > > > >> > > > > step
> > >>>> > > > >> > > > > > 3
> > >>>> > > > >> > > > > > > This is depicted as follows:
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Actually, in the thread dump of our region
> > server,
> > >>>> many
> > >>>> > > > >> handler
> > >>>> > > > >> > > > threads
> > >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at
> Step
> > 1
> > >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Many handler threads wait at this:
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> counter
> > >>>> > > > post-upgrade?
> > >>>> > > > >> > Is
> > >>>> > > > >> > > it
> > >>>> > > > >> > > > > > > > possible that all these threads are trying to
> > >>>> get to
> > >>>> > the
> > >>>> > > > >> same
> > >>>> > > > >> > row
> > >>>> > > > >> > > > to
> > >>>> > > > >> > > > > > > update
> > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or
> > are
> > >>>> you
> > >>>> > > > >> thinking
> > >>>> > > > >> > > > > increment
> > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > >>>> > > > >> > > > > > > We have just upgraded HBase, not changed the
> app
> > >>>> > behavior.
> > >>>> > > > We
> > >>>> > > > >> are
> > >>>> > > > >> > > > > > thinking
> > >>>> > > > >> > > > > > > increment itself has slowed significantly.
> > >>>> > > > >> > > > > > > Before upgrading HBase, it was good throughput
> > and
> > >>>> > > latency.
> > >>>> > > > >> > > > > > > Currently, to cope with this problem, we split
> > the
> > >>>> > regions
> > >>>> > > > >> > finely.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Thanks,
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Toshihiro Suzuki
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> > stack@duboce.net
> > >>>> >:
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > >>>> > > brfrn169@gmail.com
> > >>>> > > > >
> > >>>> > > > >> > > wrote:
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > > Ted,
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > Thank you for your response.
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to
> Gist.
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > I think that increment operation works as
> > >>>> follows:
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > 1. get row lock
> > >>>> > > > >> > > > > > > > > 2.
> mvcc.waitForPreviousTransactionsComplete()
> > >>>> //
> > >>>> > wait
> > >>>> > > > for
> > >>>> > > > >> all
> > >>>> > > > >> > > > prior
> > >>>> > > > >> > > > > > > MVCC
> > >>>> > > > >> > > > > > > > > transactions to finish
> > >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
> > >>>> start a
> > >>>> > > > >> > transaction
> > >>>> > > > >> > > > > > > > > 4. get previous values
> > >>>> > > > >> > > > > > > > > 5. create KVs
> > >>>> > > > >> > > > > > > > > 6. write to Memstore
> > >>>> > > > >> > > > > > > > > 7. write to WAL
> > >>>> > > > >> > > > > > > > > 8. release row lock
> > >>>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum()
> //
> > >>>> > complete
> > >>>> > > > the
> > >>>> > > > >> > > > > > transaction
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > A instance of
> MultiVersionConsistencyControl
> > >>>> has a
> > >>>> > > > pending
> > >>>> > > > >> > > queue
> > >>>> > > > >> > > > of
> > >>>> > > > >> > > > > > > > writes
> > >>>> > > > >> > > > > > > > > named writeQueue.
> > >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue
> and
> > >>>> waits
> > >>>> > > until
> > >>>> > > > >> > > > writeQueue
> > >>>> > > > >> > > > > > is
> > >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and
> > >>>> step 9
> > >>>> > > > removes
> > >>>> > > > >> the
> > >>>> > > > >> > > > > > > WriteEntry
> > >>>> > > > >> > > > > > > > > from writeQueue.
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > I think that when a handler thread is
> > >>>> processing
> > >>>> > > between
> > >>>> > > > >> > step 2
> > >>>> > > > >> > > > and
> > >>>> > > > >> > > > > > > step
> > >>>> > > > >> > > > > > > > 9,
> > >>>> > > > >> > > > > > > > > the other handler threads can wait until
> the
> > >>>> thread
> > >>>> > > > >> completes
> > >>>> > > > >> > > > step
> > >>>> > > > >> > > > > 9.
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > That is right. We need to read, after all
> > >>>> outstanding
> > >>>> > > > >> updates
> > >>>> > > > >> > are
> > >>>> > > > >> > > > > > done...
> > >>>> > > > >> > > > > > > > because we need to read the latest update
> > before
> > >>>> we go
> > >>>> > > to
> > >>>> > > > >> > > > > > > modify/increment
> > >>>> > > > >> > > > > > > > it.
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > How do you make out this?
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to
> occur
> > in
> > >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > In 0.98.x we did this:
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete()
> > which
> > >>>> is
> > >>>> > > > this....
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > +  public void
> > >>>> waitForPreviousTransactionsComplete() {
> > >>>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > >>>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > >>>> > > > >> > > > > > > > +  }
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in
> > >>>> 1.0 (
> > >>>> > > > >> > > > > > > >
> > https://issues.apache.org/jira/browse/HBASE-8763
> > >>>> ).
> > >>>> > > > Previous
> > >>>> > > > >> > mvcc
> > >>>> > > > >> > > > and
> > >>>> > > > >> > > > > > > > region
> > >>>> > > > >> > > > > > > > sequenceid would spin independent of each
> > other.
> > >>>> > Perhaps
> > >>>> > > > >> this
> > >>>> > > > >> > > > > > responsible
> > >>>> > > > >> > > > > > > > for some slow down.
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > That said, looking in your thread dump, we
> seem
> > >>>> to be
> > >>>> > > down
> > >>>> > > > >> in
> > >>>> > > > >> > the
> > >>>> > > > >> > > > > Get.
> > >>>> > > > >> > > > > > If
> > >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row,
> where
> > >>>> is the
> > >>>> > > > >> > > lock-holding
> > >>>> > > > >> > > > > > > thread?
> > >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
> > >>>> sequence
> > >>>> > > id?
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> counter
> > >>>> > > > post-upgrade?
> > >>>> > > > >> > Is
> > >>>> > > > >> > > it
> > >>>> > > > >> > > > > > > > possible that all these threads are trying to
> > >>>> get to
> > >>>> > the
> > >>>> > > > >> same
> > >>>> > > > >> > row
> > >>>> > > > >> > > > to
> > >>>> > > > >> > > > > > > update
> > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or
> > are
> > >>>> you
> > >>>> > > > >> thinking
> > >>>> > > > >> > > > > increment
> > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > St.Ack
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > > Thanks,
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > >>>> > yuzhihong@gmail.com
> > >>>> > > >:
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row
> > (not
> > >>>> > > region):
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >     try {
> > >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack
> trace ?
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > Thanks
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > >>>> > > > >> brfrn169@gmail.com>
> > >>>> > > > >> > > > wrote:
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Hi,
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> > >>>> > CDH5.3.1(HBase0.98.6)
> > >>>> > > > to
> > >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > >>>> > > > >> > > > > > > > > > > and we experience slowdown in increment
> > >>>> > operation.
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of
> the
> > >>>> > > > >> RegionServer of
> > >>>> > > > >> > > our
> > >>>> > > > >> > > > > > > > cluster:
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Thread 68
> > >>>> > > > >> > > > > > >
> > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> > >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> > >>>> > > > >> > > > > > > > > > >   Blocked on
> > java.util.LinkedList@3474e4b2
> > >>>> > > > >> > > > > > > > > > >   Blocked by 63
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > >>>> > > > >> > > > > > > > > > >   Stack:
> > >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native
> Method)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> >
> > >>>>
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > >
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > >
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > >
> > >>>> > > >
> > >>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > >>>> > > > >> > > > > > > > > > >
>  java.lang.Thread.run(Thread.java:745)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > There are many similar threads in the
> > >>>> thread
> > >>>> > dump.
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > I read the source code and I think this
> > is
> > >>>> > caused
> > >>>> > > by
> > >>>> > > > >> > > changes
> > >>>> > > > >> > > > of
> > >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > >>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
> > >>>> occur in
> > >>>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Also we wrote performance test code for
> > >>>> > increment
> > >>>> > > > >> > operation
> > >>>> > > > >> > > > > that
> > >>>> > > > >> > > > > > > > > included
> > >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > The result is shown below:
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > >>>> > > > >> 7.975072509210629
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > >>>> > > > 49.11840157868772
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Thanks,
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> > >>>
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Yea sorry if I was misleading.  The nonce loglines we saw only happened on
full cluster restart, it may have been the HLog's replaying, not sure.

We are still seeing slow Increments. Where Gets and Mutates will be on the
order of 50-150ms according to metrics, Increment will be in the
1000-5000ms range. It seems we may be blocking on FSHLog#syncer.
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359



On Mon, Nov 30, 2015 at 11:26 PM Stack <st...@duboce.net> wrote:

> Still slow increments though?
>
> On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > Those log lines have settled down, they may have been related to a
> > cluster-wide forced restart at the time.
> >
> > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> > bbeaudreault@hubspot.com>
> > wrote:
> >
> > > We've been doing more debugging of this and have set up the read vs
> write
> > > handlers to try to at least segment this away so reads can work. We
> have
> > > pretty beefy servers, and are running wiht the following settings:
> > >
> > > hbase.regionserver.handler.count=1000
> > > hbase.ipc.server.read.threadpool.size=50
> > > hbase.ipc.server.callqueue.handler.factor=0.025
> > > hbase.ipc.server.callqueue.read.ratio=0.6
> > > hbase.ipc.server.callqueue.scan.ratio=0.5
> > >
> > > We are seeing all 400 write handlers taken up by row locks for the most
> > > part. The read handlers are mostly idle. We're thinking of changing the
> > > ratio here, but are not sure it will help if they are all blocked on a
> > row
> > > lock.  We just enabled DEBUG logging on all our servers and notice the
> > > following:
> > >
> > > 2015-12-01 00:56:09,015 DEBUG
> > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > detected
> > > by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait
> > > false, activity 00:54:36.240]
> > > 2015-12-01 00:56:09,015 DEBUG
> > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > detected
> > > by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait
> > > false, activity 00:54:36.256]
> > > 2015-12-01 00:56:09,268 DEBUG
> > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > detected
> > > by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait
> > > false, activity 00:55:01.259]
> > > 2015-12-01 00:56:09,279 DEBUG
> > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > detected
> > > by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait
> > > false, activity 00:53:58.151]
> > > 2015-12-01 00:56:09,279 DEBUG
> > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > detected
> > > by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait
> > > false, activity 00:53:58.162]
> > >
> > >
> > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > > bbeaudreault@hubspot.com> wrote:
> > >
> > >> Sorry the second link should be
> > >>
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> > >>
> > >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> > >> bbeaudreault@hubspot.com> wrote:
> > >>
> > >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> > >>>
> > >>> An active handler:
> > >>>
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > >>> One that is locked:
> > >>>
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> > >>>
> > >>> The difference between pre-rollback and post is that previously we
> were
> > >>> seeing things blocked in mvcc.  Now we are seeing them blocked on the
> > >>> upsert.
> > >>>
> > >>> It always follows the same pattern, of 1 active handler in the upsert
> > >>> and the rest blocked waiting for it.
> > >>>
> > >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> > >>>
> > >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > >>>> bbeaudreault@hubspot.com
> > >>>> > wrote:
> > >>>>
> > >>>> > The rollback seems to have mostly solved the issue for one of our
> > >>>> clusters,
> > >>>> > but another one is still seeing long increment times:
> > >>>> >
> > >>>> > "slowIncrementCount": 52080,
> > >>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max":
> > 6162,"
> > >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > >>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> > >>>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
> > >>>> >
> > >>>> >
> > >>>> > Any ideas if there are other changes that may be causing a
> > performance
> > >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> > >>>> >
> > >>>> >
> > >>>> >
> > >>>> No.
> > >>>>
> > >>>> Post a thread dump Bryan and it might prompt something.
> > >>>>
> > >>>> St.Ack
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> >
> > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> > >>>> >
> > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > >>>> > > bbeaudreault@hubspot.com> wrote:
> > >>>> > >
> > >>>> > > > Should this be added as a known issue in the CDH or hbase
> > >>>> > documentation?
> > >>>> > > It
> > >>>> > > > was a severe performance hit for us, all of our regionservers
> > were
> > >>>> > > sitting
> > >>>> > > > at a few thousand queued requests.
> > >>>> > > >
> > >>>> > > >
> > >>>> > > Let me take care of that.
> > >>>> > > St.Ack
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > >>>> > > > bbeaudreault@hubspot.com>
> > >>>> > > > wrote:
> > >>>> > > >
> > >>>> > > > > Yea, they are all over the place and called from client and
> > >>>> > coprocessor
> > >>>> > > > > code. We ended up having no other option but to rollback,
> and
> > >>>> aside
> > >>>> > > from
> > >>>> > > > a
> > >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > >>>> Put#addColumn),
> > >>>> > > it
> > >>>> > > > > seems to be working and fixing our problem.
> > >>>> > > > >
> > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net>
> > wrote:
> > >>>> > > > >
> > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on
> > >>>> this
> > >>>> > now.
> > >>>> > > > >> Where
> > >>>> > > > >> are your counters Bryan? In their own column family or
> > >>>> scattered
> > >>>> > about
> > >>>> > > > in
> > >>>> > > > >> a
> > >>>> > > > >> row with other Cell types?
> > >>>> > > > >> St.Ack
> > >>>> > > > >>
> > >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> > >>>> > > > >>
> > >>>> > > > >> > Is there any update to this? We just upgraded all of our
> > >>>> > production
> > >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
> > >>>> listed in
> > >>>> > > the
> > >>>> > > > >> > known issues, did not not about this.  Now we are seeing
> > >>>> > perfomance
> > >>>> > > > >> issues
> > >>>> > > > >> > across all clusters, as we make heavy use of increments.
> > >>>> > > > >> >
> > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope
> > to
> > >>>> roll
> > >>>> > > back
> > >>>> > > > >> to
> > >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> > >>>> > > > >> >
> > >>>> > > > >> >
> > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <brfrn169@gmail.com
> >
> > >>>> wrote:
> > >>>> > > > >> >
> > >>>> > > > >> > > Thank you St.Ack!
> > >>>> > > > >> > >
> > >>>> > > > >> > > I would like to follow the ticket.
> > >>>> > > > >> > >
> > >>>> > > > >> > > Toshihiro Suzuki
> > >>>> > > > >> > >
> > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > >>>> > > > >> > >
> > >>>> > > > >> > > > Back to this problem. Simple tests confirm that as
> is,
> > >>>> the
> > >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops
> > if
> > >>>> some
> > >>>> > > > other
> > >>>> > > > >> row
> > >>>> > > > >> > > is
> > >>>> > > > >> > > > slow to complete. In particular Increment,
> checkAndPut,
> > >>>> and
> > >>>> > > batch
> > >>>> > > > >> > > mutations
> > >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a
> fix
> > >>>> up.
> > >>>> > Lets
> > >>>> > > > >> see if
> > >>>> > > > >> > > we
> > >>>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc
> so
> > >>>> not
> > >>>> > all
> > >>>> > > > >> Region
> > >>>> > > > >> > > ops
> > >>>> > > > >> > > > are paused.
> > >>>> > > > >> > > >
> > >>>> > > > >> > > > St.Ack
> > >>>> > > > >> > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> > >>>> brfrn169@gmail.com>
> > >>>> > > wrote:
> > >>>> > > > >> > > >
> > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > accompanying
> > >>>> > helpful
> > >>>> > > > >> > > diagram).
> > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> > with
> > >>>> the
> > >>>> > > > >> > > illustration.
> > >>>> > > > >> > > > It
> > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> > >>>> only...
> > >>>> > > Writes
> > >>>> > > > >> > > against
> > >>>> > > > >> > > > > > other rows should not hold up my read of my row.
> > Tag
> > >>>> an
> > >>>> > mvcc
> > >>>> > > > >> with a
> > >>>> > > > >> > > > 'row'
> > >>>> > > > >> > > > > > scope so we can see which on-going writes pertain
> > to
> > >>>> > current
> > >>>> > > > >> > > operation?
> > >>>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > > > You need to read back the increment and have it
> be
> > >>>> > 'correct'
> > >>>> > > > at
> > >>>> > > > >> > > > increment
> > >>>> > > > >> > > > > > time?
> > >>>> > > > >> > > > > Yes, we need it.
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > > I would like to help if there is anything I can do.
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > > Thanks,
> > >>>> > > > >> > > > > Toshihiro Suzuki
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <stack@duboce.net
> >:
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > accompanying
> > >>>> > helpful
> > >>>> > > > >> > > diagram).
> > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> > with
> > >>>> the
> > >>>> > > > >> > > illustration.
> > >>>> > > > >> > > > It
> > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> > >>>> only...
> > >>>> > > Writes
> > >>>> > > > >> > > against
> > >>>> > > > >> > > > > > other rows should not hold up my read of my row.
> > Tag
> > >>>> an
> > >>>> > mvcc
> > >>>> > > > >> with a
> > >>>> > > > >> > > > 'row'
> > >>>> > > > >> > > > > > scope so we can see which on-going writes pertain
> > to
> > >>>> > current
> > >>>> > > > >> > > operation?
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > > > You need to read back the increment and have it
> be
> > >>>> > 'correct'
> > >>>> > > > at
> > >>>> > > > >> > > > increment
> > >>>> > > > >> > > > > > time?
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > > > (This is a good one)
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> > >>>> > > > >> > > > > > St.Ack
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > >>>> brfrn169@gmail.com
> > >>>> > >
> > >>>> > > > >> wrote:
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > > > > St.Ack,
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Thank you for your response.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a row
> > lock)
> > >>>> > seems
> > >>>> > > to
> > >>>> > > > >> > occur
> > >>>> > > > >> > > in
> > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
> > >>>> follows:
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > A increment operation has 3 procedures for
> MVCC.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > 2. w =
> > mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> > >>>> walKey);
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> > >>>> writeQueue
> > >>>> > > can
> > >>>> > > > >> > cause
> > >>>> > > > >> > > a
> > >>>> > > > >> > > > > > region
> > >>>> > > > >> > > > > > > lock.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > >>>> > > advanceMemstore(w)
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> > >>>> beginMemstoreInsert() to
> > >>>> > > > >> writeQueue
> > >>>> > > > >> > > and
> > >>>> > > > >> > > > > > waits
> > >>>> > > > >> > > > > > > until writeQueue is empty or
> > writeQueue.getFirst()
> > >>>> == w.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > I think when a handler thread is processing
> > between
> > >>>> > step 2
> > >>>> > > > and
> > >>>> > > > >> > step
> > >>>> > > > >> > > > 3,
> > >>>> > > > >> > > > > > the
> > >>>> > > > >> > > > > > > other handler threads can wait at step 1 until
> > the
> > >>>> > thread
> > >>>> > > > >> > completes
> > >>>> > > > >> > > > > step
> > >>>> > > > >> > > > > > 3
> > >>>> > > > >> > > > > > > This is depicted as follows:
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Actually, in the thread dump of our region
> > server,
> > >>>> many
> > >>>> > > > >> handler
> > >>>> > > > >> > > > threads
> > >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at
> Step
> > 1
> > >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Many handler threads wait at this:
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> counter
> > >>>> > > > post-upgrade?
> > >>>> > > > >> > Is
> > >>>> > > > >> > > it
> > >>>> > > > >> > > > > > > > possible that all these threads are trying to
> > >>>> get to
> > >>>> > the
> > >>>> > > > >> same
> > >>>> > > > >> > row
> > >>>> > > > >> > > > to
> > >>>> > > > >> > > > > > > update
> > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or
> > are
> > >>>> you
> > >>>> > > > >> thinking
> > >>>> > > > >> > > > > increment
> > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > >>>> > > > >> > > > > > > We have just upgraded HBase, not changed the
> app
> > >>>> > behavior.
> > >>>> > > > We
> > >>>> > > > >> are
> > >>>> > > > >> > > > > > thinking
> > >>>> > > > >> > > > > > > increment itself has slowed significantly.
> > >>>> > > > >> > > > > > > Before upgrading HBase, it was good throughput
> > and
> > >>>> > > latency.
> > >>>> > > > >> > > > > > > Currently, to cope with this problem, we split
> > the
> > >>>> > regions
> > >>>> > > > >> > finely.
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Thanks,
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > Toshihiro Suzuki
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> > stack@duboce.net
> > >>>> >:
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > >>>> > > brfrn169@gmail.com
> > >>>> > > > >
> > >>>> > > > >> > > wrote:
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > > Ted,
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > Thank you for your response.
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to
> Gist.
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > I think that increment operation works as
> > >>>> follows:
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > 1. get row lock
> > >>>> > > > >> > > > > > > > > 2.
> mvcc.waitForPreviousTransactionsComplete()
> > >>>> //
> > >>>> > wait
> > >>>> > > > for
> > >>>> > > > >> all
> > >>>> > > > >> > > > prior
> > >>>> > > > >> > > > > > > MVCC
> > >>>> > > > >> > > > > > > > > transactions to finish
> > >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
> > >>>> start a
> > >>>> > > > >> > transaction
> > >>>> > > > >> > > > > > > > > 4. get previous values
> > >>>> > > > >> > > > > > > > > 5. create KVs
> > >>>> > > > >> > > > > > > > > 6. write to Memstore
> > >>>> > > > >> > > > > > > > > 7. write to WAL
> > >>>> > > > >> > > > > > > > > 8. release row lock
> > >>>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum()
> //
> > >>>> > complete
> > >>>> > > > the
> > >>>> > > > >> > > > > > transaction
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > A instance of
> MultiVersionConsistencyControl
> > >>>> has a
> > >>>> > > > pending
> > >>>> > > > >> > > queue
> > >>>> > > > >> > > > of
> > >>>> > > > >> > > > > > > > writes
> > >>>> > > > >> > > > > > > > > named writeQueue.
> > >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue
> and
> > >>>> waits
> > >>>> > > until
> > >>>> > > > >> > > > writeQueue
> > >>>> > > > >> > > > > > is
> > >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and
> > >>>> step 9
> > >>>> > > > removes
> > >>>> > > > >> the
> > >>>> > > > >> > > > > > > WriteEntry
> > >>>> > > > >> > > > > > > > > from writeQueue.
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > I think that when a handler thread is
> > >>>> processing
> > >>>> > > between
> > >>>> > > > >> > step 2
> > >>>> > > > >> > > > and
> > >>>> > > > >> > > > > > > step
> > >>>> > > > >> > > > > > > > 9,
> > >>>> > > > >> > > > > > > > > the other handler threads can wait until
> the
> > >>>> thread
> > >>>> > > > >> completes
> > >>>> > > > >> > > > step
> > >>>> > > > >> > > > > 9.
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > That is right. We need to read, after all
> > >>>> outstanding
> > >>>> > > > >> updates
> > >>>> > > > >> > are
> > >>>> > > > >> > > > > > done...
> > >>>> > > > >> > > > > > > > because we need to read the latest update
> > before
> > >>>> we go
> > >>>> > > to
> > >>>> > > > >> > > > > > > modify/increment
> > >>>> > > > >> > > > > > > > it.
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > How do you make out this?
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to
> occur
> > in
> > >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > In 0.98.x we did this:
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete()
> > which
> > >>>> is
> > >>>> > > > this....
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > +  public void
> > >>>> waitForPreviousTransactionsComplete() {
> > >>>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > >>>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > >>>> > > > >> > > > > > > > +  }
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in
> > >>>> 1.0 (
> > >>>> > > > >> > > > > > > >
> > https://issues.apache.org/jira/browse/HBASE-8763
> > >>>> ).
> > >>>> > > > Previous
> > >>>> > > > >> > mvcc
> > >>>> > > > >> > > > and
> > >>>> > > > >> > > > > > > > region
> > >>>> > > > >> > > > > > > > sequenceid would spin independent of each
> > other.
> > >>>> > Perhaps
> > >>>> > > > >> this
> > >>>> > > > >> > > > > > responsible
> > >>>> > > > >> > > > > > > > for some slow down.
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > That said, looking in your thread dump, we
> seem
> > >>>> to be
> > >>>> > > down
> > >>>> > > > >> in
> > >>>> > > > >> > the
> > >>>> > > > >> > > > > Get.
> > >>>> > > > >> > > > > > If
> > >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row,
> where
> > >>>> is the
> > >>>> > > > >> > > lock-holding
> > >>>> > > > >> > > > > > > thread?
> > >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
> > >>>> sequence
> > >>>> > > id?
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> counter
> > >>>> > > > post-upgrade?
> > >>>> > > > >> > Is
> > >>>> > > > >> > > it
> > >>>> > > > >> > > > > > > > possible that all these threads are trying to
> > >>>> get to
> > >>>> > the
> > >>>> > > > >> same
> > >>>> > > > >> > row
> > >>>> > > > >> > > > to
> > >>>> > > > >> > > > > > > update
> > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or
> > are
> > >>>> you
> > >>>> > > > >> thinking
> > >>>> > > > >> > > > > increment
> > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > St.Ack
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > > > > Thanks,
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > >>>> > yuzhihong@gmail.com
> > >>>> > > >:
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row
> > (not
> > >>>> > > region):
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >     try {
> > >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack
> trace ?
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > Thanks
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > >>>> > > > >> brfrn169@gmail.com>
> > >>>> > > > >> > > > wrote:
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Hi,
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> > >>>> > CDH5.3.1(HBase0.98.6)
> > >>>> > > > to
> > >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > >>>> > > > >> > > > > > > > > > > and we experience slowdown in increment
> > >>>> > operation.
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of
> the
> > >>>> > > > >> RegionServer of
> > >>>> > > > >> > > our
> > >>>> > > > >> > > > > > > > cluster:
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Thread 68
> > >>>> > > > >> > > > > > >
> > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> > >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> > >>>> > > > >> > > > > > > > > > >   Blocked on
> > java.util.LinkedList@3474e4b2
> > >>>> > > > >> > > > > > > > > > >   Blocked by 63
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > >>>> > > > >> > > > > > > > > > >   Stack:
> > >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native
> Method)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> >
> > >>>>
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > >
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > >
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > >
> > >>>> > > >
> > >>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > >>>> > > > >> > > > > > > > > > >
>  java.lang.Thread.run(Thread.java:745)
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > There are many similar threads in the
> > >>>> thread
> > >>>> > dump.
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > I read the source code and I think this
> > is
> > >>>> > caused
> > >>>> > > by
> > >>>> > > > >> > > changes
> > >>>> > > > >> > > > of
> > >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > >>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
> > >>>> occur in
> > >>>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Also we wrote performance test code for
> > >>>> > increment
> > >>>> > > > >> > operation
> > >>>> > > > >> > > > > that
> > >>>> > > > >> > > > > > > > > included
> > >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > The result is shown below:
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > >>>> > > > >> 7.975072509210629
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > >>>> > > > 49.11840157868772
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Thanks,
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > >>>> > > > >> > > > > > > > > > >
> > >>>> > > > >> > > > > > > > > >
> > >>>> > > > >> > > > > > > > >
> > >>>> > > > >> > > > > > > >
> > >>>> > > > >> > > > > > >
> > >>>> > > > >> > > > > >
> > >>>> > > > >> > > > >
> > >>>> > > > >> > > >
> > >>>> > > > >> > >
> > >>>> > > > >> >
> > >>>> > > > >>
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> > >>>
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Still slow increments though?

On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> Those log lines have settled down, they may have been related to a
> cluster-wide forced restart at the time.
>
> On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com>
> wrote:
>
> > We've been doing more debugging of this and have set up the read vs write
> > handlers to try to at least segment this away so reads can work. We have
> > pretty beefy servers, and are running wiht the following settings:
> >
> > hbase.regionserver.handler.count=1000
> > hbase.ipc.server.read.threadpool.size=50
> > hbase.ipc.server.callqueue.handler.factor=0.025
> > hbase.ipc.server.callqueue.read.ratio=0.6
> > hbase.ipc.server.callqueue.scan.ratio=0.5
> >
> > We are seeing all 400 write handlers taken up by row locks for the most
> > part. The read handlers are mostly idle. We're thinking of changing the
> > ratio here, but are not sure it will help if they are all blocked on a
> row
> > lock.  We just enabled DEBUG logging on all our servers and notice the
> > following:
> >
> > 2015-12-01 00:56:09,015 DEBUG
> > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> detected
> > by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait
> > false, activity 00:54:36.240]
> > 2015-12-01 00:56:09,015 DEBUG
> > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> detected
> > by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait
> > false, activity 00:54:36.256]
> > 2015-12-01 00:56:09,268 DEBUG
> > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> detected
> > by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait
> > false, activity 00:55:01.259]
> > 2015-12-01 00:56:09,279 DEBUG
> > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> detected
> > by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait
> > false, activity 00:53:58.151]
> > 2015-12-01 00:56:09,279 DEBUG
> > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> detected
> > by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait
> > false, activity 00:53:58.162]
> >
> >
> > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> >> Sorry the second link should be
> >>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> >>
> >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> >> bbeaudreault@hubspot.com> wrote:
> >>
> >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> >>>
> >>> An active handler:
> >>>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> >>> One that is locked:
> >>>
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> >>>
> >>> The difference between pre-rollback and post is that previously we were
> >>> seeing things blocked in mvcc.  Now we are seeing them blocked on the
> >>> upsert.
> >>>
> >>> It always follows the same pattern, of 1 active handler in the upsert
> >>> and the rest blocked waiting for it.
> >>>
> >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> >>>
> >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> >>>> bbeaudreault@hubspot.com
> >>>> > wrote:
> >>>>
> >>>> > The rollback seems to have mostly solved the issue for one of our
> >>>> clusters,
> >>>> > but another one is still seeing long increment times:
> >>>> >
> >>>> > "slowIncrementCount": 52080,
> >>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max":
> 6162,"
> >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> >>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> >>>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
> >>>> >
> >>>> >
> >>>> > Any ideas if there are other changes that may be causing a
> performance
> >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> >>>> >
> >>>> >
> >>>> >
> >>>> No.
> >>>>
> >>>> Post a thread dump Bryan and it might prompt something.
> >>>>
> >>>> St.Ack
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> >
> >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> >>>> >
> >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> >>>> > > bbeaudreault@hubspot.com> wrote:
> >>>> > >
> >>>> > > > Should this be added as a known issue in the CDH or hbase
> >>>> > documentation?
> >>>> > > It
> >>>> > > > was a severe performance hit for us, all of our regionservers
> were
> >>>> > > sitting
> >>>> > > > at a few thousand queued requests.
> >>>> > > >
> >>>> > > >
> >>>> > > Let me take care of that.
> >>>> > > St.Ack
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> >>>> > > > bbeaudreault@hubspot.com>
> >>>> > > > wrote:
> >>>> > > >
> >>>> > > > > Yea, they are all over the place and called from client and
> >>>> > coprocessor
> >>>> > > > > code. We ended up having no other option but to rollback, and
> >>>> aside
> >>>> > > from
> >>>> > > > a
> >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> >>>> Put#addColumn),
> >>>> > > it
> >>>> > > > > seems to be working and fixing our problem.
> >>>> > > > >
> >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net>
> wrote:
> >>>> > > > >
> >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on
> >>>> this
> >>>> > now.
> >>>> > > > >> Where
> >>>> > > > >> are your counters Bryan? In their own column family or
> >>>> scattered
> >>>> > about
> >>>> > > > in
> >>>> > > > >> a
> >>>> > > > >> row with other Cell types?
> >>>> > > > >> St.Ack
> >>>> > > > >>
> >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> >>>> > > > >>
> >>>> > > > >> > Is there any update to this? We just upgraded all of our
> >>>> > production
> >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
> >>>> listed in
> >>>> > > the
> >>>> > > > >> > known issues, did not not about this.  Now we are seeing
> >>>> > perfomance
> >>>> > > > >> issues
> >>>> > > > >> > across all clusters, as we make heavy use of increments.
> >>>> > > > >> >
> >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope
> to
> >>>> roll
> >>>> > > back
> >>>> > > > >> to
> >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> >>>> > > > >> >
> >>>> > > > >> >
> >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
> >>>> wrote:
> >>>> > > > >> >
> >>>> > > > >> > > Thank you St.Ack!
> >>>> > > > >> > >
> >>>> > > > >> > > I would like to follow the ticket.
> >>>> > > > >> > >
> >>>> > > > >> > > Toshihiro Suzuki
> >>>> > > > >> > >
> >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> >>>> > > > >> > >
> >>>> > > > >> > > > Back to this problem. Simple tests confirm that as is,
> >>>> the
> >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops
> if
> >>>> some
> >>>> > > > other
> >>>> > > > >> row
> >>>> > > > >> > > is
> >>>> > > > >> > > > slow to complete. In particular Increment, checkAndPut,
> >>>> and
> >>>> > > batch
> >>>> > > > >> > > mutations
> >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
> >>>> up.
> >>>> > Lets
> >>>> > > > >> see if
> >>>> > > > >> > > we
> >>>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
> >>>> not
> >>>> > all
> >>>> > > > >> Region
> >>>> > > > >> > > ops
> >>>> > > > >> > > > are paused.
> >>>> > > > >> > > >
> >>>> > > > >> > > > St.Ack
> >>>> > > > >> > > >
> >>>> > > > >> > > >
> >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> >>>> brfrn169@gmail.com>
> >>>> > > wrote:
> >>>> > > > >> > > >
> >>>> > > > >> > > > > > Thank you for the below reasoning (with
> accompanying
> >>>> > helpful
> >>>> > > > >> > > diagram).
> >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> with
> >>>> the
> >>>> > > > >> > > illustration.
> >>>> > > > >> > > > It
> >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> >>>> only...
> >>>> > > Writes
> >>>> > > > >> > > against
> >>>> > > > >> > > > > > other rows should not hold up my read of my row.
> Tag
> >>>> an
> >>>> > mvcc
> >>>> > > > >> with a
> >>>> > > > >> > > > 'row'
> >>>> > > > >> > > > > > scope so we can see which on-going writes pertain
> to
> >>>> > current
> >>>> > > > >> > > operation?
> >>>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
> >>>> > > > >> > > > >
> >>>> > > > >> > > > > > You need to read back the increment and have it be
> >>>> > 'correct'
> >>>> > > > at
> >>>> > > > >> > > > increment
> >>>> > > > >> > > > > > time?
> >>>> > > > >> > > > > Yes, we need it.
> >>>> > > > >> > > > >
> >>>> > > > >> > > > > I would like to help if there is anything I can do.
> >>>> > > > >> > > > >
> >>>> > > > >> > > > > Thanks,
> >>>> > > > >> > > > > Toshihiro Suzuki
> >>>> > > > >> > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> >>>> > > > >> > > > >
> >>>> > > > >> > > > > > Thank you for the below reasoning (with
> accompanying
> >>>> > helpful
> >>>> > > > >> > > diagram).
> >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> with
> >>>> the
> >>>> > > > >> > > illustration.
> >>>> > > > >> > > > It
> >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> >>>> only...
> >>>> > > Writes
> >>>> > > > >> > > against
> >>>> > > > >> > > > > > other rows should not hold up my read of my row.
> Tag
> >>>> an
> >>>> > mvcc
> >>>> > > > >> with a
> >>>> > > > >> > > > 'row'
> >>>> > > > >> > > > > > scope so we can see which on-going writes pertain
> to
> >>>> > current
> >>>> > > > >> > > operation?
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > > > You need to read back the increment and have it be
> >>>> > 'correct'
> >>>> > > > at
> >>>> > > > >> > > > increment
> >>>> > > > >> > > > > > time?
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > > > (This is a good one)
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> >>>> > > > >> > > > > > St.Ack
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> >>>> brfrn169@gmail.com
> >>>> > >
> >>>> > > > >> wrote:
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > > > > St.Ack,
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Thank you for your response.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Why I make out that "A region lock (not a row
> lock)
> >>>> > seems
> >>>> > > to
> >>>> > > > >> > occur
> >>>> > > > >> > > in
> >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
> >>>> follows:
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > 2. w =
> mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> >>>> walKey);
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> >>>> writeQueue
> >>>> > > can
> >>>> > > > >> > cause
> >>>> > > > >> > > a
> >>>> > > > >> > > > > > region
> >>>> > > > >> > > > > > > lock.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> >>>> > > advanceMemstore(w)
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> >>>> beginMemstoreInsert() to
> >>>> > > > >> writeQueue
> >>>> > > > >> > > and
> >>>> > > > >> > > > > > waits
> >>>> > > > >> > > > > > > until writeQueue is empty or
> writeQueue.getFirst()
> >>>> == w.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > I think when a handler thread is processing
> between
> >>>> > step 2
> >>>> > > > and
> >>>> > > > >> > step
> >>>> > > > >> > > > 3,
> >>>> > > > >> > > > > > the
> >>>> > > > >> > > > > > > other handler threads can wait at step 1 until
> the
> >>>> > thread
> >>>> > > > >> > completes
> >>>> > > > >> > > > > step
> >>>> > > > >> > > > > > 3
> >>>> > > > >> > > > > > > This is depicted as follows:
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Actually, in the thread dump of our region
> server,
> >>>> many
> >>>> > > > >> handler
> >>>> > > > >> > > > threads
> >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step
> 1
> >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Many handler threads wait at this:
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > > Is it possible you are contending on a counter
> >>>> > > > post-upgrade?
> >>>> > > > >> > Is
> >>>> > > > >> > > it
> >>>> > > > >> > > > > > > > possible that all these threads are trying to
> >>>> get to
> >>>> > the
> >>>> > > > >> same
> >>>> > > > >> > row
> >>>> > > > >> > > > to
> >>>> > > > >> > > > > > > update
> >>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or
> are
> >>>> you
> >>>> > > > >> thinking
> >>>> > > > >> > > > > increment
> >>>> > > > >> > > > > > > > itself has slowed significantly?
> >>>> > > > >> > > > > > > We have just upgraded HBase, not changed the app
> >>>> > behavior.
> >>>> > > > We
> >>>> > > > >> are
> >>>> > > > >> > > > > > thinking
> >>>> > > > >> > > > > > > increment itself has slowed significantly.
> >>>> > > > >> > > > > > > Before upgrading HBase, it was good throughput
> and
> >>>> > > latency.
> >>>> > > > >> > > > > > > Currently, to cope with this problem, we split
> the
> >>>> > regions
> >>>> > > > >> > finely.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Thanks,
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Toshihiro Suzuki
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> stack@duboce.net
> >>>> >:
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> >>>> > > brfrn169@gmail.com
> >>>> > > > >
> >>>> > > > >> > > wrote:
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > > Ted,
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > Thank you for your response.
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > I think that increment operation works as
> >>>> follows:
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > 1. get row lock
> >>>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete()
> >>>> //
> >>>> > wait
> >>>> > > > for
> >>>> > > > >> all
> >>>> > > > >> > > > prior
> >>>> > > > >> > > > > > > MVCC
> >>>> > > > >> > > > > > > > > transactions to finish
> >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
> >>>> start a
> >>>> > > > >> > transaction
> >>>> > > > >> > > > > > > > > 4. get previous values
> >>>> > > > >> > > > > > > > > 5. create KVs
> >>>> > > > >> > > > > > > > > 6. write to Memstore
> >>>> > > > >> > > > > > > > > 7. write to WAL
> >>>> > > > >> > > > > > > > > 8. release row lock
> >>>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
> >>>> > complete
> >>>> > > > the
> >>>> > > > >> > > > > > transaction
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
> >>>> has a
> >>>> > > > pending
> >>>> > > > >> > > queue
> >>>> > > > >> > > > of
> >>>> > > > >> > > > > > > > writes
> >>>> > > > >> > > > > > > > > named writeQueue.
> >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
> >>>> waits
> >>>> > > until
> >>>> > > > >> > > > writeQueue
> >>>> > > > >> > > > > > is
> >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and
> >>>> step 9
> >>>> > > > removes
> >>>> > > > >> the
> >>>> > > > >> > > > > > > WriteEntry
> >>>> > > > >> > > > > > > > > from writeQueue.
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > I think that when a handler thread is
> >>>> processing
> >>>> > > between
> >>>> > > > >> > step 2
> >>>> > > > >> > > > and
> >>>> > > > >> > > > > > > step
> >>>> > > > >> > > > > > > > 9,
> >>>> > > > >> > > > > > > > > the other handler threads can wait until the
> >>>> thread
> >>>> > > > >> completes
> >>>> > > > >> > > > step
> >>>> > > > >> > > > > 9.
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > That is right. We need to read, after all
> >>>> outstanding
> >>>> > > > >> updates
> >>>> > > > >> > are
> >>>> > > > >> > > > > > done...
> >>>> > > > >> > > > > > > > because we need to read the latest update
> before
> >>>> we go
> >>>> > > to
> >>>> > > > >> > > > > > > modify/increment
> >>>> > > > >> > > > > > > > it.
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > How do you make out this?
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur
> in
> >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > In 0.98.x we did this:
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete()
> which
> >>>> is
> >>>> > > > this....
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > +  public void
> >>>> waitForPreviousTransactionsComplete() {
> >>>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> >>>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> >>>> > > > >> > > > > > > > +  }
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in
> >>>> 1.0 (
> >>>> > > > >> > > > > > > >
> https://issues.apache.org/jira/browse/HBASE-8763
> >>>> ).
> >>>> > > > Previous
> >>>> > > > >> > mvcc
> >>>> > > > >> > > > and
> >>>> > > > >> > > > > > > > region
> >>>> > > > >> > > > > > > > sequenceid would spin independent of each
> other.
> >>>> > Perhaps
> >>>> > > > >> this
> >>>> > > > >> > > > > > responsible
> >>>> > > > >> > > > > > > > for some slow down.
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > That said, looking in your thread dump, we seem
> >>>> to be
> >>>> > > down
> >>>> > > > >> in
> >>>> > > > >> > the
> >>>> > > > >> > > > > Get.
> >>>> > > > >> > > > > > If
> >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where
> >>>> is the
> >>>> > > > >> > > lock-holding
> >>>> > > > >> > > > > > > thread?
> >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
> >>>> sequence
> >>>> > > id?
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > Is it possible you are contending on a counter
> >>>> > > > post-upgrade?
> >>>> > > > >> > Is
> >>>> > > > >> > > it
> >>>> > > > >> > > > > > > > possible that all these threads are trying to
> >>>> get to
> >>>> > the
> >>>> > > > >> same
> >>>> > > > >> > row
> >>>> > > > >> > > > to
> >>>> > > > >> > > > > > > update
> >>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or
> are
> >>>> you
> >>>> > > > >> thinking
> >>>> > > > >> > > > > increment
> >>>> > > > >> > > > > > > > itself has slowed significantly?
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > St.Ack
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > > Thanks,
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> >>>> > yuzhihong@gmail.com
> >>>> > > >:
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row
> (not
> >>>> > > region):
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > > >     try {
> >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > > > Thanks
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> >>>> > > > >> brfrn169@gmail.com>
> >>>> > > > >> > > > wrote:
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Hi,
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> >>>> > CDH5.3.1(HBase0.98.6)
> >>>> > > > to
> >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> >>>> > > > >> > > > > > > > > > > and we experience slowdown in increment
> >>>> > operation.
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
> >>>> > > > >> RegionServer of
> >>>> > > > >> > > our
> >>>> > > > >> > > > > > > > cluster:
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Thread 68
> >>>> > > > >> > > > > > >
> >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> >>>> > > > >> > > > > > > > > > >   Blocked on
> java.util.LinkedList@3474e4b2
> >>>> > > > >> > > > > > > > > > >   Blocked by 63
> >>>> > > > >> > > > > > > > >
> >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> >>>> > > > >> > > > > > > > > > >   Stack:
> >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > >
> >>>> > > > >>
> >>>> > > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > >
> >>>> > > >
> >>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> >>>> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > There are many similar threads in the
> >>>> thread
> >>>> > dump.
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > I read the source code and I think this
> is
> >>>> > caused
> >>>> > > by
> >>>> > > > >> > > changes
> >>>> > > > >> > > > of
> >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> >>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
> >>>> occur in
> >>>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Also we wrote performance test code for
> >>>> > increment
> >>>> > > > >> > operation
> >>>> > > > >> > > > > that
> >>>> > > > >> > > > > > > > > included
> >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > The result is shown below:
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> >>>> > > > >> 7.975072509210629
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> >>>> > > > 49.11840157868772
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Thanks,
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> >>>
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Still slow increments though?

On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> Those log lines have settled down, they may have been related to a
> cluster-wide forced restart at the time.
>
> On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com>
> wrote:
>
> > We've been doing more debugging of this and have set up the read vs write
> > handlers to try to at least segment this away so reads can work. We have
> > pretty beefy servers, and are running wiht the following settings:
> >
> > hbase.regionserver.handler.count=1000
> > hbase.ipc.server.read.threadpool.size=50
> > hbase.ipc.server.callqueue.handler.factor=0.025
> > hbase.ipc.server.callqueue.read.ratio=0.6
> > hbase.ipc.server.callqueue.scan.ratio=0.5
> >
> > We are seeing all 400 write handlers taken up by row locks for the most
> > part. The read handlers are mostly idle. We're thinking of changing the
> > ratio here, but are not sure it will help if they are all blocked on a
> row
> > lock.  We just enabled DEBUG logging on all our servers and notice the
> > following:
> >
> > 2015-12-01 00:56:09,015 DEBUG
> > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> detected
> > by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait
> > false, activity 00:54:36.240]
> > 2015-12-01 00:56:09,015 DEBUG
> > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> detected
> > by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait
> > false, activity 00:54:36.256]
> > 2015-12-01 00:56:09,268 DEBUG
> > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> detected
> > by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait
> > false, activity 00:55:01.259]
> > 2015-12-01 00:56:09,279 DEBUG
> > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> detected
> > by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait
> > false, activity 00:53:58.151]
> > 2015-12-01 00:56:09,279 DEBUG
> > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> detected
> > by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait
> > false, activity 00:53:58.162]
> >
> >
> > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> >> Sorry the second link should be
> >>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> >>
> >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> >> bbeaudreault@hubspot.com> wrote:
> >>
> >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> >>>
> >>> An active handler:
> >>>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> >>> One that is locked:
> >>>
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> >>>
> >>> The difference between pre-rollback and post is that previously we were
> >>> seeing things blocked in mvcc.  Now we are seeing them blocked on the
> >>> upsert.
> >>>
> >>> It always follows the same pattern, of 1 active handler in the upsert
> >>> and the rest blocked waiting for it.
> >>>
> >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> >>>
> >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> >>>> bbeaudreault@hubspot.com
> >>>> > wrote:
> >>>>
> >>>> > The rollback seems to have mostly solved the issue for one of our
> >>>> clusters,
> >>>> > but another one is still seeing long increment times:
> >>>> >
> >>>> > "slowIncrementCount": 52080,
> >>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max":
> 6162,"
> >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> >>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> >>>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
> >>>> >
> >>>> >
> >>>> > Any ideas if there are other changes that may be causing a
> performance
> >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> >>>> >
> >>>> >
> >>>> >
> >>>> No.
> >>>>
> >>>> Post a thread dump Bryan and it might prompt something.
> >>>>
> >>>> St.Ack
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> >
> >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> >>>> >
> >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> >>>> > > bbeaudreault@hubspot.com> wrote:
> >>>> > >
> >>>> > > > Should this be added as a known issue in the CDH or hbase
> >>>> > documentation?
> >>>> > > It
> >>>> > > > was a severe performance hit for us, all of our regionservers
> were
> >>>> > > sitting
> >>>> > > > at a few thousand queued requests.
> >>>> > > >
> >>>> > > >
> >>>> > > Let me take care of that.
> >>>> > > St.Ack
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> >>>> > > > bbeaudreault@hubspot.com>
> >>>> > > > wrote:
> >>>> > > >
> >>>> > > > > Yea, they are all over the place and called from client and
> >>>> > coprocessor
> >>>> > > > > code. We ended up having no other option but to rollback, and
> >>>> aside
> >>>> > > from
> >>>> > > > a
> >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> >>>> Put#addColumn),
> >>>> > > it
> >>>> > > > > seems to be working and fixing our problem.
> >>>> > > > >
> >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net>
> wrote:
> >>>> > > > >
> >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on
> >>>> this
> >>>> > now.
> >>>> > > > >> Where
> >>>> > > > >> are your counters Bryan? In their own column family or
> >>>> scattered
> >>>> > about
> >>>> > > > in
> >>>> > > > >> a
> >>>> > > > >> row with other Cell types?
> >>>> > > > >> St.Ack
> >>>> > > > >>
> >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> >>>> > > > >>
> >>>> > > > >> > Is there any update to this? We just upgraded all of our
> >>>> > production
> >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
> >>>> listed in
> >>>> > > the
> >>>> > > > >> > known issues, did not not about this.  Now we are seeing
> >>>> > perfomance
> >>>> > > > >> issues
> >>>> > > > >> > across all clusters, as we make heavy use of increments.
> >>>> > > > >> >
> >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope
> to
> >>>> roll
> >>>> > > back
> >>>> > > > >> to
> >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> >>>> > > > >> >
> >>>> > > > >> >
> >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
> >>>> wrote:
> >>>> > > > >> >
> >>>> > > > >> > > Thank you St.Ack!
> >>>> > > > >> > >
> >>>> > > > >> > > I would like to follow the ticket.
> >>>> > > > >> > >
> >>>> > > > >> > > Toshihiro Suzuki
> >>>> > > > >> > >
> >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> >>>> > > > >> > >
> >>>> > > > >> > > > Back to this problem. Simple tests confirm that as is,
> >>>> the
> >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops
> if
> >>>> some
> >>>> > > > other
> >>>> > > > >> row
> >>>> > > > >> > > is
> >>>> > > > >> > > > slow to complete. In particular Increment, checkAndPut,
> >>>> and
> >>>> > > batch
> >>>> > > > >> > > mutations
> >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
> >>>> up.
> >>>> > Lets
> >>>> > > > >> see if
> >>>> > > > >> > > we
> >>>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
> >>>> not
> >>>> > all
> >>>> > > > >> Region
> >>>> > > > >> > > ops
> >>>> > > > >> > > > are paused.
> >>>> > > > >> > > >
> >>>> > > > >> > > > St.Ack
> >>>> > > > >> > > >
> >>>> > > > >> > > >
> >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> >>>> brfrn169@gmail.com>
> >>>> > > wrote:
> >>>> > > > >> > > >
> >>>> > > > >> > > > > > Thank you for the below reasoning (with
> accompanying
> >>>> > helpful
> >>>> > > > >> > > diagram).
> >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> with
> >>>> the
> >>>> > > > >> > > illustration.
> >>>> > > > >> > > > It
> >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> >>>> only...
> >>>> > > Writes
> >>>> > > > >> > > against
> >>>> > > > >> > > > > > other rows should not hold up my read of my row.
> Tag
> >>>> an
> >>>> > mvcc
> >>>> > > > >> with a
> >>>> > > > >> > > > 'row'
> >>>> > > > >> > > > > > scope so we can see which on-going writes pertain
> to
> >>>> > current
> >>>> > > > >> > > operation?
> >>>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
> >>>> > > > >> > > > >
> >>>> > > > >> > > > > > You need to read back the increment and have it be
> >>>> > 'correct'
> >>>> > > > at
> >>>> > > > >> > > > increment
> >>>> > > > >> > > > > > time?
> >>>> > > > >> > > > > Yes, we need it.
> >>>> > > > >> > > > >
> >>>> > > > >> > > > > I would like to help if there is anything I can do.
> >>>> > > > >> > > > >
> >>>> > > > >> > > > > Thanks,
> >>>> > > > >> > > > > Toshihiro Suzuki
> >>>> > > > >> > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> >>>> > > > >> > > > >
> >>>> > > > >> > > > > > Thank you for the below reasoning (with
> accompanying
> >>>> > helpful
> >>>> > > > >> > > diagram).
> >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help
> with
> >>>> the
> >>>> > > > >> > > illustration.
> >>>> > > > >> > > > It
> >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
> >>>> only...
> >>>> > > Writes
> >>>> > > > >> > > against
> >>>> > > > >> > > > > > other rows should not hold up my read of my row.
> Tag
> >>>> an
> >>>> > mvcc
> >>>> > > > >> with a
> >>>> > > > >> > > > 'row'
> >>>> > > > >> > > > > > scope so we can see which on-going writes pertain
> to
> >>>> > current
> >>>> > > > >> > > operation?
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > > > You need to read back the increment and have it be
> >>>> > 'correct'
> >>>> > > > at
> >>>> > > > >> > > > increment
> >>>> > > > >> > > > > > time?
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > > > (This is a good one)
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> >>>> > > > >> > > > > > St.Ack
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> >>>> brfrn169@gmail.com
> >>>> > >
> >>>> > > > >> wrote:
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > > > > St.Ack,
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Thank you for your response.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Why I make out that "A region lock (not a row
> lock)
> >>>> > seems
> >>>> > > to
> >>>> > > > >> > occur
> >>>> > > > >> > > in
> >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
> >>>> follows:
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > 2. w =
> mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> >>>> walKey);
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> >>>> writeQueue
> >>>> > > can
> >>>> > > > >> > cause
> >>>> > > > >> > > a
> >>>> > > > >> > > > > > region
> >>>> > > > >> > > > > > > lock.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> >>>> > > advanceMemstore(w)
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> >>>> beginMemstoreInsert() to
> >>>> > > > >> writeQueue
> >>>> > > > >> > > and
> >>>> > > > >> > > > > > waits
> >>>> > > > >> > > > > > > until writeQueue is empty or
> writeQueue.getFirst()
> >>>> == w.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > I think when a handler thread is processing
> between
> >>>> > step 2
> >>>> > > > and
> >>>> > > > >> > step
> >>>> > > > >> > > > 3,
> >>>> > > > >> > > > > > the
> >>>> > > > >> > > > > > > other handler threads can wait at step 1 until
> the
> >>>> > thread
> >>>> > > > >> > completes
> >>>> > > > >> > > > > step
> >>>> > > > >> > > > > > 3
> >>>> > > > >> > > > > > > This is depicted as follows:
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Actually, in the thread dump of our region
> server,
> >>>> many
> >>>> > > > >> handler
> >>>> > > > >> > > > threads
> >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step
> 1
> >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Many handler threads wait at this:
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > > Is it possible you are contending on a counter
> >>>> > > > post-upgrade?
> >>>> > > > >> > Is
> >>>> > > > >> > > it
> >>>> > > > >> > > > > > > > possible that all these threads are trying to
> >>>> get to
> >>>> > the
> >>>> > > > >> same
> >>>> > > > >> > row
> >>>> > > > >> > > > to
> >>>> > > > >> > > > > > > update
> >>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or
> are
> >>>> you
> >>>> > > > >> thinking
> >>>> > > > >> > > > > increment
> >>>> > > > >> > > > > > > > itself has slowed significantly?
> >>>> > > > >> > > > > > > We have just upgraded HBase, not changed the app
> >>>> > behavior.
> >>>> > > > We
> >>>> > > > >> are
> >>>> > > > >> > > > > > thinking
> >>>> > > > >> > > > > > > increment itself has slowed significantly.
> >>>> > > > >> > > > > > > Before upgrading HBase, it was good throughput
> and
> >>>> > > latency.
> >>>> > > > >> > > > > > > Currently, to cope with this problem, we split
> the
> >>>> > regions
> >>>> > > > >> > finely.
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Thanks,
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > Toshihiro Suzuki
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> stack@duboce.net
> >>>> >:
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> >>>> > > brfrn169@gmail.com
> >>>> > > > >
> >>>> > > > >> > > wrote:
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > > Ted,
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > Thank you for your response.
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > I think that increment operation works as
> >>>> follows:
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > 1. get row lock
> >>>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete()
> >>>> //
> >>>> > wait
> >>>> > > > for
> >>>> > > > >> all
> >>>> > > > >> > > > prior
> >>>> > > > >> > > > > > > MVCC
> >>>> > > > >> > > > > > > > > transactions to finish
> >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
> >>>> start a
> >>>> > > > >> > transaction
> >>>> > > > >> > > > > > > > > 4. get previous values
> >>>> > > > >> > > > > > > > > 5. create KVs
> >>>> > > > >> > > > > > > > > 6. write to Memstore
> >>>> > > > >> > > > > > > > > 7. write to WAL
> >>>> > > > >> > > > > > > > > 8. release row lock
> >>>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
> >>>> > complete
> >>>> > > > the
> >>>> > > > >> > > > > > transaction
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
> >>>> has a
> >>>> > > > pending
> >>>> > > > >> > > queue
> >>>> > > > >> > > > of
> >>>> > > > >> > > > > > > > writes
> >>>> > > > >> > > > > > > > > named writeQueue.
> >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
> >>>> waits
> >>>> > > until
> >>>> > > > >> > > > writeQueue
> >>>> > > > >> > > > > > is
> >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and
> >>>> step 9
> >>>> > > > removes
> >>>> > > > >> the
> >>>> > > > >> > > > > > > WriteEntry
> >>>> > > > >> > > > > > > > > from writeQueue.
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > I think that when a handler thread is
> >>>> processing
> >>>> > > between
> >>>> > > > >> > step 2
> >>>> > > > >> > > > and
> >>>> > > > >> > > > > > > step
> >>>> > > > >> > > > > > > > 9,
> >>>> > > > >> > > > > > > > > the other handler threads can wait until the
> >>>> thread
> >>>> > > > >> completes
> >>>> > > > >> > > > step
> >>>> > > > >> > > > > 9.
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > That is right. We need to read, after all
> >>>> outstanding
> >>>> > > > >> updates
> >>>> > > > >> > are
> >>>> > > > >> > > > > > done...
> >>>> > > > >> > > > > > > > because we need to read the latest update
> before
> >>>> we go
> >>>> > > to
> >>>> > > > >> > > > > > > modify/increment
> >>>> > > > >> > > > > > > > it.
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > How do you make out this?
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur
> in
> >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > In 0.98.x we did this:
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete()
> which
> >>>> is
> >>>> > > > this....
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > +  public void
> >>>> waitForPreviousTransactionsComplete() {
> >>>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> >>>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> >>>> > > > >> > > > > > > > +  }
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in
> >>>> 1.0 (
> >>>> > > > >> > > > > > > >
> https://issues.apache.org/jira/browse/HBASE-8763
> >>>> ).
> >>>> > > > Previous
> >>>> > > > >> > mvcc
> >>>> > > > >> > > > and
> >>>> > > > >> > > > > > > > region
> >>>> > > > >> > > > > > > > sequenceid would spin independent of each
> other.
> >>>> > Perhaps
> >>>> > > > >> this
> >>>> > > > >> > > > > > responsible
> >>>> > > > >> > > > > > > > for some slow down.
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > That said, looking in your thread dump, we seem
> >>>> to be
> >>>> > > down
> >>>> > > > >> in
> >>>> > > > >> > the
> >>>> > > > >> > > > > Get.
> >>>> > > > >> > > > > > If
> >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where
> >>>> is the
> >>>> > > > >> > > lock-holding
> >>>> > > > >> > > > > > > thread?
> >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
> >>>> sequence
> >>>> > > id?
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > Is it possible you are contending on a counter
> >>>> > > > post-upgrade?
> >>>> > > > >> > Is
> >>>> > > > >> > > it
> >>>> > > > >> > > > > > > > possible that all these threads are trying to
> >>>> get to
> >>>> > the
> >>>> > > > >> same
> >>>> > > > >> > row
> >>>> > > > >> > > > to
> >>>> > > > >> > > > > > > update
> >>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or
> are
> >>>> you
> >>>> > > > >> thinking
> >>>> > > > >> > > > > increment
> >>>> > > > >> > > > > > > > itself has slowed significantly?
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > St.Ack
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > > > > Thanks,
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> >>>> > yuzhihong@gmail.com
> >>>> > > >:
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row
> (not
> >>>> > > region):
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > > >     try {
> >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > > > Thanks
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> >>>> > > > >> brfrn169@gmail.com>
> >>>> > > > >> > > > wrote:
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Hi,
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> >>>> > CDH5.3.1(HBase0.98.6)
> >>>> > > > to
> >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> >>>> > > > >> > > > > > > > > > > and we experience slowdown in increment
> >>>> > operation.
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
> >>>> > > > >> RegionServer of
> >>>> > > > >> > > our
> >>>> > > > >> > > > > > > > cluster:
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Thread 68
> >>>> > > > >> > > > > > >
> >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> >>>> > > > >> > > > > > > > > > >   Blocked on
> java.util.LinkedList@3474e4b2
> >>>> > > > >> > > > > > > > > > >   Blocked by 63
> >>>> > > > >> > > > > > > > >
> >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> >>>> > > > >> > > > > > > > > > >   Stack:
> >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > >
> >>>> > > > >>
> >>>> > > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > >
> >>>> > > >
> >>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> >>>> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > There are many similar threads in the
> >>>> thread
> >>>> > dump.
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > I read the source code and I think this
> is
> >>>> > caused
> >>>> > > by
> >>>> > > > >> > > changes
> >>>> > > > >> > > > of
> >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> >>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
> >>>> occur in
> >>>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Also we wrote performance test code for
> >>>> > increment
> >>>> > > > >> > operation
> >>>> > > > >> > > > > that
> >>>> > > > >> > > > > > > > > included
> >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > The result is shown below:
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> >>>> > > > >> 7.975072509210629
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> >>>> > > > 49.11840157868772
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Thanks,
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> >>>> > > > >> > > > > > > > > > >
> >>>> > > > >> > > > > > > > > >
> >>>> > > > >> > > > > > > > >
> >>>> > > > >> > > > > > > >
> >>>> > > > >> > > > > > >
> >>>> > > > >> > > > > >
> >>>> > > > >> > > > >
> >>>> > > > >> > > >
> >>>> > > > >> > >
> >>>> > > > >> >
> >>>> > > > >>
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> >>>
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Those log lines have settled down, they may have been related to a
cluster-wide forced restart at the time.

On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <bb...@hubspot.com>
wrote:

> We've been doing more debugging of this and have set up the read vs write
> handlers to try to at least segment this away so reads can work. We have
> pretty beefy servers, and are running wiht the following settings:
>
> hbase.regionserver.handler.count=1000
> hbase.ipc.server.read.threadpool.size=50
> hbase.ipc.server.callqueue.handler.factor=0.025
> hbase.ipc.server.callqueue.read.ratio=0.6
> hbase.ipc.server.callqueue.scan.ratio=0.5
>
> We are seeing all 400 write handlers taken up by row locks for the most
> part. The read handlers are mostly idle. We're thinking of changing the
> ratio here, but are not sure it will help if they are all blocked on a row
> lock.  We just enabled DEBUG logging on all our servers and notice the
> following:
>
> 2015-12-01 00:56:09,015 DEBUG
> org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
> by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait
> false, activity 00:54:36.240]
> 2015-12-01 00:56:09,015 DEBUG
> org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
> by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait
> false, activity 00:54:36.256]
> 2015-12-01 00:56:09,268 DEBUG
> org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
> by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait
> false, activity 00:55:01.259]
> 2015-12-01 00:56:09,279 DEBUG
> org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
> by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait
> false, activity 00:53:58.151]
> 2015-12-01 00:56:09,279 DEBUG
> org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
> by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait
> false, activity 00:53:58.162]
>
>
> On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> Sorry the second link should be
>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
>>
>> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
>> bbeaudreault@hubspot.com> wrote:
>>
>>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>>>
>>> An active handler:
>>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
>>> One that is locked:
>>> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
>>>
>>> The difference between pre-rollback and post is that previously we were
>>> seeing things blocked in mvcc.  Now we are seeing them blocked on the
>>> upsert.
>>>
>>> It always follows the same pattern, of 1 active handler in the upsert
>>> and the rest blocked waiting for it.
>>>
>>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
>>>
>>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
>>>> bbeaudreault@hubspot.com
>>>> > wrote:
>>>>
>>>> > The rollback seems to have mostly solved the issue for one of our
>>>> clusters,
>>>> > but another one is still seeing long increment times:
>>>> >
>>>> > "slowIncrementCount": 52080,
>>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
>>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
>>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
>>>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
>>>> >
>>>> >
>>>> > Any ideas if there are other changes that may be causing a performance
>>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
>>>> >
>>>> >
>>>> >
>>>> No.
>>>>
>>>> Post a thread dump Bryan and it might prompt something.
>>>>
>>>> St.Ack
>>>>
>>>>
>>>>
>>>>
>>>> >
>>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
>>>> >
>>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
>>>> > > bbeaudreault@hubspot.com> wrote:
>>>> > >
>>>> > > > Should this be added as a known issue in the CDH or hbase
>>>> > documentation?
>>>> > > It
>>>> > > > was a severe performance hit for us, all of our regionservers were
>>>> > > sitting
>>>> > > > at a few thousand queued requests.
>>>> > > >
>>>> > > >
>>>> > > Let me take care of that.
>>>> > > St.Ack
>>>> > >
>>>> > >
>>>> > >
>>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
>>>> > > > bbeaudreault@hubspot.com>
>>>> > > > wrote:
>>>> > > >
>>>> > > > > Yea, they are all over the place and called from client and
>>>> > coprocessor
>>>> > > > > code. We ended up having no other option but to rollback, and
>>>> aside
>>>> > > from
>>>> > > > a
>>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
>>>> Put#addColumn),
>>>> > > it
>>>> > > > > seems to be working and fixing our problem.
>>>> > > > >
>>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
>>>> > > > >
>>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on
>>>> this
>>>> > now.
>>>> > > > >> Where
>>>> > > > >> are your counters Bryan? In their own column family or
>>>> scattered
>>>> > about
>>>> > > > in
>>>> > > > >> a
>>>> > > > >> row with other Cell types?
>>>> > > > >> St.Ack
>>>> > > > >>
>>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
>>>> > > > >> bbeaudreault@hubspot.com> wrote:
>>>> > > > >>
>>>> > > > >> > Is there any update to this? We just upgraded all of our
>>>> > production
>>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
>>>> listed in
>>>> > > the
>>>> > > > >> > known issues, did not not about this.  Now we are seeing
>>>> > perfomance
>>>> > > > >> issues
>>>> > > > >> > across all clusters, as we make heavy use of increments.
>>>> > > > >> >
>>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
>>>> roll
>>>> > > back
>>>> > > > >> to
>>>> > > > >> > CDH 5.3.1 (if that is possible)?
>>>> > > > >> >
>>>> > > > >> >
>>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
>>>> wrote:
>>>> > > > >> >
>>>> > > > >> > > Thank you St.Ack!
>>>> > > > >> > >
>>>> > > > >> > > I would like to follow the ticket.
>>>> > > > >> > >
>>>> > > > >> > > Toshihiro Suzuki
>>>> > > > >> > >
>>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
>>>> > > > >> > >
>>>> > > > >> > > > Back to this problem. Simple tests confirm that as is,
>>>> the
>>>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
>>>> some
>>>> > > > other
>>>> > > > >> row
>>>> > > > >> > > is
>>>> > > > >> > > > slow to complete. In particular Increment, checkAndPut,
>>>> and
>>>> > > batch
>>>> > > > >> > > mutations
>>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
>>>> up.
>>>> > Lets
>>>> > > > >> see if
>>>> > > > >> > > we
>>>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
>>>> not
>>>> > all
>>>> > > > >> Region
>>>> > > > >> > > ops
>>>> > > > >> > > > are paused.
>>>> > > > >> > > >
>>>> > > > >> > > > St.Ack
>>>> > > > >> > > >
>>>> > > > >> > > >
>>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
>>>> brfrn169@gmail.com>
>>>> > > wrote:
>>>> > > > >> > > >
>>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>>>> > helpful
>>>> > > > >> > > diagram).
>>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>>>> the
>>>> > > > >> > > illustration.
>>>> > > > >> > > > It
>>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
>>>> only...
>>>> > > Writes
>>>> > > > >> > > against
>>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag
>>>> an
>>>> > mvcc
>>>> > > > >> with a
>>>> > > > >> > > > 'row'
>>>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>>>> > current
>>>> > > > >> > > operation?
>>>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
>>>> > > > >> > > > >
>>>> > > > >> > > > > > You need to read back the increment and have it be
>>>> > 'correct'
>>>> > > > at
>>>> > > > >> > > > increment
>>>> > > > >> > > > > > time?
>>>> > > > >> > > > > Yes, we need it.
>>>> > > > >> > > > >
>>>> > > > >> > > > > I would like to help if there is anything I can do.
>>>> > > > >> > > > >
>>>> > > > >> > > > > Thanks,
>>>> > > > >> > > > > Toshihiro Suzuki
>>>> > > > >> > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
>>>> > > > >> > > > >
>>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>>>> > helpful
>>>> > > > >> > > diagram).
>>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>>>> the
>>>> > > > >> > > illustration.
>>>> > > > >> > > > It
>>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
>>>> only...
>>>> > > Writes
>>>> > > > >> > > against
>>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag
>>>> an
>>>> > mvcc
>>>> > > > >> with a
>>>> > > > >> > > > 'row'
>>>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>>>> > current
>>>> > > > >> > > operation?
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > You need to read back the increment and have it be
>>>> > 'correct'
>>>> > > > at
>>>> > > > >> > > > increment
>>>> > > > >> > > > > > time?
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > (This is a good one)
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > Thank you Toshihiro Suzuki
>>>> > > > >> > > > > > St.Ack
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
>>>> brfrn169@gmail.com
>>>> > >
>>>> > > > >> wrote:
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > > St.Ack,
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Thank you for your response.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Why I make out that "A region lock (not a row lock)
>>>> > seems
>>>> > > to
>>>> > > > >> > occur
>>>> > > > >> > > in
>>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
>>>> follows:
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
>>>> walKey);
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
>>>> writeQueue
>>>> > > can
>>>> > > > >> > cause
>>>> > > > >> > > a
>>>> > > > >> > > > > > region
>>>> > > > >> > > > > > > lock.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
>>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
>>>> > > advanceMemstore(w)
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
>>>> beginMemstoreInsert() to
>>>> > > > >> writeQueue
>>>> > > > >> > > and
>>>> > > > >> > > > > > waits
>>>> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
>>>> == w.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > I think when a handler thread is processing between
>>>> > step 2
>>>> > > > and
>>>> > > > >> > step
>>>> > > > >> > > > 3,
>>>> > > > >> > > > > > the
>>>> > > > >> > > > > > > other handler threads can wait at step 1 until the
>>>> > thread
>>>> > > > >> > completes
>>>> > > > >> > > > > step
>>>> > > > >> > > > > > 3
>>>> > > > >> > > > > > > This is depicted as follows:
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Actually, in the thread dump of our region server,
>>>> many
>>>> > > > >> handler
>>>> > > > >> > > > threads
>>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
>>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Many handler threads wait at this:
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > > Is it possible you are contending on a counter
>>>> > > > post-upgrade?
>>>> > > > >> > Is
>>>> > > > >> > > it
>>>> > > > >> > > > > > > > possible that all these threads are trying to
>>>> get to
>>>> > the
>>>> > > > >> same
>>>> > > > >> > row
>>>> > > > >> > > > to
>>>> > > > >> > > > > > > update
>>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>>>> you
>>>> > > > >> thinking
>>>> > > > >> > > > > increment
>>>> > > > >> > > > > > > > itself has slowed significantly?
>>>> > > > >> > > > > > > We have just upgraded HBase, not changed the app
>>>> > behavior.
>>>> > > > We
>>>> > > > >> are
>>>> > > > >> > > > > > thinking
>>>> > > > >> > > > > > > increment itself has slowed significantly.
>>>> > > > >> > > > > > > Before upgrading HBase, it was good throughput and
>>>> > > latency.
>>>> > > > >> > > > > > > Currently, to cope with this problem, we split the
>>>> > regions
>>>> > > > >> > finely.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Thanks,
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Toshihiro Suzuki
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <stack@duboce.net
>>>> >:
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
>>>> > > brfrn169@gmail.com
>>>> > > > >
>>>> > > > >> > > wrote:
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > > Ted,
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > Thank you for your response.
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > I think that increment operation works as
>>>> follows:
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > 1. get row lock
>>>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete()
>>>> //
>>>> > wait
>>>> > > > for
>>>> > > > >> all
>>>> > > > >> > > > prior
>>>> > > > >> > > > > > > MVCC
>>>> > > > >> > > > > > > > > transactions to finish
>>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
>>>> start a
>>>> > > > >> > transaction
>>>> > > > >> > > > > > > > > 4. get previous values
>>>> > > > >> > > > > > > > > 5. create KVs
>>>> > > > >> > > > > > > > > 6. write to Memstore
>>>> > > > >> > > > > > > > > 7. write to WAL
>>>> > > > >> > > > > > > > > 8. release row lock
>>>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
>>>> > complete
>>>> > > > the
>>>> > > > >> > > > > > transaction
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
>>>> has a
>>>> > > > pending
>>>> > > > >> > > queue
>>>> > > > >> > > > of
>>>> > > > >> > > > > > > > writes
>>>> > > > >> > > > > > > > > named writeQueue.
>>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
>>>> waits
>>>> > > until
>>>> > > > >> > > > writeQueue
>>>> > > > >> > > > > > is
>>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
>>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and
>>>> step 9
>>>> > > > removes
>>>> > > > >> the
>>>> > > > >> > > > > > > WriteEntry
>>>> > > > >> > > > > > > > > from writeQueue.
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > I think that when a handler thread is
>>>> processing
>>>> > > between
>>>> > > > >> > step 2
>>>> > > > >> > > > and
>>>> > > > >> > > > > > > step
>>>> > > > >> > > > > > > > 9,
>>>> > > > >> > > > > > > > > the other handler threads can wait until the
>>>> thread
>>>> > > > >> completes
>>>> > > > >> > > > step
>>>> > > > >> > > > > 9.
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > That is right. We need to read, after all
>>>> outstanding
>>>> > > > >> updates
>>>> > > > >> > are
>>>> > > > >> > > > > > done...
>>>> > > > >> > > > > > > > because we need to read the latest update before
>>>> we go
>>>> > > to
>>>> > > > >> > > > > > > modify/increment
>>>> > > > >> > > > > > > > it.
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > How do you make out this?
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
>>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > In 0.98.x we did this:
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > ... and in 1.0 we do this:
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which
>>>> is
>>>> > > > this....
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > +  public void
>>>> waitForPreviousTransactionsComplete() {
>>>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
>>>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
>>>> > > > >> > > > > > > > +  }
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in
>>>> 1.0 (
>>>> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763
>>>> ).
>>>> > > > Previous
>>>> > > > >> > mvcc
>>>> > > > >> > > > and
>>>> > > > >> > > > > > > > region
>>>> > > > >> > > > > > > > sequenceid would spin independent of each other.
>>>> > Perhaps
>>>> > > > >> this
>>>> > > > >> > > > > > responsible
>>>> > > > >> > > > > > > > for some slow down.
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > That said, looking in your thread dump, we seem
>>>> to be
>>>> > > down
>>>> > > > >> in
>>>> > > > >> > the
>>>> > > > >> > > > > Get.
>>>> > > > >> > > > > > If
>>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where
>>>> is the
>>>> > > > >> > > lock-holding
>>>> > > > >> > > > > > > thread?
>>>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
>>>> sequence
>>>> > > id?
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > Is it possible you are contending on a counter
>>>> > > > post-upgrade?
>>>> > > > >> > Is
>>>> > > > >> > > it
>>>> > > > >> > > > > > > > possible that all these threads are trying to
>>>> get to
>>>> > the
>>>> > > > >> same
>>>> > > > >> > row
>>>> > > > >> > > > to
>>>> > > > >> > > > > > > update
>>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>>>> you
>>>> > > > >> thinking
>>>> > > > >> > > > > increment
>>>> > > > >> > > > > > > > itself has slowed significantly?
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > St.Ack
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > > Thanks,
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > Toshihiro Suzuki
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
>>>> > yuzhihong@gmail.com
>>>> > > >:
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
>>>> > > region):
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > > >     try {
>>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > > > Thanks
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
>>>> > > > >> brfrn169@gmail.com>
>>>> > > > >> > > > wrote:
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Hi,
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > We upgraded our cluster from
>>>> > CDH5.3.1(HBase0.98.6)
>>>> > > > to
>>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
>>>> > > > >> > > > > > > > > > > and we experience slowdown in increment
>>>> > operation.
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
>>>> > > > >> RegionServer of
>>>> > > > >> > > our
>>>> > > > >> > > > > > > > cluster:
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Thread 68
>>>> > > > >> > > > > > >
>>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
>>>> > > > >> > > > > > > > > > >   State: BLOCKED
>>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
>>>> > > > >> > > > > > > > > > >   Waited count: 39828360
>>>> > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
>>>> > > > >> > > > > > > > > > >   Blocked by 63
>>>> > > > >> > > > > > > > >
>>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
>>>> > > > >> > > > > > > > > > >   Stack:
>>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > >
>>>> > > > >>
>>>> > > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > >
>>>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > >
>>>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > >
>>>> > > >
>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>>>> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > There are many similar threads in the
>>>> thread
>>>> > dump.
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > I read the source code and I think this is
>>>> > caused
>>>> > > by
>>>> > > > >> > > changes
>>>> > > > >> > > > of
>>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
>>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
>>>> occur in
>>>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Also we wrote performance test code for
>>>> > increment
>>>> > > > >> > operation
>>>> > > > >> > > > > that
>>>> > > > >> > > > > > > > > included
>>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > The result is shown below:
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
>>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
>>>> > > > >> 7.975072509210629
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
>>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
>>>> > > > 49.11840157868772
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Thanks,
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Those log lines have settled down, they may have been related to a
cluster-wide forced restart at the time.

On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <bb...@hubspot.com>
wrote:

> We've been doing more debugging of this and have set up the read vs write
> handlers to try to at least segment this away so reads can work. We have
> pretty beefy servers, and are running wiht the following settings:
>
> hbase.regionserver.handler.count=1000
> hbase.ipc.server.read.threadpool.size=50
> hbase.ipc.server.callqueue.handler.factor=0.025
> hbase.ipc.server.callqueue.read.ratio=0.6
> hbase.ipc.server.callqueue.scan.ratio=0.5
>
> We are seeing all 400 write handlers taken up by row locks for the most
> part. The read handlers are mostly idle. We're thinking of changing the
> ratio here, but are not sure it will help if they are all blocked on a row
> lock.  We just enabled DEBUG logging on all our servers and notice the
> following:
>
> 2015-12-01 00:56:09,015 DEBUG
> org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
> by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait
> false, activity 00:54:36.240]
> 2015-12-01 00:56:09,015 DEBUG
> org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
> by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait
> false, activity 00:54:36.256]
> 2015-12-01 00:56:09,268 DEBUG
> org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
> by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait
> false, activity 00:55:01.259]
> 2015-12-01 00:56:09,279 DEBUG
> org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
> by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait
> false, activity 00:53:58.151]
> 2015-12-01 00:56:09,279 DEBUG
> org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
> by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait
> false, activity 00:53:58.162]
>
>
> On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> Sorry the second link should be
>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
>>
>> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
>> bbeaudreault@hubspot.com> wrote:
>>
>>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>>>
>>> An active handler:
>>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
>>> One that is locked:
>>> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
>>>
>>> The difference between pre-rollback and post is that previously we were
>>> seeing things blocked in mvcc.  Now we are seeing them blocked on the
>>> upsert.
>>>
>>> It always follows the same pattern, of 1 active handler in the upsert
>>> and the rest blocked waiting for it.
>>>
>>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
>>>
>>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
>>>> bbeaudreault@hubspot.com
>>>> > wrote:
>>>>
>>>> > The rollback seems to have mostly solved the issue for one of our
>>>> clusters,
>>>> > but another one is still seeing long increment times:
>>>> >
>>>> > "slowIncrementCount": 52080,
>>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
>>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
>>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
>>>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
>>>> >
>>>> >
>>>> > Any ideas if there are other changes that may be causing a performance
>>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
>>>> >
>>>> >
>>>> >
>>>> No.
>>>>
>>>> Post a thread dump Bryan and it might prompt something.
>>>>
>>>> St.Ack
>>>>
>>>>
>>>>
>>>>
>>>> >
>>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
>>>> >
>>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
>>>> > > bbeaudreault@hubspot.com> wrote:
>>>> > >
>>>> > > > Should this be added as a known issue in the CDH or hbase
>>>> > documentation?
>>>> > > It
>>>> > > > was a severe performance hit for us, all of our regionservers were
>>>> > > sitting
>>>> > > > at a few thousand queued requests.
>>>> > > >
>>>> > > >
>>>> > > Let me take care of that.
>>>> > > St.Ack
>>>> > >
>>>> > >
>>>> > >
>>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
>>>> > > > bbeaudreault@hubspot.com>
>>>> > > > wrote:
>>>> > > >
>>>> > > > > Yea, they are all over the place and called from client and
>>>> > coprocessor
>>>> > > > > code. We ended up having no other option but to rollback, and
>>>> aside
>>>> > > from
>>>> > > > a
>>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
>>>> Put#addColumn),
>>>> > > it
>>>> > > > > seems to be working and fixing our problem.
>>>> > > > >
>>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
>>>> > > > >
>>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on
>>>> this
>>>> > now.
>>>> > > > >> Where
>>>> > > > >> are your counters Bryan? In their own column family or
>>>> scattered
>>>> > about
>>>> > > > in
>>>> > > > >> a
>>>> > > > >> row with other Cell types?
>>>> > > > >> St.Ack
>>>> > > > >>
>>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
>>>> > > > >> bbeaudreault@hubspot.com> wrote:
>>>> > > > >>
>>>> > > > >> > Is there any update to this? We just upgraded all of our
>>>> > production
>>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
>>>> listed in
>>>> > > the
>>>> > > > >> > known issues, did not not about this.  Now we are seeing
>>>> > perfomance
>>>> > > > >> issues
>>>> > > > >> > across all clusters, as we make heavy use of increments.
>>>> > > > >> >
>>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
>>>> roll
>>>> > > back
>>>> > > > >> to
>>>> > > > >> > CDH 5.3.1 (if that is possible)?
>>>> > > > >> >
>>>> > > > >> >
>>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
>>>> wrote:
>>>> > > > >> >
>>>> > > > >> > > Thank you St.Ack!
>>>> > > > >> > >
>>>> > > > >> > > I would like to follow the ticket.
>>>> > > > >> > >
>>>> > > > >> > > Toshihiro Suzuki
>>>> > > > >> > >
>>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
>>>> > > > >> > >
>>>> > > > >> > > > Back to this problem. Simple tests confirm that as is,
>>>> the
>>>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
>>>> some
>>>> > > > other
>>>> > > > >> row
>>>> > > > >> > > is
>>>> > > > >> > > > slow to complete. In particular Increment, checkAndPut,
>>>> and
>>>> > > batch
>>>> > > > >> > > mutations
>>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
>>>> up.
>>>> > Lets
>>>> > > > >> see if
>>>> > > > >> > > we
>>>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
>>>> not
>>>> > all
>>>> > > > >> Region
>>>> > > > >> > > ops
>>>> > > > >> > > > are paused.
>>>> > > > >> > > >
>>>> > > > >> > > > St.Ack
>>>> > > > >> > > >
>>>> > > > >> > > >
>>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
>>>> brfrn169@gmail.com>
>>>> > > wrote:
>>>> > > > >> > > >
>>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>>>> > helpful
>>>> > > > >> > > diagram).
>>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>>>> the
>>>> > > > >> > > illustration.
>>>> > > > >> > > > It
>>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
>>>> only...
>>>> > > Writes
>>>> > > > >> > > against
>>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag
>>>> an
>>>> > mvcc
>>>> > > > >> with a
>>>> > > > >> > > > 'row'
>>>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>>>> > current
>>>> > > > >> > > operation?
>>>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
>>>> > > > >> > > > >
>>>> > > > >> > > > > > You need to read back the increment and have it be
>>>> > 'correct'
>>>> > > > at
>>>> > > > >> > > > increment
>>>> > > > >> > > > > > time?
>>>> > > > >> > > > > Yes, we need it.
>>>> > > > >> > > > >
>>>> > > > >> > > > > I would like to help if there is anything I can do.
>>>> > > > >> > > > >
>>>> > > > >> > > > > Thanks,
>>>> > > > >> > > > > Toshihiro Suzuki
>>>> > > > >> > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
>>>> > > > >> > > > >
>>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>>>> > helpful
>>>> > > > >> > > diagram).
>>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>>>> the
>>>> > > > >> > > illustration.
>>>> > > > >> > > > It
>>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
>>>> only...
>>>> > > Writes
>>>> > > > >> > > against
>>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag
>>>> an
>>>> > mvcc
>>>> > > > >> with a
>>>> > > > >> > > > 'row'
>>>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>>>> > current
>>>> > > > >> > > operation?
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > You need to read back the increment and have it be
>>>> > 'correct'
>>>> > > > at
>>>> > > > >> > > > increment
>>>> > > > >> > > > > > time?
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > (This is a good one)
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > Thank you Toshihiro Suzuki
>>>> > > > >> > > > > > St.Ack
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
>>>> brfrn169@gmail.com
>>>> > >
>>>> > > > >> wrote:
>>>> > > > >> > > > > >
>>>> > > > >> > > > > > > St.Ack,
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Thank you for your response.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Why I make out that "A region lock (not a row lock)
>>>> > seems
>>>> > > to
>>>> > > > >> > occur
>>>> > > > >> > > in
>>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
>>>> follows:
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
>>>> walKey);
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
>>>> writeQueue
>>>> > > can
>>>> > > > >> > cause
>>>> > > > >> > > a
>>>> > > > >> > > > > > region
>>>> > > > >> > > > > > > lock.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
>>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
>>>> > > advanceMemstore(w)
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
>>>> beginMemstoreInsert() to
>>>> > > > >> writeQueue
>>>> > > > >> > > and
>>>> > > > >> > > > > > waits
>>>> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
>>>> == w.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > I think when a handler thread is processing between
>>>> > step 2
>>>> > > > and
>>>> > > > >> > step
>>>> > > > >> > > > 3,
>>>> > > > >> > > > > > the
>>>> > > > >> > > > > > > other handler threads can wait at step 1 until the
>>>> > thread
>>>> > > > >> > completes
>>>> > > > >> > > > > step
>>>> > > > >> > > > > > 3
>>>> > > > >> > > > > > > This is depicted as follows:
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Actually, in the thread dump of our region server,
>>>> many
>>>> > > > >> handler
>>>> > > > >> > > > threads
>>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
>>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Many handler threads wait at this:
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > > Is it possible you are contending on a counter
>>>> > > > post-upgrade?
>>>> > > > >> > Is
>>>> > > > >> > > it
>>>> > > > >> > > > > > > > possible that all these threads are trying to
>>>> get to
>>>> > the
>>>> > > > >> same
>>>> > > > >> > row
>>>> > > > >> > > > to
>>>> > > > >> > > > > > > update
>>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>>>> you
>>>> > > > >> thinking
>>>> > > > >> > > > > increment
>>>> > > > >> > > > > > > > itself has slowed significantly?
>>>> > > > >> > > > > > > We have just upgraded HBase, not changed the app
>>>> > behavior.
>>>> > > > We
>>>> > > > >> are
>>>> > > > >> > > > > > thinking
>>>> > > > >> > > > > > > increment itself has slowed significantly.
>>>> > > > >> > > > > > > Before upgrading HBase, it was good throughput and
>>>> > > latency.
>>>> > > > >> > > > > > > Currently, to cope with this problem, we split the
>>>> > regions
>>>> > > > >> > finely.
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Thanks,
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > Toshihiro Suzuki
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <stack@duboce.net
>>>> >:
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
>>>> > > brfrn169@gmail.com
>>>> > > > >
>>>> > > > >> > > wrote:
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > > Ted,
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > Thank you for your response.
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > I think that increment operation works as
>>>> follows:
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > 1. get row lock
>>>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete()
>>>> //
>>>> > wait
>>>> > > > for
>>>> > > > >> all
>>>> > > > >> > > > prior
>>>> > > > >> > > > > > > MVCC
>>>> > > > >> > > > > > > > > transactions to finish
>>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
>>>> start a
>>>> > > > >> > transaction
>>>> > > > >> > > > > > > > > 4. get previous values
>>>> > > > >> > > > > > > > > 5. create KVs
>>>> > > > >> > > > > > > > > 6. write to Memstore
>>>> > > > >> > > > > > > > > 7. write to WAL
>>>> > > > >> > > > > > > > > 8. release row lock
>>>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
>>>> > complete
>>>> > > > the
>>>> > > > >> > > > > > transaction
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
>>>> has a
>>>> > > > pending
>>>> > > > >> > > queue
>>>> > > > >> > > > of
>>>> > > > >> > > > > > > > writes
>>>> > > > >> > > > > > > > > named writeQueue.
>>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
>>>> waits
>>>> > > until
>>>> > > > >> > > > writeQueue
>>>> > > > >> > > > > > is
>>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
>>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and
>>>> step 9
>>>> > > > removes
>>>> > > > >> the
>>>> > > > >> > > > > > > WriteEntry
>>>> > > > >> > > > > > > > > from writeQueue.
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > I think that when a handler thread is
>>>> processing
>>>> > > between
>>>> > > > >> > step 2
>>>> > > > >> > > > and
>>>> > > > >> > > > > > > step
>>>> > > > >> > > > > > > > 9,
>>>> > > > >> > > > > > > > > the other handler threads can wait until the
>>>> thread
>>>> > > > >> completes
>>>> > > > >> > > > step
>>>> > > > >> > > > > 9.
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > That is right. We need to read, after all
>>>> outstanding
>>>> > > > >> updates
>>>> > > > >> > are
>>>> > > > >> > > > > > done...
>>>> > > > >> > > > > > > > because we need to read the latest update before
>>>> we go
>>>> > > to
>>>> > > > >> > > > > > > modify/increment
>>>> > > > >> > > > > > > > it.
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > How do you make out this?
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
>>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > In 0.98.x we did this:
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > ... and in 1.0 we do this:
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which
>>>> is
>>>> > > > this....
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > +  public void
>>>> waitForPreviousTransactionsComplete() {
>>>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
>>>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
>>>> > > > >> > > > > > > > +  }
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in
>>>> 1.0 (
>>>> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763
>>>> ).
>>>> > > > Previous
>>>> > > > >> > mvcc
>>>> > > > >> > > > and
>>>> > > > >> > > > > > > > region
>>>> > > > >> > > > > > > > sequenceid would spin independent of each other.
>>>> > Perhaps
>>>> > > > >> this
>>>> > > > >> > > > > > responsible
>>>> > > > >> > > > > > > > for some slow down.
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > That said, looking in your thread dump, we seem
>>>> to be
>>>> > > down
>>>> > > > >> in
>>>> > > > >> > the
>>>> > > > >> > > > > Get.
>>>> > > > >> > > > > > If
>>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where
>>>> is the
>>>> > > > >> > > lock-holding
>>>> > > > >> > > > > > > thread?
>>>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
>>>> sequence
>>>> > > id?
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > Is it possible you are contending on a counter
>>>> > > > post-upgrade?
>>>> > > > >> > Is
>>>> > > > >> > > it
>>>> > > > >> > > > > > > > possible that all these threads are trying to
>>>> get to
>>>> > the
>>>> > > > >> same
>>>> > > > >> > row
>>>> > > > >> > > > to
>>>> > > > >> > > > > > > update
>>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>>>> you
>>>> > > > >> thinking
>>>> > > > >> > > > > increment
>>>> > > > >> > > > > > > > itself has slowed significantly?
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > St.Ack
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > > > > Thanks,
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > Toshihiro Suzuki
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
>>>> > yuzhihong@gmail.com
>>>> > > >:
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
>>>> > > region):
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > > >     try {
>>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > > > Thanks
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
>>>> > > > >> brfrn169@gmail.com>
>>>> > > > >> > > > wrote:
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Hi,
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > We upgraded our cluster from
>>>> > CDH5.3.1(HBase0.98.6)
>>>> > > > to
>>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
>>>> > > > >> > > > > > > > > > > and we experience slowdown in increment
>>>> > operation.
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
>>>> > > > >> RegionServer of
>>>> > > > >> > > our
>>>> > > > >> > > > > > > > cluster:
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Thread 68
>>>> > > > >> > > > > > >
>>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
>>>> > > > >> > > > > > > > > > >   State: BLOCKED
>>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
>>>> > > > >> > > > > > > > > > >   Waited count: 39828360
>>>> > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
>>>> > > > >> > > > > > > > > > >   Blocked by 63
>>>> > > > >> > > > > > > > >
>>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
>>>> > > > >> > > > > > > > > > >   Stack:
>>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > >
>>>> > > > >>
>>>> > > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > >
>>>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > >
>>>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > >
>>>> > > >
>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>>>> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > There are many similar threads in the
>>>> thread
>>>> > dump.
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > I read the source code and I think this is
>>>> > caused
>>>> > > by
>>>> > > > >> > > changes
>>>> > > > >> > > > of
>>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
>>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
>>>> occur in
>>>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Also we wrote performance test code for
>>>> > increment
>>>> > > > >> > operation
>>>> > > > >> > > > > that
>>>> > > > >> > > > > > > > > included
>>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > The result is shown below:
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
>>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
>>>> > > > >> 7.975072509210629
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
>>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
>>>> > > > 49.11840157868772
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Thanks,
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
>>>> > > > >> > > > > > > > > > >
>>>> > > > >> > > > > > > > > >
>>>> > > > >> > > > > > > > >
>>>> > > > >> > > > > > > >
>>>> > > > >> > > > > > >
>>>> > > > >> > > > > >
>>>> > > > >> > > > >
>>>> > > > >> > > >
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

We've been doing more debugging of this and have set up the read vs write
handlers to try to at least segment this away so reads can work. We have
pretty beefy servers, and are running wiht the following settings:

hbase.regionserver.handler.count=1000
hbase.ipc.server.read.threadpool.size=50
hbase.ipc.server.callqueue.handler.factor=0.025
hbase.ipc.server.callqueue.read.ratio=0.6
hbase.ipc.server.callqueue.scan.ratio=0.5

We are seeing all 400 write handlers taken up by row locks for the most
part. The read handlers are mostly idle. We're thinking of changing the
ratio here, but are not sure it will help if they are all blocked on a row
lock.  We just enabled DEBUG logging on all our servers and notice the
following:

2015-12-01 00:56:09,015 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait
false, activity 00:54:36.240]
2015-12-01 00:56:09,015 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait
false, activity 00:54:36.256]
2015-12-01 00:56:09,268 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait
false, activity 00:55:01.259]
2015-12-01 00:56:09,279 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait
false, activity 00:53:58.151]
2015-12-01 00:56:09,279 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait
false, activity 00:53:58.162]


On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <bb...@hubspot.com>
wrote:

> Sorry the second link should be
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
>
> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>>
>> An active handler:
>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
>> One that is locked:
>> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
>>
>> The difference between pre-rollback and post is that previously we were
>> seeing things blocked in mvcc.  Now we are seeing them blocked on the
>> upsert.
>>
>> It always follows the same pattern, of 1 active handler in the upsert and
>> the rest blocked waiting for it.
>>
>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
>>
>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
>>> bbeaudreault@hubspot.com
>>> > wrote:
>>>
>>> > The rollback seems to have mostly solved the issue for one of our
>>> clusters,
>>> > but another one is still seeing long increment times:
>>> >
>>> > "slowIncrementCount": 52080,
>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
>>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
>>> >
>>> >
>>> > Any ideas if there are other changes that may be causing a performance
>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
>>> >
>>> >
>>> >
>>> No.
>>>
>>> Post a thread dump Bryan and it might prompt something.
>>>
>>> St.Ack
>>>
>>>
>>>
>>>
>>> >
>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
>>> >
>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
>>> > > bbeaudreault@hubspot.com> wrote:
>>> > >
>>> > > > Should this be added as a known issue in the CDH or hbase
>>> > documentation?
>>> > > It
>>> > > > was a severe performance hit for us, all of our regionservers were
>>> > > sitting
>>> > > > at a few thousand queued requests.
>>> > > >
>>> > > >
>>> > > Let me take care of that.
>>> > > St.Ack
>>> > >
>>> > >
>>> > >
>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
>>> > > > bbeaudreault@hubspot.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Yea, they are all over the place and called from client and
>>> > coprocessor
>>> > > > > code. We ended up having no other option but to rollback, and
>>> aside
>>> > > from
>>> > > > a
>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
>>> Put#addColumn),
>>> > > it
>>> > > > > seems to be working and fixing our problem.
>>> > > > >
>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
>>> > > > >
>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on this
>>> > now.
>>> > > > >> Where
>>> > > > >> are your counters Bryan? In their own column family or scattered
>>> > about
>>> > > > in
>>> > > > >> a
>>> > > > >> row with other Cell types?
>>> > > > >> St.Ack
>>> > > > >>
>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
>>> > > > >> bbeaudreault@hubspot.com> wrote:
>>> > > > >>
>>> > > > >> > Is there any update to this? We just upgraded all of our
>>> > production
>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
>>> listed in
>>> > > the
>>> > > > >> > known issues, did not not about this.  Now we are seeing
>>> > perfomance
>>> > > > >> issues
>>> > > > >> > across all clusters, as we make heavy use of increments.
>>> > > > >> >
>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
>>> roll
>>> > > back
>>> > > > >> to
>>> > > > >> > CDH 5.3.1 (if that is possible)?
>>> > > > >> >
>>> > > > >> >
>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
>>> wrote:
>>> > > > >> >
>>> > > > >> > > Thank you St.Ack!
>>> > > > >> > >
>>> > > > >> > > I would like to follow the ticket.
>>> > > > >> > >
>>> > > > >> > > Toshihiro Suzuki
>>> > > > >> > >
>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
>>> > > > >> > >
>>> > > > >> > > > Back to this problem. Simple tests confirm that as is, the
>>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
>>> some
>>> > > > other
>>> > > > >> row
>>> > > > >> > > is
>>> > > > >> > > > slow to complete. In particular Increment, checkAndPut,
>>> and
>>> > > batch
>>> > > > >> > > mutations
>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
>>> up.
>>> > Lets
>>> > > > >> see if
>>> > > > >> > > we
>>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
>>> not
>>> > all
>>> > > > >> Region
>>> > > > >> > > ops
>>> > > > >> > > > are paused.
>>> > > > >> > > >
>>> > > > >> > > > St.Ack
>>> > > > >> > > >
>>> > > > >> > > >
>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <brfrn169@gmail.com
>>> >
>>> > > wrote:
>>> > > > >> > > >
>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>>> > helpful
>>> > > > >> > > diagram).
>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>>> the
>>> > > > >> > > illustration.
>>> > > > >> > > > It
>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
>>> only...
>>> > > Writes
>>> > > > >> > > against
>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag
>>> an
>>> > mvcc
>>> > > > >> with a
>>> > > > >> > > > 'row'
>>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>>> > current
>>> > > > >> > > operation?
>>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
>>> > > > >> > > > >
>>> > > > >> > > > > > You need to read back the increment and have it be
>>> > 'correct'
>>> > > > at
>>> > > > >> > > > increment
>>> > > > >> > > > > > time?
>>> > > > >> > > > > Yes, we need it.
>>> > > > >> > > > >
>>> > > > >> > > > > I would like to help if there is anything I can do.
>>> > > > >> > > > >
>>> > > > >> > > > > Thanks,
>>> > > > >> > > > > Toshihiro Suzuki
>>> > > > >> > > > >
>>> > > > >> > > > >
>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
>>> > > > >> > > > >
>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>>> > helpful
>>> > > > >> > > diagram).
>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>>> the
>>> > > > >> > > illustration.
>>> > > > >> > > > It
>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
>>> only...
>>> > > Writes
>>> > > > >> > > against
>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag
>>> an
>>> > mvcc
>>> > > > >> with a
>>> > > > >> > > > 'row'
>>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>>> > current
>>> > > > >> > > operation?
>>> > > > >> > > > > >
>>> > > > >> > > > > > You need to read back the increment and have it be
>>> > 'correct'
>>> > > > at
>>> > > > >> > > > increment
>>> > > > >> > > > > > time?
>>> > > > >> > > > > >
>>> > > > >> > > > > > (This is a good one)
>>> > > > >> > > > > >
>>> > > > >> > > > > > Thank you Toshihiro Suzuki
>>> > > > >> > > > > > St.Ack
>>> > > > >> > > > > >
>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
>>> brfrn169@gmail.com
>>> > >
>>> > > > >> wrote:
>>> > > > >> > > > > >
>>> > > > >> > > > > > > St.Ack,
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Thank you for your response.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Why I make out that "A region lock (not a row lock)
>>> > seems
>>> > > to
>>> > > > >> > occur
>>> > > > >> > > in
>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
>>> follows:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
>>> writeQueue
>>> > > can
>>> > > > >> > cause
>>> > > > >> > > a
>>> > > > >> > > > > > region
>>> > > > >> > > > > > > lock.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
>>> > > advanceMemstore(w)
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert()
>>> to
>>> > > > >> writeQueue
>>> > > > >> > > and
>>> > > > >> > > > > > waits
>>> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
>>> == w.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > I think when a handler thread is processing between
>>> > step 2
>>> > > > and
>>> > > > >> > step
>>> > > > >> > > > 3,
>>> > > > >> > > > > > the
>>> > > > >> > > > > > > other handler threads can wait at step 1 until the
>>> > thread
>>> > > > >> > completes
>>> > > > >> > > > > step
>>> > > > >> > > > > > 3
>>> > > > >> > > > > > > This is depicted as follows:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Actually, in the thread dump of our region server,
>>> many
>>> > > > >> handler
>>> > > > >> > > > threads
>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Many handler threads wait at this:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > > Is it possible you are contending on a counter
>>> > > > post-upgrade?
>>> > > > >> > Is
>>> > > > >> > > it
>>> > > > >> > > > > > > > possible that all these threads are trying to get
>>> to
>>> > the
>>> > > > >> same
>>> > > > >> > row
>>> > > > >> > > > to
>>> > > > >> > > > > > > update
>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>>> you
>>> > > > >> thinking
>>> > > > >> > > > > increment
>>> > > > >> > > > > > > > itself has slowed significantly?
>>> > > > >> > > > > > > We have just upgraded HBase, not changed the app
>>> > behavior.
>>> > > > We
>>> > > > >> are
>>> > > > >> > > > > > thinking
>>> > > > >> > > > > > > increment itself has slowed significantly.
>>> > > > >> > > > > > > Before upgrading HBase, it was good throughput and
>>> > > latency.
>>> > > > >> > > > > > > Currently, to cope with this problem, we split the
>>> > regions
>>> > > > >> > finely.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Thanks,
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Toshihiro Suzuki
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <stack@duboce.net
>>> >:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
>>> > > brfrn169@gmail.com
>>> > > > >
>>> > > > >> > > wrote:
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > > Ted,
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > Thank you for your response.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > I think that increment operation works as
>>> follows:
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > 1. get row lock
>>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() //
>>> > wait
>>> > > > for
>>> > > > >> all
>>> > > > >> > > > prior
>>> > > > >> > > > > > > MVCC
>>> > > > >> > > > > > > > > transactions to finish
>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
>>> start a
>>> > > > >> > transaction
>>> > > > >> > > > > > > > > 4. get previous values
>>> > > > >> > > > > > > > > 5. create KVs
>>> > > > >> > > > > > > > > 6. write to Memstore
>>> > > > >> > > > > > > > > 7. write to WAL
>>> > > > >> > > > > > > > > 8. release row lock
>>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
>>> > complete
>>> > > > the
>>> > > > >> > > > > > transaction
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
>>> has a
>>> > > > pending
>>> > > > >> > > queue
>>> > > > >> > > > of
>>> > > > >> > > > > > > > writes
>>> > > > >> > > > > > > > > named writeQueue.
>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
>>> waits
>>> > > until
>>> > > > >> > > > writeQueue
>>> > > > >> > > > > > is
>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step
>>> 9
>>> > > > removes
>>> > > > >> the
>>> > > > >> > > > > > > WriteEntry
>>> > > > >> > > > > > > > > from writeQueue.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > I think that when a handler thread is processing
>>> > > between
>>> > > > >> > step 2
>>> > > > >> > > > and
>>> > > > >> > > > > > > step
>>> > > > >> > > > > > > > 9,
>>> > > > >> > > > > > > > > the other handler threads can wait until the
>>> thread
>>> > > > >> completes
>>> > > > >> > > > step
>>> > > > >> > > > > 9.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > That is right. We need to read, after all
>>> outstanding
>>> > > > >> updates
>>> > > > >> > are
>>> > > > >> > > > > > done...
>>> > > > >> > > > > > > > because we need to read the latest update before
>>> we go
>>> > > to
>>> > > > >> > > > > > > modify/increment
>>> > > > >> > > > > > > > it.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > How do you make out this?
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > In 0.98.x we did this:
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > ... and in 1.0 we do this:
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which
>>> is
>>> > > > this....
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > +  public void
>>> waitForPreviousTransactionsComplete() {
>>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
>>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
>>> > > > >> > > > > > > > +  }
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0
>>> (
>>> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763
>>> ).
>>> > > > Previous
>>> > > > >> > mvcc
>>> > > > >> > > > and
>>> > > > >> > > > > > > > region
>>> > > > >> > > > > > > > sequenceid would spin independent of each other.
>>> > Perhaps
>>> > > > >> this
>>> > > > >> > > > > > responsible
>>> > > > >> > > > > > > > for some slow down.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > That said, looking in your thread dump, we seem
>>> to be
>>> > > down
>>> > > > >> in
>>> > > > >> > the
>>> > > > >> > > > > Get.
>>> > > > >> > > > > > If
>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where is
>>> the
>>> > > > >> > > lock-holding
>>> > > > >> > > > > > > thread?
>>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
>>> sequence
>>> > > id?
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Is it possible you are contending on a counter
>>> > > > post-upgrade?
>>> > > > >> > Is
>>> > > > >> > > it
>>> > > > >> > > > > > > > possible that all these threads are trying to get
>>> to
>>> > the
>>> > > > >> same
>>> > > > >> > row
>>> > > > >> > > > to
>>> > > > >> > > > > > > update
>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>>> you
>>> > > > >> thinking
>>> > > > >> > > > > increment
>>> > > > >> > > > > > > > itself has slowed significantly?
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > St.Ack
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > > Thanks,
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > Toshihiro Suzuki
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
>>> > yuzhihong@gmail.com
>>> > > >:
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
>>> > > region):
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > >     try {
>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > Thanks
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
>>> > > > >> brfrn169@gmail.com>
>>> > > > >> > > > wrote:
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > > Hi,
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > We upgraded our cluster from
>>> > CDH5.3.1(HBase0.98.6)
>>> > > > to
>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
>>> > > > >> > > > > > > > > > > and we experience slowdown in increment
>>> > operation.
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
>>> > > > >> RegionServer of
>>> > > > >> > > our
>>> > > > >> > > > > > > > cluster:
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Thread 68
>>> > > > >> > > > > > >
>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
>>> > > > >> > > > > > > > > > >   State: BLOCKED
>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
>>> > > > >> > > > > > > > > > >   Waited count: 39828360
>>> > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
>>> > > > >> > > > > > > > > > >   Blocked by 63
>>> > > > >> > > > > > > > >
>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
>>> > > > >> > > > > > > > > > >   Stack:
>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > >
>>> > > > >> > >
>>> > > > >>
>>> > > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > >
>>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > >
>>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > >
>>> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>>> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > There are many similar threads in the thread
>>> > dump.
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > I read the source code and I think this is
>>> > caused
>>> > > by
>>> > > > >> > > changes
>>> > > > >> > > > of
>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
>>> occur in
>>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Also we wrote performance test code for
>>> > increment
>>> > > > >> > operation
>>> > > > >> > > > > that
>>> > > > >> > > > > > > > > included
>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > The result is shown below:
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
>>> > > > >> 7.975072509210629
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
>>> > > > 49.11840157868772
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Thanks,
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

We've been doing more debugging of this and have set up the read vs write
handlers to try to at least segment this away so reads can work. We have
pretty beefy servers, and are running wiht the following settings:

hbase.regionserver.handler.count=1000
hbase.ipc.server.read.threadpool.size=50
hbase.ipc.server.callqueue.handler.factor=0.025
hbase.ipc.server.callqueue.read.ratio=0.6
hbase.ipc.server.callqueue.scan.ratio=0.5

We are seeing all 400 write handlers taken up by row locks for the most
part. The read handlers are mostly idle. We're thinking of changing the
ratio here, but are not sure it will help if they are all blocked on a row
lock.  We just enabled DEBUG logging on all our servers and notice the
following:

2015-12-01 00:56:09,015 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait
false, activity 00:54:36.240]
2015-12-01 00:56:09,015 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait
false, activity 00:54:36.256]
2015-12-01 00:56:09,268 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait
false, activity 00:55:01.259]
2015-12-01 00:56:09,279 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait
false, activity 00:53:58.151]
2015-12-01 00:56:09,279 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait
false, activity 00:53:58.162]


On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <bb...@hubspot.com>
wrote:

> Sorry the second link should be
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
>
> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>>
>> An active handler:
>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
>> One that is locked:
>> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
>>
>> The difference between pre-rollback and post is that previously we were
>> seeing things blocked in mvcc.  Now we are seeing them blocked on the
>> upsert.
>>
>> It always follows the same pattern, of 1 active handler in the upsert and
>> the rest blocked waiting for it.
>>
>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
>>
>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
>>> bbeaudreault@hubspot.com
>>> > wrote:
>>>
>>> > The rollback seems to have mostly solved the issue for one of our
>>> clusters,
>>> > but another one is still seeing long increment times:
>>> >
>>> > "slowIncrementCount": 52080,
>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
>>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
>>> >
>>> >
>>> > Any ideas if there are other changes that may be causing a performance
>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
>>> >
>>> >
>>> >
>>> No.
>>>
>>> Post a thread dump Bryan and it might prompt something.
>>>
>>> St.Ack
>>>
>>>
>>>
>>>
>>> >
>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
>>> >
>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
>>> > > bbeaudreault@hubspot.com> wrote:
>>> > >
>>> > > > Should this be added as a known issue in the CDH or hbase
>>> > documentation?
>>> > > It
>>> > > > was a severe performance hit for us, all of our regionservers were
>>> > > sitting
>>> > > > at a few thousand queued requests.
>>> > > >
>>> > > >
>>> > > Let me take care of that.
>>> > > St.Ack
>>> > >
>>> > >
>>> > >
>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
>>> > > > bbeaudreault@hubspot.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Yea, they are all over the place and called from client and
>>> > coprocessor
>>> > > > > code. We ended up having no other option but to rollback, and
>>> aside
>>> > > from
>>> > > > a
>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
>>> Put#addColumn),
>>> > > it
>>> > > > > seems to be working and fixing our problem.
>>> > > > >
>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
>>> > > > >
>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on this
>>> > now.
>>> > > > >> Where
>>> > > > >> are your counters Bryan? In their own column family or scattered
>>> > about
>>> > > > in
>>> > > > >> a
>>> > > > >> row with other Cell types?
>>> > > > >> St.Ack
>>> > > > >>
>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
>>> > > > >> bbeaudreault@hubspot.com> wrote:
>>> > > > >>
>>> > > > >> > Is there any update to this? We just upgraded all of our
>>> > production
>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
>>> listed in
>>> > > the
>>> > > > >> > known issues, did not not about this.  Now we are seeing
>>> > perfomance
>>> > > > >> issues
>>> > > > >> > across all clusters, as we make heavy use of increments.
>>> > > > >> >
>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
>>> roll
>>> > > back
>>> > > > >> to
>>> > > > >> > CDH 5.3.1 (if that is possible)?
>>> > > > >> >
>>> > > > >> >
>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
>>> wrote:
>>> > > > >> >
>>> > > > >> > > Thank you St.Ack!
>>> > > > >> > >
>>> > > > >> > > I would like to follow the ticket.
>>> > > > >> > >
>>> > > > >> > > Toshihiro Suzuki
>>> > > > >> > >
>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
>>> > > > >> > >
>>> > > > >> > > > Back to this problem. Simple tests confirm that as is, the
>>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
>>> some
>>> > > > other
>>> > > > >> row
>>> > > > >> > > is
>>> > > > >> > > > slow to complete. In particular Increment, checkAndPut,
>>> and
>>> > > batch
>>> > > > >> > > mutations
>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
>>> up.
>>> > Lets
>>> > > > >> see if
>>> > > > >> > > we
>>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
>>> not
>>> > all
>>> > > > >> Region
>>> > > > >> > > ops
>>> > > > >> > > > are paused.
>>> > > > >> > > >
>>> > > > >> > > > St.Ack
>>> > > > >> > > >
>>> > > > >> > > >
>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <brfrn169@gmail.com
>>> >
>>> > > wrote:
>>> > > > >> > > >
>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>>> > helpful
>>> > > > >> > > diagram).
>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>>> the
>>> > > > >> > > illustration.
>>> > > > >> > > > It
>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
>>> only...
>>> > > Writes
>>> > > > >> > > against
>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag
>>> an
>>> > mvcc
>>> > > > >> with a
>>> > > > >> > > > 'row'
>>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>>> > current
>>> > > > >> > > operation?
>>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
>>> > > > >> > > > >
>>> > > > >> > > > > > You need to read back the increment and have it be
>>> > 'correct'
>>> > > > at
>>> > > > >> > > > increment
>>> > > > >> > > > > > time?
>>> > > > >> > > > > Yes, we need it.
>>> > > > >> > > > >
>>> > > > >> > > > > I would like to help if there is anything I can do.
>>> > > > >> > > > >
>>> > > > >> > > > > Thanks,
>>> > > > >> > > > > Toshihiro Suzuki
>>> > > > >> > > > >
>>> > > > >> > > > >
>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
>>> > > > >> > > > >
>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>>> > helpful
>>> > > > >> > > diagram).
>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>>> the
>>> > > > >> > > illustration.
>>> > > > >> > > > It
>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
>>> only...
>>> > > Writes
>>> > > > >> > > against
>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag
>>> an
>>> > mvcc
>>> > > > >> with a
>>> > > > >> > > > 'row'
>>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>>> > current
>>> > > > >> > > operation?
>>> > > > >> > > > > >
>>> > > > >> > > > > > You need to read back the increment and have it be
>>> > 'correct'
>>> > > > at
>>> > > > >> > > > increment
>>> > > > >> > > > > > time?
>>> > > > >> > > > > >
>>> > > > >> > > > > > (This is a good one)
>>> > > > >> > > > > >
>>> > > > >> > > > > > Thank you Toshihiro Suzuki
>>> > > > >> > > > > > St.Ack
>>> > > > >> > > > > >
>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
>>> brfrn169@gmail.com
>>> > >
>>> > > > >> wrote:
>>> > > > >> > > > > >
>>> > > > >> > > > > > > St.Ack,
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Thank you for your response.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Why I make out that "A region lock (not a row lock)
>>> > seems
>>> > > to
>>> > > > >> > occur
>>> > > > >> > > in
>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
>>> follows:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
>>> writeQueue
>>> > > can
>>> > > > >> > cause
>>> > > > >> > > a
>>> > > > >> > > > > > region
>>> > > > >> > > > > > > lock.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
>>> > > advanceMemstore(w)
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert()
>>> to
>>> > > > >> writeQueue
>>> > > > >> > > and
>>> > > > >> > > > > > waits
>>> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
>>> == w.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > I think when a handler thread is processing between
>>> > step 2
>>> > > > and
>>> > > > >> > step
>>> > > > >> > > > 3,
>>> > > > >> > > > > > the
>>> > > > >> > > > > > > other handler threads can wait at step 1 until the
>>> > thread
>>> > > > >> > completes
>>> > > > >> > > > > step
>>> > > > >> > > > > > 3
>>> > > > >> > > > > > > This is depicted as follows:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Actually, in the thread dump of our region server,
>>> many
>>> > > > >> handler
>>> > > > >> > > > threads
>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Many handler threads wait at this:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > > Is it possible you are contending on a counter
>>> > > > post-upgrade?
>>> > > > >> > Is
>>> > > > >> > > it
>>> > > > >> > > > > > > > possible that all these threads are trying to get
>>> to
>>> > the
>>> > > > >> same
>>> > > > >> > row
>>> > > > >> > > > to
>>> > > > >> > > > > > > update
>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>>> you
>>> > > > >> thinking
>>> > > > >> > > > > increment
>>> > > > >> > > > > > > > itself has slowed significantly?
>>> > > > >> > > > > > > We have just upgraded HBase, not changed the app
>>> > behavior.
>>> > > > We
>>> > > > >> are
>>> > > > >> > > > > > thinking
>>> > > > >> > > > > > > increment itself has slowed significantly.
>>> > > > >> > > > > > > Before upgrading HBase, it was good throughput and
>>> > > latency.
>>> > > > >> > > > > > > Currently, to cope with this problem, we split the
>>> > regions
>>> > > > >> > finely.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Thanks,
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Toshihiro Suzuki
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <stack@duboce.net
>>> >:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
>>> > > brfrn169@gmail.com
>>> > > > >
>>> > > > >> > > wrote:
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > > Ted,
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > Thank you for your response.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > I think that increment operation works as
>>> follows:
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > 1. get row lock
>>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() //
>>> > wait
>>> > > > for
>>> > > > >> all
>>> > > > >> > > > prior
>>> > > > >> > > > > > > MVCC
>>> > > > >> > > > > > > > > transactions to finish
>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
>>> start a
>>> > > > >> > transaction
>>> > > > >> > > > > > > > > 4. get previous values
>>> > > > >> > > > > > > > > 5. create KVs
>>> > > > >> > > > > > > > > 6. write to Memstore
>>> > > > >> > > > > > > > > 7. write to WAL
>>> > > > >> > > > > > > > > 8. release row lock
>>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
>>> > complete
>>> > > > the
>>> > > > >> > > > > > transaction
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
>>> has a
>>> > > > pending
>>> > > > >> > > queue
>>> > > > >> > > > of
>>> > > > >> > > > > > > > writes
>>> > > > >> > > > > > > > > named writeQueue.
>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
>>> waits
>>> > > until
>>> > > > >> > > > writeQueue
>>> > > > >> > > > > > is
>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step
>>> 9
>>> > > > removes
>>> > > > >> the
>>> > > > >> > > > > > > WriteEntry
>>> > > > >> > > > > > > > > from writeQueue.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > I think that when a handler thread is processing
>>> > > between
>>> > > > >> > step 2
>>> > > > >> > > > and
>>> > > > >> > > > > > > step
>>> > > > >> > > > > > > > 9,
>>> > > > >> > > > > > > > > the other handler threads can wait until the
>>> thread
>>> > > > >> completes
>>> > > > >> > > > step
>>> > > > >> > > > > 9.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > That is right. We need to read, after all
>>> outstanding
>>> > > > >> updates
>>> > > > >> > are
>>> > > > >> > > > > > done...
>>> > > > >> > > > > > > > because we need to read the latest update before
>>> we go
>>> > > to
>>> > > > >> > > > > > > modify/increment
>>> > > > >> > > > > > > > it.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > How do you make out this?
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > In 0.98.x we did this:
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > ... and in 1.0 we do this:
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which
>>> is
>>> > > > this....
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > +  public void
>>> waitForPreviousTransactionsComplete() {
>>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
>>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
>>> > > > >> > > > > > > > +  }
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0
>>> (
>>> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763
>>> ).
>>> > > > Previous
>>> > > > >> > mvcc
>>> > > > >> > > > and
>>> > > > >> > > > > > > > region
>>> > > > >> > > > > > > > sequenceid would spin independent of each other.
>>> > Perhaps
>>> > > > >> this
>>> > > > >> > > > > > responsible
>>> > > > >> > > > > > > > for some slow down.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > That said, looking in your thread dump, we seem
>>> to be
>>> > > down
>>> > > > >> in
>>> > > > >> > the
>>> > > > >> > > > > Get.
>>> > > > >> > > > > > If
>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where is
>>> the
>>> > > > >> > > lock-holding
>>> > > > >> > > > > > > thread?
>>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
>>> sequence
>>> > > id?
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Is it possible you are contending on a counter
>>> > > > post-upgrade?
>>> > > > >> > Is
>>> > > > >> > > it
>>> > > > >> > > > > > > > possible that all these threads are trying to get
>>> to
>>> > the
>>> > > > >> same
>>> > > > >> > row
>>> > > > >> > > > to
>>> > > > >> > > > > > > update
>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>>> you
>>> > > > >> thinking
>>> > > > >> > > > > increment
>>> > > > >> > > > > > > > itself has slowed significantly?
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > St.Ack
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > > Thanks,
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > Toshihiro Suzuki
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
>>> > yuzhihong@gmail.com
>>> > > >:
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
>>> > > region):
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > >     try {
>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > Thanks
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
>>> > > > >> brfrn169@gmail.com>
>>> > > > >> > > > wrote:
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > > Hi,
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > We upgraded our cluster from
>>> > CDH5.3.1(HBase0.98.6)
>>> > > > to
>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
>>> > > > >> > > > > > > > > > > and we experience slowdown in increment
>>> > operation.
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
>>> > > > >> RegionServer of
>>> > > > >> > > our
>>> > > > >> > > > > > > > cluster:
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Thread 68
>>> > > > >> > > > > > >
>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
>>> > > > >> > > > > > > > > > >   State: BLOCKED
>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
>>> > > > >> > > > > > > > > > >   Waited count: 39828360
>>> > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
>>> > > > >> > > > > > > > > > >   Blocked by 63
>>> > > > >> > > > > > > > >
>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
>>> > > > >> > > > > > > > > > >   Stack:
>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > >
>>> > > > >> > >
>>> > > > >>
>>> > > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > >
>>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > >
>>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > >
>>> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>>> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > There are many similar threads in the thread
>>> > dump.
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > I read the source code and I think this is
>>> > caused
>>> > > by
>>> > > > >> > > changes
>>> > > > >> > > > of
>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
>>> occur in
>>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Also we wrote performance test code for
>>> > increment
>>> > > > >> > operation
>>> > > > >> > > > > that
>>> > > > >> > > > > > > > > included
>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > The result is shown below:
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
>>> > > > >> 7.975072509210629
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
>>> > > > 49.11840157868772
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Thanks,
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Sorry the second link should be
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579

On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <bb...@hubspot.com>
wrote:

> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>
> An active handler:
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> One that is locked:
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
>
> The difference between pre-rollback and post is that previously we were
> seeing things blocked in mvcc.  Now we are seeing them blocked on the
> upsert.
>
> It always follows the same pattern, of 1 active handler in the upsert and
> the rest blocked waiting for it.
>
> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
>
>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
>> bbeaudreault@hubspot.com
>> > wrote:
>>
>> > The rollback seems to have mostly solved the issue for one of our
>> clusters,
>> > but another one is still seeing long increment times:
>> >
>> > "slowIncrementCount": 52080,
>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
>> >
>> >
>> > Any ideas if there are other changes that may be causing a performance
>> > regression for increments between CDH4.7.1 and CDH5.3.8?
>> >
>> >
>> >
>> No.
>>
>> Post a thread dump Bryan and it might prompt something.
>>
>> St.Ack
>>
>>
>>
>>
>> >
>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
>> >
>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
>> > > bbeaudreault@hubspot.com> wrote:
>> > >
>> > > > Should this be added as a known issue in the CDH or hbase
>> > documentation?
>> > > It
>> > > > was a severe performance hit for us, all of our regionservers were
>> > > sitting
>> > > > at a few thousand queued requests.
>> > > >
>> > > >
>> > > Let me take care of that.
>> > > St.Ack
>> > >
>> > >
>> > >
>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
>> > > > bbeaudreault@hubspot.com>
>> > > > wrote:
>> > > >
>> > > > > Yea, they are all over the place and called from client and
>> > coprocessor
>> > > > > code. We ended up having no other option but to rollback, and
>> aside
>> > > from
>> > > > a
>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
>> Put#addColumn),
>> > > it
>> > > > > seems to be working and fixing our problem.
>> > > > >
>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
>> > > > >
>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on this
>> > now.
>> > > > >> Where
>> > > > >> are your counters Bryan? In their own column family or scattered
>> > about
>> > > > in
>> > > > >> a
>> > > > >> row with other Cell types?
>> > > > >> St.Ack
>> > > > >>
>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
>> > > > >> bbeaudreault@hubspot.com> wrote:
>> > > > >>
>> > > > >> > Is there any update to this? We just upgraded all of our
>> > production
>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
>> listed in
>> > > the
>> > > > >> > known issues, did not not about this.  Now we are seeing
>> > perfomance
>> > > > >> issues
>> > > > >> > across all clusters, as we make heavy use of increments.
>> > > > >> >
>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
>> roll
>> > > back
>> > > > >> to
>> > > > >> > CDH 5.3.1 (if that is possible)?
>> > > > >> >
>> > > > >> >
>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
>> wrote:
>> > > > >> >
>> > > > >> > > Thank you St.Ack!
>> > > > >> > >
>> > > > >> > > I would like to follow the ticket.
>> > > > >> > >
>> > > > >> > > Toshihiro Suzuki
>> > > > >> > >
>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
>> > > > >> > >
>> > > > >> > > > Back to this problem. Simple tests confirm that as is, the
>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
>> some
>> > > > other
>> > > > >> row
>> > > > >> > > is
>> > > > >> > > > slow to complete. In particular Increment, checkAndPut, and
>> > > batch
>> > > > >> > > mutations
>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix up.
>> > Lets
>> > > > >> see if
>> > > > >> > > we
>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so not
>> > all
>> > > > >> Region
>> > > > >> > > ops
>> > > > >> > > > are paused.
>> > > > >> > > >
>> > > > >> > > > St.Ack
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com>
>> > > wrote:
>> > > > >> > > >
>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>> > helpful
>> > > > >> > > diagram).
>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>> the
>> > > > >> > > illustration.
>> > > > >> > > > It
>> > > > >> > > > > > is as though the mvcc should be scoped to a row only...
>> > > Writes
>> > > > >> > > against
>> > > > >> > > > > > other rows should not hold up my read of my row. Tag an
>> > mvcc
>> > > > >> with a
>> > > > >> > > > 'row'
>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>> > current
>> > > > >> > > operation?
>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
>> > > > >> > > > >
>> > > > >> > > > > > You need to read back the increment and have it be
>> > 'correct'
>> > > > at
>> > > > >> > > > increment
>> > > > >> > > > > > time?
>> > > > >> > > > > Yes, we need it.
>> > > > >> > > > >
>> > > > >> > > > > I would like to help if there is anything I can do.
>> > > > >> > > > >
>> > > > >> > > > > Thanks,
>> > > > >> > > > > Toshihiro Suzuki
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
>> > > > >> > > > >
>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>> > helpful
>> > > > >> > > diagram).
>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>> the
>> > > > >> > > illustration.
>> > > > >> > > > It
>> > > > >> > > > > > is as though the mvcc should be scoped to a row only...
>> > > Writes
>> > > > >> > > against
>> > > > >> > > > > > other rows should not hold up my read of my row. Tag an
>> > mvcc
>> > > > >> with a
>> > > > >> > > > 'row'
>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>> > current
>> > > > >> > > operation?
>> > > > >> > > > > >
>> > > > >> > > > > > You need to read back the increment and have it be
>> > 'correct'
>> > > > at
>> > > > >> > > > increment
>> > > > >> > > > > > time?
>> > > > >> > > > > >
>> > > > >> > > > > > (This is a good one)
>> > > > >> > > > > >
>> > > > >> > > > > > Thank you Toshihiro Suzuki
>> > > > >> > > > > > St.Ack
>> > > > >> > > > > >
>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
>> brfrn169@gmail.com
>> > >
>> > > > >> wrote:
>> > > > >> > > > > >
>> > > > >> > > > > > > St.Ack,
>> > > > >> > > > > > >
>> > > > >> > > > > > > Thank you for your response.
>> > > > >> > > > > > >
>> > > > >> > > > > > > Why I make out that "A region lock (not a row lock)
>> > seems
>> > > to
>> > > > >> > occur
>> > > > >> > > in
>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
>> > > > >> > > > > > >
>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
>> > > > >> > > > > > >
>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
>> > > > >> > > > > > >
>> > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
>> > > > >> > > > > > >
>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
>> writeQueue
>> > > can
>> > > > >> > cause
>> > > > >> > > a
>> > > > >> > > > > > region
>> > > > >> > > > > > > lock.
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
>> > > > >> > > > > > >
>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
>> > > > >> > > > > > >
>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
>> > > advanceMemstore(w)
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
>> > > > >> > > > > > >
>> > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert()
>> to
>> > > > >> writeQueue
>> > > > >> > > and
>> > > > >> > > > > > waits
>> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
>> == w.
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > I think when a handler thread is processing between
>> > step 2
>> > > > and
>> > > > >> > step
>> > > > >> > > > 3,
>> > > > >> > > > > > the
>> > > > >> > > > > > > other handler threads can wait at step 1 until the
>> > thread
>> > > > >> > completes
>> > > > >> > > > > step
>> > > > >> > > > > > 3
>> > > > >> > > > > > > This is depicted as follows:
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > Actually, in the thread dump of our region server,
>> many
>> > > > >> handler
>> > > > >> > > > threads
>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
>> > > > >> > > > > > >
>> > > > >> > > > > > > Many handler threads wait at this:
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > > Is it possible you are contending on a counter
>> > > > post-upgrade?
>> > > > >> > Is
>> > > > >> > > it
>> > > > >> > > > > > > > possible that all these threads are trying to get
>> to
>> > the
>> > > > >> same
>> > > > >> > row
>> > > > >> > > > to
>> > > > >> > > > > > > update
>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>> you
>> > > > >> thinking
>> > > > >> > > > > increment
>> > > > >> > > > > > > > itself has slowed significantly?
>> > > > >> > > > > > > We have just upgraded HBase, not changed the app
>> > behavior.
>> > > > We
>> > > > >> are
>> > > > >> > > > > > thinking
>> > > > >> > > > > > > increment itself has slowed significantly.
>> > > > >> > > > > > > Before upgrading HBase, it was good throughput and
>> > > latency.
>> > > > >> > > > > > > Currently, to cope with this problem, we split the
>> > regions
>> > > > >> > finely.
>> > > > >> > > > > > >
>> > > > >> > > > > > > Thanks,
>> > > > >> > > > > > >
>> > > > >> > > > > > > Toshihiro Suzuki
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
>> > > > >> > > > > > >
>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
>> > > brfrn169@gmail.com
>> > > > >
>> > > > >> > > wrote:
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > > Ted,
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > Thank you for your response.
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > >
>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > I think that increment operation works as
>> follows:
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > 1. get row lock
>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() //
>> > wait
>> > > > for
>> > > > >> all
>> > > > >> > > > prior
>> > > > >> > > > > > > MVCC
>> > > > >> > > > > > > > > transactions to finish
>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start
>> a
>> > > > >> > transaction
>> > > > >> > > > > > > > > 4. get previous values
>> > > > >> > > > > > > > > 5. create KVs
>> > > > >> > > > > > > > > 6. write to Memstore
>> > > > >> > > > > > > > > 7. write to WAL
>> > > > >> > > > > > > > > 8. release row lock
>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
>> > complete
>> > > > the
>> > > > >> > > > > > transaction
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl has
>> a
>> > > > pending
>> > > > >> > > queue
>> > > > >> > > > of
>> > > > >> > > > > > > > writes
>> > > > >> > > > > > > > > named writeQueue.
>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
>> waits
>> > > until
>> > > > >> > > > writeQueue
>> > > > >> > > > > > is
>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9
>> > > > removes
>> > > > >> the
>> > > > >> > > > > > > WriteEntry
>> > > > >> > > > > > > > > from writeQueue.
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > I think that when a handler thread is processing
>> > > between
>> > > > >> > step 2
>> > > > >> > > > and
>> > > > >> > > > > > > step
>> > > > >> > > > > > > > 9,
>> > > > >> > > > > > > > > the other handler threads can wait until the
>> thread
>> > > > >> completes
>> > > > >> > > > step
>> > > > >> > > > > 9.
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > That is right. We need to read, after all
>> outstanding
>> > > > >> updates
>> > > > >> > are
>> > > > >> > > > > > done...
>> > > > >> > > > > > > > because we need to read the latest update before
>> we go
>> > > to
>> > > > >> > > > > > > modify/increment
>> > > > >> > > > > > > > it.
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > How do you make out this?
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > In 0.98.x we did this:
>> > > > >> > > > > > > >
>> > > > >> > > > > > > >
>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > ... and in 1.0 we do this:
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is
>> > > > this....
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > +  public void
>> waitForPreviousTransactionsComplete() {
>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
>> > > > >> > > > > > > > +  }
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
>> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763).
>> > > > Previous
>> > > > >> > mvcc
>> > > > >> > > > and
>> > > > >> > > > > > > > region
>> > > > >> > > > > > > > sequenceid would spin independent of each other.
>> > Perhaps
>> > > > >> this
>> > > > >> > > > > > responsible
>> > > > >> > > > > > > > for some slow down.
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > That said, looking in your thread dump, we seem to
>> be
>> > > down
>> > > > >> in
>> > > > >> > the
>> > > > >> > > > > Get.
>> > > > >> > > > > > If
>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where is
>> the
>> > > > >> > > lock-holding
>> > > > >> > > > > > > thread?
>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
>> sequence
>> > > id?
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > Is it possible you are contending on a counter
>> > > > post-upgrade?
>> > > > >> > Is
>> > > > >> > > it
>> > > > >> > > > > > > > possible that all these threads are trying to get
>> to
>> > the
>> > > > >> same
>> > > > >> > row
>> > > > >> > > > to
>> > > > >> > > > > > > update
>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>> you
>> > > > >> thinking
>> > > > >> > > > > increment
>> > > > >> > > > > > > > itself has slowed significantly?
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > St.Ack
>> > > > >> > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > > Thanks,
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > Toshihiro Suzuki
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
>> > yuzhihong@gmail.com
>> > > >:
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
>> > > region):
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > > >     try {
>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > > > Thanks
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
>> > > > >> brfrn169@gmail.com>
>> > > > >> > > > wrote:
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > > > > Hi,
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > We upgraded our cluster from
>> > CDH5.3.1(HBase0.98.6)
>> > > > to
>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
>> > > > >> > > > > > > > > > > and we experience slowdown in increment
>> > operation.
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
>> > > > >> RegionServer of
>> > > > >> > > our
>> > > > >> > > > > > > > cluster:
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > Thread 68
>> > > > >> > > > > > >
>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
>> > > > >> > > > > > > > > > >   State: BLOCKED
>> > > > >> > > > > > > > > > >   Blocked count: 21689888
>> > > > >> > > > > > > > > > >   Waited count: 39828360
>> > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
>> > > > >> > > > > > > > > > >   Blocked by 63
>> > > > >> > > > > > > > >
>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
>> > > > >> > > > > > > > > > >   Stack:
>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > >
>> > > > >> > >
>> > > > >>
>> > > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > >
>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > >
>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > >
>> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > There are many similar threads in the thread
>> > dump.
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > I read the source code and I think this is
>> > caused
>> > > by
>> > > > >> > > changes
>> > > > >> > > > of
>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
>> occur in
>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > Also we wrote performance test code for
>> > increment
>> > > > >> > operation
>> > > > >> > > > > that
>> > > > >> > > > > > > > > included
>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > The result is shown below:
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
>> > > > >> 7.975072509210629
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
>> > > > 49.11840157868772
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > Thanks,
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > Toshihiro Suzuki
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Sorry the second link should be
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579

On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <bb...@hubspot.com>
wrote:

> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>
> An active handler:
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> One that is locked:
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
>
> The difference between pre-rollback and post is that previously we were
> seeing things blocked in mvcc.  Now we are seeing them blocked on the
> upsert.
>
> It always follows the same pattern, of 1 active handler in the upsert and
> the rest blocked waiting for it.
>
> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
>
>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
>> bbeaudreault@hubspot.com
>> > wrote:
>>
>> > The rollback seems to have mostly solved the issue for one of our
>> clusters,
>> > but another one is still seeing long increment times:
>> >
>> > "slowIncrementCount": 52080,
>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
>> >
>> >
>> > Any ideas if there are other changes that may be causing a performance
>> > regression for increments between CDH4.7.1 and CDH5.3.8?
>> >
>> >
>> >
>> No.
>>
>> Post a thread dump Bryan and it might prompt something.
>>
>> St.Ack
>>
>>
>>
>>
>> >
>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
>> >
>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
>> > > bbeaudreault@hubspot.com> wrote:
>> > >
>> > > > Should this be added as a known issue in the CDH or hbase
>> > documentation?
>> > > It
>> > > > was a severe performance hit for us, all of our regionservers were
>> > > sitting
>> > > > at a few thousand queued requests.
>> > > >
>> > > >
>> > > Let me take care of that.
>> > > St.Ack
>> > >
>> > >
>> > >
>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
>> > > > bbeaudreault@hubspot.com>
>> > > > wrote:
>> > > >
>> > > > > Yea, they are all over the place and called from client and
>> > coprocessor
>> > > > > code. We ended up having no other option but to rollback, and
>> aside
>> > > from
>> > > > a
>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
>> Put#addColumn),
>> > > it
>> > > > > seems to be working and fixing our problem.
>> > > > >
>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
>> > > > >
>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on this
>> > now.
>> > > > >> Where
>> > > > >> are your counters Bryan? In their own column family or scattered
>> > about
>> > > > in
>> > > > >> a
>> > > > >> row with other Cell types?
>> > > > >> St.Ack
>> > > > >>
>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
>> > > > >> bbeaudreault@hubspot.com> wrote:
>> > > > >>
>> > > > >> > Is there any update to this? We just upgraded all of our
>> > production
>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
>> listed in
>> > > the
>> > > > >> > known issues, did not not about this.  Now we are seeing
>> > perfomance
>> > > > >> issues
>> > > > >> > across all clusters, as we make heavy use of increments.
>> > > > >> >
>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
>> roll
>> > > back
>> > > > >> to
>> > > > >> > CDH 5.3.1 (if that is possible)?
>> > > > >> >
>> > > > >> >
>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
>> wrote:
>> > > > >> >
>> > > > >> > > Thank you St.Ack!
>> > > > >> > >
>> > > > >> > > I would like to follow the ticket.
>> > > > >> > >
>> > > > >> > > Toshihiro Suzuki
>> > > > >> > >
>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
>> > > > >> > >
>> > > > >> > > > Back to this problem. Simple tests confirm that as is, the
>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
>> some
>> > > > other
>> > > > >> row
>> > > > >> > > is
>> > > > >> > > > slow to complete. In particular Increment, checkAndPut, and
>> > > batch
>> > > > >> > > mutations
>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix up.
>> > Lets
>> > > > >> see if
>> > > > >> > > we
>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so not
>> > all
>> > > > >> Region
>> > > > >> > > ops
>> > > > >> > > > are paused.
>> > > > >> > > >
>> > > > >> > > > St.Ack
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com>
>> > > wrote:
>> > > > >> > > >
>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>> > helpful
>> > > > >> > > diagram).
>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>> the
>> > > > >> > > illustration.
>> > > > >> > > > It
>> > > > >> > > > > > is as though the mvcc should be scoped to a row only...
>> > > Writes
>> > > > >> > > against
>> > > > >> > > > > > other rows should not hold up my read of my row. Tag an
>> > mvcc
>> > > > >> with a
>> > > > >> > > > 'row'
>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>> > current
>> > > > >> > > operation?
>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
>> > > > >> > > > >
>> > > > >> > > > > > You need to read back the increment and have it be
>> > 'correct'
>> > > > at
>> > > > >> > > > increment
>> > > > >> > > > > > time?
>> > > > >> > > > > Yes, we need it.
>> > > > >> > > > >
>> > > > >> > > > > I would like to help if there is anything I can do.
>> > > > >> > > > >
>> > > > >> > > > > Thanks,
>> > > > >> > > > > Toshihiro Suzuki
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
>> > > > >> > > > >
>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>> > helpful
>> > > > >> > > diagram).
>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>> the
>> > > > >> > > illustration.
>> > > > >> > > > It
>> > > > >> > > > > > is as though the mvcc should be scoped to a row only...
>> > > Writes
>> > > > >> > > against
>> > > > >> > > > > > other rows should not hold up my read of my row. Tag an
>> > mvcc
>> > > > >> with a
>> > > > >> > > > 'row'
>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>> > current
>> > > > >> > > operation?
>> > > > >> > > > > >
>> > > > >> > > > > > You need to read back the increment and have it be
>> > 'correct'
>> > > > at
>> > > > >> > > > increment
>> > > > >> > > > > > time?
>> > > > >> > > > > >
>> > > > >> > > > > > (This is a good one)
>> > > > >> > > > > >
>> > > > >> > > > > > Thank you Toshihiro Suzuki
>> > > > >> > > > > > St.Ack
>> > > > >> > > > > >
>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
>> brfrn169@gmail.com
>> > >
>> > > > >> wrote:
>> > > > >> > > > > >
>> > > > >> > > > > > > St.Ack,
>> > > > >> > > > > > >
>> > > > >> > > > > > > Thank you for your response.
>> > > > >> > > > > > >
>> > > > >> > > > > > > Why I make out that "A region lock (not a row lock)
>> > seems
>> > > to
>> > > > >> > occur
>> > > > >> > > in
>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
>> > > > >> > > > > > >
>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
>> > > > >> > > > > > >
>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
>> > > > >> > > > > > >
>> > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
>> > > > >> > > > > > >
>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
>> writeQueue
>> > > can
>> > > > >> > cause
>> > > > >> > > a
>> > > > >> > > > > > region
>> > > > >> > > > > > > lock.
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
>> > > > >> > > > > > >
>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
>> > > > >> > > > > > >
>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
>> > > advanceMemstore(w)
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
>> > > > >> > > > > > >
>> > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert()
>> to
>> > > > >> writeQueue
>> > > > >> > > and
>> > > > >> > > > > > waits
>> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
>> == w.
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > I think when a handler thread is processing between
>> > step 2
>> > > > and
>> > > > >> > step
>> > > > >> > > > 3,
>> > > > >> > > > > > the
>> > > > >> > > > > > > other handler threads can wait at step 1 until the
>> > thread
>> > > > >> > completes
>> > > > >> > > > > step
>> > > > >> > > > > > 3
>> > > > >> > > > > > > This is depicted as follows:
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > Actually, in the thread dump of our region server,
>> many
>> > > > >> handler
>> > > > >> > > > threads
>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
>> > > > >> > > > > > >
>> > > > >> > > > > > > Many handler threads wait at this:
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > > Is it possible you are contending on a counter
>> > > > post-upgrade?
>> > > > >> > Is
>> > > > >> > > it
>> > > > >> > > > > > > > possible that all these threads are trying to get
>> to
>> > the
>> > > > >> same
>> > > > >> > row
>> > > > >> > > > to
>> > > > >> > > > > > > update
>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>> you
>> > > > >> thinking
>> > > > >> > > > > increment
>> > > > >> > > > > > > > itself has slowed significantly?
>> > > > >> > > > > > > We have just upgraded HBase, not changed the app
>> > behavior.
>> > > > We
>> > > > >> are
>> > > > >> > > > > > thinking
>> > > > >> > > > > > > increment itself has slowed significantly.
>> > > > >> > > > > > > Before upgrading HBase, it was good throughput and
>> > > latency.
>> > > > >> > > > > > > Currently, to cope with this problem, we split the
>> > regions
>> > > > >> > finely.
>> > > > >> > > > > > >
>> > > > >> > > > > > > Thanks,
>> > > > >> > > > > > >
>> > > > >> > > > > > > Toshihiro Suzuki
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
>> > > > >> > > > > > >
>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
>> > > brfrn169@gmail.com
>> > > > >
>> > > > >> > > wrote:
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > > Ted,
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > Thank you for your response.
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > >
>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > I think that increment operation works as
>> follows:
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > 1. get row lock
>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() //
>> > wait
>> > > > for
>> > > > >> all
>> > > > >> > > > prior
>> > > > >> > > > > > > MVCC
>> > > > >> > > > > > > > > transactions to finish
>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start
>> a
>> > > > >> > transaction
>> > > > >> > > > > > > > > 4. get previous values
>> > > > >> > > > > > > > > 5. create KVs
>> > > > >> > > > > > > > > 6. write to Memstore
>> > > > >> > > > > > > > > 7. write to WAL
>> > > > >> > > > > > > > > 8. release row lock
>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
>> > complete
>> > > > the
>> > > > >> > > > > > transaction
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl has
>> a
>> > > > pending
>> > > > >> > > queue
>> > > > >> > > > of
>> > > > >> > > > > > > > writes
>> > > > >> > > > > > > > > named writeQueue.
>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
>> waits
>> > > until
>> > > > >> > > > writeQueue
>> > > > >> > > > > > is
>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9
>> > > > removes
>> > > > >> the
>> > > > >> > > > > > > WriteEntry
>> > > > >> > > > > > > > > from writeQueue.
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > I think that when a handler thread is processing
>> > > between
>> > > > >> > step 2
>> > > > >> > > > and
>> > > > >> > > > > > > step
>> > > > >> > > > > > > > 9,
>> > > > >> > > > > > > > > the other handler threads can wait until the
>> thread
>> > > > >> completes
>> > > > >> > > > step
>> > > > >> > > > > 9.
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > That is right. We need to read, after all
>> outstanding
>> > > > >> updates
>> > > > >> > are
>> > > > >> > > > > > done...
>> > > > >> > > > > > > > because we need to read the latest update before
>> we go
>> > > to
>> > > > >> > > > > > > modify/increment
>> > > > >> > > > > > > > it.
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > How do you make out this?
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > In 0.98.x we did this:
>> > > > >> > > > > > > >
>> > > > >> > > > > > > >
>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > ... and in 1.0 we do this:
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is
>> > > > this....
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > +  public void
>> waitForPreviousTransactionsComplete() {
>> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
>> > > > >> > > > > > > > +  }
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
>> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763).
>> > > > Previous
>> > > > >> > mvcc
>> > > > >> > > > and
>> > > > >> > > > > > > > region
>> > > > >> > > > > > > > sequenceid would spin independent of each other.
>> > Perhaps
>> > > > >> this
>> > > > >> > > > > > responsible
>> > > > >> > > > > > > > for some slow down.
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > That said, looking in your thread dump, we seem to
>> be
>> > > down
>> > > > >> in
>> > > > >> > the
>> > > > >> > > > > Get.
>> > > > >> > > > > > If
>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where is
>> the
>> > > > >> > > lock-holding
>> > > > >> > > > > > > thread?
>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
>> sequence
>> > > id?
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > Is it possible you are contending on a counter
>> > > > post-upgrade?
>> > > > >> > Is
>> > > > >> > > it
>> > > > >> > > > > > > > possible that all these threads are trying to get
>> to
>> > the
>> > > > >> same
>> > > > >> > row
>> > > > >> > > > to
>> > > > >> > > > > > > update
>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>> you
>> > > > >> thinking
>> > > > >> > > > > increment
>> > > > >> > > > > > > > itself has slowed significantly?
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > St.Ack
>> > > > >> > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > > Thanks,
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > Toshihiro Suzuki
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
>> > yuzhihong@gmail.com
>> > > >:
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
>> > > region):
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > > >     try {
>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > > > Thanks
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
>> > > > >> brfrn169@gmail.com>
>> > > > >> > > > wrote:
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > > > > Hi,
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > We upgraded our cluster from
>> > CDH5.3.1(HBase0.98.6)
>> > > > to
>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
>> > > > >> > > > > > > > > > > and we experience slowdown in increment
>> > operation.
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
>> > > > >> RegionServer of
>> > > > >> > > our
>> > > > >> > > > > > > > cluster:
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > Thread 68
>> > > > >> > > > > > >
>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
>> > > > >> > > > > > > > > > >   State: BLOCKED
>> > > > >> > > > > > > > > > >   Blocked count: 21689888
>> > > > >> > > > > > > > > > >   Waited count: 39828360
>> > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
>> > > > >> > > > > > > > > > >   Blocked by 63
>> > > > >> > > > > > > > >
>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
>> > > > >> > > > > > > > > > >   Stack:
>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > >
>> > > > >> > >
>> > > > >>
>> > > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > >
>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > >
>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > >
>> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > There are many similar threads in the thread
>> > dump.
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > I read the source code and I think this is
>> > caused
>> > > by
>> > > > >> > > changes
>> > > > >> > > > of
>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
>> occur in
>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > Also we wrote performance test code for
>> > increment
>> > > > >> > operation
>> > > > >> > > > > that
>> > > > >> > > > > > > > > included
>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > The result is shown below:
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
>> > > > >> 7.975072509210629
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
>> > > > 49.11840157868772
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > Thanks,
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > > > Toshihiro Suzuki
>> > > > >> > > > > > > > > > >
>> > > > >> > > > > > > > > >
>> > > > >> > > > > > > > >
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

I didn't think to use the non-aggregated jstack outout as it has become
second nature for us to use https://github.com/HubSpot/astack/.

It rolls up repeating stacktraces. You can see above each stacktrace the
number of times it occurred and an estimated cpu time spent.  Sorry will
try to get it without astack next time. Hopefully this clears things up
though, there are 122 increments blocking in the aggregated jstack

On Tue, Dec 1, 2015 at 12:17 AM Stack <st...@duboce.net> wrote:

> Looking again, the
>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
> thread
> dump and the https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> thread dump are the same? Only have two increments going on in this thread
> dump:
>
> at org.apache.hadoop.hbase.KeyValue.matchingQualifier(KeyValue.java:1656)
>
> ... and other is doing:
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:3593)
>
> Not many increments going on.
>
>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> is two increments too in same places. Is it stuck?
>
> St.Ack
>
>
>
>
>
> On Mon, Nov 30, 2015 at 3:10 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> >
> > An active handler:
> >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > One that is locked:
> >
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> >
> > The difference between pre-rollback and post is that previously we were
> > seeing things blocked in mvcc.  Now we are seeing them blocked on the
> > upsert.
> >
> > It always follows the same pattern, of 1 active handler in the upsert and
> > the rest blocked waiting for it.
> >
> > On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> >
> > > On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com
> > > > wrote:
> > >
> > > > The rollback seems to have mostly solved the issue for one of our
> > > clusters,
> > > > but another one is still seeing long increment times:
> > > >
> > > > "slowIncrementCount": 52080,
> > > > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max":
> 6162,"
> > > > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > > > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> > > > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
> > > >
> > > >
> > > > Any ideas if there are other changes that may be causing a
> performance
> > > > regression for increments between CDH4.7.1 and CDH5.3.8?
> > > >
> > > >
> > > >
> > > No.
> > >
> > > Post a thread dump Bryan and it might prompt something.
> > >
> > > St.Ack
> > >
> > >
> > >
> > >
> > > >
> > > > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> > > >
> > > > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > > > bbeaudreault@hubspot.com> wrote:
> > > > >
> > > > > > Should this be added as a known issue in the CDH or hbase
> > > > documentation?
> > > > > It
> > > > > > was a severe performance hit for us, all of our regionservers
> were
> > > > > sitting
> > > > > > at a few thousand queued requests.
> > > > > >
> > > > > >
> > > > > Let me take care of that.
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > > > bbeaudreault@hubspot.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Yea, they are all over the place and called from client and
> > > > coprocessor
> > > > > > > code. We ended up having no other option but to rollback, and
> > aside
> > > > > from
> > > > > > a
> > > > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > > Put#addColumn),
> > > > > it
> > > > > > > seems to be working and fixing our problem.
> > > > > > >
> > > > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net>
> wrote:
> > > > > > >
> > > > > > >> Rollback is untested. No fix in 5.5. I was going to work on
> this
> > > > now.
> > > > > > >> Where
> > > > > > >> are your counters Bryan? In their own column family or
> scattered
> > > > about
> > > > > > in
> > > > > > >> a
> > > > > > >> row with other Cell types?
> > > > > > >> St.Ack
> > > > > > >>
> > > > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > > > >> bbeaudreault@hubspot.com> wrote:
> > > > > > >>
> > > > > > >> > Is there any update to this? We just upgraded all of our
> > > > production
> > > > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
> > listed
> > > in
> > > > > the
> > > > > > >> > known issues, did not not about this.  Now we are seeing
> > > > perfomance
> > > > > > >> issues
> > > > > > >> > across all clusters, as we make heavy use of increments.
> > > > > > >> >
> > > > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
> > > roll
> > > > > back
> > > > > > >> to
> > > > > > >> > CDH 5.3.1 (if that is possible)?
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
> > > wrote:
> > > > > > >> >
> > > > > > >> > > Thank you St.Ack!
> > > > > > >> > >
> > > > > > >> > > I would like to follow the ticket.
> > > > > > >> > >
> > > > > > >> > > Toshihiro Suzuki
> > > > > > >> > >
> > > > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > > > > > >> > >
> > > > > > >> > > > Back to this problem. Simple tests confirm that as is,
> the
> > > > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
> > > some
> > > > > > other
> > > > > > >> row
> > > > > > >> > > is
> > > > > > >> > > > slow to complete. In particular Increment, checkAndPut,
> > and
> > > > > batch
> > > > > > >> > > mutations
> > > > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
> > up.
> > > > Lets
> > > > > > >> see if
> > > > > > >> > > we
> > > > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
> > not
> > > > all
> > > > > > >> Region
> > > > > > >> > > ops
> > > > > > >> > > > are paused.
> > > > > > >> > > >
> > > > > > >> > > > St.Ack
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> brfrn169@gmail.com
> > >
> > > > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > > > helpful
> > > > > > >> > > diagram).
> > > > > > >> > > > > > Makes sense. Let me hack up a test case to help with
> > the
> > > > > > >> > > illustration.
> > > > > > >> > > > It
> > > > > > >> > > > > > is as though the mvcc should be scoped to a row
> > only...
> > > > > Writes
> > > > > > >> > > against
> > > > > > >> > > > > > other rows should not hold up my read of my row. Tag
> > an
> > > > mvcc
> > > > > > >> with a
> > > > > > >> > > > 'row'
> > > > > > >> > > > > > scope so we can see which on-going writes pertain to
> > > > current
> > > > > > >> > > operation?
> > > > > > >> > > > > Thank you St.Ack! I think this approach would work.
> > > > > > >> > > > >
> > > > > > >> > > > > > You need to read back the increment and have it be
> > > > 'correct'
> > > > > > at
> > > > > > >> > > > increment
> > > > > > >> > > > > > time?
> > > > > > >> > > > > Yes, we need it.
> > > > > > >> > > > >
> > > > > > >> > > > > I would like to help if there is anything I can do.
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks,
> > > > > > >> > > > > Toshihiro Suzuki
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > > > > >> > > > >
> > > > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > > > helpful
> > > > > > >> > > diagram).
> > > > > > >> > > > > > Makes sense. Let me hack up a test case to help with
> > the
> > > > > > >> > > illustration.
> > > > > > >> > > > It
> > > > > > >> > > > > > is as though the mvcc should be scoped to a row
> > only...
> > > > > Writes
> > > > > > >> > > against
> > > > > > >> > > > > > other rows should not hold up my read of my row. Tag
> > an
> > > > mvcc
> > > > > > >> with a
> > > > > > >> > > > 'row'
> > > > > > >> > > > > > scope so we can see which on-going writes pertain to
> > > > current
> > > > > > >> > > operation?
> > > > > > >> > > > > >
> > > > > > >> > > > > > You need to read back the increment and have it be
> > > > 'correct'
> > > > > > at
> > > > > > >> > > > increment
> > > > > > >> > > > > > time?
> > > > > > >> > > > > >
> > > > > > >> > > > > > (This is a good one)
> > > > > > >> > > > > >
> > > > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > > > >> > > > > > St.Ack
> > > > > > >> > > > > >
> > > > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > > brfrn169@gmail.com
> > > > >
> > > > > > >> wrote:
> > > > > > >> > > > > >
> > > > > > >> > > > > > > St.Ack,
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Thank you for your response.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Why I make out that "A region lock (not a row
> lock)
> > > > seems
> > > > > to
> > > > > > >> > occur
> > > > > > >> > > in
> > > > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
> > follows:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 2. w =
> mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> walKey);
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> > > writeQueue
> > > > > can
> > > > > > >> > cause
> > > > > > >> > > a
> > > > > > >> > > > > > region
> > > > > > >> > > > > > > lock.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > > > advanceMemstore(w)
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Step 1 adds a WriteEntry w in
> beginMemstoreInsert()
> > to
> > > > > > >> writeQueue
> > > > > > >> > > and
> > > > > > >> > > > > > waits
> > > > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
> > ==
> > > w.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > I think when a handler thread is processing
> between
> > > > step 2
> > > > > > and
> > > > > > >> > step
> > > > > > >> > > > 3,
> > > > > > >> > > > > > the
> > > > > > >> > > > > > > other handler threads can wait at step 1 until the
> > > > thread
> > > > > > >> > completes
> > > > > > >> > > > > step
> > > > > > >> > > > > > 3
> > > > > > >> > > > > > > This is depicted as follows:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Actually, in the thread dump of our region server,
> > > many
> > > > > > >> handler
> > > > > > >> > > > threads
> > > > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Many handler threads wait at this:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > > > post-upgrade?
> > > > > > >> > Is
> > > > > > >> > > it
> > > > > > >> > > > > > > > possible that all these threads are trying to
> get
> > to
> > > > the
> > > > > > >> same
> > > > > > >> > row
> > > > > > >> > > > to
> > > > > > >> > > > > > > update
> > > > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
> > you
> > > > > > >> thinking
> > > > > > >> > > > > increment
> > > > > > >> > > > > > > > itself has slowed significantly?
> > > > > > >> > > > > > > We have just upgraded HBase, not changed the app
> > > > behavior.
> > > > > > We
> > > > > > >> are
> > > > > > >> > > > > > thinking
> > > > > > >> > > > > > > increment itself has slowed significantly.
> > > > > > >> > > > > > > Before upgrading HBase, it was good throughput and
> > > > > latency.
> > > > > > >> > > > > > > Currently, to cope with this problem, we split the
> > > > regions
> > > > > > >> > finely.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Thanks,
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Toshihiro Suzuki
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> stack@duboce.net
> > >:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > > > brfrn169@gmail.com
> > > > > > >
> > > > > > >> > > wrote:
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > Ted,
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Thank you for your response.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > I think that increment operation works as
> > follows:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > 1. get row lock
> > > > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete()
> //
> > > > wait
> > > > > > for
> > > > > > >> all
> > > > > > >> > > > prior
> > > > > > >> > > > > > > MVCC
> > > > > > >> > > > > > > > > transactions to finish
> > > > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
> > start a
> > > > > > >> > transaction
> > > > > > >> > > > > > > > > 4. get previous values
> > > > > > >> > > > > > > > > 5. create KVs
> > > > > > >> > > > > > > > > 6. write to Memstore
> > > > > > >> > > > > > > > > 7. write to WAL
> > > > > > >> > > > > > > > > 8. release row lock
> > > > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
> > > > complete
> > > > > > the
> > > > > > >> > > > > > transaction
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
> > has a
> > > > > > pending
> > > > > > >> > > queue
> > > > > > >> > > > of
> > > > > > >> > > > > > > > writes
> > > > > > >> > > > > > > > > named writeQueue.
> > > > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
> > waits
> > > > > until
> > > > > > >> > > > writeQueue
> > > > > > >> > > > > > is
> > > > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and
> step
> > 9
> > > > > > removes
> > > > > > >> the
> > > > > > >> > > > > > > WriteEntry
> > > > > > >> > > > > > > > > from writeQueue.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > I think that when a handler thread is
> processing
> > > > > between
> > > > > > >> > step 2
> > > > > > >> > > > and
> > > > > > >> > > > > > > step
> > > > > > >> > > > > > > > 9,
> > > > > > >> > > > > > > > > the other handler threads can wait until the
> > > thread
> > > > > > >> completes
> > > > > > >> > > > step
> > > > > > >> > > > > 9.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > That is right. We need to read, after all
> > > outstanding
> > > > > > >> updates
> > > > > > >> > are
> > > > > > >> > > > > > done...
> > > > > > >> > > > > > > > because we need to read the latest update before
> > we
> > > go
> > > > > to
> > > > > > >> > > > > > > modify/increment
> > > > > > >> > > > > > > > it.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > How do you make out this?
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > "A region lock (not a row lock) seems to occur
> in
> > > > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > In 0.98.x we did this:
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which
> > is
> > > > > > this....
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > +  public void
> > > waitForPreviousTransactionsComplete() {
> > > > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > > >> > > > > > > > +  }
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > The mvcc and region sequenceid were merged in
> 1.0
> > (
> > > > > > >> > > > > > > >
> https://issues.apache.org/jira/browse/HBASE-8763
> > ).
> > > > > > Previous
> > > > > > >> > mvcc
> > > > > > >> > > > and
> > > > > > >> > > > > > > > region
> > > > > > >> > > > > > > > sequenceid would spin independent of each other.
> > > > Perhaps
> > > > > > >> this
> > > > > > >> > > > > > responsible
> > > > > > >> > > > > > > > for some slow down.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > That said, looking in your thread dump, we seem
> to
> > > be
> > > > > down
> > > > > > >> in
> > > > > > >> > the
> > > > > > >> > > > > Get.
> > > > > > >> > > > > > If
> > > > > > >> > > > > > > > you do a bunch of thread dumps in a row, where
> is
> > > the
> > > > > > >> > > lock-holding
> > > > > > >> > > > > > > thread?
> > > > > > >> > > > > > > > In Get or writing Increment... or waiting on
> > > sequence
> > > > > id?
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > > > post-upgrade?
> > > > > > >> > Is
> > > > > > >> > > it
> > > > > > >> > > > > > > > possible that all these threads are trying to
> get
> > to
> > > > the
> > > > > > >> same
> > > > > > >> > row
> > > > > > >> > > > to
> > > > > > >> > > > > > > update
> > > > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
> > you
> > > > > > >> thinking
> > > > > > >> > > > > increment
> > > > > > >> > > > > > > > itself has slowed significantly?
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > St.Ack
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > Thanks,
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Toshihiro Suzuki
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > > yuzhihong@gmail.com
> > > > > >:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
> > > > > region):
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >     try {
> > > > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Thanks
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > > > > > >> brfrn169@gmail.com>
> > > > > > >> > > > wrote:
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > > Hi,
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > We upgraded our cluster from
> > > > CDH5.3.1(HBase0.98.6)
> > > > > > to
> > > > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > >> > > > > > > > > > > and we experience slowdown in increment
> > > > operation.
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Here's an extract from thread dump of the
> > > > > > >> RegionServer of
> > > > > > >> > > our
> > > > > > >> > > > > > > > cluster:
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Thread 68
> > > > > > >> > > > > > >
> > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > >> > > > > > > > > > >   State: BLOCKED
> > > > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > >> > > > > > > > > > >   Blocked by 63
> > > > > > >> > > > > > > > >
> > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > >> > > > > > > > > > >   Stack:
> > > > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > >
> > > > > > >> > >
> > > > > > >>
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >
> > > > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >
> > > > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > >
> > > > > >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > There are many similar threads in the
> thread
> > > > dump.
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > I read the source code and I think this is
> > > > caused
> > > > > by
> > > > > > >> > > changes
> > > > > > >> > > > of
> > > > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
> > occur
> > > in
> > > > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Also we wrote performance test code for
> > > > increment
> > > > > > >> > operation
> > > > > > >> > > > > that
> > > > > > >> > > > > > > > > included
> > > > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > The result is shown below:
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > > > > > >> 7.975072509210629
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > > > > > 49.11840157868772
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Thanks,
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

I didn't think to use the non-aggregated jstack outout as it has become
second nature for us to use https://github.com/HubSpot/astack/.

It rolls up repeating stacktraces. You can see above each stacktrace the
number of times it occurred and an estimated cpu time spent.  Sorry will
try to get it without astack next time. Hopefully this clears things up
though, there are 122 increments blocking in the aggregated jstack

On Tue, Dec 1, 2015 at 12:17 AM Stack <st...@duboce.net> wrote:

> Looking again, the
>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
> thread
> dump and the https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> thread dump are the same? Only have two increments going on in this thread
> dump:
>
> at org.apache.hadoop.hbase.KeyValue.matchingQualifier(KeyValue.java:1656)
>
> ... and other is doing:
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:3593)
>
> Not many increments going on.
>
>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> is two increments too in same places. Is it stuck?
>
> St.Ack
>
>
>
>
>
> On Mon, Nov 30, 2015 at 3:10 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> >
> > An active handler:
> >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > One that is locked:
> >
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> >
> > The difference between pre-rollback and post is that previously we were
> > seeing things blocked in mvcc.  Now we are seeing them blocked on the
> > upsert.
> >
> > It always follows the same pattern, of 1 active handler in the upsert and
> > the rest blocked waiting for it.
> >
> > On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> >
> > > On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com
> > > > wrote:
> > >
> > > > The rollback seems to have mostly solved the issue for one of our
> > > clusters,
> > > > but another one is still seeing long increment times:
> > > >
> > > > "slowIncrementCount": 52080,
> > > > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max":
> 6162,"
> > > > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > > > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> > > > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
> > > >
> > > >
> > > > Any ideas if there are other changes that may be causing a
> performance
> > > > regression for increments between CDH4.7.1 and CDH5.3.8?
> > > >
> > > >
> > > >
> > > No.
> > >
> > > Post a thread dump Bryan and it might prompt something.
> > >
> > > St.Ack
> > >
> > >
> > >
> > >
> > > >
> > > > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> > > >
> > > > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > > > bbeaudreault@hubspot.com> wrote:
> > > > >
> > > > > > Should this be added as a known issue in the CDH or hbase
> > > > documentation?
> > > > > It
> > > > > > was a severe performance hit for us, all of our regionservers
> were
> > > > > sitting
> > > > > > at a few thousand queued requests.
> > > > > >
> > > > > >
> > > > > Let me take care of that.
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > > > bbeaudreault@hubspot.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Yea, they are all over the place and called from client and
> > > > coprocessor
> > > > > > > code. We ended up having no other option but to rollback, and
> > aside
> > > > > from
> > > > > > a
> > > > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > > Put#addColumn),
> > > > > it
> > > > > > > seems to be working and fixing our problem.
> > > > > > >
> > > > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net>
> wrote:
> > > > > > >
> > > > > > >> Rollback is untested. No fix in 5.5. I was going to work on
> this
> > > > now.
> > > > > > >> Where
> > > > > > >> are your counters Bryan? In their own column family or
> scattered
> > > > about
> > > > > > in
> > > > > > >> a
> > > > > > >> row with other Cell types?
> > > > > > >> St.Ack
> > > > > > >>
> > > > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > > > >> bbeaudreault@hubspot.com> wrote:
> > > > > > >>
> > > > > > >> > Is there any update to this? We just upgraded all of our
> > > > production
> > > > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
> > listed
> > > in
> > > > > the
> > > > > > >> > known issues, did not not about this.  Now we are seeing
> > > > perfomance
> > > > > > >> issues
> > > > > > >> > across all clusters, as we make heavy use of increments.
> > > > > > >> >
> > > > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
> > > roll
> > > > > back
> > > > > > >> to
> > > > > > >> > CDH 5.3.1 (if that is possible)?
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
> > > wrote:
> > > > > > >> >
> > > > > > >> > > Thank you St.Ack!
> > > > > > >> > >
> > > > > > >> > > I would like to follow the ticket.
> > > > > > >> > >
> > > > > > >> > > Toshihiro Suzuki
> > > > > > >> > >
> > > > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > > > > > >> > >
> > > > > > >> > > > Back to this problem. Simple tests confirm that as is,
> the
> > > > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
> > > some
> > > > > > other
> > > > > > >> row
> > > > > > >> > > is
> > > > > > >> > > > slow to complete. In particular Increment, checkAndPut,
> > and
> > > > > batch
> > > > > > >> > > mutations
> > > > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
> > up.
> > > > Lets
> > > > > > >> see if
> > > > > > >> > > we
> > > > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
> > not
> > > > all
> > > > > > >> Region
> > > > > > >> > > ops
> > > > > > >> > > > are paused.
> > > > > > >> > > >
> > > > > > >> > > > St.Ack
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> brfrn169@gmail.com
> > >
> > > > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > > > helpful
> > > > > > >> > > diagram).
> > > > > > >> > > > > > Makes sense. Let me hack up a test case to help with
> > the
> > > > > > >> > > illustration.
> > > > > > >> > > > It
> > > > > > >> > > > > > is as though the mvcc should be scoped to a row
> > only...
> > > > > Writes
> > > > > > >> > > against
> > > > > > >> > > > > > other rows should not hold up my read of my row. Tag
> > an
> > > > mvcc
> > > > > > >> with a
> > > > > > >> > > > 'row'
> > > > > > >> > > > > > scope so we can see which on-going writes pertain to
> > > > current
> > > > > > >> > > operation?
> > > > > > >> > > > > Thank you St.Ack! I think this approach would work.
> > > > > > >> > > > >
> > > > > > >> > > > > > You need to read back the increment and have it be
> > > > 'correct'
> > > > > > at
> > > > > > >> > > > increment
> > > > > > >> > > > > > time?
> > > > > > >> > > > > Yes, we need it.
> > > > > > >> > > > >
> > > > > > >> > > > > I would like to help if there is anything I can do.
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks,
> > > > > > >> > > > > Toshihiro Suzuki
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > > > > >> > > > >
> > > > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > > > helpful
> > > > > > >> > > diagram).
> > > > > > >> > > > > > Makes sense. Let me hack up a test case to help with
> > the
> > > > > > >> > > illustration.
> > > > > > >> > > > It
> > > > > > >> > > > > > is as though the mvcc should be scoped to a row
> > only...
> > > > > Writes
> > > > > > >> > > against
> > > > > > >> > > > > > other rows should not hold up my read of my row. Tag
> > an
> > > > mvcc
> > > > > > >> with a
> > > > > > >> > > > 'row'
> > > > > > >> > > > > > scope so we can see which on-going writes pertain to
> > > > current
> > > > > > >> > > operation?
> > > > > > >> > > > > >
> > > > > > >> > > > > > You need to read back the increment and have it be
> > > > 'correct'
> > > > > > at
> > > > > > >> > > > increment
> > > > > > >> > > > > > time?
> > > > > > >> > > > > >
> > > > > > >> > > > > > (This is a good one)
> > > > > > >> > > > > >
> > > > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > > > >> > > > > > St.Ack
> > > > > > >> > > > > >
> > > > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > > brfrn169@gmail.com
> > > > >
> > > > > > >> wrote:
> > > > > > >> > > > > >
> > > > > > >> > > > > > > St.Ack,
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Thank you for your response.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Why I make out that "A region lock (not a row
> lock)
> > > > seems
> > > > > to
> > > > > > >> > occur
> > > > > > >> > > in
> > > > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
> > follows:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 2. w =
> mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> walKey);
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> > > writeQueue
> > > > > can
> > > > > > >> > cause
> > > > > > >> > > a
> > > > > > >> > > > > > region
> > > > > > >> > > > > > > lock.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > > > advanceMemstore(w)
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Step 1 adds a WriteEntry w in
> beginMemstoreInsert()
> > to
> > > > > > >> writeQueue
> > > > > > >> > > and
> > > > > > >> > > > > > waits
> > > > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
> > ==
> > > w.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > I think when a handler thread is processing
> between
> > > > step 2
> > > > > > and
> > > > > > >> > step
> > > > > > >> > > > 3,
> > > > > > >> > > > > > the
> > > > > > >> > > > > > > other handler threads can wait at step 1 until the
> > > > thread
> > > > > > >> > completes
> > > > > > >> > > > > step
> > > > > > >> > > > > > 3
> > > > > > >> > > > > > > This is depicted as follows:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Actually, in the thread dump of our region server,
> > > many
> > > > > > >> handler
> > > > > > >> > > > threads
> > > > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Many handler threads wait at this:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > > > post-upgrade?
> > > > > > >> > Is
> > > > > > >> > > it
> > > > > > >> > > > > > > > possible that all these threads are trying to
> get
> > to
> > > > the
> > > > > > >> same
> > > > > > >> > row
> > > > > > >> > > > to
> > > > > > >> > > > > > > update
> > > > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
> > you
> > > > > > >> thinking
> > > > > > >> > > > > increment
> > > > > > >> > > > > > > > itself has slowed significantly?
> > > > > > >> > > > > > > We have just upgraded HBase, not changed the app
> > > > behavior.
> > > > > > We
> > > > > > >> are
> > > > > > >> > > > > > thinking
> > > > > > >> > > > > > > increment itself has slowed significantly.
> > > > > > >> > > > > > > Before upgrading HBase, it was good throughput and
> > > > > latency.
> > > > > > >> > > > > > > Currently, to cope with this problem, we split the
> > > > regions
> > > > > > >> > finely.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Thanks,
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Toshihiro Suzuki
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> stack@duboce.net
> > >:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > > > brfrn169@gmail.com
> > > > > > >
> > > > > > >> > > wrote:
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > Ted,
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Thank you for your response.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > I think that increment operation works as
> > follows:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > 1. get row lock
> > > > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete()
> //
> > > > wait
> > > > > > for
> > > > > > >> all
> > > > > > >> > > > prior
> > > > > > >> > > > > > > MVCC
> > > > > > >> > > > > > > > > transactions to finish
> > > > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
> > start a
> > > > > > >> > transaction
> > > > > > >> > > > > > > > > 4. get previous values
> > > > > > >> > > > > > > > > 5. create KVs
> > > > > > >> > > > > > > > > 6. write to Memstore
> > > > > > >> > > > > > > > > 7. write to WAL
> > > > > > >> > > > > > > > > 8. release row lock
> > > > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
> > > > complete
> > > > > > the
> > > > > > >> > > > > > transaction
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
> > has a
> > > > > > pending
> > > > > > >> > > queue
> > > > > > >> > > > of
> > > > > > >> > > > > > > > writes
> > > > > > >> > > > > > > > > named writeQueue.
> > > > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
> > waits
> > > > > until
> > > > > > >> > > > writeQueue
> > > > > > >> > > > > > is
> > > > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and
> step
> > 9
> > > > > > removes
> > > > > > >> the
> > > > > > >> > > > > > > WriteEntry
> > > > > > >> > > > > > > > > from writeQueue.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > I think that when a handler thread is
> processing
> > > > > between
> > > > > > >> > step 2
> > > > > > >> > > > and
> > > > > > >> > > > > > > step
> > > > > > >> > > > > > > > 9,
> > > > > > >> > > > > > > > > the other handler threads can wait until the
> > > thread
> > > > > > >> completes
> > > > > > >> > > > step
> > > > > > >> > > > > 9.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > That is right. We need to read, after all
> > > outstanding
> > > > > > >> updates
> > > > > > >> > are
> > > > > > >> > > > > > done...
> > > > > > >> > > > > > > > because we need to read the latest update before
> > we
> > > go
> > > > > to
> > > > > > >> > > > > > > modify/increment
> > > > > > >> > > > > > > > it.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > How do you make out this?
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > "A region lock (not a row lock) seems to occur
> in
> > > > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > In 0.98.x we did this:
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which
> > is
> > > > > > this....
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > +  public void
> > > waitForPreviousTransactionsComplete() {
> > > > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > > >> > > > > > > > +  }
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > The mvcc and region sequenceid were merged in
> 1.0
> > (
> > > > > > >> > > > > > > >
> https://issues.apache.org/jira/browse/HBASE-8763
> > ).
> > > > > > Previous
> > > > > > >> > mvcc
> > > > > > >> > > > and
> > > > > > >> > > > > > > > region
> > > > > > >> > > > > > > > sequenceid would spin independent of each other.
> > > > Perhaps
> > > > > > >> this
> > > > > > >> > > > > > responsible
> > > > > > >> > > > > > > > for some slow down.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > That said, looking in your thread dump, we seem
> to
> > > be
> > > > > down
> > > > > > >> in
> > > > > > >> > the
> > > > > > >> > > > > Get.
> > > > > > >> > > > > > If
> > > > > > >> > > > > > > > you do a bunch of thread dumps in a row, where
> is
> > > the
> > > > > > >> > > lock-holding
> > > > > > >> > > > > > > thread?
> > > > > > >> > > > > > > > In Get or writing Increment... or waiting on
> > > sequence
> > > > > id?
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > > > post-upgrade?
> > > > > > >> > Is
> > > > > > >> > > it
> > > > > > >> > > > > > > > possible that all these threads are trying to
> get
> > to
> > > > the
> > > > > > >> same
> > > > > > >> > row
> > > > > > >> > > > to
> > > > > > >> > > > > > > update
> > > > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
> > you
> > > > > > >> thinking
> > > > > > >> > > > > increment
> > > > > > >> > > > > > > > itself has slowed significantly?
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > St.Ack
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > Thanks,
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Toshihiro Suzuki
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > > yuzhihong@gmail.com
> > > > > >:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
> > > > > region):
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >     try {
> > > > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Thanks
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > > > > > >> brfrn169@gmail.com>
> > > > > > >> > > > wrote:
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > > Hi,
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > We upgraded our cluster from
> > > > CDH5.3.1(HBase0.98.6)
> > > > > > to
> > > > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > >> > > > > > > > > > > and we experience slowdown in increment
> > > > operation.
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Here's an extract from thread dump of the
> > > > > > >> RegionServer of
> > > > > > >> > > our
> > > > > > >> > > > > > > > cluster:
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Thread 68
> > > > > > >> > > > > > >
> > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > >> > > > > > > > > > >   State: BLOCKED
> > > > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > >> > > > > > > > > > >   Blocked by 63
> > > > > > >> > > > > > > > >
> > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > >> > > > > > > > > > >   Stack:
> > > > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > >
> > > > > > >> > >
> > > > > > >>
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >
> > > > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > >
> > > > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > >
> > > > > >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > There are many similar threads in the
> thread
> > > > dump.
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > I read the source code and I think this is
> > > > caused
> > > > > by
> > > > > > >> > > changes
> > > > > > >> > > > of
> > > > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
> > occur
> > > in
> > > > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Also we wrote performance test code for
> > > > increment
> > > > > > >> > operation
> > > > > > >> > > > > that
> > > > > > >> > > > > > > > > included
> > > > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > The result is shown below:
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > > > > > >> 7.975072509210629
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > > > > > 49.11840157868772
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Thanks,
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > > > >> > > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Looking again, the
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
thread
dump and the https://gist.github.com/bbeaudreault/2994a748da83d9f75085
thread dump are the same? Only have two increments going on in this thread
dump:

at org.apache.hadoop.hbase.KeyValue.matchingQualifier(KeyValue.java:1656)

... and other is doing:

at
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:3593)

Not many increments going on.

https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
is two increments too in same places. Is it stuck?

St.Ack





On Mon, Nov 30, 2015 at 3:10 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>
> An active handler:
>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> One that is locked:
>
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
>
> The difference between pre-rollback and post is that previously we were
> seeing things blocked in mvcc.  Now we are seeing them blocked on the
> upsert.
>
> It always follows the same pattern, of 1 active handler in the upsert and
> the rest blocked waiting for it.
>
> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
>
> > On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com
> > > wrote:
> >
> > > The rollback seems to have mostly solved the issue for one of our
> > clusters,
> > > but another one is still seeing long increment times:
> > >
> > > "slowIncrementCount": 52080,
> > > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
> > > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> > > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
> > >
> > >
> > > Any ideas if there are other changes that may be causing a performance
> > > regression for increments between CDH4.7.1 and CDH5.3.8?
> > >
> > >
> > >
> > No.
> >
> > Post a thread dump Bryan and it might prompt something.
> >
> > St.Ack
> >
> >
> >
> >
> > >
> > > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> > >
> > > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com> wrote:
> > > >
> > > > > Should this be added as a known issue in the CDH or hbase
> > > documentation?
> > > > It
> > > > > was a severe performance hit for us, all of our regionservers were
> > > > sitting
> > > > > at a few thousand queued requests.
> > > > >
> > > > >
> > > > Let me take care of that.
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > > bbeaudreault@hubspot.com>
> > > > > wrote:
> > > > >
> > > > > > Yea, they are all over the place and called from client and
> > > coprocessor
> > > > > > code. We ended up having no other option but to rollback, and
> aside
> > > > from
> > > > > a
> > > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > Put#addColumn),
> > > > it
> > > > > > seems to be working and fixing our problem.
> > > > > >
> > > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
> > > > > >
> > > > > >> Rollback is untested. No fix in 5.5. I was going to work on this
> > > now.
> > > > > >> Where
> > > > > >> are your counters Bryan? In their own column family or scattered
> > > about
> > > > > in
> > > > > >> a
> > > > > >> row with other Cell types?
> > > > > >> St.Ack
> > > > > >>
> > > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > > >> bbeaudreault@hubspot.com> wrote:
> > > > > >>
> > > > > >> > Is there any update to this? We just upgraded all of our
> > > production
> > > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
> listed
> > in
> > > > the
> > > > > >> > known issues, did not not about this.  Now we are seeing
> > > perfomance
> > > > > >> issues
> > > > > >> > across all clusters, as we make heavy use of increments.
> > > > > >> >
> > > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
> > roll
> > > > back
> > > > > >> to
> > > > > >> > CDH 5.3.1 (if that is possible)?
> > > > > >> >
> > > > > >> >
> > > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
> > wrote:
> > > > > >> >
> > > > > >> > > Thank you St.Ack!
> > > > > >> > >
> > > > > >> > > I would like to follow the ticket.
> > > > > >> > >
> > > > > >> > > Toshihiro Suzuki
> > > > > >> > >
> > > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > > > > >> > >
> > > > > >> > > > Back to this problem. Simple tests confirm that as is, the
> > > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
> > some
> > > > > other
> > > > > >> row
> > > > > >> > > is
> > > > > >> > > > slow to complete. In particular Increment, checkAndPut,
> and
> > > > batch
> > > > > >> > > mutations
> > > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
> up.
> > > Lets
> > > > > >> see if
> > > > > >> > > we
> > > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
> not
> > > all
> > > > > >> Region
> > > > > >> > > ops
> > > > > >> > > > are paused.
> > > > > >> > > >
> > > > > >> > > > St.Ack
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <brfrn169@gmail.com
> >
> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > > helpful
> > > > > >> > > diagram).
> > > > > >> > > > > > Makes sense. Let me hack up a test case to help with
> the
> > > > > >> > > illustration.
> > > > > >> > > > It
> > > > > >> > > > > > is as though the mvcc should be scoped to a row
> only...
> > > > Writes
> > > > > >> > > against
> > > > > >> > > > > > other rows should not hold up my read of my row. Tag
> an
> > > mvcc
> > > > > >> with a
> > > > > >> > > > 'row'
> > > > > >> > > > > > scope so we can see which on-going writes pertain to
> > > current
> > > > > >> > > operation?
> > > > > >> > > > > Thank you St.Ack! I think this approach would work.
> > > > > >> > > > >
> > > > > >> > > > > > You need to read back the increment and have it be
> > > 'correct'
> > > > > at
> > > > > >> > > > increment
> > > > > >> > > > > > time?
> > > > > >> > > > > Yes, we need it.
> > > > > >> > > > >
> > > > > >> > > > > I would like to help if there is anything I can do.
> > > > > >> > > > >
> > > > > >> > > > > Thanks,
> > > > > >> > > > > Toshihiro Suzuki
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > > > >> > > > >
> > > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > > helpful
> > > > > >> > > diagram).
> > > > > >> > > > > > Makes sense. Let me hack up a test case to help with
> the
> > > > > >> > > illustration.
> > > > > >> > > > It
> > > > > >> > > > > > is as though the mvcc should be scoped to a row
> only...
> > > > Writes
> > > > > >> > > against
> > > > > >> > > > > > other rows should not hold up my read of my row. Tag
> an
> > > mvcc
> > > > > >> with a
> > > > > >> > > > 'row'
> > > > > >> > > > > > scope so we can see which on-going writes pertain to
> > > current
> > > > > >> > > operation?
> > > > > >> > > > > >
> > > > > >> > > > > > You need to read back the increment and have it be
> > > 'correct'
> > > > > at
> > > > > >> > > > increment
> > > > > >> > > > > > time?
> > > > > >> > > > > >
> > > > > >> > > > > > (This is a good one)
> > > > > >> > > > > >
> > > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > > >> > > > > > St.Ack
> > > > > >> > > > > >
> > > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > brfrn169@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > St.Ack,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thank you for your response.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Why I make out that "A region lock (not a row lock)
> > > seems
> > > > to
> > > > > >> > occur
> > > > > >> > > in
> > > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
> follows:
> > > > > >> > > > > > >
> > > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
> > > > > >> > > > > > >
> > > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > > >> > > > > > >
> > > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > > >> > > > > > >
> > > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> > writeQueue
> > > > can
> > > > > >> > cause
> > > > > >> > > a
> > > > > >> > > > > > region
> > > > > >> > > > > > > lock.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > > >> > > > > > >
> > > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > > >> > > > > > >
> > > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > > advanceMemstore(w)
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > > >> > > > > > >
> > > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert()
> to
> > > > > >> writeQueue
> > > > > >> > > and
> > > > > >> > > > > > waits
> > > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
> ==
> > w.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > I think when a handler thread is processing between
> > > step 2
> > > > > and
> > > > > >> > step
> > > > > >> > > > 3,
> > > > > >> > > > > > the
> > > > > >> > > > > > > other handler threads can wait at step 1 until the
> > > thread
> > > > > >> > completes
> > > > > >> > > > > step
> > > > > >> > > > > > 3
> > > > > >> > > > > > > This is depicted as follows:
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > Actually, in the thread dump of our region server,
> > many
> > > > > >> handler
> > > > > >> > > > threads
> > > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > > >> > > > > > >
> > > > > >> > > > > > > Many handler threads wait at this:
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > > post-upgrade?
> > > > > >> > Is
> > > > > >> > > it
> > > > > >> > > > > > > > possible that all these threads are trying to get
> to
> > > the
> > > > > >> same
> > > > > >> > row
> > > > > >> > > > to
> > > > > >> > > > > > > update
> > > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
> you
> > > > > >> thinking
> > > > > >> > > > > increment
> > > > > >> > > > > > > > itself has slowed significantly?
> > > > > >> > > > > > > We have just upgraded HBase, not changed the app
> > > behavior.
> > > > > We
> > > > > >> are
> > > > > >> > > > > > thinking
> > > > > >> > > > > > > increment itself has slowed significantly.
> > > > > >> > > > > > > Before upgrading HBase, it was good throughput and
> > > > latency.
> > > > > >> > > > > > > Currently, to cope with this problem, we split the
> > > regions
> > > > > >> > finely.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thanks,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Toshihiro Suzuki
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <stack@duboce.net
> >:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > > brfrn169@gmail.com
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Ted,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thank you for your response.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I think that increment operation works as
> follows:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > 1. get row lock
> > > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() //
> > > wait
> > > > > for
> > > > > >> all
> > > > > >> > > > prior
> > > > > >> > > > > > > MVCC
> > > > > >> > > > > > > > > transactions to finish
> > > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
> start a
> > > > > >> > transaction
> > > > > >> > > > > > > > > 4. get previous values
> > > > > >> > > > > > > > > 5. create KVs
> > > > > >> > > > > > > > > 6. write to Memstore
> > > > > >> > > > > > > > > 7. write to WAL
> > > > > >> > > > > > > > > 8. release row lock
> > > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
> > > complete
> > > > > the
> > > > > >> > > > > > transaction
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
> has a
> > > > > pending
> > > > > >> > > queue
> > > > > >> > > > of
> > > > > >> > > > > > > > writes
> > > > > >> > > > > > > > > named writeQueue.
> > > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
> waits
> > > > until
> > > > > >> > > > writeQueue
> > > > > >> > > > > > is
> > > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step
> 9
> > > > > removes
> > > > > >> the
> > > > > >> > > > > > > WriteEntry
> > > > > >> > > > > > > > > from writeQueue.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I think that when a handler thread is processing
> > > > between
> > > > > >> > step 2
> > > > > >> > > > and
> > > > > >> > > > > > > step
> > > > > >> > > > > > > > 9,
> > > > > >> > > > > > > > > the other handler threads can wait until the
> > thread
> > > > > >> completes
> > > > > >> > > > step
> > > > > >> > > > > 9.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > That is right. We need to read, after all
> > outstanding
> > > > > >> updates
> > > > > >> > are
> > > > > >> > > > > > done...
> > > > > >> > > > > > > > because we need to read the latest update before
> we
> > go
> > > > to
> > > > > >> > > > > > > modify/increment
> > > > > >> > > > > > > > it.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > How do you make out this?
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
> > > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > In 0.98.x we did this:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which
> is
> > > > > this....
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > +  public void
> > waitForPreviousTransactionsComplete() {
> > > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > >> > > > > > > > +  }
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0
> (
> > > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763
> ).
> > > > > Previous
> > > > > >> > mvcc
> > > > > >> > > > and
> > > > > >> > > > > > > > region
> > > > > >> > > > > > > > sequenceid would spin independent of each other.
> > > Perhaps
> > > > > >> this
> > > > > >> > > > > > responsible
> > > > > >> > > > > > > > for some slow down.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > That said, looking in your thread dump, we seem to
> > be
> > > > down
> > > > > >> in
> > > > > >> > the
> > > > > >> > > > > Get.
> > > > > >> > > > > > If
> > > > > >> > > > > > > > you do a bunch of thread dumps in a row, where is
> > the
> > > > > >> > > lock-holding
> > > > > >> > > > > > > thread?
> > > > > >> > > > > > > > In Get or writing Increment... or waiting on
> > sequence
> > > > id?
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > > post-upgrade?
> > > > > >> > Is
> > > > > >> > > it
> > > > > >> > > > > > > > possible that all these threads are trying to get
> to
> > > the
> > > > > >> same
> > > > > >> > row
> > > > > >> > > > to
> > > > > >> > > > > > > update
> > > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
> you
> > > > > >> thinking
> > > > > >> > > > > increment
> > > > > >> > > > > > > > itself has slowed significantly?
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > St.Ack
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Toshihiro Suzuki
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > yuzhihong@gmail.com
> > > > >:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
> > > > region):
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >     try {
> > > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Thanks
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > > > > >> brfrn169@gmail.com>
> > > > > >> > > > wrote:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > > Hi,
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > We upgraded our cluster from
> > > CDH5.3.1(HBase0.98.6)
> > > > > to
> > > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > >> > > > > > > > > > > and we experience slowdown in increment
> > > operation.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Here's an extract from thread dump of the
> > > > > >> RegionServer of
> > > > > >> > > our
> > > > > >> > > > > > > > cluster:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Thread 68
> > > > > >> > > > > > >
> > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > >> > > > > > > > > > >   State: BLOCKED
> > > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > >> > > > > > > > > > >   Blocked by 63
> > > > > >> > > > > > > > >
> > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > >> > > > > > > > > > >   Stack:
> > > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > >
> > > > > >> > >
> > > > > >>
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >
> > > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >
> > > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > >
> > > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > There are many similar threads in the thread
> > > dump.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > I read the source code and I think this is
> > > caused
> > > > by
> > > > > >> > > changes
> > > > > >> > > > of
> > > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
> occur
> > in
> > > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Also we wrote performance test code for
> > > increment
> > > > > >> > operation
> > > > > >> > > > > that
> > > > > >> > > > > > > > > included
> > > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > The result is shown below:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > > > > >> 7.975072509210629
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > > > > 49.11840157868772
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Thanks,
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Looking again, the
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
thread
dump and the https://gist.github.com/bbeaudreault/2994a748da83d9f75085
thread dump are the same? Only have two increments going on in this thread
dump:

at org.apache.hadoop.hbase.KeyValue.matchingQualifier(KeyValue.java:1656)

... and other is doing:

at
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:3593)

Not many increments going on.

https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
is two increments too in same places. Is it stuck?

St.Ack





On Mon, Nov 30, 2015 at 3:10 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>
> An active handler:
>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> One that is locked:
>
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
>
> The difference between pre-rollback and post is that previously we were
> seeing things blocked in mvcc.  Now we are seeing them blocked on the
> upsert.
>
> It always follows the same pattern, of 1 active handler in the upsert and
> the rest blocked waiting for it.
>
> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
>
> > On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com
> > > wrote:
> >
> > > The rollback seems to have mostly solved the issue for one of our
> > clusters,
> > > but another one is still seeing long increment times:
> > >
> > > "slowIncrementCount": 52080,
> > > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
> > > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> > > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
> > >
> > >
> > > Any ideas if there are other changes that may be causing a performance
> > > regression for increments between CDH4.7.1 and CDH5.3.8?
> > >
> > >
> > >
> > No.
> >
> > Post a thread dump Bryan and it might prompt something.
> >
> > St.Ack
> >
> >
> >
> >
> > >
> > > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> > >
> > > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com> wrote:
> > > >
> > > > > Should this be added as a known issue in the CDH or hbase
> > > documentation?
> > > > It
> > > > > was a severe performance hit for us, all of our regionservers were
> > > > sitting
> > > > > at a few thousand queued requests.
> > > > >
> > > > >
> > > > Let me take care of that.
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > > bbeaudreault@hubspot.com>
> > > > > wrote:
> > > > >
> > > > > > Yea, they are all over the place and called from client and
> > > coprocessor
> > > > > > code. We ended up having no other option but to rollback, and
> aside
> > > > from
> > > > > a
> > > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > Put#addColumn),
> > > > it
> > > > > > seems to be working and fixing our problem.
> > > > > >
> > > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
> > > > > >
> > > > > >> Rollback is untested. No fix in 5.5. I was going to work on this
> > > now.
> > > > > >> Where
> > > > > >> are your counters Bryan? In their own column family or scattered
> > > about
> > > > > in
> > > > > >> a
> > > > > >> row with other Cell types?
> > > > > >> St.Ack
> > > > > >>
> > > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > > >> bbeaudreault@hubspot.com> wrote:
> > > > > >>
> > > > > >> > Is there any update to this? We just upgraded all of our
> > > production
> > > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
> listed
> > in
> > > > the
> > > > > >> > known issues, did not not about this.  Now we are seeing
> > > perfomance
> > > > > >> issues
> > > > > >> > across all clusters, as we make heavy use of increments.
> > > > > >> >
> > > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
> > roll
> > > > back
> > > > > >> to
> > > > > >> > CDH 5.3.1 (if that is possible)?
> > > > > >> >
> > > > > >> >
> > > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
> > wrote:
> > > > > >> >
> > > > > >> > > Thank you St.Ack!
> > > > > >> > >
> > > > > >> > > I would like to follow the ticket.
> > > > > >> > >
> > > > > >> > > Toshihiro Suzuki
> > > > > >> > >
> > > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > > > > >> > >
> > > > > >> > > > Back to this problem. Simple tests confirm that as is, the
> > > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
> > some
> > > > > other
> > > > > >> row
> > > > > >> > > is
> > > > > >> > > > slow to complete. In particular Increment, checkAndPut,
> and
> > > > batch
> > > > > >> > > mutations
> > > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
> up.
> > > Lets
> > > > > >> see if
> > > > > >> > > we
> > > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
> not
> > > all
> > > > > >> Region
> > > > > >> > > ops
> > > > > >> > > > are paused.
> > > > > >> > > >
> > > > > >> > > > St.Ack
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <brfrn169@gmail.com
> >
> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > > helpful
> > > > > >> > > diagram).
> > > > > >> > > > > > Makes sense. Let me hack up a test case to help with
> the
> > > > > >> > > illustration.
> > > > > >> > > > It
> > > > > >> > > > > > is as though the mvcc should be scoped to a row
> only...
> > > > Writes
> > > > > >> > > against
> > > > > >> > > > > > other rows should not hold up my read of my row. Tag
> an
> > > mvcc
> > > > > >> with a
> > > > > >> > > > 'row'
> > > > > >> > > > > > scope so we can see which on-going writes pertain to
> > > current
> > > > > >> > > operation?
> > > > > >> > > > > Thank you St.Ack! I think this approach would work.
> > > > > >> > > > >
> > > > > >> > > > > > You need to read back the increment and have it be
> > > 'correct'
> > > > > at
> > > > > >> > > > increment
> > > > > >> > > > > > time?
> > > > > >> > > > > Yes, we need it.
> > > > > >> > > > >
> > > > > >> > > > > I would like to help if there is anything I can do.
> > > > > >> > > > >
> > > > > >> > > > > Thanks,
> > > > > >> > > > > Toshihiro Suzuki
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > > > >> > > > >
> > > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > > helpful
> > > > > >> > > diagram).
> > > > > >> > > > > > Makes sense. Let me hack up a test case to help with
> the
> > > > > >> > > illustration.
> > > > > >> > > > It
> > > > > >> > > > > > is as though the mvcc should be scoped to a row
> only...
> > > > Writes
> > > > > >> > > against
> > > > > >> > > > > > other rows should not hold up my read of my row. Tag
> an
> > > mvcc
> > > > > >> with a
> > > > > >> > > > 'row'
> > > > > >> > > > > > scope so we can see which on-going writes pertain to
> > > current
> > > > > >> > > operation?
> > > > > >> > > > > >
> > > > > >> > > > > > You need to read back the increment and have it be
> > > 'correct'
> > > > > at
> > > > > >> > > > increment
> > > > > >> > > > > > time?
> > > > > >> > > > > >
> > > > > >> > > > > > (This is a good one)
> > > > > >> > > > > >
> > > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > > >> > > > > > St.Ack
> > > > > >> > > > > >
> > > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > brfrn169@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > St.Ack,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thank you for your response.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Why I make out that "A region lock (not a row lock)
> > > seems
> > > > to
> > > > > >> > occur
> > > > > >> > > in
> > > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
> follows:
> > > > > >> > > > > > >
> > > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
> > > > > >> > > > > > >
> > > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > > >> > > > > > >
> > > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > > >> > > > > > >
> > > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> > writeQueue
> > > > can
> > > > > >> > cause
> > > > > >> > > a
> > > > > >> > > > > > region
> > > > > >> > > > > > > lock.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > > >> > > > > > >
> > > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > > >> > > > > > >
> > > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > > advanceMemstore(w)
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > > >> > > > > > >
> > > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert()
> to
> > > > > >> writeQueue
> > > > > >> > > and
> > > > > >> > > > > > waits
> > > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
> ==
> > w.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > I think when a handler thread is processing between
> > > step 2
> > > > > and
> > > > > >> > step
> > > > > >> > > > 3,
> > > > > >> > > > > > the
> > > > > >> > > > > > > other handler threads can wait at step 1 until the
> > > thread
> > > > > >> > completes
> > > > > >> > > > > step
> > > > > >> > > > > > 3
> > > > > >> > > > > > > This is depicted as follows:
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > Actually, in the thread dump of our region server,
> > many
> > > > > >> handler
> > > > > >> > > > threads
> > > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > > >> > > > > > >
> > > > > >> > > > > > > Many handler threads wait at this:
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > > post-upgrade?
> > > > > >> > Is
> > > > > >> > > it
> > > > > >> > > > > > > > possible that all these threads are trying to get
> to
> > > the
> > > > > >> same
> > > > > >> > row
> > > > > >> > > > to
> > > > > >> > > > > > > update
> > > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
> you
> > > > > >> thinking
> > > > > >> > > > > increment
> > > > > >> > > > > > > > itself has slowed significantly?
> > > > > >> > > > > > > We have just upgraded HBase, not changed the app
> > > behavior.
> > > > > We
> > > > > >> are
> > > > > >> > > > > > thinking
> > > > > >> > > > > > > increment itself has slowed significantly.
> > > > > >> > > > > > > Before upgrading HBase, it was good throughput and
> > > > latency.
> > > > > >> > > > > > > Currently, to cope with this problem, we split the
> > > regions
> > > > > >> > finely.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thanks,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Toshihiro Suzuki
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <stack@duboce.net
> >:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > > brfrn169@gmail.com
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Ted,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thank you for your response.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I think that increment operation works as
> follows:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > 1. get row lock
> > > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() //
> > > wait
> > > > > for
> > > > > >> all
> > > > > >> > > > prior
> > > > > >> > > > > > > MVCC
> > > > > >> > > > > > > > > transactions to finish
> > > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
> start a
> > > > > >> > transaction
> > > > > >> > > > > > > > > 4. get previous values
> > > > > >> > > > > > > > > 5. create KVs
> > > > > >> > > > > > > > > 6. write to Memstore
> > > > > >> > > > > > > > > 7. write to WAL
> > > > > >> > > > > > > > > 8. release row lock
> > > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
> > > complete
> > > > > the
> > > > > >> > > > > > transaction
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
> has a
> > > > > pending
> > > > > >> > > queue
> > > > > >> > > > of
> > > > > >> > > > > > > > writes
> > > > > >> > > > > > > > > named writeQueue.
> > > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
> waits
> > > > until
> > > > > >> > > > writeQueue
> > > > > >> > > > > > is
> > > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step
> 9
> > > > > removes
> > > > > >> the
> > > > > >> > > > > > > WriteEntry
> > > > > >> > > > > > > > > from writeQueue.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I think that when a handler thread is processing
> > > > between
> > > > > >> > step 2
> > > > > >> > > > and
> > > > > >> > > > > > > step
> > > > > >> > > > > > > > 9,
> > > > > >> > > > > > > > > the other handler threads can wait until the
> > thread
> > > > > >> completes
> > > > > >> > > > step
> > > > > >> > > > > 9.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > That is right. We need to read, after all
> > outstanding
> > > > > >> updates
> > > > > >> > are
> > > > > >> > > > > > done...
> > > > > >> > > > > > > > because we need to read the latest update before
> we
> > go
> > > > to
> > > > > >> > > > > > > modify/increment
> > > > > >> > > > > > > > it.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > How do you make out this?
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
> > > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > In 0.98.x we did this:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which
> is
> > > > > this....
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > +  public void
> > waitForPreviousTransactionsComplete() {
> > > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > >> > > > > > > > +  }
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0
> (
> > > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763
> ).
> > > > > Previous
> > > > > >> > mvcc
> > > > > >> > > > and
> > > > > >> > > > > > > > region
> > > > > >> > > > > > > > sequenceid would spin independent of each other.
> > > Perhaps
> > > > > >> this
> > > > > >> > > > > > responsible
> > > > > >> > > > > > > > for some slow down.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > That said, looking in your thread dump, we seem to
> > be
> > > > down
> > > > > >> in
> > > > > >> > the
> > > > > >> > > > > Get.
> > > > > >> > > > > > If
> > > > > >> > > > > > > > you do a bunch of thread dumps in a row, where is
> > the
> > > > > >> > > lock-holding
> > > > > >> > > > > > > thread?
> > > > > >> > > > > > > > In Get or writing Increment... or waiting on
> > sequence
> > > > id?
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > > post-upgrade?
> > > > > >> > Is
> > > > > >> > > it
> > > > > >> > > > > > > > possible that all these threads are trying to get
> to
> > > the
> > > > > >> same
> > > > > >> > row
> > > > > >> > > > to
> > > > > >> > > > > > > update
> > > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
> you
> > > > > >> thinking
> > > > > >> > > > > increment
> > > > > >> > > > > > > > itself has slowed significantly?
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > St.Ack
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Toshihiro Suzuki
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > yuzhihong@gmail.com
> > > > >:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
> > > > region):
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >     try {
> > > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Thanks
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > > > > >> brfrn169@gmail.com>
> > > > > >> > > > wrote:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > > Hi,
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > We upgraded our cluster from
> > > CDH5.3.1(HBase0.98.6)
> > > > > to
> > > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > >> > > > > > > > > > > and we experience slowdown in increment
> > > operation.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Here's an extract from thread dump of the
> > > > > >> RegionServer of
> > > > > >> > > our
> > > > > >> > > > > > > > cluster:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Thread 68
> > > > > >> > > > > > >
> > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > >> > > > > > > > > > >   State: BLOCKED
> > > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > >> > > > > > > > > > >   Blocked by 63
> > > > > >> > > > > > > > >
> > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > >> > > > > > > > > > >   Stack:
> > > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > >
> > > > > >> > >
> > > > > >>
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >
> > > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > >
> > > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > >
> > > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > There are many similar threads in the thread
> > > dump.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > I read the source code and I think this is
> > > caused
> > > > by
> > > > > >> > > changes
> > > > > >> > > > of
> > > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
> occur
> > in
> > > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Also we wrote performance test code for
> > > increment
> > > > > >> > operation
> > > > > >> > > > > that
> > > > > >> > > > > > > > > included
> > > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > The result is shown below:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > > > > >> 7.975072509210629
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > > > > 49.11840157868772
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Thanks,
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

https://gist.github.com/bbeaudreault/2994a748da83d9f75085

An active handler:
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
One that is locked:
https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579

The difference between pre-rollback and post is that previously we were
seeing things blocked in mvcc.  Now we are seeing them blocked on the
upsert.

It always follows the same pattern, of 1 active handler in the upsert and
the rest blocked waiting for it.

On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:

> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > The rollback seems to have mostly solved the issue for one of our
> clusters,
> > but another one is still seeing long increment times:
> >
> > "slowIncrementCount": 52080,
> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
> >
> >
> > Any ideas if there are other changes that may be causing a performance
> > regression for increments between CDH4.7.1 and CDH5.3.8?
> >
> >
> >
> No.
>
> Post a thread dump Bryan and it might prompt something.
>
> St.Ack
>
>
>
>
> >
> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> >
> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com> wrote:
> > >
> > > > Should this be added as a known issue in the CDH or hbase
> > documentation?
> > > It
> > > > was a severe performance hit for us, all of our regionservers were
> > > sitting
> > > > at a few thousand queued requests.
> > > >
> > > >
> > > Let me take care of that.
> > > St.Ack
> > >
> > >
> > >
> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com>
> > > > wrote:
> > > >
> > > > > Yea, they are all over the place and called from client and
> > coprocessor
> > > > > code. We ended up having no other option but to rollback, and aside
> > > from
> > > > a
> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> Put#addColumn),
> > > it
> > > > > seems to be working and fixing our problem.
> > > > >
> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
> > > > >
> > > > >> Rollback is untested. No fix in 5.5. I was going to work on this
> > now.
> > > > >> Where
> > > > >> are your counters Bryan? In their own column family or scattered
> > about
> > > > in
> > > > >> a
> > > > >> row with other Cell types?
> > > > >> St.Ack
> > > > >>
> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > >> bbeaudreault@hubspot.com> wrote:
> > > > >>
> > > > >> > Is there any update to this? We just upgraded all of our
> > production
> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed
> in
> > > the
> > > > >> > known issues, did not not about this.  Now we are seeing
> > perfomance
> > > > >> issues
> > > > >> > across all clusters, as we make heavy use of increments.
> > > > >> >
> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
> roll
> > > back
> > > > >> to
> > > > >> > CDH 5.3.1 (if that is possible)?
> > > > >> >
> > > > >> >
> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
> wrote:
> > > > >> >
> > > > >> > > Thank you St.Ack!
> > > > >> > >
> > > > >> > > I would like to follow the ticket.
> > > > >> > >
> > > > >> > > Toshihiro Suzuki
> > > > >> > >
> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > > > >> > >
> > > > >> > > > Back to this problem. Simple tests confirm that as is, the
> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
> some
> > > > other
> > > > >> row
> > > > >> > > is
> > > > >> > > > slow to complete. In particular Increment, checkAndPut, and
> > > batch
> > > > >> > > mutations
> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix up.
> > Lets
> > > > >> see if
> > > > >> > > we
> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so not
> > all
> > > > >> Region
> > > > >> > > ops
> > > > >> > > > are paused.
> > > > >> > > >
> > > > >> > > > St.Ack
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com>
> > > wrote:
> > > > >> > > >
> > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > helpful
> > > > >> > > diagram).
> > > > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > > > >> > > illustration.
> > > > >> > > > It
> > > > >> > > > > > is as though the mvcc should be scoped to a row only...
> > > Writes
> > > > >> > > against
> > > > >> > > > > > other rows should not hold up my read of my row. Tag an
> > mvcc
> > > > >> with a
> > > > >> > > > 'row'
> > > > >> > > > > > scope so we can see which on-going writes pertain to
> > current
> > > > >> > > operation?
> > > > >> > > > > Thank you St.Ack! I think this approach would work.
> > > > >> > > > >
> > > > >> > > > > > You need to read back the increment and have it be
> > 'correct'
> > > > at
> > > > >> > > > increment
> > > > >> > > > > > time?
> > > > >> > > > > Yes, we need it.
> > > > >> > > > >
> > > > >> > > > > I would like to help if there is anything I can do.
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > > Toshihiro Suzuki
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > > >> > > > >
> > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > helpful
> > > > >> > > diagram).
> > > > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > > > >> > > illustration.
> > > > >> > > > It
> > > > >> > > > > > is as though the mvcc should be scoped to a row only...
> > > Writes
> > > > >> > > against
> > > > >> > > > > > other rows should not hold up my read of my row. Tag an
> > mvcc
> > > > >> with a
> > > > >> > > > 'row'
> > > > >> > > > > > scope so we can see which on-going writes pertain to
> > current
> > > > >> > > operation?
> > > > >> > > > > >
> > > > >> > > > > > You need to read back the increment and have it be
> > 'correct'
> > > > at
> > > > >> > > > increment
> > > > >> > > > > > time?
> > > > >> > > > > >
> > > > >> > > > > > (This is a good one)
> > > > >> > > > > >
> > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > >> > > > > > St.Ack
> > > > >> > > > > >
> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> brfrn169@gmail.com
> > >
> > > > >> wrote:
> > > > >> > > > > >
> > > > >> > > > > > > St.Ack,
> > > > >> > > > > > >
> > > > >> > > > > > > Thank you for your response.
> > > > >> > > > > > >
> > > > >> > > > > > > Why I make out that "A region lock (not a row lock)
> > seems
> > > to
> > > > >> > occur
> > > > >> > > in
> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
> > > > >> > > > > > >
> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
> > > > >> > > > > > >
> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > >> > > > > > >
> > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > >> > > > > > >
> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> writeQueue
> > > can
> > > > >> > cause
> > > > >> > > a
> > > > >> > > > > > region
> > > > >> > > > > > > lock.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > >> > > > > > >
> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > >> > > > > > >
> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > advanceMemstore(w)
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > >> > > > > > >
> > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
> > > > >> writeQueue
> > > > >> > > and
> > > > >> > > > > > waits
> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst() ==
> w.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > I think when a handler thread is processing between
> > step 2
> > > > and
> > > > >> > step
> > > > >> > > > 3,
> > > > >> > > > > > the
> > > > >> > > > > > > other handler threads can wait at step 1 until the
> > thread
> > > > >> > completes
> > > > >> > > > > step
> > > > >> > > > > > 3
> > > > >> > > > > > > This is depicted as follows:
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > Actually, in the thread dump of our region server,
> many
> > > > >> handler
> > > > >> > > > threads
> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > >> > > > > > >
> > > > >> > > > > > > Many handler threads wait at this:
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > post-upgrade?
> > > > >> > Is
> > > > >> > > it
> > > > >> > > > > > > > possible that all these threads are trying to get to
> > the
> > > > >> same
> > > > >> > row
> > > > >> > > > to
> > > > >> > > > > > > update
> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > > > >> thinking
> > > > >> > > > > increment
> > > > >> > > > > > > > itself has slowed significantly?
> > > > >> > > > > > > We have just upgraded HBase, not changed the app
> > behavior.
> > > > We
> > > > >> are
> > > > >> > > > > > thinking
> > > > >> > > > > > > increment itself has slowed significantly.
> > > > >> > > > > > > Before upgrading HBase, it was good throughput and
> > > latency.
> > > > >> > > > > > > Currently, to cope with this problem, we split the
> > regions
> > > > >> > finely.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > >
> > > > >> > > > > > > Toshihiro Suzuki
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > > >> > > > > > >
> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > brfrn169@gmail.com
> > > > >
> > > > >> > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Ted,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Thank you for your response.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I think that increment operation works as follows:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 1. get row lock
> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() //
> > wait
> > > > for
> > > > >> all
> > > > >> > > > prior
> > > > >> > > > > > > MVCC
> > > > >> > > > > > > > > transactions to finish
> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> > > > >> > transaction
> > > > >> > > > > > > > > 4. get previous values
> > > > >> > > > > > > > > 5. create KVs
> > > > >> > > > > > > > > 6. write to Memstore
> > > > >> > > > > > > > > 7. write to WAL
> > > > >> > > > > > > > > 8. release row lock
> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
> > complete
> > > > the
> > > > >> > > > > > transaction
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl has a
> > > > pending
> > > > >> > > queue
> > > > >> > > > of
> > > > >> > > > > > > > writes
> > > > >> > > > > > > > > named writeQueue.
> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits
> > > until
> > > > >> > > > writeQueue
> > > > >> > > > > > is
> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9
> > > > removes
> > > > >> the
> > > > >> > > > > > > WriteEntry
> > > > >> > > > > > > > > from writeQueue.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I think that when a handler thread is processing
> > > between
> > > > >> > step 2
> > > > >> > > > and
> > > > >> > > > > > > step
> > > > >> > > > > > > > 9,
> > > > >> > > > > > > > > the other handler threads can wait until the
> thread
> > > > >> completes
> > > > >> > > > step
> > > > >> > > > > 9.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > That is right. We need to read, after all
> outstanding
> > > > >> updates
> > > > >> > are
> > > > >> > > > > > done...
> > > > >> > > > > > > > because we need to read the latest update before we
> go
> > > to
> > > > >> > > > > > > modify/increment
> > > > >> > > > > > > > it.
> > > > >> > > > > > > >
> > > > >> > > > > > > > How do you make out this?
> > > > >> > > > > > > >
> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > >> > > > > > > >
> > > > >> > > > > > > > In 0.98.x we did this:
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > >> > > > > > > >
> > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > >> > > > > > > >
> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is
> > > > this....
> > > > >> > > > > > > >
> > > > >> > > > > > > > +  public void
> waitForPreviousTransactionsComplete() {
> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > >> > > > > > > > +  }
> > > > >> > > > > > > >
> > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763).
> > > > Previous
> > > > >> > mvcc
> > > > >> > > > and
> > > > >> > > > > > > > region
> > > > >> > > > > > > > sequenceid would spin independent of each other.
> > Perhaps
> > > > >> this
> > > > >> > > > > > responsible
> > > > >> > > > > > > > for some slow down.
> > > > >> > > > > > > >
> > > > >> > > > > > > > That said, looking in your thread dump, we seem to
> be
> > > down
> > > > >> in
> > > > >> > the
> > > > >> > > > > Get.
> > > > >> > > > > > If
> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where is
> the
> > > > >> > > lock-holding
> > > > >> > > > > > > thread?
> > > > >> > > > > > > > In Get or writing Increment... or waiting on
> sequence
> > > id?
> > > > >> > > > > > > >
> > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > post-upgrade?
> > > > >> > Is
> > > > >> > > it
> > > > >> > > > > > > > possible that all these threads are trying to get to
> > the
> > > > >> same
> > > > >> > row
> > > > >> > > > to
> > > > >> > > > > > > update
> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > > > >> thinking
> > > > >> > > > > increment
> > > > >> > > > > > > > itself has slowed significantly?
> > > > >> > > > > > > >
> > > > >> > > > > > > > St.Ack
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Thanks,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Toshihiro Suzuki
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > yuzhihong@gmail.com
> > > >:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
> > > region):
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >     try {
> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Thanks
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > > > >> brfrn169@gmail.com>
> > > > >> > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > Hi,
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > We upgraded our cluster from
> > CDH5.3.1(HBase0.98.6)
> > > > to
> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >> > > > > > > > > > > and we experience slowdown in increment
> > operation.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
> > > > >> RegionServer of
> > > > >> > > our
> > > > >> > > > > > > > cluster:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Thread 68
> > > > >> > > > > > >
> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > >> > > > > > > > > > >   State: BLOCKED
> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > >> > > > > > > > > > >   Blocked by 63
> > > > >> > > > > > > > >
> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > >> > > > > > > > > > >   Stack:
> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > >
> > > > >> > >
> > > > >>
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > >> > > > > > > > > > >
> > > > >> > > > > >
> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > >> > > > > > > > > > >
> > > > >> > > > > >
> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > >
> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > There are many similar threads in the thread
> > dump.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > I read the source code and I think this is
> > caused
> > > by
> > > > >> > > changes
> > > > >> > > > of
> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to occur
> in
> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Also we wrote performance test code for
> > increment
> > > > >> > operation
> > > > >> > > > > that
> > > > >> > > > > > > > > included
> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > The result is shown below:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > > > >> 7.975072509210629
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > > > 49.11840157868772
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

https://gist.github.com/bbeaudreault/2994a748da83d9f75085

An active handler:
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
One that is locked:
https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579

The difference between pre-rollback and post is that previously we were
seeing things blocked in mvcc.  Now we are seeing them blocked on the
upsert.

It always follows the same pattern, of 1 active handler in the upsert and
the rest blocked waiting for it.

On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:

> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > The rollback seems to have mostly solved the issue for one of our
> clusters,
> > but another one is still seeing long increment times:
> >
> > "slowIncrementCount": 52080,
> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
> >
> >
> > Any ideas if there are other changes that may be causing a performance
> > regression for increments between CDH4.7.1 and CDH5.3.8?
> >
> >
> >
> No.
>
> Post a thread dump Bryan and it might prompt something.
>
> St.Ack
>
>
>
>
> >
> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
> >
> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com> wrote:
> > >
> > > > Should this be added as a known issue in the CDH or hbase
> > documentation?
> > > It
> > > > was a severe performance hit for us, all of our regionservers were
> > > sitting
> > > > at a few thousand queued requests.
> > > >
> > > >
> > > Let me take care of that.
> > > St.Ack
> > >
> > >
> > >
> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com>
> > > > wrote:
> > > >
> > > > > Yea, they are all over the place and called from client and
> > coprocessor
> > > > > code. We ended up having no other option but to rollback, and aside
> > > from
> > > > a
> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> Put#addColumn),
> > > it
> > > > > seems to be working and fixing our problem.
> > > > >
> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
> > > > >
> > > > >> Rollback is untested. No fix in 5.5. I was going to work on this
> > now.
> > > > >> Where
> > > > >> are your counters Bryan? In their own column family or scattered
> > about
> > > > in
> > > > >> a
> > > > >> row with other Cell types?
> > > > >> St.Ack
> > > > >>
> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > >> bbeaudreault@hubspot.com> wrote:
> > > > >>
> > > > >> > Is there any update to this? We just upgraded all of our
> > production
> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed
> in
> > > the
> > > > >> > known issues, did not not about this.  Now we are seeing
> > perfomance
> > > > >> issues
> > > > >> > across all clusters, as we make heavy use of increments.
> > > > >> >
> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
> roll
> > > back
> > > > >> to
> > > > >> > CDH 5.3.1 (if that is possible)?
> > > > >> >
> > > > >> >
> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com>
> wrote:
> > > > >> >
> > > > >> > > Thank you St.Ack!
> > > > >> > >
> > > > >> > > I would like to follow the ticket.
> > > > >> > >
> > > > >> > > Toshihiro Suzuki
> > > > >> > >
> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > > > >> > >
> > > > >> > > > Back to this problem. Simple tests confirm that as is, the
> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
> some
> > > > other
> > > > >> row
> > > > >> > > is
> > > > >> > > > slow to complete. In particular Increment, checkAndPut, and
> > > batch
> > > > >> > > mutations
> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix up.
> > Lets
> > > > >> see if
> > > > >> > > we
> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so not
> > all
> > > > >> Region
> > > > >> > > ops
> > > > >> > > > are paused.
> > > > >> > > >
> > > > >> > > > St.Ack
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com>
> > > wrote:
> > > > >> > > >
> > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > helpful
> > > > >> > > diagram).
> > > > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > > > >> > > illustration.
> > > > >> > > > It
> > > > >> > > > > > is as though the mvcc should be scoped to a row only...
> > > Writes
> > > > >> > > against
> > > > >> > > > > > other rows should not hold up my read of my row. Tag an
> > mvcc
> > > > >> with a
> > > > >> > > > 'row'
> > > > >> > > > > > scope so we can see which on-going writes pertain to
> > current
> > > > >> > > operation?
> > > > >> > > > > Thank you St.Ack! I think this approach would work.
> > > > >> > > > >
> > > > >> > > > > > You need to read back the increment and have it be
> > 'correct'
> > > > at
> > > > >> > > > increment
> > > > >> > > > > > time?
> > > > >> > > > > Yes, we need it.
> > > > >> > > > >
> > > > >> > > > > I would like to help if there is anything I can do.
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > > Toshihiro Suzuki
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > > >> > > > >
> > > > >> > > > > > Thank you for the below reasoning (with accompanying
> > helpful
> > > > >> > > diagram).
> > > > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > > > >> > > illustration.
> > > > >> > > > It
> > > > >> > > > > > is as though the mvcc should be scoped to a row only...
> > > Writes
> > > > >> > > against
> > > > >> > > > > > other rows should not hold up my read of my row. Tag an
> > mvcc
> > > > >> with a
> > > > >> > > > 'row'
> > > > >> > > > > > scope so we can see which on-going writes pertain to
> > current
> > > > >> > > operation?
> > > > >> > > > > >
> > > > >> > > > > > You need to read back the increment and have it be
> > 'correct'
> > > > at
> > > > >> > > > increment
> > > > >> > > > > > time?
> > > > >> > > > > >
> > > > >> > > > > > (This is a good one)
> > > > >> > > > > >
> > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > >> > > > > > St.Ack
> > > > >> > > > > >
> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> brfrn169@gmail.com
> > >
> > > > >> wrote:
> > > > >> > > > > >
> > > > >> > > > > > > St.Ack,
> > > > >> > > > > > >
> > > > >> > > > > > > Thank you for your response.
> > > > >> > > > > > >
> > > > >> > > > > > > Why I make out that "A region lock (not a row lock)
> > seems
> > > to
> > > > >> > occur
> > > > >> > > in
> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
> > > > >> > > > > > >
> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
> > > > >> > > > > > >
> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > >> > > > > > >
> > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > >> > > > > > >
> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
> writeQueue
> > > can
> > > > >> > cause
> > > > >> > > a
> > > > >> > > > > > region
> > > > >> > > > > > > lock.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > >> > > > > > >
> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > >> > > > > > >
> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > advanceMemstore(w)
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > >> > > > > > >
> > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
> > > > >> writeQueue
> > > > >> > > and
> > > > >> > > > > > waits
> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst() ==
> w.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > I think when a handler thread is processing between
> > step 2
> > > > and
> > > > >> > step
> > > > >> > > > 3,
> > > > >> > > > > > the
> > > > >> > > > > > > other handler threads can wait at step 1 until the
> > thread
> > > > >> > completes
> > > > >> > > > > step
> > > > >> > > > > > 3
> > > > >> > > > > > > This is depicted as follows:
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > Actually, in the thread dump of our region server,
> many
> > > > >> handler
> > > > >> > > > threads
> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > >> > > > > > >
> > > > >> > > > > > > Many handler threads wait at this:
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > post-upgrade?
> > > > >> > Is
> > > > >> > > it
> > > > >> > > > > > > > possible that all these threads are trying to get to
> > the
> > > > >> same
> > > > >> > row
> > > > >> > > > to
> > > > >> > > > > > > update
> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > > > >> thinking
> > > > >> > > > > increment
> > > > >> > > > > > > > itself has slowed significantly?
> > > > >> > > > > > > We have just upgraded HBase, not changed the app
> > behavior.
> > > > We
> > > > >> are
> > > > >> > > > > > thinking
> > > > >> > > > > > > increment itself has slowed significantly.
> > > > >> > > > > > > Before upgrading HBase, it was good throughput and
> > > latency.
> > > > >> > > > > > > Currently, to cope with this problem, we split the
> > regions
> > > > >> > finely.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > >
> > > > >> > > > > > > Toshihiro Suzuki
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > > >> > > > > > >
> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > brfrn169@gmail.com
> > > > >
> > > > >> > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Ted,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Thank you for your response.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I think that increment operation works as follows:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 1. get row lock
> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() //
> > wait
> > > > for
> > > > >> all
> > > > >> > > > prior
> > > > >> > > > > > > MVCC
> > > > >> > > > > > > > > transactions to finish
> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> > > > >> > transaction
> > > > >> > > > > > > > > 4. get previous values
> > > > >> > > > > > > > > 5. create KVs
> > > > >> > > > > > > > > 6. write to Memstore
> > > > >> > > > > > > > > 7. write to WAL
> > > > >> > > > > > > > > 8. release row lock
> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
> > complete
> > > > the
> > > > >> > > > > > transaction
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl has a
> > > > pending
> > > > >> > > queue
> > > > >> > > > of
> > > > >> > > > > > > > writes
> > > > >> > > > > > > > > named writeQueue.
> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits
> > > until
> > > > >> > > > writeQueue
> > > > >> > > > > > is
> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9
> > > > removes
> > > > >> the
> > > > >> > > > > > > WriteEntry
> > > > >> > > > > > > > > from writeQueue.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I think that when a handler thread is processing
> > > between
> > > > >> > step 2
> > > > >> > > > and
> > > > >> > > > > > > step
> > > > >> > > > > > > > 9,
> > > > >> > > > > > > > > the other handler threads can wait until the
> thread
> > > > >> completes
> > > > >> > > > step
> > > > >> > > > > 9.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > That is right. We need to read, after all
> outstanding
> > > > >> updates
> > > > >> > are
> > > > >> > > > > > done...
> > > > >> > > > > > > > because we need to read the latest update before we
> go
> > > to
> > > > >> > > > > > > modify/increment
> > > > >> > > > > > > > it.
> > > > >> > > > > > > >
> > > > >> > > > > > > > How do you make out this?
> > > > >> > > > > > > >
> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > >> > > > > > > >
> > > > >> > > > > > > > In 0.98.x we did this:
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > >> > > > > > > >
> > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > >> > > > > > > >
> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is
> > > > this....
> > > > >> > > > > > > >
> > > > >> > > > > > > > +  public void
> waitForPreviousTransactionsComplete() {
> > > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > >> > > > > > > > +  }
> > > > >> > > > > > > >
> > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763).
> > > > Previous
> > > > >> > mvcc
> > > > >> > > > and
> > > > >> > > > > > > > region
> > > > >> > > > > > > > sequenceid would spin independent of each other.
> > Perhaps
> > > > >> this
> > > > >> > > > > > responsible
> > > > >> > > > > > > > for some slow down.
> > > > >> > > > > > > >
> > > > >> > > > > > > > That said, looking in your thread dump, we seem to
> be
> > > down
> > > > >> in
> > > > >> > the
> > > > >> > > > > Get.
> > > > >> > > > > > If
> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where is
> the
> > > > >> > > lock-holding
> > > > >> > > > > > > thread?
> > > > >> > > > > > > > In Get or writing Increment... or waiting on
> sequence
> > > id?
> > > > >> > > > > > > >
> > > > >> > > > > > > > Is it possible you are contending on a counter
> > > > post-upgrade?
> > > > >> > Is
> > > > >> > > it
> > > > >> > > > > > > > possible that all these threads are trying to get to
> > the
> > > > >> same
> > > > >> > row
> > > > >> > > > to
> > > > >> > > > > > > update
> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > > > >> thinking
> > > > >> > > > > increment
> > > > >> > > > > > > > itself has slowed significantly?
> > > > >> > > > > > > >
> > > > >> > > > > > > > St.Ack
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Thanks,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Toshihiro Suzuki
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > yuzhihong@gmail.com
> > > >:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
> > > region):
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >     try {
> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Thanks
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > > > >> brfrn169@gmail.com>
> > > > >> > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > Hi,
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > We upgraded our cluster from
> > CDH5.3.1(HBase0.98.6)
> > > > to
> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >> > > > > > > > > > > and we experience slowdown in increment
> > operation.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
> > > > >> RegionServer of
> > > > >> > > our
> > > > >> > > > > > > > cluster:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Thread 68
> > > > >> > > > > > >
> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > >> > > > > > > > > > >   State: BLOCKED
> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > >> > > > > > > > > > >   Blocked by 63
> > > > >> > > > > > > > >
> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > >> > > > > > > > > > >   Stack:
> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > >
> > > > >> > >
> > > > >>
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > >> > > > > > > > > > >
> > > > >> > > > > >
> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > >> > > > > > > > > > >
> > > > >> > > > > >
> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > >
> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > There are many similar threads in the thread
> > dump.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > I read the source code and I think this is
> > caused
> > > by
> > > > >> > > changes
> > > > >> > > > of
> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to occur
> in
> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Also we wrote performance test code for
> > increment
> > > > >> > operation
> > > > >> > > > > that
> > > > >> > > > > > > > > included
> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > The result is shown below:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > > > >> 7.975072509210629
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > > > 49.11840157868772
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> The rollback seems to have mostly solved the issue for one of our clusters,
> but another one is still seeing long increment times:
>
> "slowIncrementCount": 52080,
> "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
> Increment_mean": 465.68678129112396,"Increment_median": 216,"
> Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
>
>
> Any ideas if there are other changes that may be causing a performance
> regression for increments between CDH4.7.1 and CDH5.3.8?
>
>
>
No.

Post a thread dump Bryan and it might prompt something.

St.Ack




>
> On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
>
> > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> > > Should this be added as a known issue in the CDH or hbase
> documentation?
> > It
> > > was a severe performance hit for us, all of our regionservers were
> > sitting
> > > at a few thousand queued requests.
> > >
> > >
> > Let me take care of that.
> > St.Ack
> >
> >
> >
> > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > bbeaudreault@hubspot.com>
> > > wrote:
> > >
> > > > Yea, they are all over the place and called from client and
> coprocessor
> > > > code. We ended up having no other option but to rollback, and aside
> > from
> > > a
> > > > few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn),
> > it
> > > > seems to be working and fixing our problem.
> > > >
> > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
> > > >
> > > >> Rollback is untested. No fix in 5.5. I was going to work on this
> now.
> > > >> Where
> > > >> are your counters Bryan? In their own column family or scattered
> about
> > > in
> > > >> a
> > > >> row with other Cell types?
> > > >> St.Ack
> > > >>
> > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > >> bbeaudreault@hubspot.com> wrote:
> > > >>
> > > >> > Is there any update to this? We just upgraded all of our
> production
> > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in
> > the
> > > >> > known issues, did not not about this.  Now we are seeing
> perfomance
> > > >> issues
> > > >> > across all clusters, as we make heavy use of increments.
> > > >> >
> > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll
> > back
> > > >> to
> > > >> > CDH 5.3.1 (if that is possible)?
> > > >> >
> > > >> >
> > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
> > > >> >
> > > >> > > Thank you St.Ack!
> > > >> > >
> > > >> > > I would like to follow the ticket.
> > > >> > >
> > > >> > > Toshihiro Suzuki
> > > >> > >
> > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > > >> > >
> > > >> > > > Back to this problem. Simple tests confirm that as is, the
> > > >> > > > single-queue-backed MVCC instance can slow Region ops if some
> > > other
> > > >> row
> > > >> > > is
> > > >> > > > slow to complete. In particular Increment, checkAndPut, and
> > batch
> > > >> > > mutations
> > > >> > > > are effected. I opened HBASE-14460 to start in on a fix up.
> Lets
> > > >> see if
> > > >> > > we
> > > >> > > > can somehow scope mvcc to row or at least shard mvcc so not
> all
> > > >> Region
> > > >> > > ops
> > > >> > > > are paused.
> > > >> > > >
> > > >> > > > St.Ack
> > > >> > > >
> > > >> > > >
> > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com>
> > wrote:
> > > >> > > >
> > > >> > > > > > Thank you for the below reasoning (with accompanying
> helpful
> > > >> > > diagram).
> > > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > > >> > > illustration.
> > > >> > > > It
> > > >> > > > > > is as though the mvcc should be scoped to a row only...
> > Writes
> > > >> > > against
> > > >> > > > > > other rows should not hold up my read of my row. Tag an
> mvcc
> > > >> with a
> > > >> > > > 'row'
> > > >> > > > > > scope so we can see which on-going writes pertain to
> current
> > > >> > > operation?
> > > >> > > > > Thank you St.Ack! I think this approach would work.
> > > >> > > > >
> > > >> > > > > > You need to read back the increment and have it be
> 'correct'
> > > at
> > > >> > > > increment
> > > >> > > > > > time?
> > > >> > > > > Yes, we need it.
> > > >> > > > >
> > > >> > > > > I would like to help if there is anything I can do.
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Toshihiro Suzuki
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > >> > > > >
> > > >> > > > > > Thank you for the below reasoning (with accompanying
> helpful
> > > >> > > diagram).
> > > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > > >> > > illustration.
> > > >> > > > It
> > > >> > > > > > is as though the mvcc should be scoped to a row only...
> > Writes
> > > >> > > against
> > > >> > > > > > other rows should not hold up my read of my row. Tag an
> mvcc
> > > >> with a
> > > >> > > > 'row'
> > > >> > > > > > scope so we can see which on-going writes pertain to
> current
> > > >> > > operation?
> > > >> > > > > >
> > > >> > > > > > You need to read back the increment and have it be
> 'correct'
> > > at
> > > >> > > > increment
> > > >> > > > > > time?
> > > >> > > > > >
> > > >> > > > > > (This is a good one)
> > > >> > > > > >
> > > >> > > > > > Thank you Toshihiro Suzuki
> > > >> > > > > > St.Ack
> > > >> > > > > >
> > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <brfrn169@gmail.com
> >
> > > >> wrote:
> > > >> > > > > >
> > > >> > > > > > > St.Ack,
> > > >> > > > > > >
> > > >> > > > > > > Thank you for your response.
> > > >> > > > > > >
> > > >> > > > > > > Why I make out that "A region lock (not a row lock)
> seems
> > to
> > > >> > occur
> > > >> > > in
> > > >> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
> > > >> > > > > > >
> > > >> > > > > > > A increment operation has 3 procedures for MVCC.
> > > >> > > > > > >
> > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > >> > > > > > >
> > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > >> > > > > > >
> > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > I think that MultiVersionConsistencyControl's writeQueue
> > can
> > > >> > cause
> > > >> > > a
> > > >> > > > > > region
> > > >> > > > > > > lock.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > >> > > > > > >
> > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > >> > > > > > >
> > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > advanceMemstore(w)
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > >> > > > > > >
> > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
> > > >> writeQueue
> > > >> > > and
> > > >> > > > > > waits
> > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > I think when a handler thread is processing between
> step 2
> > > and
> > > >> > step
> > > >> > > > 3,
> > > >> > > > > > the
> > > >> > > > > > > other handler threads can wait at step 1 until the
> thread
> > > >> > completes
> > > >> > > > > step
> > > >> > > > > > 3
> > > >> > > > > > > This is depicted as follows:
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > Actually, in the thread dump of our region server, many
> > > >> handler
> > > >> > > > threads
> > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > >> > > > > > >
> > > >> > > > > > > Many handler threads wait at this:
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > > Is it possible you are contending on a counter
> > > post-upgrade?
> > > >> > Is
> > > >> > > it
> > > >> > > > > > > > possible that all these threads are trying to get to
> the
> > > >> same
> > > >> > row
> > > >> > > > to
> > > >> > > > > > > update
> > > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > > >> thinking
> > > >> > > > > increment
> > > >> > > > > > > > itself has slowed significantly?
> > > >> > > > > > > We have just upgraded HBase, not changed the app
> behavior.
> > > We
> > > >> are
> > > >> > > > > > thinking
> > > >> > > > > > > increment itself has slowed significantly.
> > > >> > > > > > > Before upgrading HBase, it was good throughput and
> > latency.
> > > >> > > > > > > Currently, to cope with this problem, we split the
> regions
> > > >> > finely.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > >
> > > >> > > > > > > Toshihiro Suzuki
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > >> > > > > > >
> > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > brfrn169@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > Ted,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thank you for your response.
> > > >> > > > > > > > >
> > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > I think that increment operation works as follows:
> > > >> > > > > > > > >
> > > >> > > > > > > > > 1. get row lock
> > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() //
> wait
> > > for
> > > >> all
> > > >> > > > prior
> > > >> > > > > > > MVCC
> > > >> > > > > > > > > transactions to finish
> > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> > > >> > transaction
> > > >> > > > > > > > > 4. get previous values
> > > >> > > > > > > > > 5. create KVs
> > > >> > > > > > > > > 6. write to Memstore
> > > >> > > > > > > > > 7. write to WAL
> > > >> > > > > > > > > 8. release row lock
> > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
> complete
> > > the
> > > >> > > > > > transaction
> > > >> > > > > > > > >
> > > >> > > > > > > > > A instance of MultiVersionConsistencyControl has a
> > > pending
> > > >> > > queue
> > > >> > > > of
> > > >> > > > > > > > writes
> > > >> > > > > > > > > named writeQueue.
> > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits
> > until
> > > >> > > > writeQueue
> > > >> > > > > > is
> > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9
> > > removes
> > > >> the
> > > >> > > > > > > WriteEntry
> > > >> > > > > > > > > from writeQueue.
> > > >> > > > > > > > >
> > > >> > > > > > > > > I think that when a handler thread is processing
> > between
> > > >> > step 2
> > > >> > > > and
> > > >> > > > > > > step
> > > >> > > > > > > > 9,
> > > >> > > > > > > > > the other handler threads can wait until the thread
> > > >> completes
> > > >> > > > step
> > > >> > > > > 9.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > That is right. We need to read, after all outstanding
> > > >> updates
> > > >> > are
> > > >> > > > > > done...
> > > >> > > > > > > > because we need to read the latest update before we go
> > to
> > > >> > > > > > > modify/increment
> > > >> > > > > > > > it.
> > > >> > > > > > > >
> > > >> > > > > > > > How do you make out this?
> > > >> > > > > > > >
> > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
> > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > >> > > > > > > >
> > > >> > > > > > > > In 0.98.x we did this:
> > > >> > > > > > > >
> > > >> > > > > > > >
> mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > >> > > > > > > >
> > > >> > > > > > > > ... and in 1.0 we do this:
> > > >> > > > > > > >
> > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is
> > > this....
> > > >> > > > > > > >
> > > >> > > > > > > > +  public void waitForPreviousTransactionsComplete() {
> > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > >> > > > > > > > +  }
> > > >> > > > > > > >
> > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763).
> > > Previous
> > > >> > mvcc
> > > >> > > > and
> > > >> > > > > > > > region
> > > >> > > > > > > > sequenceid would spin independent of each other.
> Perhaps
> > > >> this
> > > >> > > > > > responsible
> > > >> > > > > > > > for some slow down.
> > > >> > > > > > > >
> > > >> > > > > > > > That said, looking in your thread dump, we seem to be
> > down
> > > >> in
> > > >> > the
> > > >> > > > > Get.
> > > >> > > > > > If
> > > >> > > > > > > > you do a bunch of thread dumps in a row, where is the
> > > >> > > lock-holding
> > > >> > > > > > > thread?
> > > >> > > > > > > > In Get or writing Increment... or waiting on sequence
> > id?
> > > >> > > > > > > >
> > > >> > > > > > > > Is it possible you are contending on a counter
> > > post-upgrade?
> > > >> > Is
> > > >> > > it
> > > >> > > > > > > > possible that all these threads are trying to get to
> the
> > > >> same
> > > >> > row
> > > >> > > > to
> > > >> > > > > > > update
> > > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > > >> thinking
> > > >> > > > > increment
> > > >> > > > > > > > itself has slowed significantly?
> > > >> > > > > > > >
> > > >> > > > > > > > St.Ack
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > > Thanks,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Toshihiro Suzuki
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> yuzhihong@gmail.com
> > >:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
> > region):
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     try {
> > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Thanks
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > > >> brfrn169@gmail.com>
> > > >> > > > wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > Hi,
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > We upgraded our cluster from
> CDH5.3.1(HBase0.98.6)
> > > to
> > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > >> > > > > > > > > > > and we experience slowdown in increment
> operation.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Here's an extract from thread dump of the
> > > >> RegionServer of
> > > >> > > our
> > > >> > > > > > > > cluster:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Thread 68
> > > >> > > > > > >
> (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > >> > > > > > > > > > >   State: BLOCKED
> > > >> > > > > > > > > > >   Blocked count: 21689888
> > > >> > > > > > > > > > >   Waited count: 39828360
> > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > >> > > > > > > > > > >   Blocked by 63
> > > >> > > > > > > > >
> > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > >> > > > > > > > > > >   Stack:
> > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > >
> > > >> > > > >
> > > >> > >
> > > >>
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > >> > > > > > > > > > >
> > > >> > > > > >
> > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > >> > > > > > > > > > >
> > > >> > > > > >
> > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > >> > > > > > > > > > >
> > > >> > > > > > > >
> > > >> > > >
> > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > There are many similar threads in the thread
> dump.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > I read the source code and I think this is
> caused
> > by
> > > >> > > changes
> > > >> > > > of
> > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > >> > > > > > > > > > > A region lock (not a row lock) seems to occur in
> > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Also we wrote performance test code for
> increment
> > > >> > operation
> > > >> > > > > that
> > > >> > > > > > > > > included
> > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > The result is shown below:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > > >> 7.975072509210629
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > > 49.11840157868772
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Thanks,
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Toshihiro Suzuki
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> The rollback seems to have mostly solved the issue for one of our clusters,
> but another one is still seeing long increment times:
>
> "slowIncrementCount": 52080,
> "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
> Increment_mean": 465.68678129112396,"Increment_median": 216,"
> Increment_75th_percentile": 450.25,"Increment_95th_percentile":
> 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
>
>
> Any ideas if there are other changes that may be causing a performance
> regression for increments between CDH4.7.1 and CDH5.3.8?
>
>
>
No.

Post a thread dump Bryan and it might prompt something.

St.Ack




>
> On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:
>
> > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> > > Should this be added as a known issue in the CDH or hbase
> documentation?
> > It
> > > was a severe performance hit for us, all of our regionservers were
> > sitting
> > > at a few thousand queued requests.
> > >
> > >
> > Let me take care of that.
> > St.Ack
> >
> >
> >
> > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > bbeaudreault@hubspot.com>
> > > wrote:
> > >
> > > > Yea, they are all over the place and called from client and
> coprocessor
> > > > code. We ended up having no other option but to rollback, and aside
> > from
> > > a
> > > > few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn),
> > it
> > > > seems to be working and fixing our problem.
> > > >
> > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
> > > >
> > > >> Rollback is untested. No fix in 5.5. I was going to work on this
> now.
> > > >> Where
> > > >> are your counters Bryan? In their own column family or scattered
> about
> > > in
> > > >> a
> > > >> row with other Cell types?
> > > >> St.Ack
> > > >>
> > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > >> bbeaudreault@hubspot.com> wrote:
> > > >>
> > > >> > Is there any update to this? We just upgraded all of our
> production
> > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in
> > the
> > > >> > known issues, did not not about this.  Now we are seeing
> perfomance
> > > >> issues
> > > >> > across all clusters, as we make heavy use of increments.
> > > >> >
> > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll
> > back
> > > >> to
> > > >> > CDH 5.3.1 (if that is possible)?
> > > >> >
> > > >> >
> > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
> > > >> >
> > > >> > > Thank you St.Ack!
> > > >> > >
> > > >> > > I would like to follow the ticket.
> > > >> > >
> > > >> > > Toshihiro Suzuki
> > > >> > >
> > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > > >> > >
> > > >> > > > Back to this problem. Simple tests confirm that as is, the
> > > >> > > > single-queue-backed MVCC instance can slow Region ops if some
> > > other
> > > >> row
> > > >> > > is
> > > >> > > > slow to complete. In particular Increment, checkAndPut, and
> > batch
> > > >> > > mutations
> > > >> > > > are effected. I opened HBASE-14460 to start in on a fix up.
> Lets
> > > >> see if
> > > >> > > we
> > > >> > > > can somehow scope mvcc to row or at least shard mvcc so not
> all
> > > >> Region
> > > >> > > ops
> > > >> > > > are paused.
> > > >> > > >
> > > >> > > > St.Ack
> > > >> > > >
> > > >> > > >
> > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com>
> > wrote:
> > > >> > > >
> > > >> > > > > > Thank you for the below reasoning (with accompanying
> helpful
> > > >> > > diagram).
> > > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > > >> > > illustration.
> > > >> > > > It
> > > >> > > > > > is as though the mvcc should be scoped to a row only...
> > Writes
> > > >> > > against
> > > >> > > > > > other rows should not hold up my read of my row. Tag an
> mvcc
> > > >> with a
> > > >> > > > 'row'
> > > >> > > > > > scope so we can see which on-going writes pertain to
> current
> > > >> > > operation?
> > > >> > > > > Thank you St.Ack! I think this approach would work.
> > > >> > > > >
> > > >> > > > > > You need to read back the increment and have it be
> 'correct'
> > > at
> > > >> > > > increment
> > > >> > > > > > time?
> > > >> > > > > Yes, we need it.
> > > >> > > > >
> > > >> > > > > I would like to help if there is anything I can do.
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Toshihiro Suzuki
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > >> > > > >
> > > >> > > > > > Thank you for the below reasoning (with accompanying
> helpful
> > > >> > > diagram).
> > > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > > >> > > illustration.
> > > >> > > > It
> > > >> > > > > > is as though the mvcc should be scoped to a row only...
> > Writes
> > > >> > > against
> > > >> > > > > > other rows should not hold up my read of my row. Tag an
> mvcc
> > > >> with a
> > > >> > > > 'row'
> > > >> > > > > > scope so we can see which on-going writes pertain to
> current
> > > >> > > operation?
> > > >> > > > > >
> > > >> > > > > > You need to read back the increment and have it be
> 'correct'
> > > at
> > > >> > > > increment
> > > >> > > > > > time?
> > > >> > > > > >
> > > >> > > > > > (This is a good one)
> > > >> > > > > >
> > > >> > > > > > Thank you Toshihiro Suzuki
> > > >> > > > > > St.Ack
> > > >> > > > > >
> > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <brfrn169@gmail.com
> >
> > > >> wrote:
> > > >> > > > > >
> > > >> > > > > > > St.Ack,
> > > >> > > > > > >
> > > >> > > > > > > Thank you for your response.
> > > >> > > > > > >
> > > >> > > > > > > Why I make out that "A region lock (not a row lock)
> seems
> > to
> > > >> > occur
> > > >> > > in
> > > >> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
> > > >> > > > > > >
> > > >> > > > > > > A increment operation has 3 procedures for MVCC.
> > > >> > > > > > >
> > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > >> > > > > > >
> > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > >> > > > > > >
> > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > I think that MultiVersionConsistencyControl's writeQueue
> > can
> > > >> > cause
> > > >> > > a
> > > >> > > > > > region
> > > >> > > > > > > lock.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > >> > > > > > >
> > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > >> > > > > > >
> > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > advanceMemstore(w)
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > >> > > > > > >
> > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
> > > >> writeQueue
> > > >> > > and
> > > >> > > > > > waits
> > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > I think when a handler thread is processing between
> step 2
> > > and
> > > >> > step
> > > >> > > > 3,
> > > >> > > > > > the
> > > >> > > > > > > other handler threads can wait at step 1 until the
> thread
> > > >> > completes
> > > >> > > > > step
> > > >> > > > > > 3
> > > >> > > > > > > This is depicted as follows:
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > Actually, in the thread dump of our region server, many
> > > >> handler
> > > >> > > > threads
> > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > >> > > > > > >
> > > >> > > > > > > Many handler threads wait at this:
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > > Is it possible you are contending on a counter
> > > post-upgrade?
> > > >> > Is
> > > >> > > it
> > > >> > > > > > > > possible that all these threads are trying to get to
> the
> > > >> same
> > > >> > row
> > > >> > > > to
> > > >> > > > > > > update
> > > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > > >> thinking
> > > >> > > > > increment
> > > >> > > > > > > > itself has slowed significantly?
> > > >> > > > > > > We have just upgraded HBase, not changed the app
> behavior.
> > > We
> > > >> are
> > > >> > > > > > thinking
> > > >> > > > > > > increment itself has slowed significantly.
> > > >> > > > > > > Before upgrading HBase, it was good throughput and
> > latency.
> > > >> > > > > > > Currently, to cope with this problem, we split the
> regions
> > > >> > finely.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > >
> > > >> > > > > > > Toshihiro Suzuki
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > >> > > > > > >
> > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > brfrn169@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > Ted,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thank you for your response.
> > > >> > > > > > > > >
> > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > I think that increment operation works as follows:
> > > >> > > > > > > > >
> > > >> > > > > > > > > 1. get row lock
> > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() //
> wait
> > > for
> > > >> all
> > > >> > > > prior
> > > >> > > > > > > MVCC
> > > >> > > > > > > > > transactions to finish
> > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> > > >> > transaction
> > > >> > > > > > > > > 4. get previous values
> > > >> > > > > > > > > 5. create KVs
> > > >> > > > > > > > > 6. write to Memstore
> > > >> > > > > > > > > 7. write to WAL
> > > >> > > > > > > > > 8. release row lock
> > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
> complete
> > > the
> > > >> > > > > > transaction
> > > >> > > > > > > > >
> > > >> > > > > > > > > A instance of MultiVersionConsistencyControl has a
> > > pending
> > > >> > > queue
> > > >> > > > of
> > > >> > > > > > > > writes
> > > >> > > > > > > > > named writeQueue.
> > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits
> > until
> > > >> > > > writeQueue
> > > >> > > > > > is
> > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9
> > > removes
> > > >> the
> > > >> > > > > > > WriteEntry
> > > >> > > > > > > > > from writeQueue.
> > > >> > > > > > > > >
> > > >> > > > > > > > > I think that when a handler thread is processing
> > between
> > > >> > step 2
> > > >> > > > and
> > > >> > > > > > > step
> > > >> > > > > > > > 9,
> > > >> > > > > > > > > the other handler threads can wait until the thread
> > > >> completes
> > > >> > > > step
> > > >> > > > > 9.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > That is right. We need to read, after all outstanding
> > > >> updates
> > > >> > are
> > > >> > > > > > done...
> > > >> > > > > > > > because we need to read the latest update before we go
> > to
> > > >> > > > > > > modify/increment
> > > >> > > > > > > > it.
> > > >> > > > > > > >
> > > >> > > > > > > > How do you make out this?
> > > >> > > > > > > >
> > > >> > > > > > > > "A region lock (not a row lock) seems to occur in
> > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > >> > > > > > > >
> > > >> > > > > > > > In 0.98.x we did this:
> > > >> > > > > > > >
> > > >> > > > > > > >
> mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > >> > > > > > > >
> > > >> > > > > > > > ... and in 1.0 we do this:
> > > >> > > > > > > >
> > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is
> > > this....
> > > >> > > > > > > >
> > > >> > > > > > > > +  public void waitForPreviousTransactionsComplete() {
> > > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > >> > > > > > > > +  }
> > > >> > > > > > > >
> > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763).
> > > Previous
> > > >> > mvcc
> > > >> > > > and
> > > >> > > > > > > > region
> > > >> > > > > > > > sequenceid would spin independent of each other.
> Perhaps
> > > >> this
> > > >> > > > > > responsible
> > > >> > > > > > > > for some slow down.
> > > >> > > > > > > >
> > > >> > > > > > > > That said, looking in your thread dump, we seem to be
> > down
> > > >> in
> > > >> > the
> > > >> > > > > Get.
> > > >> > > > > > If
> > > >> > > > > > > > you do a bunch of thread dumps in a row, where is the
> > > >> > > lock-holding
> > > >> > > > > > > thread?
> > > >> > > > > > > > In Get or writing Increment... or waiting on sequence
> > id?
> > > >> > > > > > > >
> > > >> > > > > > > > Is it possible you are contending on a counter
> > > post-upgrade?
> > > >> > Is
> > > >> > > it
> > > >> > > > > > > > possible that all these threads are trying to get to
> the
> > > >> same
> > > >> > row
> > > >> > > > to
> > > >> > > > > > > update
> > > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > > >> thinking
> > > >> > > > > increment
> > > >> > > > > > > > itself has slowed significantly?
> > > >> > > > > > > >
> > > >> > > > > > > > St.Ack
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > > Thanks,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Toshihiro Suzuki
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> yuzhihong@gmail.com
> > >:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
> > region):
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >     try {
> > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Thanks
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > > >> brfrn169@gmail.com>
> > > >> > > > wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > Hi,
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > We upgraded our cluster from
> CDH5.3.1(HBase0.98.6)
> > > to
> > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > >> > > > > > > > > > > and we experience slowdown in increment
> operation.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Here's an extract from thread dump of the
> > > >> RegionServer of
> > > >> > > our
> > > >> > > > > > > > cluster:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Thread 68
> > > >> > > > > > >
> (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > >> > > > > > > > > > >   State: BLOCKED
> > > >> > > > > > > > > > >   Blocked count: 21689888
> > > >> > > > > > > > > > >   Waited count: 39828360
> > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > >> > > > > > > > > > >   Blocked by 63
> > > >> > > > > > > > >
> > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > >> > > > > > > > > > >   Stack:
> > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > >
> > > >> > > > >
> > > >> > >
> > > >>
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > >> > > > > > > > > > >
> > > >> > > > > >
> > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > >> > > > > > > > > > >
> > > >> > > > > >
> > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > >> > > > > > > > > > >
> > > >> > > > > > > >
> > > >> > > >
> > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > There are many similar threads in the thread
> dump.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > I read the source code and I think this is
> caused
> > by
> > > >> > > changes
> > > >> > > > of
> > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > >> > > > > > > > > > > A region lock (not a row lock) seems to occur in
> > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Also we wrote performance test code for
> increment
> > > >> > operation
> > > >> > > > > that
> > > >> > > > > > > > > included
> > > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > The result is shown below:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > > >> 7.975072509210629
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > > 49.11840157868772
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Thanks,
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Toshihiro Suzuki
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

The rollback seems to have mostly solved the issue for one of our clusters,
but another one is still seeing long increment times:

"slowIncrementCount": 52080,
"Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
Increment_mean": 465.68678129112396,"Increment_median": 216,"
Increment_75th_percentile": 450.25,"Increment_95th_percentile":
1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998


Any ideas if there are other changes that may be causing a performance
regression for increments between CDH4.7.1 and CDH5.3.8?



On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:

> On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
> > Should this be added as a known issue in the CDH or hbase documentation?
> It
> > was a severe performance hit for us, all of our regionservers were
> sitting
> > at a few thousand queued requests.
> >
> >
> Let me take care of that.
> St.Ack
>
>
>
> > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > bbeaudreault@hubspot.com>
> > wrote:
> >
> > > Yea, they are all over the place and called from client and coprocessor
> > > code. We ended up having no other option but to rollback, and aside
> from
> > a
> > > few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn),
> it
> > > seems to be working and fixing our problem.
> > >
> > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
> > >
> > >> Rollback is untested. No fix in 5.5. I was going to work on this now.
> > >> Where
> > >> are your counters Bryan? In their own column family or scattered about
> > in
> > >> a
> > >> row with other Cell types?
> > >> St.Ack
> > >>
> > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > >> bbeaudreault@hubspot.com> wrote:
> > >>
> > >> > Is there any update to this? We just upgraded all of our production
> > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in
> the
> > >> > known issues, did not not about this.  Now we are seeing perfomance
> > >> issues
> > >> > across all clusters, as we make heavy use of increments.
> > >> >
> > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll
> back
> > >> to
> > >> > CDH 5.3.1 (if that is possible)?
> > >> >
> > >> >
> > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
> > >> >
> > >> > > Thank you St.Ack!
> > >> > >
> > >> > > I would like to follow the ticket.
> > >> > >
> > >> > > Toshihiro Suzuki
> > >> > >
> > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > >> > >
> > >> > > > Back to this problem. Simple tests confirm that as is, the
> > >> > > > single-queue-backed MVCC instance can slow Region ops if some
> > other
> > >> row
> > >> > > is
> > >> > > > slow to complete. In particular Increment, checkAndPut, and
> batch
> > >> > > mutations
> > >> > > > are effected. I opened HBASE-14460 to start in on a fix up. Lets
> > >> see if
> > >> > > we
> > >> > > > can somehow scope mvcc to row or at least shard mvcc so not all
> > >> Region
> > >> > > ops
> > >> > > > are paused.
> > >> > > >
> > >> > > > St.Ack
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com>
> wrote:
> > >> > > >
> > >> > > > > > Thank you for the below reasoning (with accompanying helpful
> > >> > > diagram).
> > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > >> > > illustration.
> > >> > > > It
> > >> > > > > > is as though the mvcc should be scoped to a row only...
> Writes
> > >> > > against
> > >> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> > >> with a
> > >> > > > 'row'
> > >> > > > > > scope so we can see which on-going writes pertain to current
> > >> > > operation?
> > >> > > > > Thank you St.Ack! I think this approach would work.
> > >> > > > >
> > >> > > > > > You need to read back the increment and have it be 'correct'
> > at
> > >> > > > increment
> > >> > > > > > time?
> > >> > > > > Yes, we need it.
> > >> > > > >
> > >> > > > > I would like to help if there is anything I can do.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Toshihiro Suzuki
> > >> > > > >
> > >> > > > >
> > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > >> > > > >
> > >> > > > > > Thank you for the below reasoning (with accompanying helpful
> > >> > > diagram).
> > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > >> > > illustration.
> > >> > > > It
> > >> > > > > > is as though the mvcc should be scoped to a row only...
> Writes
> > >> > > against
> > >> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> > >> with a
> > >> > > > 'row'
> > >> > > > > > scope so we can see which on-going writes pertain to current
> > >> > > operation?
> > >> > > > > >
> > >> > > > > > You need to read back the increment and have it be 'correct'
> > at
> > >> > > > increment
> > >> > > > > > time?
> > >> > > > > >
> > >> > > > > > (This is a good one)
> > >> > > > > >
> > >> > > > > > Thank you Toshihiro Suzuki
> > >> > > > > > St.Ack
> > >> > > > > >
> > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com>
> > >> wrote:
> > >> > > > > >
> > >> > > > > > > St.Ack,
> > >> > > > > > >
> > >> > > > > > > Thank you for your response.
> > >> > > > > > >
> > >> > > > > > > Why I make out that "A region lock (not a row lock) seems
> to
> > >> > occur
> > >> > > in
> > >> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
> > >> > > > > > >
> > >> > > > > > > A increment operation has 3 procedures for MVCC.
> > >> > > > > > >
> > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > >> > > > > > >
> > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > >> > > > > > >
> > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > I think that MultiVersionConsistencyControl's writeQueue
> can
> > >> > cause
> > >> > > a
> > >> > > > > > region
> > >> > > > > > > lock.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > >> > > > > > >
> > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > >> > > > > > >
> > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> advanceMemstore(w)
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > >> > > > > > >
> > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
> > >> writeQueue
> > >> > > and
> > >> > > > > > waits
> > >> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > I think when a handler thread is processing between step 2
> > and
> > >> > step
> > >> > > > 3,
> > >> > > > > > the
> > >> > > > > > > other handler threads can wait at step 1 until the thread
> > >> > completes
> > >> > > > > step
> > >> > > > > > 3
> > >> > > > > > > This is depicted as follows:
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Actually, in the thread dump of our region server, many
> > >> handler
> > >> > > > threads
> > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > >> > > > > > >
> > >> > > > > > > Many handler threads wait at this:
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > > Is it possible you are contending on a counter
> > post-upgrade?
> > >> > Is
> > >> > > it
> > >> > > > > > > > possible that all these threads are trying to get to the
> > >> same
> > >> > row
> > >> > > > to
> > >> > > > > > > update
> > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > >> thinking
> > >> > > > > increment
> > >> > > > > > > > itself has slowed significantly?
> > >> > > > > > > We have just upgraded HBase, not changed the app behavior.
> > We
> > >> are
> > >> > > > > > thinking
> > >> > > > > > > increment itself has slowed significantly.
> > >> > > > > > > Before upgrading HBase, it was good throughput and
> latency.
> > >> > > > > > > Currently, to cope with this problem, we split the regions
> > >> > finely.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > >
> > >> > > > > > > Toshihiro Suzuki
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > >> > > > > > >
> > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> brfrn169@gmail.com
> > >
> > >> > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Ted,
> > >> > > > > > > > >
> > >> > > > > > > > > Thank you for your response.
> > >> > > > > > > > >
> > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> > >> > > > > > > > >
> > >> > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > I think that increment operation works as follows:
> > >> > > > > > > > >
> > >> > > > > > > > > 1. get row lock
> > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait
> > for
> > >> all
> > >> > > > prior
> > >> > > > > > > MVCC
> > >> > > > > > > > > transactions to finish
> > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> > >> > transaction
> > >> > > > > > > > > 4. get previous values
> > >> > > > > > > > > 5. create KVs
> > >> > > > > > > > > 6. write to Memstore
> > >> > > > > > > > > 7. write to WAL
> > >> > > > > > > > > 8. release row lock
> > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete
> > the
> > >> > > > > > transaction
> > >> > > > > > > > >
> > >> > > > > > > > > A instance of MultiVersionConsistencyControl has a
> > pending
> > >> > > queue
> > >> > > > of
> > >> > > > > > > > writes
> > >> > > > > > > > > named writeQueue.
> > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits
> until
> > >> > > > writeQueue
> > >> > > > > > is
> > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9
> > removes
> > >> the
> > >> > > > > > > WriteEntry
> > >> > > > > > > > > from writeQueue.
> > >> > > > > > > > >
> > >> > > > > > > > > I think that when a handler thread is processing
> between
> > >> > step 2
> > >> > > > and
> > >> > > > > > > step
> > >> > > > > > > > 9,
> > >> > > > > > > > > the other handler threads can wait until the thread
> > >> completes
> > >> > > > step
> > >> > > > > 9.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > That is right. We need to read, after all outstanding
> > >> updates
> > >> > are
> > >> > > > > > done...
> > >> > > > > > > > because we need to read the latest update before we go
> to
> > >> > > > > > > modify/increment
> > >> > > > > > > > it.
> > >> > > > > > > >
> > >> > > > > > > > How do you make out this?
> > >> > > > > > > >
> > >> > > > > > > > "A region lock (not a row lock) seems to occur in
> > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > >> > > > > > > >
> > >> > > > > > > > In 0.98.x we did this:
> > >> > > > > > > >
> > >> > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > >> > > > > > > >
> > >> > > > > > > > ... and in 1.0 we do this:
> > >> > > > > > > >
> > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is
> > this....
> > >> > > > > > > >
> > >> > > > > > > > +  public void waitForPreviousTransactionsComplete() {
> > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > >> > > > > > > > +  }
> > >> > > > > > > >
> > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763).
> > Previous
> > >> > mvcc
> > >> > > > and
> > >> > > > > > > > region
> > >> > > > > > > > sequenceid would spin independent of each other. Perhaps
> > >> this
> > >> > > > > > responsible
> > >> > > > > > > > for some slow down.
> > >> > > > > > > >
> > >> > > > > > > > That said, looking in your thread dump, we seem to be
> down
> > >> in
> > >> > the
> > >> > > > > Get.
> > >> > > > > > If
> > >> > > > > > > > you do a bunch of thread dumps in a row, where is the
> > >> > > lock-holding
> > >> > > > > > > thread?
> > >> > > > > > > > In Get or writing Increment... or waiting on sequence
> id?
> > >> > > > > > > >
> > >> > > > > > > > Is it possible you are contending on a counter
> > post-upgrade?
> > >> > Is
> > >> > > it
> > >> > > > > > > > possible that all these threads are trying to get to the
> > >> same
> > >> > row
> > >> > > > to
> > >> > > > > > > update
> > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > >> thinking
> > >> > > > > increment
> > >> > > > > > > > itself has slowed significantly?
> > >> > > > > > > >
> > >> > > > > > > > St.Ack
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > > Thanks,
> > >> > > > > > > > >
> > >> > > > > > > > > Toshihiro Suzuki
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yuzhihong@gmail.com
> >:
> > >> > > > > > > > >
> > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
> region):
> > >> > > > > > > > > >
> > >> > > > > > > > > >     try {
> > >> > > > > > > > > >       rowLock = getRowLock(row);
> > >> > > > > > > > > >
> > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> > >> > > > > > > > > >
> > >> > > > > > > > > > Thanks
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > >> brfrn169@gmail.com>
> > >> > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Hi,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6)
> > to
> > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > >> > > > > > > > > > > and we experience slowdown in increment operation.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Here's an extract from thread dump of the
> > >> RegionServer of
> > >> > > our
> > >> > > > > > > > cluster:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Thread 68
> > >> > > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > >> > > > > > > > > > >   State: BLOCKED
> > >> > > > > > > > > > >   Blocked count: 21689888
> > >> > > > > > > > > > >   Waited count: 39828360
> > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > >> > > > > > > > > > >   Blocked by 63
> > >> > > > > > > > >
> > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > >> > > > > > > > > > >   Stack:
> > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > >
> > >> > > > >
> > >> > >
> > >>
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > >> > > > > > > > > > >
> > >> > > > > >
> > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > >> > > > > > > > > > >
> > >> > > > > >
> > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > >> > > > > > > > > > >
> > >> > > > > > > >
> > >> > > >
> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > There are many similar threads in the thread dump.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I read the source code and I think this is caused
> by
> > >> > > changes
> > >> > > > of
> > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > >> > > > > > > > > > > A region lock (not a row lock) seems to occur in
> > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Also we wrote performance test code for increment
> > >> > operation
> > >> > > > > that
> > >> > > > > > > > > included
> > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > The result is shown below:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > >> 7.975072509210629
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > 49.11840157868772
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Thanks,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Toshihiro Suzuki
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

The rollback seems to have mostly solved the issue for one of our clusters,
but another one is still seeing long increment times:

"slowIncrementCount": 52080,
"Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
Increment_mean": 465.68678129112396,"Increment_median": 216,"
Increment_75th_percentile": 450.25,"Increment_95th_percentile":
1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998


Any ideas if there are other changes that may be causing a performance
regression for increments between CDH4.7.1 and CDH5.3.8?



On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote:

> On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
> > Should this be added as a known issue in the CDH or hbase documentation?
> It
> > was a severe performance hit for us, all of our regionservers were
> sitting
> > at a few thousand queued requests.
> >
> >
> Let me take care of that.
> St.Ack
>
>
>
> > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > bbeaudreault@hubspot.com>
> > wrote:
> >
> > > Yea, they are all over the place and called from client and coprocessor
> > > code. We ended up having no other option but to rollback, and aside
> from
> > a
> > > few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn),
> it
> > > seems to be working and fixing our problem.
> > >
> > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
> > >
> > >> Rollback is untested. No fix in 5.5. I was going to work on this now.
> > >> Where
> > >> are your counters Bryan? In their own column family or scattered about
> > in
> > >> a
> > >> row with other Cell types?
> > >> St.Ack
> > >>
> > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > >> bbeaudreault@hubspot.com> wrote:
> > >>
> > >> > Is there any update to this? We just upgraded all of our production
> > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in
> the
> > >> > known issues, did not not about this.  Now we are seeing perfomance
> > >> issues
> > >> > across all clusters, as we make heavy use of increments.
> > >> >
> > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll
> back
> > >> to
> > >> > CDH 5.3.1 (if that is possible)?
> > >> >
> > >> >
> > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
> > >> >
> > >> > > Thank you St.Ack!
> > >> > >
> > >> > > I would like to follow the ticket.
> > >> > >
> > >> > > Toshihiro Suzuki
> > >> > >
> > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > >> > >
> > >> > > > Back to this problem. Simple tests confirm that as is, the
> > >> > > > single-queue-backed MVCC instance can slow Region ops if some
> > other
> > >> row
> > >> > > is
> > >> > > > slow to complete. In particular Increment, checkAndPut, and
> batch
> > >> > > mutations
> > >> > > > are effected. I opened HBASE-14460 to start in on a fix up. Lets
> > >> see if
> > >> > > we
> > >> > > > can somehow scope mvcc to row or at least shard mvcc so not all
> > >> Region
> > >> > > ops
> > >> > > > are paused.
> > >> > > >
> > >> > > > St.Ack
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com>
> wrote:
> > >> > > >
> > >> > > > > > Thank you for the below reasoning (with accompanying helpful
> > >> > > diagram).
> > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > >> > > illustration.
> > >> > > > It
> > >> > > > > > is as though the mvcc should be scoped to a row only...
> Writes
> > >> > > against
> > >> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> > >> with a
> > >> > > > 'row'
> > >> > > > > > scope so we can see which on-going writes pertain to current
> > >> > > operation?
> > >> > > > > Thank you St.Ack! I think this approach would work.
> > >> > > > >
> > >> > > > > > You need to read back the increment and have it be 'correct'
> > at
> > >> > > > increment
> > >> > > > > > time?
> > >> > > > > Yes, we need it.
> > >> > > > >
> > >> > > > > I would like to help if there is anything I can do.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Toshihiro Suzuki
> > >> > > > >
> > >> > > > >
> > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > >> > > > >
> > >> > > > > > Thank you for the below reasoning (with accompanying helpful
> > >> > > diagram).
> > >> > > > > > Makes sense. Let me hack up a test case to help with the
> > >> > > illustration.
> > >> > > > It
> > >> > > > > > is as though the mvcc should be scoped to a row only...
> Writes
> > >> > > against
> > >> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> > >> with a
> > >> > > > 'row'
> > >> > > > > > scope so we can see which on-going writes pertain to current
> > >> > > operation?
> > >> > > > > >
> > >> > > > > > You need to read back the increment and have it be 'correct'
> > at
> > >> > > > increment
> > >> > > > > > time?
> > >> > > > > >
> > >> > > > > > (This is a good one)
> > >> > > > > >
> > >> > > > > > Thank you Toshihiro Suzuki
> > >> > > > > > St.Ack
> > >> > > > > >
> > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com>
> > >> wrote:
> > >> > > > > >
> > >> > > > > > > St.Ack,
> > >> > > > > > >
> > >> > > > > > > Thank you for your response.
> > >> > > > > > >
> > >> > > > > > > Why I make out that "A region lock (not a row lock) seems
> to
> > >> > occur
> > >> > > in
> > >> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
> > >> > > > > > >
> > >> > > > > > > A increment operation has 3 procedures for MVCC.
> > >> > > > > > >
> > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > >> > > > > > >
> > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > >> > > > > > >
> > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > I think that MultiVersionConsistencyControl's writeQueue
> can
> > >> > cause
> > >> > > a
> > >> > > > > > region
> > >> > > > > > > lock.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > >> > > > > > >
> > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > >> > > > > > >
> > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> advanceMemstore(w)
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > >> > > > > > >
> > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
> > >> writeQueue
> > >> > > and
> > >> > > > > > waits
> > >> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > I think when a handler thread is processing between step 2
> > and
> > >> > step
> > >> > > > 3,
> > >> > > > > > the
> > >> > > > > > > other handler threads can wait at step 1 until the thread
> > >> > completes
> > >> > > > > step
> > >> > > > > > 3
> > >> > > > > > > This is depicted as follows:
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Actually, in the thread dump of our region server, many
> > >> handler
> > >> > > > threads
> > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > >> > > > > > >
> > >> > > > > > > Many handler threads wait at this:
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > > Is it possible you are contending on a counter
> > post-upgrade?
> > >> > Is
> > >> > > it
> > >> > > > > > > > possible that all these threads are trying to get to the
> > >> same
> > >> > row
> > >> > > > to
> > >> > > > > > > update
> > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > >> thinking
> > >> > > > > increment
> > >> > > > > > > > itself has slowed significantly?
> > >> > > > > > > We have just upgraded HBase, not changed the app behavior.
> > We
> > >> are
> > >> > > > > > thinking
> > >> > > > > > > increment itself has slowed significantly.
> > >> > > > > > > Before upgrading HBase, it was good throughput and
> latency.
> > >> > > > > > > Currently, to cope with this problem, we split the regions
> > >> > finely.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > >
> > >> > > > > > > Toshihiro Suzuki
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > >> > > > > > >
> > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> brfrn169@gmail.com
> > >
> > >> > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Ted,
> > >> > > > > > > > >
> > >> > > > > > > > > Thank you for your response.
> > >> > > > > > > > >
> > >> > > > > > > > > I uploaded the complete stack trace to Gist.
> > >> > > > > > > > >
> > >> > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > I think that increment operation works as follows:
> > >> > > > > > > > >
> > >> > > > > > > > > 1. get row lock
> > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait
> > for
> > >> all
> > >> > > > prior
> > >> > > > > > > MVCC
> > >> > > > > > > > > transactions to finish
> > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> > >> > transaction
> > >> > > > > > > > > 4. get previous values
> > >> > > > > > > > > 5. create KVs
> > >> > > > > > > > > 6. write to Memstore
> > >> > > > > > > > > 7. write to WAL
> > >> > > > > > > > > 8. release row lock
> > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete
> > the
> > >> > > > > > transaction
> > >> > > > > > > > >
> > >> > > > > > > > > A instance of MultiVersionConsistencyControl has a
> > pending
> > >> > > queue
> > >> > > > of
> > >> > > > > > > > writes
> > >> > > > > > > > > named writeQueue.
> > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits
> until
> > >> > > > writeQueue
> > >> > > > > > is
> > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9
> > removes
> > >> the
> > >> > > > > > > WriteEntry
> > >> > > > > > > > > from writeQueue.
> > >> > > > > > > > >
> > >> > > > > > > > > I think that when a handler thread is processing
> between
> > >> > step 2
> > >> > > > and
> > >> > > > > > > step
> > >> > > > > > > > 9,
> > >> > > > > > > > > the other handler threads can wait until the thread
> > >> completes
> > >> > > > step
> > >> > > > > 9.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > That is right. We need to read, after all outstanding
> > >> updates
> > >> > are
> > >> > > > > > done...
> > >> > > > > > > > because we need to read the latest update before we go
> to
> > >> > > > > > > modify/increment
> > >> > > > > > > > it.
> > >> > > > > > > >
> > >> > > > > > > > How do you make out this?
> > >> > > > > > > >
> > >> > > > > > > > "A region lock (not a row lock) seems to occur in
> > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > >> > > > > > > >
> > >> > > > > > > > In 0.98.x we did this:
> > >> > > > > > > >
> > >> > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > >> > > > > > > >
> > >> > > > > > > > ... and in 1.0 we do this:
> > >> > > > > > > >
> > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is
> > this....
> > >> > > > > > > >
> > >> > > > > > > > +  public void waitForPreviousTransactionsComplete() {
> > >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > >> > > > > > > > +  }
> > >> > > > > > > >
> > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763).
> > Previous
> > >> > mvcc
> > >> > > > and
> > >> > > > > > > > region
> > >> > > > > > > > sequenceid would spin independent of each other. Perhaps
> > >> this
> > >> > > > > > responsible
> > >> > > > > > > > for some slow down.
> > >> > > > > > > >
> > >> > > > > > > > That said, looking in your thread dump, we seem to be
> down
> > >> in
> > >> > the
> > >> > > > > Get.
> > >> > > > > > If
> > >> > > > > > > > you do a bunch of thread dumps in a row, where is the
> > >> > > lock-holding
> > >> > > > > > > thread?
> > >> > > > > > > > In Get or writing Increment... or waiting on sequence
> id?
> > >> > > > > > > >
> > >> > > > > > > > Is it possible you are contending on a counter
> > post-upgrade?
> > >> > Is
> > >> > > it
> > >> > > > > > > > possible that all these threads are trying to get to the
> > >> same
> > >> > row
> > >> > > > to
> > >> > > > > > > update
> > >> > > > > > > > it? Could the app behavior have changed?  Or are you
> > >> thinking
> > >> > > > > increment
> > >> > > > > > > > itself has slowed significantly?
> > >> > > > > > > >
> > >> > > > > > > > St.Ack
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > > Thanks,
> > >> > > > > > > > >
> > >> > > > > > > > > Toshihiro Suzuki
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yuzhihong@gmail.com
> >:
> > >> > > > > > > > >
> > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
> region):
> > >> > > > > > > > > >
> > >> > > > > > > > > >     try {
> > >> > > > > > > > > >       rowLock = getRowLock(row);
> > >> > > > > > > > > >
> > >> > > > > > > > > > Can you pastebin the complete stack trace ?
> > >> > > > > > > > > >
> > >> > > > > > > > > > Thanks
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> > >> brfrn169@gmail.com>
> > >> > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Hi,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6)
> > to
> > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > >> > > > > > > > > > > and we experience slowdown in increment operation.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Here's an extract from thread dump of the
> > >> RegionServer of
> > >> > > our
> > >> > > > > > > > cluster:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Thread 68
> > >> > > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > >> > > > > > > > > > >   State: BLOCKED
> > >> > > > > > > > > > >   Blocked count: 21689888
> > >> > > > > > > > > > >   Waited count: 39828360
> > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > >> > > > > > > > > > >   Blocked by 63
> > >> > > > > > > > >
> > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > >> > > > > > > > > > >   Stack:
> > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > >
> > >> > > > >
> > >> > >
> > >>
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > >> > > > > > > > > > >
> > >> > > > > >
> > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > >> > > > > > > > > > >
> > >> > > > > >
> > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > >> > > > > > > > > > >
> > >> > > > > > > >
> > >> > > >
> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > There are many similar threads in the thread dump.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I read the source code and I think this is caused
> by
> > >> > > changes
> > >> > > > of
> > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > >> > > > > > > > > > > A region lock (not a row lock) seems to occur in
> > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Also we wrote performance test code for increment
> > >> > operation
> > >> > > > > that
> > >> > > > > > > > > included
> > >> > > > > > > > > > > 100 threads and ran it in local mode.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > The result is shown below:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> > >> 7.975072509210629
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> > 49.11840157868772
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Thanks,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Toshihiro Suzuki
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> Should this be added as a known issue in the CDH or hbase documentation? It
> was a severe performance hit for us, all of our regionservers were sitting
> at a few thousand queued requests.
>
>
Let me take care of that.
St.Ack



> On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com>
> wrote:
>
> > Yea, they are all over the place and called from client and coprocessor
> > code. We ended up having no other option but to rollback, and aside from
> a
> > few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn), it
> > seems to be working and fixing our problem.
> >
> > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
> >
> >> Rollback is untested. No fix in 5.5. I was going to work on this now.
> >> Where
> >> are your counters Bryan? In their own column family or scattered about
> in
> >> a
> >> row with other Cell types?
> >> St.Ack
> >>
> >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> >> bbeaudreault@hubspot.com> wrote:
> >>
> >> > Is there any update to this? We just upgraded all of our production
> >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
> >> > known issues, did not not about this.  Now we are seeing perfomance
> >> issues
> >> > across all clusters, as we make heavy use of increments.
> >> >
> >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back
> >> to
> >> > CDH 5.3.1 (if that is possible)?
> >> >
> >> >
> >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
> >> >
> >> > > Thank you St.Ack!
> >> > >
> >> > > I would like to follow the ticket.
> >> > >
> >> > > Toshihiro Suzuki
> >> > >
> >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> >> > >
> >> > > > Back to this problem. Simple tests confirm that as is, the
> >> > > > single-queue-backed MVCC instance can slow Region ops if some
> other
> >> row
> >> > > is
> >> > > > slow to complete. In particular Increment, checkAndPut, and batch
> >> > > mutations
> >> > > > are effected. I opened HBASE-14460 to start in on a fix up. Lets
> >> see if
> >> > > we
> >> > > > can somehow scope mvcc to row or at least shard mvcc so not all
> >> Region
> >> > > ops
> >> > > > are paused.
> >> > > >
> >> > > > St.Ack
> >> > > >
> >> > > >
> >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> >> > > >
> >> > > > > > Thank you for the below reasoning (with accompanying helpful
> >> > > diagram).
> >> > > > > > Makes sense. Let me hack up a test case to help with the
> >> > > illustration.
> >> > > > It
> >> > > > > > is as though the mvcc should be scoped to a row only... Writes
> >> > > against
> >> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> >> with a
> >> > > > 'row'
> >> > > > > > scope so we can see which on-going writes pertain to current
> >> > > operation?
> >> > > > > Thank you St.Ack! I think this approach would work.
> >> > > > >
> >> > > > > > You need to read back the increment and have it be 'correct'
> at
> >> > > > increment
> >> > > > > > time?
> >> > > > > Yes, we need it.
> >> > > > >
> >> > > > > I would like to help if there is anything I can do.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Toshihiro Suzuki
> >> > > > >
> >> > > > >
> >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> >> > > > >
> >> > > > > > Thank you for the below reasoning (with accompanying helpful
> >> > > diagram).
> >> > > > > > Makes sense. Let me hack up a test case to help with the
> >> > > illustration.
> >> > > > It
> >> > > > > > is as though the mvcc should be scoped to a row only... Writes
> >> > > against
> >> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> >> with a
> >> > > > 'row'
> >> > > > > > scope so we can see which on-going writes pertain to current
> >> > > operation?
> >> > > > > >
> >> > > > > > You need to read back the increment and have it be 'correct'
> at
> >> > > > increment
> >> > > > > > time?
> >> > > > > >
> >> > > > > > (This is a good one)
> >> > > > > >
> >> > > > > > Thank you Toshihiro Suzuki
> >> > > > > > St.Ack
> >> > > > > >
> >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com>
> >> wrote:
> >> > > > > >
> >> > > > > > > St.Ack,
> >> > > > > > >
> >> > > > > > > Thank you for your response.
> >> > > > > > >
> >> > > > > > > Why I make out that "A region lock (not a row lock) seems to
> >> > occur
> >> > > in
> >> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
> >> > > > > > >
> >> > > > > > > A increment operation has 3 procedures for MVCC.
> >> > > > > > >
> >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> >> > > > > > >
> >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> >> > > > > > >
> >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > I think that MultiVersionConsistencyControl's writeQueue can
> >> > cause
> >> > > a
> >> > > > > > region
> >> > > > > > > lock.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> >> > > > > > >
> >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> >> > > > > > >
> >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> >> > > > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> >> > > > > > >
> >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
> >> writeQueue
> >> > > and
> >> > > > > > waits
> >> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > I think when a handler thread is processing between step 2
> and
> >> > step
> >> > > > 3,
> >> > > > > > the
> >> > > > > > > other handler threads can wait at step 1 until the thread
> >> > completes
> >> > > > > step
> >> > > > > > 3
> >> > > > > > > This is depicted as follows:
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Actually, in the thread dump of our region server, many
> >> handler
> >> > > > threads
> >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> >> > > > > > > (waitForPreviousTransactionsComplete()).
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> >> > > > > > >
> >> > > > > > > Many handler threads wait at this:
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > > Is it possible you are contending on a counter
> post-upgrade?
> >> > Is
> >> > > it
> >> > > > > > > > possible that all these threads are trying to get to the
> >> same
> >> > row
> >> > > > to
> >> > > > > > > update
> >> > > > > > > > it? Could the app behavior have changed?  Or are you
> >> thinking
> >> > > > > increment
> >> > > > > > > > itself has slowed significantly?
> >> > > > > > > We have just upgraded HBase, not changed the app behavior.
> We
> >> are
> >> > > > > > thinking
> >> > > > > > > increment itself has slowed significantly.
> >> > > > > > > Before upgrading HBase, it was good throughput and latency.
> >> > > > > > > Currently, to cope with this problem, we split the regions
> >> > finely.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > >
> >> > > > > > > Toshihiro Suzuki
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> >> > > > > > >
> >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <brfrn169@gmail.com
> >
> >> > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Ted,
> >> > > > > > > > >
> >> > > > > > > > > Thank you for your response.
> >> > > > > > > > >
> >> > > > > > > > > I uploaded the complete stack trace to Gist.
> >> > > > > > > > >
> >> > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > I think that increment operation works as follows:
> >> > > > > > > > >
> >> > > > > > > > > 1. get row lock
> >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait
> for
> >> all
> >> > > > prior
> >> > > > > > > MVCC
> >> > > > > > > > > transactions to finish
> >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> >> > transaction
> >> > > > > > > > > 4. get previous values
> >> > > > > > > > > 5. create KVs
> >> > > > > > > > > 6. write to Memstore
> >> > > > > > > > > 7. write to WAL
> >> > > > > > > > > 8. release row lock
> >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete
> the
> >> > > > > > transaction
> >> > > > > > > > >
> >> > > > > > > > > A instance of MultiVersionConsistencyControl has a
> pending
> >> > > queue
> >> > > > of
> >> > > > > > > > writes
> >> > > > > > > > > named writeQueue.
> >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
> >> > > > writeQueue
> >> > > > > > is
> >> > > > > > > > > empty or writeQueue.getFirst() == w.
> >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9
> removes
> >> the
> >> > > > > > > WriteEntry
> >> > > > > > > > > from writeQueue.
> >> > > > > > > > >
> >> > > > > > > > > I think that when a handler thread is processing between
> >> > step 2
> >> > > > and
> >> > > > > > > step
> >> > > > > > > > 9,
> >> > > > > > > > > the other handler threads can wait until the thread
> >> completes
> >> > > > step
> >> > > > > 9.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > That is right. We need to read, after all outstanding
> >> updates
> >> > are
> >> > > > > > done...
> >> > > > > > > > because we need to read the latest update before we go to
> >> > > > > > > modify/increment
> >> > > > > > > > it.
> >> > > > > > > >
> >> > > > > > > > How do you make out this?
> >> > > > > > > >
> >> > > > > > > > "A region lock (not a row lock) seems to occur in
> >> > > > > > > > waitForPreviousTransactionsComplete()."
> >> > > > > > > >
> >> > > > > > > > In 0.98.x we did this:
> >> > > > > > > >
> >> > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> >> > > > > > > >
> >> > > > > > > > ... and in 1.0 we do this:
> >> > > > > > > >
> >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is
> this....
> >> > > > > > > >
> >> > > > > > > > +  public void waitForPreviousTransactionsComplete() {
> >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> >> > > > > > > > +  }
> >> > > > > > > >
> >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763).
> Previous
> >> > mvcc
> >> > > > and
> >> > > > > > > > region
> >> > > > > > > > sequenceid would spin independent of each other. Perhaps
> >> this
> >> > > > > > responsible
> >> > > > > > > > for some slow down.
> >> > > > > > > >
> >> > > > > > > > That said, looking in your thread dump, we seem to be down
> >> in
> >> > the
> >> > > > > Get.
> >> > > > > > If
> >> > > > > > > > you do a bunch of thread dumps in a row, where is the
> >> > > lock-holding
> >> > > > > > > thread?
> >> > > > > > > > In Get or writing Increment... or waiting on sequence id?
> >> > > > > > > >
> >> > > > > > > > Is it possible you are contending on a counter
> post-upgrade?
> >> > Is
> >> > > it
> >> > > > > > > > possible that all these threads are trying to get to the
> >> same
> >> > row
> >> > > > to
> >> > > > > > > update
> >> > > > > > > > it? Could the app behavior have changed?  Or are you
> >> thinking
> >> > > > > increment
> >> > > > > > > > itself has slowed significantly?
> >> > > > > > > >
> >> > > > > > > > St.Ack
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > > Thanks,
> >> > > > > > > > >
> >> > > > > > > > > Toshihiro Suzuki
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> >> > > > > > > > >
> >> > > > > > > > > > In HRegion#increment(), we lock the row (not region):
> >> > > > > > > > > >
> >> > > > > > > > > >     try {
> >> > > > > > > > > >       rowLock = getRowLock(row);
> >> > > > > > > > > >
> >> > > > > > > > > > Can you pastebin the complete stack trace ?
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks
> >> > > > > > > > > >
> >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> >> brfrn169@gmail.com>
> >> > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Hi,
> >> > > > > > > > > > >
> >> > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6)
> to
> >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> >> > > > > > > > > > > and we experience slowdown in increment operation.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Here's an extract from thread dump of the
> >> RegionServer of
> >> > > our
> >> > > > > > > > cluster:
> >> > > > > > > > > > >
> >> > > > > > > > > > > Thread 68
> >> > > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> >> > > > > > > > > > >   State: BLOCKED
> >> > > > > > > > > > >   Blocked count: 21689888
> >> > > > > > > > > > >   Waited count: 39828360
> >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> >> > > > > > > > > > >   Blocked by 63
> >> > > > > > > > >
> (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> >> > > > > > > > > > >   Stack:
> >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > >
> >> > > > >
> >> > >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> >> > > > > > > > > > >
> >> > > > > >
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> >> > > > > > > > > > >
> >> > > > > >
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> >> > > > > > > > > > >
> >> > > > > > > >
> >> > > >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> >> > > > > > > > > > >
> >> > > > > > > > > > > There are many similar threads in the thread dump.
> >> > > > > > > > > > >
> >> > > > > > > > > > > I read the source code and I think this is caused by
> >> > > changes
> >> > > > of
> >> > > > > > > > > > > MultiVersionConsistencyControl.
> >> > > > > > > > > > > A region lock (not a row lock) seems to occur in
> >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Also we wrote performance test code for increment
> >> > operation
> >> > > > > that
> >> > > > > > > > > included
> >> > > > > > > > > > > 100 threads and ran it in local mode.
> >> > > > > > > > > > >
> >> > > > > > > > > > > The result is shown below:
> >> > > > > > > > > > >
> >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> >> 7.975072509210629
> >> > > > > > > > > > >
> >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> 49.11840157868772
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Thanks,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Toshihiro Suzuki
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> Should this be added as a known issue in the CDH or hbase documentation? It
> was a severe performance hit for us, all of our regionservers were sitting
> at a few thousand queued requests.
>
>
Let me take care of that.
St.Ack



> On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com>
> wrote:
>
> > Yea, they are all over the place and called from client and coprocessor
> > code. We ended up having no other option but to rollback, and aside from
> a
> > few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn), it
> > seems to be working and fixing our problem.
> >
> > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
> >
> >> Rollback is untested. No fix in 5.5. I was going to work on this now.
> >> Where
> >> are your counters Bryan? In their own column family or scattered about
> in
> >> a
> >> row with other Cell types?
> >> St.Ack
> >>
> >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> >> bbeaudreault@hubspot.com> wrote:
> >>
> >> > Is there any update to this? We just upgraded all of our production
> >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
> >> > known issues, did not not about this.  Now we are seeing perfomance
> >> issues
> >> > across all clusters, as we make heavy use of increments.
> >> >
> >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back
> >> to
> >> > CDH 5.3.1 (if that is possible)?
> >> >
> >> >
> >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
> >> >
> >> > > Thank you St.Ack!
> >> > >
> >> > > I would like to follow the ticket.
> >> > >
> >> > > Toshihiro Suzuki
> >> > >
> >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> >> > >
> >> > > > Back to this problem. Simple tests confirm that as is, the
> >> > > > single-queue-backed MVCC instance can slow Region ops if some
> other
> >> row
> >> > > is
> >> > > > slow to complete. In particular Increment, checkAndPut, and batch
> >> > > mutations
> >> > > > are effected. I opened HBASE-14460 to start in on a fix up. Lets
> >> see if
> >> > > we
> >> > > > can somehow scope mvcc to row or at least shard mvcc so not all
> >> Region
> >> > > ops
> >> > > > are paused.
> >> > > >
> >> > > > St.Ack
> >> > > >
> >> > > >
> >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> >> > > >
> >> > > > > > Thank you for the below reasoning (with accompanying helpful
> >> > > diagram).
> >> > > > > > Makes sense. Let me hack up a test case to help with the
> >> > > illustration.
> >> > > > It
> >> > > > > > is as though the mvcc should be scoped to a row only... Writes
> >> > > against
> >> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> >> with a
> >> > > > 'row'
> >> > > > > > scope so we can see which on-going writes pertain to current
> >> > > operation?
> >> > > > > Thank you St.Ack! I think this approach would work.
> >> > > > >
> >> > > > > > You need to read back the increment and have it be 'correct'
> at
> >> > > > increment
> >> > > > > > time?
> >> > > > > Yes, we need it.
> >> > > > >
> >> > > > > I would like to help if there is anything I can do.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Toshihiro Suzuki
> >> > > > >
> >> > > > >
> >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> >> > > > >
> >> > > > > > Thank you for the below reasoning (with accompanying helpful
> >> > > diagram).
> >> > > > > > Makes sense. Let me hack up a test case to help with the
> >> > > illustration.
> >> > > > It
> >> > > > > > is as though the mvcc should be scoped to a row only... Writes
> >> > > against
> >> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> >> with a
> >> > > > 'row'
> >> > > > > > scope so we can see which on-going writes pertain to current
> >> > > operation?
> >> > > > > >
> >> > > > > > You need to read back the increment and have it be 'correct'
> at
> >> > > > increment
> >> > > > > > time?
> >> > > > > >
> >> > > > > > (This is a good one)
> >> > > > > >
> >> > > > > > Thank you Toshihiro Suzuki
> >> > > > > > St.Ack
> >> > > > > >
> >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com>
> >> wrote:
> >> > > > > >
> >> > > > > > > St.Ack,
> >> > > > > > >
> >> > > > > > > Thank you for your response.
> >> > > > > > >
> >> > > > > > > Why I make out that "A region lock (not a row lock) seems to
> >> > occur
> >> > > in
> >> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
> >> > > > > > >
> >> > > > > > > A increment operation has 3 procedures for MVCC.
> >> > > > > > >
> >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> >> > > > > > >
> >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> >> > > > > > >
> >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > I think that MultiVersionConsistencyControl's writeQueue can
> >> > cause
> >> > > a
> >> > > > > > region
> >> > > > > > > lock.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> >> > > > > > >
> >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> >> > > > > > >
> >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> >> > > > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> >> > > > > > >
> >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
> >> writeQueue
> >> > > and
> >> > > > > > waits
> >> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > I think when a handler thread is processing between step 2
> and
> >> > step
> >> > > > 3,
> >> > > > > > the
> >> > > > > > > other handler threads can wait at step 1 until the thread
> >> > completes
> >> > > > > step
> >> > > > > > 3
> >> > > > > > > This is depicted as follows:
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Actually, in the thread dump of our region server, many
> >> handler
> >> > > > threads
> >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> >> > > > > > > (waitForPreviousTransactionsComplete()).
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> >> > > > > > >
> >> > > > > > > Many handler threads wait at this:
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > > Is it possible you are contending on a counter
> post-upgrade?
> >> > Is
> >> > > it
> >> > > > > > > > possible that all these threads are trying to get to the
> >> same
> >> > row
> >> > > > to
> >> > > > > > > update
> >> > > > > > > > it? Could the app behavior have changed?  Or are you
> >> thinking
> >> > > > > increment
> >> > > > > > > > itself has slowed significantly?
> >> > > > > > > We have just upgraded HBase, not changed the app behavior.
> We
> >> are
> >> > > > > > thinking
> >> > > > > > > increment itself has slowed significantly.
> >> > > > > > > Before upgrading HBase, it was good throughput and latency.
> >> > > > > > > Currently, to cope with this problem, we split the regions
> >> > finely.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > >
> >> > > > > > > Toshihiro Suzuki
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> >> > > > > > >
> >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <brfrn169@gmail.com
> >
> >> > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Ted,
> >> > > > > > > > >
> >> > > > > > > > > Thank you for your response.
> >> > > > > > > > >
> >> > > > > > > > > I uploaded the complete stack trace to Gist.
> >> > > > > > > > >
> >> > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > I think that increment operation works as follows:
> >> > > > > > > > >
> >> > > > > > > > > 1. get row lock
> >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait
> for
> >> all
> >> > > > prior
> >> > > > > > > MVCC
> >> > > > > > > > > transactions to finish
> >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> >> > transaction
> >> > > > > > > > > 4. get previous values
> >> > > > > > > > > 5. create KVs
> >> > > > > > > > > 6. write to Memstore
> >> > > > > > > > > 7. write to WAL
> >> > > > > > > > > 8. release row lock
> >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete
> the
> >> > > > > > transaction
> >> > > > > > > > >
> >> > > > > > > > > A instance of MultiVersionConsistencyControl has a
> pending
> >> > > queue
> >> > > > of
> >> > > > > > > > writes
> >> > > > > > > > > named writeQueue.
> >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
> >> > > > writeQueue
> >> > > > > > is
> >> > > > > > > > > empty or writeQueue.getFirst() == w.
> >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9
> removes
> >> the
> >> > > > > > > WriteEntry
> >> > > > > > > > > from writeQueue.
> >> > > > > > > > >
> >> > > > > > > > > I think that when a handler thread is processing between
> >> > step 2
> >> > > > and
> >> > > > > > > step
> >> > > > > > > > 9,
> >> > > > > > > > > the other handler threads can wait until the thread
> >> completes
> >> > > > step
> >> > > > > 9.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > That is right. We need to read, after all outstanding
> >> updates
> >> > are
> >> > > > > > done...
> >> > > > > > > > because we need to read the latest update before we go to
> >> > > > > > > modify/increment
> >> > > > > > > > it.
> >> > > > > > > >
> >> > > > > > > > How do you make out this?
> >> > > > > > > >
> >> > > > > > > > "A region lock (not a row lock) seems to occur in
> >> > > > > > > > waitForPreviousTransactionsComplete()."
> >> > > > > > > >
> >> > > > > > > > In 0.98.x we did this:
> >> > > > > > > >
> >> > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> >> > > > > > > >
> >> > > > > > > > ... and in 1.0 we do this:
> >> > > > > > > >
> >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is
> this....
> >> > > > > > > >
> >> > > > > > > > +  public void waitForPreviousTransactionsComplete() {
> >> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> >> > > > > > > > +  }
> >> > > > > > > >
> >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763).
> Previous
> >> > mvcc
> >> > > > and
> >> > > > > > > > region
> >> > > > > > > > sequenceid would spin independent of each other. Perhaps
> >> this
> >> > > > > > responsible
> >> > > > > > > > for some slow down.
> >> > > > > > > >
> >> > > > > > > > That said, looking in your thread dump, we seem to be down
> >> in
> >> > the
> >> > > > > Get.
> >> > > > > > If
> >> > > > > > > > you do a bunch of thread dumps in a row, where is the
> >> > > lock-holding
> >> > > > > > > thread?
> >> > > > > > > > In Get or writing Increment... or waiting on sequence id?
> >> > > > > > > >
> >> > > > > > > > Is it possible you are contending on a counter
> post-upgrade?
> >> > Is
> >> > > it
> >> > > > > > > > possible that all these threads are trying to get to the
> >> same
> >> > row
> >> > > > to
> >> > > > > > > update
> >> > > > > > > > it? Could the app behavior have changed?  Or are you
> >> thinking
> >> > > > > increment
> >> > > > > > > > itself has slowed significantly?
> >> > > > > > > >
> >> > > > > > > > St.Ack
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > > Thanks,
> >> > > > > > > > >
> >> > > > > > > > > Toshihiro Suzuki
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> >> > > > > > > > >
> >> > > > > > > > > > In HRegion#increment(), we lock the row (not region):
> >> > > > > > > > > >
> >> > > > > > > > > >     try {
> >> > > > > > > > > >       rowLock = getRowLock(row);
> >> > > > > > > > > >
> >> > > > > > > > > > Can you pastebin the complete stack trace ?
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks
> >> > > > > > > > > >
> >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
> >> brfrn169@gmail.com>
> >> > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Hi,
> >> > > > > > > > > > >
> >> > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6)
> to
> >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> >> > > > > > > > > > > and we experience slowdown in increment operation.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Here's an extract from thread dump of the
> >> RegionServer of
> >> > > our
> >> > > > > > > > cluster:
> >> > > > > > > > > > >
> >> > > > > > > > > > > Thread 68
> >> > > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> >> > > > > > > > > > >   State: BLOCKED
> >> > > > > > > > > > >   Blocked count: 21689888
> >> > > > > > > > > > >   Waited count: 39828360
> >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> >> > > > > > > > > > >   Blocked by 63
> >> > > > > > > > >
> (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> >> > > > > > > > > > >   Stack:
> >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > >
> >> > > > >
> >> > >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> >> > > > > > > > > > >
> >> > > > > >
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> >> > > > > > > > > > >
> >> > > > > >
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> >> > > > > > > > > > >
> >> > > > > > > >
> >> > > >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> >> > > > > > > > > > >
> >> > > > > > > > > > > There are many similar threads in the thread dump.
> >> > > > > > > > > > >
> >> > > > > > > > > > > I read the source code and I think this is caused by
> >> > > changes
> >> > > > of
> >> > > > > > > > > > > MultiVersionConsistencyControl.
> >> > > > > > > > > > > A region lock (not a row lock) seems to occur in
> >> > > > > > > > > > > waitForPreviousTransactionsComplete().
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Also we wrote performance test code for increment
> >> > operation
> >> > > > > that
> >> > > > > > > > > included
> >> > > > > > > > > > > 100 threads and ran it in local mode.
> >> > > > > > > > > > >
> >> > > > > > > > > > > The result is shown below:
> >> > > > > > > > > > >
> >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
> >> 7.975072509210629
> >> > > > > > > > > > >
> >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
> 49.11840157868772
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Thanks,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Toshihiro Suzuki
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Should this be added as a known issue in the CDH or hbase documentation? It
was a severe performance hit for us, all of our regionservers were sitting
at a few thousand queued requests.

On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <bb...@hubspot.com>
wrote:

> Yea, they are all over the place and called from client and coprocessor
> code. We ended up having no other option but to rollback, and aside from a
> few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn), it
> seems to be working and fixing our problem.
>
> On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
>
>> Rollback is untested. No fix in 5.5. I was going to work on this now.
>> Where
>> are your counters Bryan? In their own column family or scattered about in
>> a
>> row with other Cell types?
>> St.Ack
>>
>> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
>> bbeaudreault@hubspot.com> wrote:
>>
>> > Is there any update to this? We just upgraded all of our production
>> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
>> > known issues, did not not about this.  Now we are seeing perfomance
>> issues
>> > across all clusters, as we make heavy use of increments.
>> >
>> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back
>> to
>> > CDH 5.3.1 (if that is possible)?
>> >
>> >
>> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
>> >
>> > > Thank you St.Ack!
>> > >
>> > > I would like to follow the ticket.
>> > >
>> > > Toshihiro Suzuki
>> > >
>> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
>> > >
>> > > > Back to this problem. Simple tests confirm that as is, the
>> > > > single-queue-backed MVCC instance can slow Region ops if some other
>> row
>> > > is
>> > > > slow to complete. In particular Increment, checkAndPut, and batch
>> > > mutations
>> > > > are effected. I opened HBASE-14460 to start in on a fix up. Lets
>> see if
>> > > we
>> > > > can somehow scope mvcc to row or at least shard mvcc so not all
>> Region
>> > > ops
>> > > > are paused.
>> > > >
>> > > > St.Ack
>> > > >
>> > > >
>> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
>> > > >
>> > > > > > Thank you for the below reasoning (with accompanying helpful
>> > > diagram).
>> > > > > > Makes sense. Let me hack up a test case to help with the
>> > > illustration.
>> > > > It
>> > > > > > is as though the mvcc should be scoped to a row only... Writes
>> > > against
>> > > > > > other rows should not hold up my read of my row. Tag an mvcc
>> with a
>> > > > 'row'
>> > > > > > scope so we can see which on-going writes pertain to current
>> > > operation?
>> > > > > Thank you St.Ack! I think this approach would work.
>> > > > >
>> > > > > > You need to read back the increment and have it be 'correct' at
>> > > > increment
>> > > > > > time?
>> > > > > Yes, we need it.
>> > > > >
>> > > > > I would like to help if there is anything I can do.
>> > > > >
>> > > > > Thanks,
>> > > > > Toshihiro Suzuki
>> > > > >
>> > > > >
>> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
>> > > > >
>> > > > > > Thank you for the below reasoning (with accompanying helpful
>> > > diagram).
>> > > > > > Makes sense. Let me hack up a test case to help with the
>> > > illustration.
>> > > > It
>> > > > > > is as though the mvcc should be scoped to a row only... Writes
>> > > against
>> > > > > > other rows should not hold up my read of my row. Tag an mvcc
>> with a
>> > > > 'row'
>> > > > > > scope so we can see which on-going writes pertain to current
>> > > operation?
>> > > > > >
>> > > > > > You need to read back the increment and have it be 'correct' at
>> > > > increment
>> > > > > > time?
>> > > > > >
>> > > > > > (This is a good one)
>> > > > > >
>> > > > > > Thank you Toshihiro Suzuki
>> > > > > > St.Ack
>> > > > > >
>> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com>
>> wrote:
>> > > > > >
>> > > > > > > St.Ack,
>> > > > > > >
>> > > > > > > Thank you for your response.
>> > > > > > >
>> > > > > > > Why I make out that "A region lock (not a row lock) seems to
>> > occur
>> > > in
>> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
>> > > > > > >
>> > > > > > > A increment operation has 3 procedures for MVCC.
>> > > > > > >
>> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
>> > > > > > >
>> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
>> > > > > > >
>> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
>> > > > > > >
>> > > > > > >
>> > > > > > > I think that MultiVersionConsistencyControl's writeQueue can
>> > cause
>> > > a
>> > > > > > region
>> > > > > > > lock.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
>> > > > > > >
>> > > > > > >
>> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
>> > > > > > >
>> > > > > > > Step 3 removes the WriteEntry from writeQueue.
>> > > > > > >
>> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
>> > > > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
>> > > > > > >
>> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
>> writeQueue
>> > > and
>> > > > > > waits
>> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
>> > > > > > >
>> > > > > > >
>> > > > > > > I think when a handler thread is processing between step 2 and
>> > step
>> > > > 3,
>> > > > > > the
>> > > > > > > other handler threads can wait at step 1 until the thread
>> > completes
>> > > > > step
>> > > > > > 3
>> > > > > > > This is depicted as follows:
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
>> > > > > > >
>> > > > > > >
>> > > > > > > Actually, in the thread dump of our region server, many
>> handler
>> > > > threads
>> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
>> > > > > > > (waitForPreviousTransactionsComplete()).
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
>> > > > > > >
>> > > > > > > Many handler threads wait at this:
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
>> > > > > > >
>> > > > > > >
>> > > > > > > > Is it possible you are contending on a counter post-upgrade?
>> > Is
>> > > it
>> > > > > > > > possible that all these threads are trying to get to the
>> same
>> > row
>> > > > to
>> > > > > > > update
>> > > > > > > > it? Could the app behavior have changed?  Or are you
>> thinking
>> > > > > increment
>> > > > > > > > itself has slowed significantly?
>> > > > > > > We have just upgraded HBase, not changed the app behavior. We
>> are
>> > > > > > thinking
>> > > > > > > increment itself has slowed significantly.
>> > > > > > > Before upgrading HBase, it was good throughput and latency.
>> > > > > > > Currently, to cope with this problem, we split the regions
>> > finely.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Toshihiro Suzuki
>> > > > > > >
>> > > > > > >
>> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
>> > > > > > >
>> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com>
>> > > wrote:
>> > > > > > > >
>> > > > > > > > > Ted,
>> > > > > > > > >
>> > > > > > > > > Thank you for your response.
>> > > > > > > > >
>> > > > > > > > > I uploaded the complete stack trace to Gist.
>> > > > > > > > >
>> > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I think that increment operation works as follows:
>> > > > > > > > >
>> > > > > > > > > 1. get row lock
>> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for
>> all
>> > > > prior
>> > > > > > > MVCC
>> > > > > > > > > transactions to finish
>> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
>> > transaction
>> > > > > > > > > 4. get previous values
>> > > > > > > > > 5. create KVs
>> > > > > > > > > 6. write to Memstore
>> > > > > > > > > 7. write to WAL
>> > > > > > > > > 8. release row lock
>> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
>> > > > > > transaction
>> > > > > > > > >
>> > > > > > > > > A instance of MultiVersionConsistencyControl has a pending
>> > > queue
>> > > > of
>> > > > > > > > writes
>> > > > > > > > > named writeQueue.
>> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
>> > > > writeQueue
>> > > > > > is
>> > > > > > > > > empty or writeQueue.getFirst() == w.
>> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes
>> the
>> > > > > > > WriteEntry
>> > > > > > > > > from writeQueue.
>> > > > > > > > >
>> > > > > > > > > I think that when a handler thread is processing between
>> > step 2
>> > > > and
>> > > > > > > step
>> > > > > > > > 9,
>> > > > > > > > > the other handler threads can wait until the thread
>> completes
>> > > > step
>> > > > > 9.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > That is right. We need to read, after all outstanding
>> updates
>> > are
>> > > > > > done...
>> > > > > > > > because we need to read the latest update before we go to
>> > > > > > > modify/increment
>> > > > > > > > it.
>> > > > > > > >
>> > > > > > > > How do you make out this?
>> > > > > > > >
>> > > > > > > > "A region lock (not a row lock) seems to occur in
>> > > > > > > > waitForPreviousTransactionsComplete()."
>> > > > > > > >
>> > > > > > > > In 0.98.x we did this:
>> > > > > > > >
>> > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>> > > > > > > >
>> > > > > > > > ... and in 1.0 we do this:
>> > > > > > > >
>> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is this....
>> > > > > > > >
>> > > > > > > > +  public void waitForPreviousTransactionsComplete() {
>> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
>> > > > > > > > +    waitForPreviousTransactionsComplete(w);
>> > > > > > > > +  }
>> > > > > > > >
>> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
>> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous
>> > mvcc
>> > > > and
>> > > > > > > > region
>> > > > > > > > sequenceid would spin independent of each other. Perhaps
>> this
>> > > > > > responsible
>> > > > > > > > for some slow down.
>> > > > > > > >
>> > > > > > > > That said, looking in your thread dump, we seem to be down
>> in
>> > the
>> > > > > Get.
>> > > > > > If
>> > > > > > > > you do a bunch of thread dumps in a row, where is the
>> > > lock-holding
>> > > > > > > thread?
>> > > > > > > > In Get or writing Increment... or waiting on sequence id?
>> > > > > > > >
>> > > > > > > > Is it possible you are contending on a counter post-upgrade?
>> > Is
>> > > it
>> > > > > > > > possible that all these threads are trying to get to the
>> same
>> > row
>> > > > to
>> > > > > > > update
>> > > > > > > > it? Could the app behavior have changed?  Or are you
>> thinking
>> > > > > increment
>> > > > > > > > itself has slowed significantly?
>> > > > > > > >
>> > > > > > > > St.Ack
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >
>> > > > > > > > > Toshihiro Suzuki
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
>> > > > > > > > >
>> > > > > > > > > > In HRegion#increment(), we lock the row (not region):
>> > > > > > > > > >
>> > > > > > > > > >     try {
>> > > > > > > > > >       rowLock = getRowLock(row);
>> > > > > > > > > >
>> > > > > > > > > > Can you pastebin the complete stack trace ?
>> > > > > > > > > >
>> > > > > > > > > > Thanks
>> > > > > > > > > >
>> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
>> brfrn169@gmail.com>
>> > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi,
>> > > > > > > > > > >
>> > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
>> > > > > > > > > > CDH5.4.5(HBase1.0.0)
>> > > > > > > > > > > and we experience slowdown in increment operation.
>> > > > > > > > > > >
>> > > > > > > > > > > Here's an extract from thread dump of the
>> RegionServer of
>> > > our
>> > > > > > > > cluster:
>> > > > > > > > > > >
>> > > > > > > > > > > Thread 68
>> > > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
>> > > > > > > > > > >   State: BLOCKED
>> > > > > > > > > > >   Blocked count: 21689888
>> > > > > > > > > > >   Waited count: 39828360
>> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
>> > > > > > > > > > >   Blocked by 63
>> > > > > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
>> > > > > > > > > > >   Stack:
>> > > > > > > > > > >     java.lang.Object.wait(Native Method)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>> > > > > > > > > > >
>> > > > > >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>> > > > > > > > > > >
>> > > > > >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> > > > > > > > > > >
>> > > > > > > >
>> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
>> > > > > > > > > > >
>> > > > > > > > > > > There are many similar threads in the thread dump.
>> > > > > > > > > > >
>> > > > > > > > > > > I read the source code and I think this is caused by
>> > > changes
>> > > > of
>> > > > > > > > > > > MultiVersionConsistencyControl.
>> > > > > > > > > > > A region lock (not a row lock) seems to occur in
>> > > > > > > > > > > waitForPreviousTransactionsComplete().
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Also we wrote performance test code for increment
>> > operation
>> > > > > that
>> > > > > > > > > included
>> > > > > > > > > > > 100 threads and ran it in local mode.
>> > > > > > > > > > >
>> > > > > > > > > > > The result is shown below:
>> > > > > > > > > > >
>> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
>> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
>> 7.975072509210629
>> > > > > > > > > > >
>> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
>> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks,
>> > > > > > > > > > >
>> > > > > > > > > > > Toshihiro Suzuki
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Should this be added as a known issue in the CDH or hbase documentation? It
was a severe performance hit for us, all of our regionservers were sitting
at a few thousand queued requests.

On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <bb...@hubspot.com>
wrote:

> Yea, they are all over the place and called from client and coprocessor
> code. We ended up having no other option but to rollback, and aside from a
> few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn), it
> seems to be working and fixing our problem.
>
> On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:
>
>> Rollback is untested. No fix in 5.5. I was going to work on this now.
>> Where
>> are your counters Bryan? In their own column family or scattered about in
>> a
>> row with other Cell types?
>> St.Ack
>>
>> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
>> bbeaudreault@hubspot.com> wrote:
>>
>> > Is there any update to this? We just upgraded all of our production
>> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
>> > known issues, did not not about this.  Now we are seeing perfomance
>> issues
>> > across all clusters, as we make heavy use of increments.
>> >
>> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back
>> to
>> > CDH 5.3.1 (if that is possible)?
>> >
>> >
>> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
>> >
>> > > Thank you St.Ack!
>> > >
>> > > I would like to follow the ticket.
>> > >
>> > > Toshihiro Suzuki
>> > >
>> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
>> > >
>> > > > Back to this problem. Simple tests confirm that as is, the
>> > > > single-queue-backed MVCC instance can slow Region ops if some other
>> row
>> > > is
>> > > > slow to complete. In particular Increment, checkAndPut, and batch
>> > > mutations
>> > > > are effected. I opened HBASE-14460 to start in on a fix up. Lets
>> see if
>> > > we
>> > > > can somehow scope mvcc to row or at least shard mvcc so not all
>> Region
>> > > ops
>> > > > are paused.
>> > > >
>> > > > St.Ack
>> > > >
>> > > >
>> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
>> > > >
>> > > > > > Thank you for the below reasoning (with accompanying helpful
>> > > diagram).
>> > > > > > Makes sense. Let me hack up a test case to help with the
>> > > illustration.
>> > > > It
>> > > > > > is as though the mvcc should be scoped to a row only... Writes
>> > > against
>> > > > > > other rows should not hold up my read of my row. Tag an mvcc
>> with a
>> > > > 'row'
>> > > > > > scope so we can see which on-going writes pertain to current
>> > > operation?
>> > > > > Thank you St.Ack! I think this approach would work.
>> > > > >
>> > > > > > You need to read back the increment and have it be 'correct' at
>> > > > increment
>> > > > > > time?
>> > > > > Yes, we need it.
>> > > > >
>> > > > > I would like to help if there is anything I can do.
>> > > > >
>> > > > > Thanks,
>> > > > > Toshihiro Suzuki
>> > > > >
>> > > > >
>> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
>> > > > >
>> > > > > > Thank you for the below reasoning (with accompanying helpful
>> > > diagram).
>> > > > > > Makes sense. Let me hack up a test case to help with the
>> > > illustration.
>> > > > It
>> > > > > > is as though the mvcc should be scoped to a row only... Writes
>> > > against
>> > > > > > other rows should not hold up my read of my row. Tag an mvcc
>> with a
>> > > > 'row'
>> > > > > > scope so we can see which on-going writes pertain to current
>> > > operation?
>> > > > > >
>> > > > > > You need to read back the increment and have it be 'correct' at
>> > > > increment
>> > > > > > time?
>> > > > > >
>> > > > > > (This is a good one)
>> > > > > >
>> > > > > > Thank you Toshihiro Suzuki
>> > > > > > St.Ack
>> > > > > >
>> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com>
>> wrote:
>> > > > > >
>> > > > > > > St.Ack,
>> > > > > > >
>> > > > > > > Thank you for your response.
>> > > > > > >
>> > > > > > > Why I make out that "A region lock (not a row lock) seems to
>> > occur
>> > > in
>> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
>> > > > > > >
>> > > > > > > A increment operation has 3 procedures for MVCC.
>> > > > > > >
>> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
>> > > > > > >
>> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
>> > > > > > >
>> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
>> > > > > > >
>> > > > > > >
>> > > > > > > I think that MultiVersionConsistencyControl's writeQueue can
>> > cause
>> > > a
>> > > > > > region
>> > > > > > > lock.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
>> > > > > > >
>> > > > > > >
>> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
>> > > > > > >
>> > > > > > > Step 3 removes the WriteEntry from writeQueue.
>> > > > > > >
>> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
>> > > > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
>> > > > > > >
>> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
>> writeQueue
>> > > and
>> > > > > > waits
>> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
>> > > > > > >
>> > > > > > >
>> > > > > > > I think when a handler thread is processing between step 2 and
>> > step
>> > > > 3,
>> > > > > > the
>> > > > > > > other handler threads can wait at step 1 until the thread
>> > completes
>> > > > > step
>> > > > > > 3
>> > > > > > > This is depicted as follows:
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
>> > > > > > >
>> > > > > > >
>> > > > > > > Actually, in the thread dump of our region server, many
>> handler
>> > > > threads
>> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
>> > > > > > > (waitForPreviousTransactionsComplete()).
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
>> > > > > > >
>> > > > > > > Many handler threads wait at this:
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
>> > > > > > >
>> > > > > > >
>> > > > > > > > Is it possible you are contending on a counter post-upgrade?
>> > Is
>> > > it
>> > > > > > > > possible that all these threads are trying to get to the
>> same
>> > row
>> > > > to
>> > > > > > > update
>> > > > > > > > it? Could the app behavior have changed?  Or are you
>> thinking
>> > > > > increment
>> > > > > > > > itself has slowed significantly?
>> > > > > > > We have just upgraded HBase, not changed the app behavior. We
>> are
>> > > > > > thinking
>> > > > > > > increment itself has slowed significantly.
>> > > > > > > Before upgrading HBase, it was good throughput and latency.
>> > > > > > > Currently, to cope with this problem, we split the regions
>> > finely.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Toshihiro Suzuki
>> > > > > > >
>> > > > > > >
>> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
>> > > > > > >
>> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com>
>> > > wrote:
>> > > > > > > >
>> > > > > > > > > Ted,
>> > > > > > > > >
>> > > > > > > > > Thank you for your response.
>> > > > > > > > >
>> > > > > > > > > I uploaded the complete stack trace to Gist.
>> > > > > > > > >
>> > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I think that increment operation works as follows:
>> > > > > > > > >
>> > > > > > > > > 1. get row lock
>> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for
>> all
>> > > > prior
>> > > > > > > MVCC
>> > > > > > > > > transactions to finish
>> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
>> > transaction
>> > > > > > > > > 4. get previous values
>> > > > > > > > > 5. create KVs
>> > > > > > > > > 6. write to Memstore
>> > > > > > > > > 7. write to WAL
>> > > > > > > > > 8. release row lock
>> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
>> > > > > > transaction
>> > > > > > > > >
>> > > > > > > > > A instance of MultiVersionConsistencyControl has a pending
>> > > queue
>> > > > of
>> > > > > > > > writes
>> > > > > > > > > named writeQueue.
>> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
>> > > > writeQueue
>> > > > > > is
>> > > > > > > > > empty or writeQueue.getFirst() == w.
>> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes
>> the
>> > > > > > > WriteEntry
>> > > > > > > > > from writeQueue.
>> > > > > > > > >
>> > > > > > > > > I think that when a handler thread is processing between
>> > step 2
>> > > > and
>> > > > > > > step
>> > > > > > > > 9,
>> > > > > > > > > the other handler threads can wait until the thread
>> completes
>> > > > step
>> > > > > 9.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > That is right. We need to read, after all outstanding
>> updates
>> > are
>> > > > > > done...
>> > > > > > > > because we need to read the latest update before we go to
>> > > > > > > modify/increment
>> > > > > > > > it.
>> > > > > > > >
>> > > > > > > > How do you make out this?
>> > > > > > > >
>> > > > > > > > "A region lock (not a row lock) seems to occur in
>> > > > > > > > waitForPreviousTransactionsComplete()."
>> > > > > > > >
>> > > > > > > > In 0.98.x we did this:
>> > > > > > > >
>> > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>> > > > > > > >
>> > > > > > > > ... and in 1.0 we do this:
>> > > > > > > >
>> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is this....
>> > > > > > > >
>> > > > > > > > +  public void waitForPreviousTransactionsComplete() {
>> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
>> > > > > > > > +    waitForPreviousTransactionsComplete(w);
>> > > > > > > > +  }
>> > > > > > > >
>> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
>> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous
>> > mvcc
>> > > > and
>> > > > > > > > region
>> > > > > > > > sequenceid would spin independent of each other. Perhaps
>> this
>> > > > > > responsible
>> > > > > > > > for some slow down.
>> > > > > > > >
>> > > > > > > > That said, looking in your thread dump, we seem to be down
>> in
>> > the
>> > > > > Get.
>> > > > > > If
>> > > > > > > > you do a bunch of thread dumps in a row, where is the
>> > > lock-holding
>> > > > > > > thread?
>> > > > > > > > In Get or writing Increment... or waiting on sequence id?
>> > > > > > > >
>> > > > > > > > Is it possible you are contending on a counter post-upgrade?
>> > Is
>> > > it
>> > > > > > > > possible that all these threads are trying to get to the
>> same
>> > row
>> > > > to
>> > > > > > > update
>> > > > > > > > it? Could the app behavior have changed?  Or are you
>> thinking
>> > > > > increment
>> > > > > > > > itself has slowed significantly?
>> > > > > > > >
>> > > > > > > > St.Ack
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >
>> > > > > > > > > Toshihiro Suzuki
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
>> > > > > > > > >
>> > > > > > > > > > In HRegion#increment(), we lock the row (not region):
>> > > > > > > > > >
>> > > > > > > > > >     try {
>> > > > > > > > > >       rowLock = getRowLock(row);
>> > > > > > > > > >
>> > > > > > > > > > Can you pastebin the complete stack trace ?
>> > > > > > > > > >
>> > > > > > > > > > Thanks
>> > > > > > > > > >
>> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <
>> brfrn169@gmail.com>
>> > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi,
>> > > > > > > > > > >
>> > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
>> > > > > > > > > > CDH5.4.5(HBase1.0.0)
>> > > > > > > > > > > and we experience slowdown in increment operation.
>> > > > > > > > > > >
>> > > > > > > > > > > Here's an extract from thread dump of the
>> RegionServer of
>> > > our
>> > > > > > > > cluster:
>> > > > > > > > > > >
>> > > > > > > > > > > Thread 68
>> > > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
>> > > > > > > > > > >   State: BLOCKED
>> > > > > > > > > > >   Blocked count: 21689888
>> > > > > > > > > > >   Waited count: 39828360
>> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
>> > > > > > > > > > >   Blocked by 63
>> > > > > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
>> > > > > > > > > > >   Stack:
>> > > > > > > > > > >     java.lang.Object.wait(Native Method)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>> > > > > > > > > > >
>> > > > > >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>> > > > > > > > > > >
>> > > > > >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> > > > > > > > > > >
>> > > > > > > >
>> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
>> > > > > > > > > > >
>> > > > > > > > > > > There are many similar threads in the thread dump.
>> > > > > > > > > > >
>> > > > > > > > > > > I read the source code and I think this is caused by
>> > > changes
>> > > > of
>> > > > > > > > > > > MultiVersionConsistencyControl.
>> > > > > > > > > > > A region lock (not a row lock) seems to occur in
>> > > > > > > > > > > waitForPreviousTransactionsComplete().
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Also we wrote performance test code for increment
>> > operation
>> > > > > that
>> > > > > > > > > included
>> > > > > > > > > > > 100 threads and ran it in local mode.
>> > > > > > > > > > >
>> > > > > > > > > > > The result is shown below:
>> > > > > > > > > > >
>> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
>> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
>> 7.975072509210629
>> > > > > > > > > > >
>> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
>> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks,
>> > > > > > > > > > >
>> > > > > > > > > > > Toshihiro Suzuki
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Yea, they are all over the place and called from client and coprocessor
code. We ended up having no other option but to rollback, and aside from a
few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn), it
seems to be working and fixing our problem.

On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:

> Rollback is untested. No fix in 5.5. I was going to work on this now. Where
> are your counters Bryan? In their own column family or scattered about in a
> row with other Cell types?
> St.Ack
>
> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
> > Is there any update to this? We just upgraded all of our production
> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
> > known issues, did not not about this.  Now we are seeing perfomance
> issues
> > across all clusters, as we make heavy use of increments.
> >
> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back to
> > CDH 5.3.1 (if that is possible)?
> >
> >
> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
> >
> > > Thank you St.Ack!
> > >
> > > I would like to follow the ticket.
> > >
> > > Toshihiro Suzuki
> > >
> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > >
> > > > Back to this problem. Simple tests confirm that as is, the
> > > > single-queue-backed MVCC instance can slow Region ops if some other
> row
> > > is
> > > > slow to complete. In particular Increment, checkAndPut, and batch
> > > mutations
> > > > are effected. I opened HBASE-14460 to start in on a fix up. Lets see
> if
> > > we
> > > > can somehow scope mvcc to row or at least shard mvcc so not all
> Region
> > > ops
> > > > are paused.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > >
> > > > > > Thank you for the below reasoning (with accompanying helpful
> > > diagram).
> > > > > > Makes sense. Let me hack up a test case to help with the
> > > illustration.
> > > > It
> > > > > > is as though the mvcc should be scoped to a row only... Writes
> > > against
> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> with a
> > > > 'row'
> > > > > > scope so we can see which on-going writes pertain to current
> > > operation?
> > > > > Thank you St.Ack! I think this approach would work.
> > > > >
> > > > > > You need to read back the increment and have it be 'correct' at
> > > > increment
> > > > > > time?
> > > > > Yes, we need it.
> > > > >
> > > > > I would like to help if there is anything I can do.
> > > > >
> > > > > Thanks,
> > > > > Toshihiro Suzuki
> > > > >
> > > > >
> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > > >
> > > > > > Thank you for the below reasoning (with accompanying helpful
> > > diagram).
> > > > > > Makes sense. Let me hack up a test case to help with the
> > > illustration.
> > > > It
> > > > > > is as though the mvcc should be scoped to a row only... Writes
> > > against
> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> with a
> > > > 'row'
> > > > > > scope so we can see which on-going writes pertain to current
> > > operation?
> > > > > >
> > > > > > You need to read back the increment and have it be 'correct' at
> > > > increment
> > > > > > time?
> > > > > >
> > > > > > (This is a good one)
> > > > > >
> > > > > > Thank you Toshihiro Suzuki
> > > > > > St.Ack
> > > > > >
> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com>
> wrote:
> > > > > >
> > > > > > > St.Ack,
> > > > > > >
> > > > > > > Thank you for your response.
> > > > > > >
> > > > > > > Why I make out that "A region lock (not a row lock) seems to
> > occur
> > > in
> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
> > > > > > >
> > > > > > > A increment operation has 3 procedures for MVCC.
> > > > > > >
> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > > > >
> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > > > >
> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > > > >
> > > > > > >
> > > > > > > I think that MultiVersionConsistencyControl's writeQueue can
> > cause
> > > a
> > > > > > region
> > > > > > > lock.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > > > >
> > > > > > >
> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > > > >
> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > > > >
> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > > > >
> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
> writeQueue
> > > and
> > > > > > waits
> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > > > >
> > > > > > >
> > > > > > > I think when a handler thread is processing between step 2 and
> > step
> > > > 3,
> > > > > > the
> > > > > > > other handler threads can wait at step 1 until the thread
> > completes
> > > > > step
> > > > > > 3
> > > > > > > This is depicted as follows:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > > > >
> > > > > > >
> > > > > > > Actually, in the thread dump of our region server, many handler
> > > > threads
> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > > > >
> > > > > > > Many handler threads wait at this:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > > > >
> > > > > > >
> > > > > > > > Is it possible you are contending on a counter post-upgrade?
> > Is
> > > it
> > > > > > > > possible that all these threads are trying to get to the same
> > row
> > > > to
> > > > > > > update
> > > > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > > > increment
> > > > > > > > itself has slowed significantly?
> > > > > > > We have just upgraded HBase, not changed the app behavior. We
> are
> > > > > > thinking
> > > > > > > increment itself has slowed significantly.
> > > > > > > Before upgrading HBase, it was good throughput and latency.
> > > > > > > Currently, to cope with this problem, we split the regions
> > finely.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Toshihiro Suzuki
> > > > > > >
> > > > > > >
> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > > > > >
> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > Ted,
> > > > > > > > >
> > > > > > > > > Thank you for your response.
> > > > > > > > >
> > > > > > > > > I uploaded the complete stack trace to Gist.
> > > > > > > > >
> > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think that increment operation works as follows:
> > > > > > > > >
> > > > > > > > > 1. get row lock
> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for
> all
> > > > prior
> > > > > > > MVCC
> > > > > > > > > transactions to finish
> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> > transaction
> > > > > > > > > 4. get previous values
> > > > > > > > > 5. create KVs
> > > > > > > > > 6. write to Memstore
> > > > > > > > > 7. write to WAL
> > > > > > > > > 8. release row lock
> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> > > > > > transaction
> > > > > > > > >
> > > > > > > > > A instance of MultiVersionConsistencyControl has a pending
> > > queue
> > > > of
> > > > > > > > writes
> > > > > > > > > named writeQueue.
> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
> > > > writeQueue
> > > > > > is
> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes
> the
> > > > > > > WriteEntry
> > > > > > > > > from writeQueue.
> > > > > > > > >
> > > > > > > > > I think that when a handler thread is processing between
> > step 2
> > > > and
> > > > > > > step
> > > > > > > > 9,
> > > > > > > > > the other handler threads can wait until the thread
> completes
> > > > step
> > > > > 9.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > That is right. We need to read, after all outstanding updates
> > are
> > > > > > done...
> > > > > > > > because we need to read the latest update before we go to
> > > > > > > modify/increment
> > > > > > > > it.
> > > > > > > >
> > > > > > > > How do you make out this?
> > > > > > > >
> > > > > > > > "A region lock (not a row lock) seems to occur in
> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > > > > >
> > > > > > > > In 0.98.x we did this:
> > > > > > > >
> > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > > > >
> > > > > > > > ... and in 1.0 we do this:
> > > > > > > >
> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > > > > > > >
> > > > > > > > +  public void waitForPreviousTransactionsComplete() {
> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > > > > +  }
> > > > > > > >
> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous
> > mvcc
> > > > and
> > > > > > > > region
> > > > > > > > sequenceid would spin independent of each other. Perhaps this
> > > > > > responsible
> > > > > > > > for some slow down.
> > > > > > > >
> > > > > > > > That said, looking in your thread dump, we seem to be down in
> > the
> > > > > Get.
> > > > > > If
> > > > > > > > you do a bunch of thread dumps in a row, where is the
> > > lock-holding
> > > > > > > thread?
> > > > > > > > In Get or writing Increment... or waiting on sequence id?
> > > > > > > >
> > > > > > > > Is it possible you are contending on a counter post-upgrade?
> > Is
> > > it
> > > > > > > > possible that all these threads are trying to get to the same
> > row
> > > > to
> > > > > > > update
> > > > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > > > increment
> > > > > > > > itself has slowed significantly?
> > > > > > > >
> > > > > > > > St.Ack
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Toshihiro Suzuki
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > > > > > > > >
> > > > > > > > > > In HRegion#increment(), we lock the row (not region):
> > > > > > > > > >
> > > > > > > > > >     try {
> > > > > > > > > >       rowLock = getRowLock(row);
> > > > > > > > > >
> > > > > > > > > > Can you pastebin the complete stack trace ?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <brfrn169@gmail.com
> >
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > > > and we experience slowdown in increment operation.
> > > > > > > > > > >
> > > > > > > > > > > Here's an extract from thread dump of the RegionServer
> of
> > > our
> > > > > > > > cluster:
> > > > > > > > > > >
> > > > > > > > > > > Thread 68
> > > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > > > > > >   State: BLOCKED
> > > > > > > > > > >   Blocked count: 21689888
> > > > > > > > > > >   Waited count: 39828360
> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > > > > > >   Blocked by 63
> > > > > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > > > > > >   Stack:
> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > > > > > >
> > > > > >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > > > > > >
> > > > > >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > > > > > >
> > > > > > > >
> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > > > > > >
> > > > > > > > > > > There are many similar threads in the thread dump.
> > > > > > > > > > >
> > > > > > > > > > > I read the source code and I think this is caused by
> > > changes
> > > > of
> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > > > > > > > > A region lock (not a row lock) seems to occur in
> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Also we wrote performance test code for increment
> > operation
> > > > > that
> > > > > > > > > included
> > > > > > > > > > > 100 threads and ran it in local mode.
> > > > > > > > > > >
> > > > > > > > > > > The result is shown below:
> > > > > > > > > > >
> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > > > > > > >
> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Toshihiro Suzuki
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Yea, they are all over the place and called from client and coprocessor
code. We ended up having no other option but to rollback, and aside from a
few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn), it
seems to be working and fixing our problem.

On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote:

> Rollback is untested. No fix in 5.5. I was going to work on this now. Where
> are your counters Bryan? In their own column family or scattered about in a
> row with other Cell types?
> St.Ack
>
> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
> > Is there any update to this? We just upgraded all of our production
> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
> > known issues, did not not about this.  Now we are seeing perfomance
> issues
> > across all clusters, as we make heavy use of increments.
> >
> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back to
> > CDH 5.3.1 (if that is possible)?
> >
> >
> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
> >
> > > Thank you St.Ack!
> > >
> > > I would like to follow the ticket.
> > >
> > > Toshihiro Suzuki
> > >
> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> > >
> > > > Back to this problem. Simple tests confirm that as is, the
> > > > single-queue-backed MVCC instance can slow Region ops if some other
> row
> > > is
> > > > slow to complete. In particular Increment, checkAndPut, and batch
> > > mutations
> > > > are effected. I opened HBASE-14460 to start in on a fix up. Lets see
> if
> > > we
> > > > can somehow scope mvcc to row or at least shard mvcc so not all
> Region
> > > ops
> > > > are paused.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > >
> > > > > > Thank you for the below reasoning (with accompanying helpful
> > > diagram).
> > > > > > Makes sense. Let me hack up a test case to help with the
> > > illustration.
> > > > It
> > > > > > is as though the mvcc should be scoped to a row only... Writes
> > > against
> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> with a
> > > > 'row'
> > > > > > scope so we can see which on-going writes pertain to current
> > > operation?
> > > > > Thank you St.Ack! I think this approach would work.
> > > > >
> > > > > > You need to read back the increment and have it be 'correct' at
> > > > increment
> > > > > > time?
> > > > > Yes, we need it.
> > > > >
> > > > > I would like to help if there is anything I can do.
> > > > >
> > > > > Thanks,
> > > > > Toshihiro Suzuki
> > > > >
> > > > >
> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > > >
> > > > > > Thank you for the below reasoning (with accompanying helpful
> > > diagram).
> > > > > > Makes sense. Let me hack up a test case to help with the
> > > illustration.
> > > > It
> > > > > > is as though the mvcc should be scoped to a row only... Writes
> > > against
> > > > > > other rows should not hold up my read of my row. Tag an mvcc
> with a
> > > > 'row'
> > > > > > scope so we can see which on-going writes pertain to current
> > > operation?
> > > > > >
> > > > > > You need to read back the increment and have it be 'correct' at
> > > > increment
> > > > > > time?
> > > > > >
> > > > > > (This is a good one)
> > > > > >
> > > > > > Thank you Toshihiro Suzuki
> > > > > > St.Ack
> > > > > >
> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com>
> wrote:
> > > > > >
> > > > > > > St.Ack,
> > > > > > >
> > > > > > > Thank you for your response.
> > > > > > >
> > > > > > > Why I make out that "A region lock (not a row lock) seems to
> > occur
> > > in
> > > > > > > waitForPreviousTransactionsComplete()" is as follows:
> > > > > > >
> > > > > > > A increment operation has 3 procedures for MVCC.
> > > > > > >
> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > > > >
> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > > > >
> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > > > >
> > > > > > >
> > > > > > > I think that MultiVersionConsistencyControl's writeQueue can
> > cause
> > > a
> > > > > > region
> > > > > > > lock.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > > > >
> > > > > > >
> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > > > >
> > > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > > > >
> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > > > >
> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to
> writeQueue
> > > and
> > > > > > waits
> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > > > >
> > > > > > >
> > > > > > > I think when a handler thread is processing between step 2 and
> > step
> > > > 3,
> > > > > > the
> > > > > > > other handler threads can wait at step 1 until the thread
> > completes
> > > > > step
> > > > > > 3
> > > > > > > This is depicted as follows:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > > > >
> > > > > > >
> > > > > > > Actually, in the thread dump of our region server, many handler
> > > > threads
> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > > > >
> > > > > > > Many handler threads wait at this:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > > > >
> > > > > > >
> > > > > > > > Is it possible you are contending on a counter post-upgrade?
> > Is
> > > it
> > > > > > > > possible that all these threads are trying to get to the same
> > row
> > > > to
> > > > > > > update
> > > > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > > > increment
> > > > > > > > itself has slowed significantly?
> > > > > > > We have just upgraded HBase, not changed the app behavior. We
> are
> > > > > > thinking
> > > > > > > increment itself has slowed significantly.
> > > > > > > Before upgrading HBase, it was good throughput and latency.
> > > > > > > Currently, to cope with this problem, we split the regions
> > finely.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Toshihiro Suzuki
> > > > > > >
> > > > > > >
> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > > > > >
> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > Ted,
> > > > > > > > >
> > > > > > > > > Thank you for your response.
> > > > > > > > >
> > > > > > > > > I uploaded the complete stack trace to Gist.
> > > > > > > > >
> > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think that increment operation works as follows:
> > > > > > > > >
> > > > > > > > > 1. get row lock
> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for
> all
> > > > prior
> > > > > > > MVCC
> > > > > > > > > transactions to finish
> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> > transaction
> > > > > > > > > 4. get previous values
> > > > > > > > > 5. create KVs
> > > > > > > > > 6. write to Memstore
> > > > > > > > > 7. write to WAL
> > > > > > > > > 8. release row lock
> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> > > > > > transaction
> > > > > > > > >
> > > > > > > > > A instance of MultiVersionConsistencyControl has a pending
> > > queue
> > > > of
> > > > > > > > writes
> > > > > > > > > named writeQueue.
> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
> > > > writeQueue
> > > > > > is
> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes
> the
> > > > > > > WriteEntry
> > > > > > > > > from writeQueue.
> > > > > > > > >
> > > > > > > > > I think that when a handler thread is processing between
> > step 2
> > > > and
> > > > > > > step
> > > > > > > > 9,
> > > > > > > > > the other handler threads can wait until the thread
> completes
> > > > step
> > > > > 9.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > That is right. We need to read, after all outstanding updates
> > are
> > > > > > done...
> > > > > > > > because we need to read the latest update before we go to
> > > > > > > modify/increment
> > > > > > > > it.
> > > > > > > >
> > > > > > > > How do you make out this?
> > > > > > > >
> > > > > > > > "A region lock (not a row lock) seems to occur in
> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > > > > >
> > > > > > > > In 0.98.x we did this:
> > > > > > > >
> > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > > > >
> > > > > > > > ... and in 1.0 we do this:
> > > > > > > >
> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > > > > > > >
> > > > > > > > +  public void waitForPreviousTransactionsComplete() {
> > > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > > > > +  }
> > > > > > > >
> > > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous
> > mvcc
> > > > and
> > > > > > > > region
> > > > > > > > sequenceid would spin independent of each other. Perhaps this
> > > > > > responsible
> > > > > > > > for some slow down.
> > > > > > > >
> > > > > > > > That said, looking in your thread dump, we seem to be down in
> > the
> > > > > Get.
> > > > > > If
> > > > > > > > you do a bunch of thread dumps in a row, where is the
> > > lock-holding
> > > > > > > thread?
> > > > > > > > In Get or writing Increment... or waiting on sequence id?
> > > > > > > >
> > > > > > > > Is it possible you are contending on a counter post-upgrade?
> > Is
> > > it
> > > > > > > > possible that all these threads are trying to get to the same
> > row
> > > > to
> > > > > > > update
> > > > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > > > increment
> > > > > > > > itself has slowed significantly?
> > > > > > > >
> > > > > > > > St.Ack
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Toshihiro Suzuki
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > > > > > > > >
> > > > > > > > > > In HRegion#increment(), we lock the row (not region):
> > > > > > > > > >
> > > > > > > > > >     try {
> > > > > > > > > >       rowLock = getRowLock(row);
> > > > > > > > > >
> > > > > > > > > > Can you pastebin the complete stack trace ?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <brfrn169@gmail.com
> >
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > > > and we experience slowdown in increment operation.
> > > > > > > > > > >
> > > > > > > > > > > Here's an extract from thread dump of the RegionServer
> of
> > > our
> > > > > > > > cluster:
> > > > > > > > > > >
> > > > > > > > > > > Thread 68
> > > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > > > > > >   State: BLOCKED
> > > > > > > > > > >   Blocked count: 21689888
> > > > > > > > > > >   Waited count: 39828360
> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > > > > > >   Blocked by 63
> > > > > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > > > > > >   Stack:
> > > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > > > > > >
> > > > > >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > > > > > >
> > > > > >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > > > > > >
> > > > > > > >
> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > > > > > >
> > > > > > > > > > > There are many similar threads in the thread dump.
> > > > > > > > > > >
> > > > > > > > > > > I read the source code and I think this is caused by
> > > changes
> > > > of
> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > > > > > > > > A region lock (not a row lock) seems to occur in
> > > > > > > > > > > waitForPreviousTransactionsComplete().
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Also we wrote performance test code for increment
> > operation
> > > > > that
> > > > > > > > > included
> > > > > > > > > > > 100 threads and ran it in local mode.
> > > > > > > > > > >
> > > > > > > > > > > The result is shown below:
> > > > > > > > > > >
> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > > > > > > >
> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Toshihiro Suzuki
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Rollback is untested. No fix in 5.5. I was going to work on this now. Where
are your counters Bryan? In their own column family or scattered about in a
row with other Cell types?
St.Ack

On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> Is there any update to this? We just upgraded all of our production
> clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
> known issues, did not not about this.  Now we are seeing perfomance issues
> across all clusters, as we make heavy use of increments.
>
> Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back to
> CDH 5.3.1 (if that is possible)?
>
>
> On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
>
> > Thank you St.Ack!
> >
> > I would like to follow the ticket.
> >
> > Toshihiro Suzuki
> >
> > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> >
> > > Back to this problem. Simple tests confirm that as is, the
> > > single-queue-backed MVCC instance can slow Region ops if some other row
> > is
> > > slow to complete. In particular Increment, checkAndPut, and batch
> > mutations
> > > are effected. I opened HBASE-14460 to start in on a fix up. Lets see if
> > we
> > > can somehow scope mvcc to row or at least shard mvcc so not all Region
> > ops
> > > are paused.
> > >
> > > St.Ack
> > >
> > >
> > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > >
> > > > > Thank you for the below reasoning (with accompanying helpful
> > diagram).
> > > > > Makes sense. Let me hack up a test case to help with the
> > illustration.
> > > It
> > > > > is as though the mvcc should be scoped to a row only... Writes
> > against
> > > > > other rows should not hold up my read of my row. Tag an mvcc with a
> > > 'row'
> > > > > scope so we can see which on-going writes pertain to current
> > operation?
> > > > Thank you St.Ack! I think this approach would work.
> > > >
> > > > > You need to read back the increment and have it be 'correct' at
> > > increment
> > > > > time?
> > > > Yes, we need it.
> > > >
> > > > I would like to help if there is anything I can do.
> > > >
> > > > Thanks,
> > > > Toshihiro Suzuki
> > > >
> > > >
> > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > >
> > > > > Thank you for the below reasoning (with accompanying helpful
> > diagram).
> > > > > Makes sense. Let me hack up a test case to help with the
> > illustration.
> > > It
> > > > > is as though the mvcc should be scoped to a row only... Writes
> > against
> > > > > other rows should not hold up my read of my row. Tag an mvcc with a
> > > 'row'
> > > > > scope so we can see which on-going writes pertain to current
> > operation?
> > > > >
> > > > > You need to read back the increment and have it be 'correct' at
> > > increment
> > > > > time?
> > > > >
> > > > > (This is a good one)
> > > > >
> > > > > Thank you Toshihiro Suzuki
> > > > > St.Ack
> > > > >
> > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > > >
> > > > > > St.Ack,
> > > > > >
> > > > > > Thank you for your response.
> > > > > >
> > > > > > Why I make out that "A region lock (not a row lock) seems to
> occur
> > in
> > > > > > waitForPreviousTransactionsComplete()" is as follows:
> > > > > >
> > > > > > A increment operation has 3 procedures for MVCC.
> > > > > >
> > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > > >
> > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > > >
> > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > > >
> > > > > >
> > > > > > I think that MultiVersionConsistencyControl's writeQueue can
> cause
> > a
> > > > > region
> > > > > > lock.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > > >
> > > > > >
> > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > > >
> > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > > >
> > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > > >
> > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue
> > and
> > > > > waits
> > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > > >
> > > > > >
> > > > > > I think when a handler thread is processing between step 2 and
> step
> > > 3,
> > > > > the
> > > > > > other handler threads can wait at step 1 until the thread
> completes
> > > > step
> > > > > 3
> > > > > > This is depicted as follows:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > > >
> > > > > >
> > > > > > Actually, in the thread dump of our region server, many handler
> > > threads
> > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > > > (waitForPreviousTransactionsComplete()).
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > > >
> > > > > > Many handler threads wait at this:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > > >
> > > > > >
> > > > > > > Is it possible you are contending on a counter post-upgrade?
> Is
> > it
> > > > > > > possible that all these threads are trying to get to the same
> row
> > > to
> > > > > > update
> > > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > > increment
> > > > > > > itself has slowed significantly?
> > > > > > We have just upgraded HBase, not changed the app behavior. We are
> > > > > thinking
> > > > > > increment itself has slowed significantly.
> > > > > > Before upgrading HBase, it was good throughput and latency.
> > > > > > Currently, to cope with this problem, we split the regions
> finely.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Toshihiro Suzuki
> > > > > >
> > > > > >
> > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > > > >
> > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Ted,
> > > > > > > >
> > > > > > > > Thank you for your response.
> > > > > > > >
> > > > > > > > I uploaded the complete stack trace to Gist.
> > > > > > > >
> > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > > > >
> > > > > > > >
> > > > > > > > I think that increment operation works as follows:
> > > > > > > >
> > > > > > > > 1. get row lock
> > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all
> > > prior
> > > > > > MVCC
> > > > > > > > transactions to finish
> > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> transaction
> > > > > > > > 4. get previous values
> > > > > > > > 5. create KVs
> > > > > > > > 6. write to Memstore
> > > > > > > > 7. write to WAL
> > > > > > > > 8. release row lock
> > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> > > > > transaction
> > > > > > > >
> > > > > > > > A instance of MultiVersionConsistencyControl has a pending
> > queue
> > > of
> > > > > > > writes
> > > > > > > > named writeQueue.
> > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
> > > writeQueue
> > > > > is
> > > > > > > > empty or writeQueue.getFirst() == w.
> > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the
> > > > > > WriteEntry
> > > > > > > > from writeQueue.
> > > > > > > >
> > > > > > > > I think that when a handler thread is processing between
> step 2
> > > and
> > > > > > step
> > > > > > > 9,
> > > > > > > > the other handler threads can wait until the thread completes
> > > step
> > > > 9.
> > > > > > > >
> > > > > > > >
> > > > > > > That is right. We need to read, after all outstanding updates
> are
> > > > > done...
> > > > > > > because we need to read the latest update before we go to
> > > > > > modify/increment
> > > > > > > it.
> > > > > > >
> > > > > > > How do you make out this?
> > > > > > >
> > > > > > > "A region lock (not a row lock) seems to occur in
> > > > > > > waitForPreviousTransactionsComplete()."
> > > > > > >
> > > > > > > In 0.98.x we did this:
> > > > > > >
> > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > > >
> > > > > > > ... and in 1.0 we do this:
> > > > > > >
> > > > > > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > > > > > >
> > > > > > > +  public void waitForPreviousTransactionsComplete() {
> > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > > > +  }
> > > > > > >
> > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous
> mvcc
> > > and
> > > > > > > region
> > > > > > > sequenceid would spin independent of each other. Perhaps this
> > > > > responsible
> > > > > > > for some slow down.
> > > > > > >
> > > > > > > That said, looking in your thread dump, we seem to be down in
> the
> > > > Get.
> > > > > If
> > > > > > > you do a bunch of thread dumps in a row, where is the
> > lock-holding
> > > > > > thread?
> > > > > > > In Get or writing Increment... or waiting on sequence id?
> > > > > > >
> > > > > > > Is it possible you are contending on a counter post-upgrade?
> Is
> > it
> > > > > > > possible that all these threads are trying to get to the same
> row
> > > to
> > > > > > update
> > > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > > increment
> > > > > > > itself has slowed significantly?
> > > > > > >
> > > > > > > St.Ack
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Toshihiro Suzuki
> > > > > > > >
> > > > > > > >
> > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > > > > > > >
> > > > > > > > > In HRegion#increment(), we lock the row (not region):
> > > > > > > > >
> > > > > > > > >     try {
> > > > > > > > >       rowLock = getRowLock(row);
> > > > > > > > >
> > > > > > > > > Can you pastebin the complete stack trace ?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com>
> > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > > and we experience slowdown in increment operation.
> > > > > > > > > >
> > > > > > > > > > Here's an extract from thread dump of the RegionServer of
> > our
> > > > > > > cluster:
> > > > > > > > > >
> > > > > > > > > > Thread 68
> > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > > > > >   State: BLOCKED
> > > > > > > > > >   Blocked count: 21689888
> > > > > > > > > >   Waited count: 39828360
> > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > > > > >   Blocked by 63
> > > > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > > > > >   Stack:
> > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > > > > >
> > > > >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > > > > >
> > > > >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > > > > >
> > > > > > >
> > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > > > > >
> > > > > > > > > > There are many similar threads in the thread dump.
> > > > > > > > > >
> > > > > > > > > > I read the source code and I think this is caused by
> > changes
> > > of
> > > > > > > > > > MultiVersionConsistencyControl.
> > > > > > > > > > A region lock (not a row lock) seems to occur in
> > > > > > > > > > waitForPreviousTransactionsComplete().
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Also we wrote performance test code for increment
> operation
> > > > that
> > > > > > > > included
> > > > > > > > > > 100 threads and ran it in local mode.
> > > > > > > > > >
> > > > > > > > > > The result is shown below:
> > > > > > > > > >
> > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > > > > > >
> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Toshihiro Suzuki
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Rollback is untested. No fix in 5.5. I was going to work on this now. Where
are your counters Bryan? In their own column family or scattered about in a
row with other Cell types?
St.Ack

On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> Is there any update to this? We just upgraded all of our production
> clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
> known issues, did not not about this.  Now we are seeing perfomance issues
> across all clusters, as we make heavy use of increments.
>
> Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back to
> CDH 5.3.1 (if that is possible)?
>
>
> On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:
>
> > Thank you St.Ack!
> >
> > I would like to follow the ticket.
> >
> > Toshihiro Suzuki
> >
> > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
> >
> > > Back to this problem. Simple tests confirm that as is, the
> > > single-queue-backed MVCC instance can slow Region ops if some other row
> > is
> > > slow to complete. In particular Increment, checkAndPut, and batch
> > mutations
> > > are effected. I opened HBASE-14460 to start in on a fix up. Lets see if
> > we
> > > can somehow scope mvcc to row or at least shard mvcc so not all Region
> > ops
> > > are paused.
> > >
> > > St.Ack
> > >
> > >
> > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > >
> > > > > Thank you for the below reasoning (with accompanying helpful
> > diagram).
> > > > > Makes sense. Let me hack up a test case to help with the
> > illustration.
> > > It
> > > > > is as though the mvcc should be scoped to a row only... Writes
> > against
> > > > > other rows should not hold up my read of my row. Tag an mvcc with a
> > > 'row'
> > > > > scope so we can see which on-going writes pertain to current
> > operation?
> > > > Thank you St.Ack! I think this approach would work.
> > > >
> > > > > You need to read back the increment and have it be 'correct' at
> > > increment
> > > > > time?
> > > > Yes, we need it.
> > > >
> > > > I would like to help if there is anything I can do.
> > > >
> > > > Thanks,
> > > > Toshihiro Suzuki
> > > >
> > > >
> > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > > >
> > > > > Thank you for the below reasoning (with accompanying helpful
> > diagram).
> > > > > Makes sense. Let me hack up a test case to help with the
> > illustration.
> > > It
> > > > > is as though the mvcc should be scoped to a row only... Writes
> > against
> > > > > other rows should not hold up my read of my row. Tag an mvcc with a
> > > 'row'
> > > > > scope so we can see which on-going writes pertain to current
> > operation?
> > > > >
> > > > > You need to read back the increment and have it be 'correct' at
> > > increment
> > > > > time?
> > > > >
> > > > > (This is a good one)
> > > > >
> > > > > Thank you Toshihiro Suzuki
> > > > > St.Ack
> > > > >
> > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > > >
> > > > > > St.Ack,
> > > > > >
> > > > > > Thank you for your response.
> > > > > >
> > > > > > Why I make out that "A region lock (not a row lock) seems to
> occur
> > in
> > > > > > waitForPreviousTransactionsComplete()" is as follows:
> > > > > >
> > > > > > A increment operation has 3 procedures for MVCC.
> > > > > >
> > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > > >
> > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > > >
> > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > > >
> > > > > >
> > > > > > I think that MultiVersionConsistencyControl's writeQueue can
> cause
> > a
> > > > > region
> > > > > > lock.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > > >
> > > > > >
> > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > > >
> > > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > > >
> > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > > >
> > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue
> > and
> > > > > waits
> > > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > > >
> > > > > >
> > > > > > I think when a handler thread is processing between step 2 and
> step
> > > 3,
> > > > > the
> > > > > > other handler threads can wait at step 1 until the thread
> completes
> > > > step
> > > > > 3
> > > > > > This is depicted as follows:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > > >
> > > > > >
> > > > > > Actually, in the thread dump of our region server, many handler
> > > threads
> > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > > > (waitForPreviousTransactionsComplete()).
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > > >
> > > > > > Many handler threads wait at this:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > > >
> > > > > >
> > > > > > > Is it possible you are contending on a counter post-upgrade?
> Is
> > it
> > > > > > > possible that all these threads are trying to get to the same
> row
> > > to
> > > > > > update
> > > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > > increment
> > > > > > > itself has slowed significantly?
> > > > > > We have just upgraded HBase, not changed the app behavior. We are
> > > > > thinking
> > > > > > increment itself has slowed significantly.
> > > > > > Before upgrading HBase, it was good throughput and latency.
> > > > > > Currently, to cope with this problem, we split the regions
> finely.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Toshihiro Suzuki
> > > > > >
> > > > > >
> > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > > > >
> > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Ted,
> > > > > > > >
> > > > > > > > Thank you for your response.
> > > > > > > >
> > > > > > > > I uploaded the complete stack trace to Gist.
> > > > > > > >
> > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > > > >
> > > > > > > >
> > > > > > > > I think that increment operation works as follows:
> > > > > > > >
> > > > > > > > 1. get row lock
> > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all
> > > prior
> > > > > > MVCC
> > > > > > > > transactions to finish
> > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a
> transaction
> > > > > > > > 4. get previous values
> > > > > > > > 5. create KVs
> > > > > > > > 6. write to Memstore
> > > > > > > > 7. write to WAL
> > > > > > > > 8. release row lock
> > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> > > > > transaction
> > > > > > > >
> > > > > > > > A instance of MultiVersionConsistencyControl has a pending
> > queue
> > > of
> > > > > > > writes
> > > > > > > > named writeQueue.
> > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
> > > writeQueue
> > > > > is
> > > > > > > > empty or writeQueue.getFirst() == w.
> > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the
> > > > > > WriteEntry
> > > > > > > > from writeQueue.
> > > > > > > >
> > > > > > > > I think that when a handler thread is processing between
> step 2
> > > and
> > > > > > step
> > > > > > > 9,
> > > > > > > > the other handler threads can wait until the thread completes
> > > step
> > > > 9.
> > > > > > > >
> > > > > > > >
> > > > > > > That is right. We need to read, after all outstanding updates
> are
> > > > > done...
> > > > > > > because we need to read the latest update before we go to
> > > > > > modify/increment
> > > > > > > it.
> > > > > > >
> > > > > > > How do you make out this?
> > > > > > >
> > > > > > > "A region lock (not a row lock) seems to occur in
> > > > > > > waitForPreviousTransactionsComplete()."
> > > > > > >
> > > > > > > In 0.98.x we did this:
> > > > > > >
> > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > > >
> > > > > > > ... and in 1.0 we do this:
> > > > > > >
> > > > > > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > > > > > >
> > > > > > > +  public void waitForPreviousTransactionsComplete() {
> > > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > > > +  }
> > > > > > >
> > > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous
> mvcc
> > > and
> > > > > > > region
> > > > > > > sequenceid would spin independent of each other. Perhaps this
> > > > > responsible
> > > > > > > for some slow down.
> > > > > > >
> > > > > > > That said, looking in your thread dump, we seem to be down in
> the
> > > > Get.
> > > > > If
> > > > > > > you do a bunch of thread dumps in a row, where is the
> > lock-holding
> > > > > > thread?
> > > > > > > In Get or writing Increment... or waiting on sequence id?
> > > > > > >
> > > > > > > Is it possible you are contending on a counter post-upgrade?
> Is
> > it
> > > > > > > possible that all these threads are trying to get to the same
> row
> > > to
> > > > > > update
> > > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > > increment
> > > > > > > itself has slowed significantly?
> > > > > > >
> > > > > > > St.Ack
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Toshihiro Suzuki
> > > > > > > >
> > > > > > > >
> > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > > > > > > >
> > > > > > > > > In HRegion#increment(), we lock the row (not region):
> > > > > > > > >
> > > > > > > > >     try {
> > > > > > > > >       rowLock = getRowLock(row);
> > > > > > > > >
> > > > > > > > > Can you pastebin the complete stack trace ?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com>
> > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > > and we experience slowdown in increment operation.
> > > > > > > > > >
> > > > > > > > > > Here's an extract from thread dump of the RegionServer of
> > our
> > > > > > > cluster:
> > > > > > > > > >
> > > > > > > > > > Thread 68
> > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > > > > >   State: BLOCKED
> > > > > > > > > >   Blocked count: 21689888
> > > > > > > > > >   Waited count: 39828360
> > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > > > > >   Blocked by 63
> > > > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > > > > >   Stack:
> > > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > > > > >
> > > > >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > > > > >
> > > > >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > > > > >
> > > > > > >
> > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > > > > >
> > > > > > > > > > There are many similar threads in the thread dump.
> > > > > > > > > >
> > > > > > > > > > I read the source code and I think this is caused by
> > changes
> > > of
> > > > > > > > > > MultiVersionConsistencyControl.
> > > > > > > > > > A region lock (not a row lock) seems to occur in
> > > > > > > > > > waitForPreviousTransactionsComplete().
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Also we wrote performance test code for increment
> operation
> > > > that
> > > > > > > > included
> > > > > > > > > > 100 threads and ran it in local mode.
> > > > > > > > > >
> > > > > > > > > > The result is shown below:
> > > > > > > > > >
> > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > > > > > >
> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Toshihiro Suzuki
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Is there any update to this? We just upgraded all of our production
clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
known issues, did not not about this.  Now we are seeing perfomance issues
across all clusters, as we make heavy use of increments.

Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back to
CDH 5.3.1 (if that is possible)?


On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:

> Thank you St.Ack!
>
> I would like to follow the ticket.
>
> Toshihiro Suzuki
>
> 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
>
> > Back to this problem. Simple tests confirm that as is, the
> > single-queue-backed MVCC instance can slow Region ops if some other row
> is
> > slow to complete. In particular Increment, checkAndPut, and batch
> mutations
> > are effected. I opened HBASE-14460 to start in on a fix up. Lets see if
> we
> > can somehow scope mvcc to row or at least shard mvcc so not all Region
> ops
> > are paused.
> >
> > St.Ack
> >
> >
> > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> >
> > > > Thank you for the below reasoning (with accompanying helpful
> diagram).
> > > > Makes sense. Let me hack up a test case to help with the
> illustration.
> > It
> > > > is as though the mvcc should be scoped to a row only... Writes
> against
> > > > other rows should not hold up my read of my row. Tag an mvcc with a
> > 'row'
> > > > scope so we can see which on-going writes pertain to current
> operation?
> > > Thank you St.Ack! I think this approach would work.
> > >
> > > > You need to read back the increment and have it be 'correct' at
> > increment
> > > > time?
> > > Yes, we need it.
> > >
> > > I would like to help if there is anything I can do.
> > >
> > > Thanks,
> > > Toshihiro Suzuki
> > >
> > >
> > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > >
> > > > Thank you for the below reasoning (with accompanying helpful
> diagram).
> > > > Makes sense. Let me hack up a test case to help with the
> illustration.
> > It
> > > > is as though the mvcc should be scoped to a row only... Writes
> against
> > > > other rows should not hold up my read of my row. Tag an mvcc with a
> > 'row'
> > > > scope so we can see which on-going writes pertain to current
> operation?
> > > >
> > > > You need to read back the increment and have it be 'correct' at
> > increment
> > > > time?
> > > >
> > > > (This is a good one)
> > > >
> > > > Thank you Toshihiro Suzuki
> > > > St.Ack
> > > >
> > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > >
> > > > > St.Ack,
> > > > >
> > > > > Thank you for your response.
> > > > >
> > > > > Why I make out that "A region lock (not a row lock) seems to occur
> in
> > > > > waitForPreviousTransactionsComplete()" is as follows:
> > > > >
> > > > > A increment operation has 3 procedures for MVCC.
> > > > >
> > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > >
> > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > >
> > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > >
> > > > >
> > > > > I think that MultiVersionConsistencyControl's writeQueue can cause
> a
> > > > region
> > > > > lock.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > >
> > > > >
> > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > >
> > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > >
> > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > >
> > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue
> and
> > > > waits
> > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > >
> > > > >
> > > > > I think when a handler thread is processing between step 2 and step
> > 3,
> > > > the
> > > > > other handler threads can wait at step 1 until the thread completes
> > > step
> > > > 3
> > > > > This is depicted as follows:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > >
> > > > >
> > > > > Actually, in the thread dump of our region server, many handler
> > threads
> > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > > (waitForPreviousTransactionsComplete()).
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > >
> > > > > Many handler threads wait at this:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > >
> > > > >
> > > > > > Is it possible you are contending on a counter post-upgrade?  Is
> it
> > > > > > possible that all these threads are trying to get to the same row
> > to
> > > > > update
> > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > increment
> > > > > > itself has slowed significantly?
> > > > > We have just upgraded HBase, not changed the app behavior. We are
> > > > thinking
> > > > > increment itself has slowed significantly.
> > > > > Before upgrading HBase, it was good throughput and latency.
> > > > > Currently, to cope with this problem, we split the regions finely.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Toshihiro Suzuki
> > > > >
> > > > >
> > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > > >
> > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Ted,
> > > > > > >
> > > > > > > Thank you for your response.
> > > > > > >
> > > > > > > I uploaded the complete stack trace to Gist.
> > > > > > >
> > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > > >
> > > > > > >
> > > > > > > I think that increment operation works as follows:
> > > > > > >
> > > > > > > 1. get row lock
> > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all
> > prior
> > > > > MVCC
> > > > > > > transactions to finish
> > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> > > > > > > 4. get previous values
> > > > > > > 5. create KVs
> > > > > > > 6. write to Memstore
> > > > > > > 7. write to WAL
> > > > > > > 8. release row lock
> > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> > > > transaction
> > > > > > >
> > > > > > > A instance of MultiVersionConsistencyControl has a pending
> queue
> > of
> > > > > > writes
> > > > > > > named writeQueue.
> > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
> > writeQueue
> > > > is
> > > > > > > empty or writeQueue.getFirst() == w.
> > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the
> > > > > WriteEntry
> > > > > > > from writeQueue.
> > > > > > >
> > > > > > > I think that when a handler thread is processing between step 2
> > and
> > > > > step
> > > > > > 9,
> > > > > > > the other handler threads can wait until the thread completes
> > step
> > > 9.
> > > > > > >
> > > > > > >
> > > > > > That is right. We need to read, after all outstanding updates are
> > > > done...
> > > > > > because we need to read the latest update before we go to
> > > > > modify/increment
> > > > > > it.
> > > > > >
> > > > > > How do you make out this?
> > > > > >
> > > > > > "A region lock (not a row lock) seems to occur in
> > > > > > waitForPreviousTransactionsComplete()."
> > > > > >
> > > > > > In 0.98.x we did this:
> > > > > >
> > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > >
> > > > > > ... and in 1.0 we do this:
> > > > > >
> > > > > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > > > > >
> > > > > > +  public void waitForPreviousTransactionsComplete() {
> > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > > +  }
> > > > > >
> > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc
> > and
> > > > > > region
> > > > > > sequenceid would spin independent of each other. Perhaps this
> > > > responsible
> > > > > > for some slow down.
> > > > > >
> > > > > > That said, looking in your thread dump, we seem to be down in the
> > > Get.
> > > > If
> > > > > > you do a bunch of thread dumps in a row, where is the
> lock-holding
> > > > > thread?
> > > > > > In Get or writing Increment... or waiting on sequence id?
> > > > > >
> > > > > > Is it possible you are contending on a counter post-upgrade?  Is
> it
> > > > > > possible that all these threads are trying to get to the same row
> > to
> > > > > update
> > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > increment
> > > > > > itself has slowed significantly?
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Toshihiro Suzuki
> > > > > > >
> > > > > > >
> > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > > > > > >
> > > > > > > > In HRegion#increment(), we lock the row (not region):
> > > > > > > >
> > > > > > > >     try {
> > > > > > > >       rowLock = getRowLock(row);
> > > > > > > >
> > > > > > > > Can you pastebin the complete stack trace ?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com>
> > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > and we experience slowdown in increment operation.
> > > > > > > > >
> > > > > > > > > Here's an extract from thread dump of the RegionServer of
> our
> > > > > > cluster:
> > > > > > > > >
> > > > > > > > > Thread 68
> > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > > > >   State: BLOCKED
> > > > > > > > >   Blocked count: 21689888
> > > > > > > > >   Waited count: 39828360
> > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > > > >   Blocked by 63
> > > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > > > >   Stack:
> > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > > > >
> > > >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > > > >
> > > >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > > > >
> > > > > >
> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > > > >
> > > > > > > > > There are many similar threads in the thread dump.
> > > > > > > > >
> > > > > > > > > I read the source code and I think this is caused by
> changes
> > of
> > > > > > > > > MultiVersionConsistencyControl.
> > > > > > > > > A region lock (not a row lock) seems to occur in
> > > > > > > > > waitForPreviousTransactionsComplete().
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Also we wrote performance test code for increment operation
> > > that
> > > > > > > included
> > > > > > > > > 100 threads and ran it in local mode.
> > > > > > > > >
> > > > > > > > > The result is shown below:
> > > > > > > > >
> > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > > > > >
> > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Toshihiro Suzuki
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

Is there any update to this? We just upgraded all of our production
clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
known issues, did not not about this.  Now we are seeing perfomance issues
across all clusters, as we make heavy use of increments.

Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back to
CDH 5.3.1 (if that is possible)?


On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <br...@gmail.com> wrote:

> Thank you St.Ack!
>
> I would like to follow the ticket.
>
> Toshihiro Suzuki
>
> 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:
>
> > Back to this problem. Simple tests confirm that as is, the
> > single-queue-backed MVCC instance can slow Region ops if some other row
> is
> > slow to complete. In particular Increment, checkAndPut, and batch
> mutations
> > are effected. I opened HBASE-14460 to start in on a fix up. Lets see if
> we
> > can somehow scope mvcc to row or at least shard mvcc so not all Region
> ops
> > are paused.
> >
> > St.Ack
> >
> >
> > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> >
> > > > Thank you for the below reasoning (with accompanying helpful
> diagram).
> > > > Makes sense. Let me hack up a test case to help with the
> illustration.
> > It
> > > > is as though the mvcc should be scoped to a row only... Writes
> against
> > > > other rows should not hold up my read of my row. Tag an mvcc with a
> > 'row'
> > > > scope so we can see which on-going writes pertain to current
> operation?
> > > Thank you St.Ack! I think this approach would work.
> > >
> > > > You need to read back the increment and have it be 'correct' at
> > increment
> > > > time?
> > > Yes, we need it.
> > >
> > > I would like to help if there is anything I can do.
> > >
> > > Thanks,
> > > Toshihiro Suzuki
> > >
> > >
> > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> > >
> > > > Thank you for the below reasoning (with accompanying helpful
> diagram).
> > > > Makes sense. Let me hack up a test case to help with the
> illustration.
> > It
> > > > is as though the mvcc should be scoped to a row only... Writes
> against
> > > > other rows should not hold up my read of my row. Tag an mvcc with a
> > 'row'
> > > > scope so we can see which on-going writes pertain to current
> operation?
> > > >
> > > > You need to read back the increment and have it be 'correct' at
> > increment
> > > > time?
> > > >
> > > > (This is a good one)
> > > >
> > > > Thank you Toshihiro Suzuki
> > > > St.Ack
> > > >
> > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > >
> > > > > St.Ack,
> > > > >
> > > > > Thank you for your response.
> > > > >
> > > > > Why I make out that "A region lock (not a row lock) seems to occur
> in
> > > > > waitForPreviousTransactionsComplete()" is as follows:
> > > > >
> > > > > A increment operation has 3 procedures for MVCC.
> > > > >
> > > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > >
> > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > >
> > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > >
> > > > >
> > > > > I think that MultiVersionConsistencyControl's writeQueue can cause
> a
> > > > region
> > > > > lock.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > >
> > > > >
> > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > >
> > > > > Step 3 removes the WriteEntry from writeQueue.
> > > > >
> > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > >
> > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue
> and
> > > > waits
> > > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > >
> > > > >
> > > > > I think when a handler thread is processing between step 2 and step
> > 3,
> > > > the
> > > > > other handler threads can wait at step 1 until the thread completes
> > > step
> > > > 3
> > > > > This is depicted as follows:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > >
> > > > >
> > > > > Actually, in the thread dump of our region server, many handler
> > threads
> > > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > > (waitForPreviousTransactionsComplete()).
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > >
> > > > > Many handler threads wait at this:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > >
> > > > >
> > > > > > Is it possible you are contending on a counter post-upgrade?  Is
> it
> > > > > > possible that all these threads are trying to get to the same row
> > to
> > > > > update
> > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > increment
> > > > > > itself has slowed significantly?
> > > > > We have just upgraded HBase, not changed the app behavior. We are
> > > > thinking
> > > > > increment itself has slowed significantly.
> > > > > Before upgrading HBase, it was good throughput and latency.
> > > > > Currently, to cope with this problem, we split the regions finely.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Toshihiro Suzuki
> > > > >
> > > > >
> > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > > >
> > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Ted,
> > > > > > >
> > > > > > > Thank you for your response.
> > > > > > >
> > > > > > > I uploaded the complete stack trace to Gist.
> > > > > > >
> > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > > >
> > > > > > >
> > > > > > > I think that increment operation works as follows:
> > > > > > >
> > > > > > > 1. get row lock
> > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all
> > prior
> > > > > MVCC
> > > > > > > transactions to finish
> > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> > > > > > > 4. get previous values
> > > > > > > 5. create KVs
> > > > > > > 6. write to Memstore
> > > > > > > 7. write to WAL
> > > > > > > 8. release row lock
> > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> > > > transaction
> > > > > > >
> > > > > > > A instance of MultiVersionConsistencyControl has a pending
> queue
> > of
> > > > > > writes
> > > > > > > named writeQueue.
> > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
> > writeQueue
> > > > is
> > > > > > > empty or writeQueue.getFirst() == w.
> > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the
> > > > > WriteEntry
> > > > > > > from writeQueue.
> > > > > > >
> > > > > > > I think that when a handler thread is processing between step 2
> > and
> > > > > step
> > > > > > 9,
> > > > > > > the other handler threads can wait until the thread completes
> > step
> > > 9.
> > > > > > >
> > > > > > >
> > > > > > That is right. We need to read, after all outstanding updates are
> > > > done...
> > > > > > because we need to read the latest update before we go to
> > > > > modify/increment
> > > > > > it.
> > > > > >
> > > > > > How do you make out this?
> > > > > >
> > > > > > "A region lock (not a row lock) seems to occur in
> > > > > > waitForPreviousTransactionsComplete()."
> > > > > >
> > > > > > In 0.98.x we did this:
> > > > > >
> > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > > >
> > > > > > ... and in 1.0 we do this:
> > > > > >
> > > > > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > > > > >
> > > > > > +  public void waitForPreviousTransactionsComplete() {
> > > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > > +  }
> > > > > >
> > > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc
> > and
> > > > > > region
> > > > > > sequenceid would spin independent of each other. Perhaps this
> > > > responsible
> > > > > > for some slow down.
> > > > > >
> > > > > > That said, looking in your thread dump, we seem to be down in the
> > > Get.
> > > > If
> > > > > > you do a bunch of thread dumps in a row, where is the
> lock-holding
> > > > > thread?
> > > > > > In Get or writing Increment... or waiting on sequence id?
> > > > > >
> > > > > > Is it possible you are contending on a counter post-upgrade?  Is
> it
> > > > > > possible that all these threads are trying to get to the same row
> > to
> > > > > update
> > > > > > it? Could the app behavior have changed?  Or are you thinking
> > > increment
> > > > > > itself has slowed significantly?
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Toshihiro Suzuki
> > > > > > >
> > > > > > >
> > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > > > > > >
> > > > > > > > In HRegion#increment(), we lock the row (not region):
> > > > > > > >
> > > > > > > >     try {
> > > > > > > >       rowLock = getRowLock(row);
> > > > > > > >
> > > > > > > > Can you pastebin the complete stack trace ?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com>
> > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > and we experience slowdown in increment operation.
> > > > > > > > >
> > > > > > > > > Here's an extract from thread dump of the RegionServer of
> our
> > > > > > cluster:
> > > > > > > > >
> > > > > > > > > Thread 68
> > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > > > >   State: BLOCKED
> > > > > > > > >   Blocked count: 21689888
> > > > > > > > >   Waited count: 39828360
> > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > > > >   Blocked by 63
> > > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > > > >   Stack:
> > > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > > > >
> > > >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > > > >
> > > >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > > > >
> > > > > >
> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > > > >
> > > > > > > > > There are many similar threads in the thread dump.
> > > > > > > > >
> > > > > > > > > I read the source code and I think this is caused by
> changes
> > of
> > > > > > > > > MultiVersionConsistencyControl.
> > > > > > > > > A region lock (not a row lock) seems to occur in
> > > > > > > > > waitForPreviousTransactionsComplete().
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Also we wrote performance test code for increment operation
> > > that
> > > > > > > included
> > > > > > > > > 100 threads and ran it in local mode.
> > > > > > > > >
> > > > > > > > > The result is shown below:
> > > > > > > > >
> > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > > > > >
> > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Toshihiro Suzuki
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by 鈴木俊裕 <br...@gmail.com>.

Thank you St.Ack!

I would like to follow the ticket.

Toshihiro Suzuki

2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>:

> Back to this problem. Simple tests confirm that as is, the
> single-queue-backed MVCC instance can slow Region ops if some other row is
> slow to complete. In particular Increment, checkAndPut, and batch mutations
> are effected. I opened HBASE-14460 to start in on a fix up. Lets see if we
> can somehow scope mvcc to row or at least shard mvcc so not all Region ops
> are paused.
>
> St.Ack
>
>
> On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:
>
> > > Thank you for the below reasoning (with accompanying helpful diagram).
> > > Makes sense. Let me hack up a test case to help with the illustration.
> It
> > > is as though the mvcc should be scoped to a row only... Writes against
> > > other rows should not hold up my read of my row. Tag an mvcc with a
> 'row'
> > > scope so we can see which on-going writes pertain to current operation?
> > Thank you St.Ack! I think this approach would work.
> >
> > > You need to read back the increment and have it be 'correct' at
> increment
> > > time?
> > Yes, we need it.
> >
> > I would like to help if there is anything I can do.
> >
> > Thanks,
> > Toshihiro Suzuki
> >
> >
> > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
> >
> > > Thank you for the below reasoning (with accompanying helpful diagram).
> > > Makes sense. Let me hack up a test case to help with the illustration.
> It
> > > is as though the mvcc should be scoped to a row only... Writes against
> > > other rows should not hold up my read of my row. Tag an mvcc with a
> 'row'
> > > scope so we can see which on-going writes pertain to current operation?
> > >
> > > You need to read back the increment and have it be 'correct' at
> increment
> > > time?
> > >
> > > (This is a good one)
> > >
> > > Thank you Toshihiro Suzuki
> > > St.Ack
> > >
> > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > >
> > > > St.Ack,
> > > >
> > > > Thank you for your response.
> > > >
> > > > Why I make out that "A region lock (not a row lock) seems to occur in
> > > > waitForPreviousTransactionsComplete()" is as follows:
> > > >
> > > > A increment operation has 3 procedures for MVCC.
> > > >
> > > > 1. mvcc.waitForPreviousTransactionsComplete();
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > >
> > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > >
> > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > >
> > > >
> > > > I think that MultiVersionConsistencyControl's writeQueue can cause a
> > > region
> > > > lock.
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > >
> > > >
> > > > Step 2 adds to a WriteEntry to writeQueue.
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > >
> > > > Step 3 removes the WriteEntry from writeQueue.
> > > >
> > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > >
> > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue and
> > > waits
> > > > until writeQueue is empty or writeQueue.getFirst() == w.
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > >
> > > >
> > > > I think when a handler thread is processing between step 2 and step
> 3,
> > > the
> > > > other handler threads can wait at step 1 until the thread completes
> > step
> > > 3
> > > > This is depicted as follows:
> > > >
> > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > >
> > > >
> > > > Actually, in the thread dump of our region server, many handler
> threads
> > > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > > (waitForPreviousTransactionsComplete()).
> > > >
> > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > >
> > > > Many handler threads wait at this:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > >
> > > >
> > > > > Is it possible you are contending on a counter post-upgrade?  Is it
> > > > > possible that all these threads are trying to get to the same row
> to
> > > > update
> > > > > it? Could the app behavior have changed?  Or are you thinking
> > increment
> > > > > itself has slowed significantly?
> > > > We have just upgraded HBase, not changed the app behavior. We are
> > > thinking
> > > > increment itself has slowed significantly.
> > > > Before upgrading HBase, it was good throughput and latency.
> > > > Currently, to cope with this problem, we split the regions finely.
> > > >
> > > > Thanks,
> > > >
> > > > Toshihiro Suzuki
> > > >
> > > >
> > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > > >
> > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > > >
> > > > > > Ted,
> > > > > >
> > > > > > Thank you for your response.
> > > > > >
> > > > > > I uploaded the complete stack trace to Gist.
> > > > > >
> > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > > >
> > > > > >
> > > > > > I think that increment operation works as follows:
> > > > > >
> > > > > > 1. get row lock
> > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all
> prior
> > > > MVCC
> > > > > > transactions to finish
> > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> > > > > > 4. get previous values
> > > > > > 5. create KVs
> > > > > > 6. write to Memstore
> > > > > > 7. write to WAL
> > > > > > 8. release row lock
> > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> > > transaction
> > > > > >
> > > > > > A instance of MultiVersionConsistencyControl has a pending queue
> of
> > > > > writes
> > > > > > named writeQueue.
> > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until
> writeQueue
> > > is
> > > > > > empty or writeQueue.getFirst() == w.
> > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the
> > > > WriteEntry
> > > > > > from writeQueue.
> > > > > >
> > > > > > I think that when a handler thread is processing between step 2
> and
> > > > step
> > > > > 9,
> > > > > > the other handler threads can wait until the thread completes
> step
> > 9.
> > > > > >
> > > > > >
> > > > > That is right. We need to read, after all outstanding updates are
> > > done...
> > > > > because we need to read the latest update before we go to
> > > > modify/increment
> > > > > it.
> > > > >
> > > > > How do you make out this?
> > > > >
> > > > > "A region lock (not a row lock) seems to occur in
> > > > > waitForPreviousTransactionsComplete()."
> > > > >
> > > > > In 0.98.x we did this:
> > > > >
> > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > >
> > > > > ... and in 1.0 we do this:
> > > > >
> > > > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > > > >
> > > > > +  public void waitForPreviousTransactionsComplete() {
> > > > > +    WriteEntry w = beginMemstoreInsert();
> > > > > +    waitForPreviousTransactionsComplete(w);
> > > > > +  }
> > > > >
> > > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc
> and
> > > > > region
> > > > > sequenceid would spin independent of each other. Perhaps this
> > > responsible
> > > > > for some slow down.
> > > > >
> > > > > That said, looking in your thread dump, we seem to be down in the
> > Get.
> > > If
> > > > > you do a bunch of thread dumps in a row, where is the lock-holding
> > > > thread?
> > > > > In Get or writing Increment... or waiting on sequence id?
> > > > >
> > > > > Is it possible you are contending on a counter post-upgrade?  Is it
> > > > > possible that all these threads are trying to get to the same row
> to
> > > > update
> > > > > it? Could the app behavior have changed?  Or are you thinking
> > increment
> > > > > itself has slowed significantly?
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Toshihiro Suzuki
> > > > > >
> > > > > >
> > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > > > > >
> > > > > > > In HRegion#increment(), we lock the row (not region):
> > > > > > >
> > > > > > >     try {
> > > > > > >       rowLock = getRowLock(row);
> > > > > > >
> > > > > > > Can you pastebin the complete stack trace ?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com>
> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > and we experience slowdown in increment operation.
> > > > > > > >
> > > > > > > > Here's an extract from thread dump of the RegionServer of our
> > > > > cluster:
> > > > > > > >
> > > > > > > > Thread 68
> > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > > >   State: BLOCKED
> > > > > > > >   Blocked count: 21689888
> > > > > > > >   Waited count: 39828360
> > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > > >   Blocked by 63
> > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > > >   Stack:
> > > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > > >
> > >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > > >
> > >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > > >
> > > > >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > > >
> > > > > > > > There are many similar threads in the thread dump.
> > > > > > > >
> > > > > > > > I read the source code and I think this is caused by changes
> of
> > > > > > > > MultiVersionConsistencyControl.
> > > > > > > > A region lock (not a row lock) seems to occur in
> > > > > > > > waitForPreviousTransactionsComplete().
> > > > > > > >
> > > > > > > >
> > > > > > > > Also we wrote performance test code for increment operation
> > that
> > > > > > included
> > > > > > > > 100 threads and ran it in local mode.
> > > > > > > >
> > > > > > > > The result is shown below:
> > > > > > > >
> > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > > > >
> > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Toshihiro Suzuki
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Back to this problem. Simple tests confirm that as is, the
single-queue-backed MVCC instance can slow Region ops if some other row is
slow to complete. In particular Increment, checkAndPut, and batch mutations
are effected. I opened HBASE-14460 to start in on a fix up. Lets see if we
can somehow scope mvcc to row or at least shard mvcc so not all Region ops
are paused.

St.Ack


On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <br...@gmail.com> wrote:

> > Thank you for the below reasoning (with accompanying helpful diagram).
> > Makes sense. Let me hack up a test case to help with the illustration. It
> > is as though the mvcc should be scoped to a row only... Writes against
> > other rows should not hold up my read of my row. Tag an mvcc with a 'row'
> > scope so we can see which on-going writes pertain to current operation?
> Thank you St.Ack! I think this approach would work.
>
> > You need to read back the increment and have it be 'correct' at increment
> > time?
> Yes, we need it.
>
> I would like to help if there is anything I can do.
>
> Thanks,
> Toshihiro Suzuki
>
>
> 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:
>
> > Thank you for the below reasoning (with accompanying helpful diagram).
> > Makes sense. Let me hack up a test case to help with the illustration. It
> > is as though the mvcc should be scoped to a row only... Writes against
> > other rows should not hold up my read of my row. Tag an mvcc with a 'row'
> > scope so we can see which on-going writes pertain to current operation?
> >
> > You need to read back the increment and have it be 'correct' at increment
> > time?
> >
> > (This is a good one)
> >
> > Thank you Toshihiro Suzuki
> > St.Ack
> >
> > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> >
> > > St.Ack,
> > >
> > > Thank you for your response.
> > >
> > > Why I make out that "A region lock (not a row lock) seems to occur in
> > > waitForPreviousTransactionsComplete()" is as follows:
> > >
> > > A increment operation has 3 procedures for MVCC.
> > >
> > > 1. mvcc.waitForPreviousTransactionsComplete();
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > >
> > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > >
> > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > >
> > >
> > > I think that MultiVersionConsistencyControl's writeQueue can cause a
> > region
> > > lock.
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > >
> > >
> > > Step 2 adds to a WriteEntry to writeQueue.
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > >
> > > Step 3 removes the WriteEntry from writeQueue.
> > >
> > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > >
> > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue and
> > waits
> > > until writeQueue is empty or writeQueue.getFirst() == w.
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > >
> > >
> > > I think when a handler thread is processing between step 2 and step 3,
> > the
> > > other handler threads can wait at step 1 until the thread completes
> step
> > 3
> > > This is depicted as follows:
> > >
> > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > >
> > >
> > > Actually, in the thread dump of our region server, many handler threads
> > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > (waitForPreviousTransactionsComplete()).
> > >
> > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > >
> > > Many handler threads wait at this:
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > >
> > >
> > > > Is it possible you are contending on a counter post-upgrade?  Is it
> > > > possible that all these threads are trying to get to the same row to
> > > update
> > > > it? Could the app behavior have changed?  Or are you thinking
> increment
> > > > itself has slowed significantly?
> > > We have just upgraded HBase, not changed the app behavior. We are
> > thinking
> > > increment itself has slowed significantly.
> > > Before upgrading HBase, it was good throughput and latency.
> > > Currently, to cope with this problem, we split the regions finely.
> > >
> > > Thanks,
> > >
> > > Toshihiro Suzuki
> > >
> > >
> > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> > >
> > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > >
> > > > > Ted,
> > > > >
> > > > > Thank you for your response.
> > > > >
> > > > > I uploaded the complete stack trace to Gist.
> > > > >
> > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > >
> > > > >
> > > > > I think that increment operation works as follows:
> > > > >
> > > > > 1. get row lock
> > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all prior
> > > MVCC
> > > > > transactions to finish
> > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> > > > > 4. get previous values
> > > > > 5. create KVs
> > > > > 6. write to Memstore
> > > > > 7. write to WAL
> > > > > 8. release row lock
> > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> > transaction
> > > > >
> > > > > A instance of MultiVersionConsistencyControl has a pending queue of
> > > > writes
> > > > > named writeQueue.
> > > > > Step 2 puts a WriteEntry w to writeQueue and waits until writeQueue
> > is
> > > > > empty or writeQueue.getFirst() == w.
> > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the
> > > WriteEntry
> > > > > from writeQueue.
> > > > >
> > > > > I think that when a handler thread is processing between step 2 and
> > > step
> > > > 9,
> > > > > the other handler threads can wait until the thread completes step
> 9.
> > > > >
> > > > >
> > > > That is right. We need to read, after all outstanding updates are
> > done...
> > > > because we need to read the latest update before we go to
> > > modify/increment
> > > > it.
> > > >
> > > > How do you make out this?
> > > >
> > > > "A region lock (not a row lock) seems to occur in
> > > > waitForPreviousTransactionsComplete()."
> > > >
> > > > In 0.98.x we did this:
> > > >
> > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > >
> > > > ... and in 1.0 we do this:
> > > >
> > > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > > >
> > > > +  public void waitForPreviousTransactionsComplete() {
> > > > +    WriteEntry w = beginMemstoreInsert();
> > > > +    waitForPreviousTransactionsComplete(w);
> > > > +  }
> > > >
> > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc and
> > > > region
> > > > sequenceid would spin independent of each other. Perhaps this
> > responsible
> > > > for some slow down.
> > > >
> > > > That said, looking in your thread dump, we seem to be down in the
> Get.
> > If
> > > > you do a bunch of thread dumps in a row, where is the lock-holding
> > > thread?
> > > > In Get or writing Increment... or waiting on sequence id?
> > > >
> > > > Is it possible you are contending on a counter post-upgrade?  Is it
> > > > possible that all these threads are trying to get to the same row to
> > > update
> > > > it? Could the app behavior have changed?  Or are you thinking
> increment
> > > > itself has slowed significantly?
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > > > Thanks,
> > > > >
> > > > > Toshihiro Suzuki
> > > > >
> > > > >
> > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > > > >
> > > > > > In HRegion#increment(), we lock the row (not region):
> > > > > >
> > > > > >     try {
> > > > > >       rowLock = getRowLock(row);
> > > > > >
> > > > > > Can you pastebin the complete stack trace ?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > and we experience slowdown in increment operation.
> > > > > > >
> > > > > > > Here's an extract from thread dump of the RegionServer of our
> > > > cluster:
> > > > > > >
> > > > > > > Thread 68
> > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > >   State: BLOCKED
> > > > > > >   Blocked count: 21689888
> > > > > > >   Waited count: 39828360
> > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > >   Blocked by 63
> > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > >   Stack:
> > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > >
> >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > >
> >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > >
> > > >  org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > >
> > > > > > > There are many similar threads in the thread dump.
> > > > > > >
> > > > > > > I read the source code and I think this is caused by changes of
> > > > > > > MultiVersionConsistencyControl.
> > > > > > > A region lock (not a row lock) seems to occur in
> > > > > > > waitForPreviousTransactionsComplete().
> > > > > > >
> > > > > > >
> > > > > > > Also we wrote performance test code for increment operation
> that
> > > > > included
> > > > > > > 100 threads and ran it in local mode.
> > > > > > >
> > > > > > > The result is shown below:
> > > > > > >
> > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > > >
> > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Toshihiro Suzuki
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by 鈴木俊裕 <br...@gmail.com>.

> Thank you for the below reasoning (with accompanying helpful diagram).
> Makes sense. Let me hack up a test case to help with the illustration. It
> is as though the mvcc should be scoped to a row only... Writes against
> other rows should not hold up my read of my row. Tag an mvcc with a 'row'
> scope so we can see which on-going writes pertain to current operation?
Thank you St.Ack! I think this approach would work.

> You need to read back the increment and have it be 'correct' at increment
> time?
Yes, we need it.

I would like to help if there is anything I can do.

Thanks,
Toshihiro Suzuki


2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>:

> Thank you for the below reasoning (with accompanying helpful diagram).
> Makes sense. Let me hack up a test case to help with the illustration. It
> is as though the mvcc should be scoped to a row only... Writes against
> other rows should not hold up my read of my row. Tag an mvcc with a 'row'
> scope so we can see which on-going writes pertain to current operation?
>
> You need to read back the increment and have it be 'correct' at increment
> time?
>
> (This is a good one)
>
> Thank you Toshihiro Suzuki
> St.Ack
>
> On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com> wrote:
>
> > St.Ack,
> >
> > Thank you for your response.
> >
> > Why I make out that "A region lock (not a row lock) seems to occur in
> > waitForPreviousTransactionsComplete()" is as follows:
> >
> > A increment operation has 3 procedures for MVCC.
> >
> > 1. mvcc.waitForPreviousTransactionsComplete();
> >
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> >
> > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> >
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> >
> > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> >
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> >
> >
> > I think that MultiVersionConsistencyControl's writeQueue can cause a
> region
> > lock.
> >
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> >
> >
> > Step 2 adds to a WriteEntry to writeQueue.
> >
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> >
> > Step 3 removes the WriteEntry from writeQueue.
> >
> > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> >
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> >
> > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue and
> waits
> > until writeQueue is empty or writeQueue.getFirst() == w.
> >
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> >
> >
> > I think when a handler thread is processing between step 2 and step 3,
> the
> > other handler threads can wait at step 1 until the thread completes step
> 3
> > This is depicted as follows:
> >
> >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> >
> >
> > Actually, in the thread dump of our region server, many handler threads
> > (RW.default.writeRpcServer.handler) wait at Step 1
> > (waitForPreviousTransactionsComplete()).
> >
> >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> >
> > Many handler threads wait at this:
> >
> >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> >
> >
> > > Is it possible you are contending on a counter post-upgrade?  Is it
> > > possible that all these threads are trying to get to the same row to
> > update
> > > it? Could the app behavior have changed?  Or are you thinking increment
> > > itself has slowed significantly?
> > We have just upgraded HBase, not changed the app behavior. We are
> thinking
> > increment itself has slowed significantly.
> > Before upgrading HBase, it was good throughput and latency.
> > Currently, to cope with this problem, we split the regions finely.
> >
> > Thanks,
> >
> > Toshihiro Suzuki
> >
> >
> > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
> >
> > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com> wrote:
> > >
> > > > Ted,
> > > >
> > > > Thank you for your response.
> > > >
> > > > I uploaded the complete stack trace to Gist.
> > > >
> > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > >
> > > >
> > > > I think that increment operation works as follows:
> > > >
> > > > 1. get row lock
> > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all prior
> > MVCC
> > > > transactions to finish
> > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> > > > 4. get previous values
> > > > 5. create KVs
> > > > 6. write to Memstore
> > > > 7. write to WAL
> > > > 8. release row lock
> > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> transaction
> > > >
> > > > A instance of MultiVersionConsistencyControl has a pending queue of
> > > writes
> > > > named writeQueue.
> > > > Step 2 puts a WriteEntry w to writeQueue and waits until writeQueue
> is
> > > > empty or writeQueue.getFirst() == w.
> > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the
> > WriteEntry
> > > > from writeQueue.
> > > >
> > > > I think that when a handler thread is processing between step 2 and
> > step
> > > 9,
> > > > the other handler threads can wait until the thread completes step 9.
> > > >
> > > >
> > > That is right. We need to read, after all outstanding updates are
> done...
> > > because we need to read the latest update before we go to
> > modify/increment
> > > it.
> > >
> > > How do you make out this?
> > >
> > > "A region lock (not a row lock) seems to occur in
> > > waitForPreviousTransactionsComplete()."
> > >
> > > In 0.98.x we did this:
> > >
> > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > >
> > > ... and in 1.0 we do this:
> > >
> > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > >
> > > +  public void waitForPreviousTransactionsComplete() {
> > > +    WriteEntry w = beginMemstoreInsert();
> > > +    waitForPreviousTransactionsComplete(w);
> > > +  }
> > >
> > > The mvcc and region sequenceid were merged in 1.0 (
> > > https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc and
> > > region
> > > sequenceid would spin independent of each other. Perhaps this
> responsible
> > > for some slow down.
> > >
> > > That said, looking in your thread dump, we seem to be down in the Get.
> If
> > > you do a bunch of thread dumps in a row, where is the lock-holding
> > thread?
> > > In Get or writing Increment... or waiting on sequence id?
> > >
> > > Is it possible you are contending on a counter post-upgrade?  Is it
> > > possible that all these threads are trying to get to the same row to
> > update
> > > it? Could the app behavior have changed?  Or are you thinking increment
> > > itself has slowed significantly?
> > >
> > > St.Ack
> > >
> > >
> > >
> > >
> > > > Thanks,
> > > >
> > > > Toshihiro Suzuki
> > > >
> > > >
> > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > > >
> > > > > In HRegion#increment(), we lock the row (not region):
> > > > >
> > > > >     try {
> > > > >       rowLock = getRowLock(row);
> > > > >
> > > > > Can you pastebin the complete stack trace ?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > CDH5.4.5(HBase1.0.0)
> > > > > > and we experience slowdown in increment operation.
> > > > > >
> > > > > > Here's an extract from thread dump of the RegionServer of our
> > > cluster:
> > > > > >
> > > > > > Thread 68
> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > >   State: BLOCKED
> > > > > >   Blocked count: 21689888
> > > > > >   Waited count: 39828360
> > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > >   Blocked by 63
> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > >   Stack:
> > > > > >     java.lang.Object.wait(Native Method)
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > >
>  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > >
>  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > >
> > >  org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > >
> > > > > > There are many similar threads in the thread dump.
> > > > > >
> > > > > > I read the source code and I think this is caused by changes of
> > > > > > MultiVersionConsistencyControl.
> > > > > > A region lock (not a row lock) seems to occur in
> > > > > > waitForPreviousTransactionsComplete().
> > > > > >
> > > > > >
> > > > > > Also we wrote performance test code for increment operation that
> > > > included
> > > > > > 100 threads and ran it in local mode.
> > > > > >
> > > > > > The result is shown below:
> > > > > >
> > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > >
> > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Toshihiro Suzuki
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Thank you for the below reasoning (with accompanying helpful diagram).
Makes sense. Let me hack up a test case to help with the illustration. It
is as though the mvcc should be scoped to a row only... Writes against
other rows should not hold up my read of my row. Tag an mvcc with a 'row'
scope so we can see which on-going writes pertain to current operation?

You need to read back the increment and have it be 'correct' at increment
time?

(This is a good one)

Thank you Toshihiro Suzuki
St.Ack

On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <br...@gmail.com> wrote:

> St.Ack,
>
> Thank you for your response.
>
> Why I make out that "A region lock (not a row lock) seems to occur in
> waitForPreviousTransactionsComplete()" is as follows:
>
> A increment operation has 3 procedures for MVCC.
>
> 1. mvcc.waitForPreviousTransactionsComplete();
>
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
>
> 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
>
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
>
> 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
>
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
>
>
> I think that MultiVersionConsistencyControl's writeQueue can cause a region
> lock.
>
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
>
>
> Step 2 adds to a WriteEntry to writeQueue.
>
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
>
> Step 3 removes the WriteEntry from writeQueue.
>
> completeMemstoreInsertWithSeqNum(w, walKey) ->
> waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
>
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
>
> Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue and waits
> until writeQueue is empty or writeQueue.getFirst() == w.
>
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
>
>
> I think when a handler thread is processing between step 2 and step 3, the
> other handler threads can wait at step 1 until the thread completes step 3
> This is depicted as follows:
>
>
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
>
>
> Actually, in the thread dump of our region server, many handler threads
> (RW.default.writeRpcServer.handler) wait at Step 1
> (waitForPreviousTransactionsComplete()).
>
>
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
>
> Many handler threads wait at this:
>
>
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
>
>
> > Is it possible you are contending on a counter post-upgrade?  Is it
> > possible that all these threads are trying to get to the same row to
> update
> > it? Could the app behavior have changed?  Or are you thinking increment
> > itself has slowed significantly?
> We have just upgraded HBase, not changed the app behavior. We are thinking
> increment itself has slowed significantly.
> Before upgrading HBase, it was good throughput and latency.
> Currently, to cope with this problem, we split the regions finely.
>
> Thanks,
>
> Toshihiro Suzuki
>
>
> 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:
>
> > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com> wrote:
> >
> > > Ted,
> > >
> > > Thank you for your response.
> > >
> > > I uploaded the complete stack trace to Gist.
> > >
> > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > >
> > >
> > > I think that increment operation works as follows:
> > >
> > > 1. get row lock
> > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all prior
> MVCC
> > > transactions to finish
> > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> > > 4. get previous values
> > > 5. create KVs
> > > 6. write to Memstore
> > > 7. write to WAL
> > > 8. release row lock
> > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the transaction
> > >
> > > A instance of MultiVersionConsistencyControl has a pending queue of
> > writes
> > > named writeQueue.
> > > Step 2 puts a WriteEntry w to writeQueue and waits until writeQueue is
> > > empty or writeQueue.getFirst() == w.
> > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the
> WriteEntry
> > > from writeQueue.
> > >
> > > I think that when a handler thread is processing between step 2 and
> step
> > 9,
> > > the other handler threads can wait until the thread completes step 9.
> > >
> > >
> > That is right. We need to read, after all outstanding updates are done...
> > because we need to read the latest update before we go to
> modify/increment
> > it.
> >
> > How do you make out this?
> >
> > "A region lock (not a row lock) seems to occur in
> > waitForPreviousTransactionsComplete()."
> >
> > In 0.98.x we did this:
> >
> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> >
> > ... and in 1.0 we do this:
> >
> > mvcc.waitForPreviousTransactionsComplete() which is this....
> >
> > +  public void waitForPreviousTransactionsComplete() {
> > +    WriteEntry w = beginMemstoreInsert();
> > +    waitForPreviousTransactionsComplete(w);
> > +  }
> >
> > The mvcc and region sequenceid were merged in 1.0 (
> > https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc and
> > region
> > sequenceid would spin independent of each other. Perhaps this responsible
> > for some slow down.
> >
> > That said, looking in your thread dump, we seem to be down in the Get. If
> > you do a bunch of thread dumps in a row, where is the lock-holding
> thread?
> > In Get or writing Increment... or waiting on sequence id?
> >
> > Is it possible you are contending on a counter post-upgrade?  Is it
> > possible that all these threads are trying to get to the same row to
> update
> > it? Could the app behavior have changed?  Or are you thinking increment
> > itself has slowed significantly?
> >
> > St.Ack
> >
> >
> >
> >
> > > Thanks,
> > >
> > > Toshihiro Suzuki
> > >
> > >
> > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> > >
> > > > In HRegion#increment(), we lock the row (not region):
> > > >
> > > >     try {
> > > >       rowLock = getRowLock(row);
> > > >
> > > > Can you pastebin the complete stack trace ?
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > CDH5.4.5(HBase1.0.0)
> > > > > and we experience slowdown in increment operation.
> > > > >
> > > > > Here's an extract from thread dump of the RegionServer of our
> > cluster:
> > > > >
> > > > > Thread 68
> (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > >   State: BLOCKED
> > > > >   Blocked count: 21689888
> > > > >   Waited count: 39828360
> > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > >   Blocked by 63
> > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > >   Stack:
> > > > >     java.lang.Object.wait(Native Method)
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > >
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > >     org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > >     org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > >
> >  org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > >     java.lang.Thread.run(Thread.java:745)
> > > > >
> > > > > There are many similar threads in the thread dump.
> > > > >
> > > > > I read the source code and I think this is caused by changes of
> > > > > MultiVersionConsistencyControl.
> > > > > A region lock (not a row lock) seems to occur in
> > > > > waitForPreviousTransactionsComplete().
> > > > >
> > > > >
> > > > > Also we wrote performance test code for increment operation that
> > > included
> > > > > 100 threads and ran it in local mode.
> > > > >
> > > > > The result is shown below:
> > > > >
> > > > > CDH5.3.1(HBase0.98.6)
> > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > >
> > > > > CDH5.4.5(HBase1.0.0)
> > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Toshihiro Suzuki
> > > > >
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by 鈴木俊裕 <br...@gmail.com>.

St.Ack,

Thank you for your response.

Why I make out that "A region lock (not a row lock) seems to occur in
waitForPreviousTransactionsComplete()" is as follows:

A increment operation has 3 procedures for MVCC.

1. mvcc.waitForPreviousTransactionsComplete();

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712

2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721

3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893


I think that MultiVersionConsistencyControl's writeQueue can cause a region
lock.

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43


Step 2 adds to a WriteEntry to writeQueue.

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108

Step 3 removes the WriteEntry from writeQueue.

completeMemstoreInsertWithSeqNum(w, walKey) ->
waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160

Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue and waits
until writeQueue is empty or writeQueue.getFirst() == w.

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241


I think when a handler thread is processing between step 2 and step 3, the
other handler threads can wait at step 1 until the thread completes step 3
This is depicted as follows:

https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png


Actually, in the thread dump of our region server, many handler threads
(RW.default.writeRpcServer.handler) wait at Step 1
(waitForPreviousTransactionsComplete()).

https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt

Many handler threads wait at this:

https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224


> Is it possible you are contending on a counter post-upgrade?  Is it
> possible that all these threads are trying to get to the same row to
update
> it? Could the app behavior have changed?  Or are you thinking increment
> itself has slowed significantly?
We have just upgraded HBase, not changed the app behavior. We are thinking
increment itself has slowed significantly.
Before upgrading HBase, it was good throughput and latency.
Currently, to cope with this problem, we split the regions finely.

Thanks,

Toshihiro Suzuki


2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>:

> On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com> wrote:
>
> > Ted,
> >
> > Thank you for your response.
> >
> > I uploaded the complete stack trace to Gist.
> >
> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> >
> >
> > I think that increment operation works as follows:
> >
> > 1. get row lock
> > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all prior MVCC
> > transactions to finish
> > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> > 4. get previous values
> > 5. create KVs
> > 6. write to Memstore
> > 7. write to WAL
> > 8. release row lock
> > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the transaction
> >
> > A instance of MultiVersionConsistencyControl has a pending queue of
> writes
> > named writeQueue.
> > Step 2 puts a WriteEntry w to writeQueue and waits until writeQueue is
> > empty or writeQueue.getFirst() == w.
> > Step 3 puts a WriteEntry to writeQueue and step 9 removes the WriteEntry
> > from writeQueue.
> >
> > I think that when a handler thread is processing between step 2 and step
> 9,
> > the other handler threads can wait until the thread completes step 9.
> >
> >
> That is right. We need to read, after all outstanding updates are done...
> because we need to read the latest update before we go to modify/increment
> it.
>
> How do you make out this?
>
> "A region lock (not a row lock) seems to occur in
> waitForPreviousTransactionsComplete()."
>
> In 0.98.x we did this:
>
> mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>
> ... and in 1.0 we do this:
>
> mvcc.waitForPreviousTransactionsComplete() which is this....
>
> +  public void waitForPreviousTransactionsComplete() {
> +    WriteEntry w = beginMemstoreInsert();
> +    waitForPreviousTransactionsComplete(w);
> +  }
>
> The mvcc and region sequenceid were merged in 1.0 (
> https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc and
> region
> sequenceid would spin independent of each other. Perhaps this responsible
> for some slow down.
>
> That said, looking in your thread dump, we seem to be down in the Get. If
> you do a bunch of thread dumps in a row, where is the lock-holding thread?
> In Get or writing Increment... or waiting on sequence id?
>
> Is it possible you are contending on a counter post-upgrade?  Is it
> possible that all these threads are trying to get to the same row to update
> it? Could the app behavior have changed?  Or are you thinking increment
> itself has slowed significantly?
>
> St.Ack
>
>
>
>
> > Thanks,
> >
> > Toshihiro Suzuki
> >
> >
> > 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
> >
> > > In HRegion#increment(), we lock the row (not region):
> > >
> > >     try {
> > >       rowLock = getRowLock(row);
> > >
> > > Can you pastebin the complete stack trace ?
> > >
> > > Thanks
> > >
> > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > CDH5.4.5(HBase1.0.0)
> > > > and we experience slowdown in increment operation.
> > > >
> > > > Here's an extract from thread dump of the RegionServer of our
> cluster:
> > > >
> > > > Thread 68 (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > >   State: BLOCKED
> > > >   Blocked count: 21689888
> > > >   Waited count: 39828360
> > > >   Blocked on java.util.LinkedList@3474e4b2
> > > >   Blocked by 63
> > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > >   Stack:
> > > >     java.lang.Object.wait(Native Method)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > >     org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > >     org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > >
>  org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > >     java.lang.Thread.run(Thread.java:745)
> > > >
> > > > There are many similar threads in the thread dump.
> > > >
> > > > I read the source code and I think this is caused by changes of
> > > > MultiVersionConsistencyControl.
> > > > A region lock (not a row lock) seems to occur in
> > > > waitForPreviousTransactionsComplete().
> > > >
> > > >
> > > > Also we wrote performance test code for increment operation that
> > included
> > > > 100 threads and ran it in local mode.
> > > >
> > > > The result is shown below:
> > > >
> > > > CDH5.3.1(HBase0.98.6)
> > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > >
> > > > CDH5.4.5(HBase1.0.0)
> > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Toshihiro Suzuki
> > > >
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <br...@gmail.com> wrote:

> Ted,
>
> Thank you for your response.
>
> I uploaded the complete stack trace to Gist.
>
> https://gist.github.com/brfrn169/cb4f2c157129330cd932
>
>
> I think that increment operation works as follows:
>
> 1. get row lock
> 2. mvcc.waitForPreviousTransactionsComplete() // wait for all prior MVCC
> transactions to finish
> 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> 4. get previous values
> 5. create KVs
> 6. write to Memstore
> 7. write to WAL
> 8. release row lock
> 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the transaction
>
> A instance of MultiVersionConsistencyControl has a pending queue of writes
> named writeQueue.
> Step 2 puts a WriteEntry w to writeQueue and waits until writeQueue is
> empty or writeQueue.getFirst() == w.
> Step 3 puts a WriteEntry to writeQueue and step 9 removes the WriteEntry
> from writeQueue.
>
> I think that when a handler thread is processing between step 2 and step 9,
> the other handler threads can wait until the thread completes step 9.
>
>
That is right. We need to read, after all outstanding updates are done...
because we need to read the latest update before we go to modify/increment
it.

How do you make out this?

"A region lock (not a row lock) seems to occur in
waitForPreviousTransactionsComplete()."

In 0.98.x we did this:

mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());

... and in 1.0 we do this:

mvcc.waitForPreviousTransactionsComplete() which is this....

+  public void waitForPreviousTransactionsComplete() {
+    WriteEntry w = beginMemstoreInsert();
+    waitForPreviousTransactionsComplete(w);
+  }

The mvcc and region sequenceid were merged in 1.0 (
https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc and region
sequenceid would spin independent of each other. Perhaps this responsible
for some slow down.

That said, looking in your thread dump, we seem to be down in the Get. If
you do a bunch of thread dumps in a row, where is the lock-holding thread?
In Get or writing Increment... or waiting on sequence id?

Is it possible you are contending on a counter post-upgrade?  Is it
possible that all these threads are trying to get to the same row to update
it? Could the app behavior have changed?  Or are you thinking increment
itself has slowed significantly?

St.Ack




> Thanks,
>
> Toshihiro Suzuki
>
>
> 2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:
>
> > In HRegion#increment(), we lock the row (not region):
> >
> >     try {
> >       rowLock = getRowLock(row);
> >
> > Can you pastebin the complete stack trace ?
> >
> > Thanks
> >
> > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > CDH5.4.5(HBase1.0.0)
> > > and we experience slowdown in increment operation.
> > >
> > > Here's an extract from thread dump of the RegionServer of our cluster:
> > >
> > > Thread 68 (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > >   State: BLOCKED
> > >   Blocked count: 21689888
> > >   Waited count: 39828360
> > >   Blocked on java.util.LinkedList@3474e4b2
> > >   Blocked by 63
> (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > >   Stack:
> > >     java.lang.Object.wait(Native Method)
> > >
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > >
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > >
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > >
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > >
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > >
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > >     org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > >     org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > >     org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > >     java.lang.Thread.run(Thread.java:745)
> > >
> > > There are many similar threads in the thread dump.
> > >
> > > I read the source code and I think this is caused by changes of
> > > MultiVersionConsistencyControl.
> > > A region lock (not a row lock) seems to occur in
> > > waitForPreviousTransactionsComplete().
> > >
> > >
> > > Also we wrote performance test code for increment operation that
> included
> > > 100 threads and ran it in local mode.
> > >
> > > The result is shown below:
> > >
> > > CDH5.3.1(HBase0.98.6)
> > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > >
> > > CDH5.4.5(HBase1.0.0)
> > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > >
> > >
> > > Thanks,
> > >
> > > Toshihiro Suzuki
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by 鈴木俊裕 <br...@gmail.com>.

Ted,

Thank you for your response.

I uploaded the complete stack trace to Gist.

https://gist.github.com/brfrn169/cb4f2c157129330cd932


I think that increment operation works as follows:

1. get row lock
2. mvcc.waitForPreviousTransactionsComplete() // wait for all prior MVCC
transactions to finish
3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
4. get previous values
5. create KVs
6. write to Memstore
7. write to WAL
8. release row lock
9. mvcc.completeMemstoreInsertWithSeqNum() // complete the transaction

A instance of MultiVersionConsistencyControl has a pending queue of writes
named writeQueue.
Step 2 puts a WriteEntry w to writeQueue and waits until writeQueue is
empty or writeQueue.getFirst() == w.
Step 3 puts a WriteEntry to writeQueue and step 9 removes the WriteEntry
from writeQueue.

I think that when a handler thread is processing between step 2 and step 9,
the other handler threads can wait until the thread completes step 9.

Thanks,

Toshihiro Suzuki


2015-09-09 0:05 GMT+09:00 Ted Yu <yu...@gmail.com>:

> In HRegion#increment(), we lock the row (not region):
>
>     try {
>       rowLock = getRowLock(row);
>
> Can you pastebin the complete stack trace ?
>
> Thanks
>
> On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:
>
> > Hi,
> >
> > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> CDH5.4.5(HBase1.0.0)
> > and we experience slowdown in increment operation.
> >
> > Here's an extract from thread dump of the RegionServer of our cluster:
> >
> > Thread 68 (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> >   State: BLOCKED
> >   Blocked count: 21689888
> >   Waited count: 39828360
> >   Blocked on java.util.LinkedList@3474e4b2
> >   Blocked by 63 (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> >   Stack:
> >     java.lang.Object.wait(Native Method)
> >
> >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> >
> >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> >
> >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> >
> >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> >
> >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> >
> >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> >     org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> >     org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> >     org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> >     java.lang.Thread.run(Thread.java:745)
> >
> > There are many similar threads in the thread dump.
> >
> > I read the source code and I think this is caused by changes of
> > MultiVersionConsistencyControl.
> > A region lock (not a row lock) seems to occur in
> > waitForPreviousTransactionsComplete().
> >
> >
> > Also we wrote performance test code for increment operation that included
> > 100 threads and ran it in local mode.
> >
> > The result is shown below:
> >
> > CDH5.3.1(HBase0.98.6)
> > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> >
> > CDH5.4.5(HBase1.0.0)
> > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> >
> >
> > Thanks,
> >
> > Toshihiro Suzuki
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Ted Yu <yu...@gmail.com>.

In HRegion#increment(), we lock the row (not region):

    try {
      rowLock = getRowLock(row);

Can you pastebin the complete stack trace ?

Thanks

On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:

> Hi,
>
> We upgraded our cluster from CDH5.3.1(HBase0.98.6) to CDH5.4.5(HBase1.0.0)
> and we experience slowdown in increment operation.
>
> Here's an extract from thread dump of the RegionServer of our cluster:
>
> Thread 68 (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
>   State: BLOCKED
>   Blocked count: 21689888
>   Waited count: 39828360
>   Blocked on java.util.LinkedList@3474e4b2
>   Blocked by 63 (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
>   Stack:
>     java.lang.Object.wait(Native Method)
>
>
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
>
>
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
>
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
>
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
>
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
>
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
>
>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>     org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>     org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>     org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>     java.lang.Thread.run(Thread.java:745)
>
> There are many similar threads in the thread dump.
>
> I read the source code and I think this is caused by changes of
> MultiVersionConsistencyControl.
> A region lock (not a row lock) seems to occur in
> waitForPreviousTransactionsComplete().
>
>
> Also we wrote performance test code for increment operation that included
> 100 threads and ran it in local mode.
>
> The result is shown below:
>
> CDH5.3.1(HBase0.98.6)
> Throughput(op/s): 12757, Latency(ms): 7.975072509210629
>
> CDH5.4.5(HBase1.0.0)
> Throughput(op/s): 2027, Latency(ms): 49.11840157868772
>
>
> Thanks,
>
> Toshihiro Suzuki
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Bryan Beaudreault <bb...@hubspot.com>.

This is great news! Thanks for all of the hard work here, we're excited to
put this issue behind us and are happy to see the lesson around improving
perf testing.

Cheers,

Bryan

On Mon, Feb 8, 2016 at 1:55 PM Stack <st...@duboce.net> wrote:

> Let me close out this thread.
>
> Below is the release note from the HBASE-14460 umbrella increments
> regression issue and then some.
>
> Increments, appends, checkAnd* have been slow since hbase-.1.0.0. The
> unification of mvcc and sequence id done by HBASE-8763 was responsible.
>
> A ‘fast-path’ workaround was added by HBASE-15031 “Fix merge of MVCC and
> SequenceID performance regression in branch-1.0 for Increments”. It became
> available in 1.0.3 and 1.1.3. To enable the fast path, set
> "hbase.increment.fast.but.narrow.consistency" and then rolling restart. The
> workaround was for increments only (appends, checkAndPut, etc., were not
> addressed. See HBASE-15031 release note for more detail).
>
> Subsequently, the regression was properly identified and fixed in
> HBASE-15213 and the fix applied to branch-1.0 and branch-1.1. As it
> happens, hbase-1.2.0 does not suffer from the performance regression
> (though the thought was that it did -- and so it got the fast-path patch
> too via HBASE-15092). Nor does the master branch. HBASE-15213 identified
> that HBASE-12751 (as a side effect) had cured the regression.
>
> hbase-1.0.4 (if it is ever released -- 1.0 has been end-of-lifed) and
> hbase-1.1.4 will have the HBASE-15213 fix.  If you are suffering from the
> increment regression and you are on 1.0.3 or 1.1.3, you can enable the work
> around to get back your increment performance but you should upgrade to get
> the proper fix
>
>
> There are a couple of lessons here:
>
>    - Open source rules. See Junegunn Choi ‘scratch an itch’ in HBASE-15213
>    - Never make assumptions, even between versions. Presumption was that
>    the slowdown came from added friction brought on by unification of mvcc
> and
>    sequenceid and cursory testing seemed to bare out this hunch but should
>    have dug in more (see bullet point above). A load of work was then done
> to
>    make work-around patch fit different versions of the write-path (it has
>    evolved across 1.0=>2.0) but at least for 1.2.0, this work was not
> needed.
>    - We need to perf test all paths. The increment/append/checkAnd* code
>    paths had been neglected and were missing perf tooling (since
> addressed).
>
> Hopefully we live and learn. Thanks to all who helped along the fix. It
> took an army.
>
> St.Ack
>
>
> On Mon, Dec 21, 2015 at 12:35 PM, Stack <st...@duboce.net> wrote:
>
> > On Mon, Dec 21, 2015 at 2:31 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> >
> >> St.Ack
> >>
> >> I am sorry for the late reply.
> >>
> >>
> > Thank you for the reply.
> >
> >
> >
> >> This is the test code:
> >> https://github.com/brfrn169/hbase-test
> >
> >
> > This helps. The test has a different character to others we currently
> have
> > (a thread keeps writing its row rather than a thread writing all rows in
> > region or all threads writing a single row). Let me try it.
> >
> >
> >
> >>
> >>
> >> We applied the patch you can find below to HBase-1.0.0 to resolve the
> >> performance degradation:
> >> https://gist.github.com/brfrn169/15a874594be2fb9d6ea0
> >>
> >> It showed a good performance.
> >>
> >>
> > The patch is attractive because it keeps the change in MVCC class only.
> >
> > It is a pretty radical change though with some changes that look like
> they
> > could go upstream. There are some questions below but no problem if you
> are
> > busy, I can go do my own study.
> >
> > It looks like advanceMemstore had a "blind spot": i.e. we could complete
> a
> > WriteEntry but it might not yet have had its write numbers assigned... Is
> > that why you added the new advancedNoWriteNumberWriteEntries Set?
> >
> > + if (queueFirst.getWriteNumber() == NO_WRITE_NUMBER) {
> > + advancedNoWriteNumberWriteEntries.add(queueFirst);
> > Are we sure that we'll move up through the write mvcc sequence numbers in
> > order?  We seem to be able to take from the
> > advancedNoWriteNumberWriteEntries Set w/o concern for order.
> >
> > You remove the writeQueue notify and seem to flip instead to wait/notify
> > on readWaiters. Less synchronization points seems good. We should do this
> > upstream too (and your change of > to >= looking at nextReadValue?)
> >
> > Do you have a notion of how much more throughput you got with this
> change?
> >
> > Thank you 鈴木俊裕,
> > St.Ack
> >
> >
> >
> >
> >> I think the direct cause of the performance degradation is not a region
> >> level lock because HBase-0.98.6 also has a region level lock.
> >> I think lock contention is caused by HBASE-8763.
> >>
> >> The patch mitigates that lock contention.
> >>
> >> Thanks,
> >> Toshihiro Suzuki
> >>
> >>
> >> 2015-12-14 15:12 GMT+09:00 Stack <st...@duboce.net>:
> >>
> >> > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> >> >
> >> > > ...
> >> > >
> >> > > Also we wrote performance test code for increment operation that
> >> included
> >> > > 100 threads and ran it in local mode.
> >> > >
> >> > > The result is shown below:
> >> > >
> >> > > CDH5.3.1(HBase0.98.6)
> >> > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> >> > >
> >> > > CDH5.4.5(HBase1.0.0)
> >> > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> >> > >
> >> > >
> >> > Do you have this program Toshihiro Suzuki still? May I see it? I am
> >> > interested in seeing what the 100 threads are doing, if they are all
> >> > updating the same Cell or if they are spread over many rows (I see
> >> master
> >> > branch is more than 2x slower than 0.94 if all threads are contending
> >> on a
> >> > single Cell but if not contending, master seems faster -- I must be
> >> doing
> >> > something wrong over in my experiments on HBASE-14460). I would also
> be
> >> > interested in what you loading is like in production if you can
> >> describe it
> >> > at all (send me offlist if you'd rather talk about it in public). In
> >> > production, can you tell how much slowdown you are seeing? Is it 2x or
> >> 7x
> >> > as per your test program?
> >> >
> >> > Thank you,
> >> > St.Ack
> >> >
> >> >
> >> >
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Toshihiro Suzuki
> >> > >
> >> >
> >>
> >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

Let me close out this thread.

Below is the release note from the HBASE-14460 umbrella increments
regression issue and then some.

Increments, appends, checkAnd* have been slow since hbase-.1.0.0. The
unification of mvcc and sequence id done by HBASE-8763 was responsible.

A ‘fast-path’ workaround was added by HBASE-15031 “Fix merge of MVCC and
SequenceID performance regression in branch-1.0 for Increments”. It became
available in 1.0.3 and 1.1.3. To enable the fast path, set
"hbase.increment.fast.but.narrow.consistency" and then rolling restart. The
workaround was for increments only (appends, checkAndPut, etc., were not
addressed. See HBASE-15031 release note for more detail).

Subsequently, the regression was properly identified and fixed in
HBASE-15213 and the fix applied to branch-1.0 and branch-1.1. As it
happens, hbase-1.2.0 does not suffer from the performance regression
(though the thought was that it did -- and so it got the fast-path patch
too via HBASE-15092). Nor does the master branch. HBASE-15213 identified
that HBASE-12751 (as a side effect) had cured the regression.

hbase-1.0.4 (if it is ever released -- 1.0 has been end-of-lifed) and
hbase-1.1.4 will have the HBASE-15213 fix.  If you are suffering from the
increment regression and you are on 1.0.3 or 1.1.3, you can enable the work
around to get back your increment performance but you should upgrade to get
the proper fix


There are a couple of lessons here:

   - Open source rules. See Junegunn Choi ‘scratch an itch’ in HBASE-15213
   - Never make assumptions, even between versions. Presumption was that
   the slowdown came from added friction brought on by unification of mvcc and
   sequenceid and cursory testing seemed to bare out this hunch but should
   have dug in more (see bullet point above). A load of work was then done to
   make work-around patch fit different versions of the write-path (it has
   evolved across 1.0=>2.0) but at least for 1.2.0, this work was not needed.
   - We need to perf test all paths. The increment/append/checkAnd* code
   paths had been neglected and were missing perf tooling (since addressed).

Hopefully we live and learn. Thanks to all who helped along the fix. It
took an army.

St.Ack


On Mon, Dec 21, 2015 at 12:35 PM, Stack <st...@duboce.net> wrote:

> On Mon, Dec 21, 2015 at 2:31 AM, 鈴木俊裕 <br...@gmail.com> wrote:
>
>> St.Ack
>>
>> I am sorry for the late reply.
>>
>>
> Thank you for the reply.
>
>
>
>> This is the test code:
>> https://github.com/brfrn169/hbase-test
>
>
> This helps. The test has a different character to others we currently have
> (a thread keeps writing its row rather than a thread writing all rows in
> region or all threads writing a single row). Let me try it.
>
>
>
>>
>>
>> We applied the patch you can find below to HBase-1.0.0 to resolve the
>> performance degradation:
>> https://gist.github.com/brfrn169/15a874594be2fb9d6ea0
>>
>> It showed a good performance.
>>
>>
> The patch is attractive because it keeps the change in MVCC class only.
>
> It is a pretty radical change though with some changes that look like they
> could go upstream. There are some questions below but no problem if you are
> busy, I can go do my own study.
>
> It looks like advanceMemstore had a "blind spot": i.e. we could complete a
> WriteEntry but it might not yet have had its write numbers assigned... Is
> that why you added the new advancedNoWriteNumberWriteEntries Set?
>
> + if (queueFirst.getWriteNumber() == NO_WRITE_NUMBER) {
> + advancedNoWriteNumberWriteEntries.add(queueFirst);
> Are we sure that we'll move up through the write mvcc sequence numbers in
> order?  We seem to be able to take from the
> advancedNoWriteNumberWriteEntries Set w/o concern for order.
>
> You remove the writeQueue notify and seem to flip instead to wait/notify
> on readWaiters. Less synchronization points seems good. We should do this
> upstream too (and your change of > to >= looking at nextReadValue?)
>
> Do you have a notion of how much more throughput you got with this change?
>
> Thank you 鈴木俊裕,
> St.Ack
>
>
>
>
>> I think the direct cause of the performance degradation is not a region
>> level lock because HBase-0.98.6 also has a region level lock.
>> I think lock contention is caused by HBASE-8763.
>>
>> The patch mitigates that lock contention.
>>
>> Thanks,
>> Toshihiro Suzuki
>>
>>
>> 2015-12-14 15:12 GMT+09:00 Stack <st...@duboce.net>:
>>
>> > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:
>> >
>> > > ...
>> > >
>> > > Also we wrote performance test code for increment operation that
>> included
>> > > 100 threads and ran it in local mode.
>> > >
>> > > The result is shown below:
>> > >
>> > > CDH5.3.1(HBase0.98.6)
>> > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
>> > >
>> > > CDH5.4.5(HBase1.0.0)
>> > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
>> > >
>> > >
>> > Do you have this program Toshihiro Suzuki still? May I see it? I am
>> > interested in seeing what the 100 threads are doing, if they are all
>> > updating the same Cell or if they are spread over many rows (I see
>> master
>> > branch is more than 2x slower than 0.94 if all threads are contending
>> on a
>> > single Cell but if not contending, master seems faster -- I must be
>> doing
>> > something wrong over in my experiments on HBASE-14460). I would also be
>> > interested in what you loading is like in production if you can
>> describe it
>> > at all (send me offlist if you'd rather talk about it in public). In
>> > production, can you tell how much slowdown you are seeing? Is it 2x or
>> 7x
>> > as per your test program?
>> >
>> > Thank you,
>> > St.Ack
>> >
>> >
>> >
>> > >
>> > > Thanks,
>> > >
>> > > Toshihiro Suzuki
>> > >
>> >
>>
>
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

On Mon, Dec 21, 2015 at 2:31 AM, 鈴木俊裕 <br...@gmail.com> wrote:

> St.Ack
>
> I am sorry for the late reply.
>
>
Thank you for the reply.



> This is the test code:
> https://github.com/brfrn169/hbase-test


This helps. The test has a different character to others we currently have
(a thread keeps writing its row rather than a thread writing all rows in
region or all threads writing a single row). Let me try it.



>
>
> We applied the patch you can find below to HBase-1.0.0 to resolve the
> performance degradation:
> https://gist.github.com/brfrn169/15a874594be2fb9d6ea0
>
> It showed a good performance.
>
>
The patch is attractive because it keeps the change in MVCC class only.

It is a pretty radical change though with some changes that look like they
could go upstream. There are some questions below but no problem if you are
busy, I can go do my own study.

It looks like advanceMemstore had a "blind spot": i.e. we could complete a
WriteEntry but it might not yet have had its write numbers assigned... Is
that why you added the new advancedNoWriteNumberWriteEntries Set?

+ if (queueFirst.getWriteNumber() == NO_WRITE_NUMBER) {+
advancedNoWriteNumberWriteEntries.add(queueFirst);
Are we sure that we'll move up through the write mvcc sequence numbers in
order?  We seem to be able to take from the
advancedNoWriteNumberWriteEntries Set w/o concern for order.

You remove the writeQueue notify and seem to flip instead to wait/notify on
readWaiters. Less synchronization points seems good. We should do this
upstream too (and your change of > to >= looking at nextReadValue?)

Do you have a notion of how much more throughput you got with this change?

Thank you 鈴木俊裕,
St.Ack




> I think the direct cause of the performance degradation is not a region
> level lock because HBase-0.98.6 also has a region level lock.
> I think lock contention is caused by HBASE-8763.
>
> The patch mitigates that lock contention.
>
> Thanks,
> Toshihiro Suzuki
>
>
> 2015-12-14 15:12 GMT+09:00 Stack <st...@duboce.net>:
>
> > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:
> >
> > > ...
> > >
> > > Also we wrote performance test code for increment operation that
> included
> > > 100 threads and ran it in local mode.
> > >
> > > The result is shown below:
> > >
> > > CDH5.3.1(HBase0.98.6)
> > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > >
> > > CDH5.4.5(HBase1.0.0)
> > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > >
> > >
> > Do you have this program Toshihiro Suzuki still? May I see it? I am
> > interested in seeing what the 100 threads are doing, if they are all
> > updating the same Cell or if they are spread over many rows (I see master
> > branch is more than 2x slower than 0.94 if all threads are contending on
> a
> > single Cell but if not contending, master seems faster -- I must be doing
> > something wrong over in my experiments on HBASE-14460). I would also be
> > interested in what you loading is like in production if you can describe
> it
> > at all (send me offlist if you'd rather talk about it in public). In
> > production, can you tell how much slowdown you are seeing? Is it 2x or 7x
> > as per your test program?
> >
> > Thank you,
> > St.Ack
> >
> >
> >
> > >
> > > Thanks,
> > >
> > > Toshihiro Suzuki
> > >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by 鈴木俊裕 <br...@gmail.com>.

St.Ack

I am sorry for the late reply.

This is the test code:
https://github.com/brfrn169/hbase-test

We applied the patch you can find below to HBase-1.0.0 to resolve the
performance degradation:
https://gist.github.com/brfrn169/15a874594be2fb9d6ea0

It showed a good performance.

I think the direct cause of the performance degradation is not a region
level lock because HBase-0.98.6 also has a region level lock.
I think lock contention is caused by HBASE-8763.

The patch mitigates that lock contention.

Thanks,
Toshihiro Suzuki


2015-12-14 15:12 GMT+09:00 Stack <st...@duboce.net>:

> On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:
>
> > ...
> >
> > Also we wrote performance test code for increment operation that included
> > 100 threads and ran it in local mode.
> >
> > The result is shown below:
> >
> > CDH5.3.1(HBase0.98.6)
> > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> >
> > CDH5.4.5(HBase1.0.0)
> > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> >
> >
> Do you have this program Toshihiro Suzuki still? May I see it? I am
> interested in seeing what the 100 threads are doing, if they are all
> updating the same Cell or if they are spread over many rows (I see master
> branch is more than 2x slower than 0.94 if all threads are contending on a
> single Cell but if not contending, master seems faster -- I must be doing
> something wrong over in my experiments on HBASE-14460). I would also be
> interested in what you loading is like in production if you can describe it
> at all (send me offlist if you'd rather talk about it in public). In
> production, can you tell how much slowdown you are seeing? Is it 2x or 7x
> as per your test program?
>
> Thank you,
> St.Ack
>
>
>
> >
> > Thanks,
> >
> > Toshihiro Suzuki
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Posted by Stack <st...@duboce.net>.

On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <br...@gmail.com> wrote:

> ...
>
> Also we wrote performance test code for increment operation that included
> 100 threads and ran it in local mode.
>
> The result is shown below:
>
> CDH5.3.1(HBase0.98.6)
> Throughput(op/s): 12757, Latency(ms): 7.975072509210629
>
> CDH5.4.5(HBase1.0.0)
> Throughput(op/s): 2027, Latency(ms): 49.11840157868772
>
>
Do you have this program Toshihiro Suzuki still? May I see it? I am
interested in seeing what the 100 threads are doing, if they are all
updating the same Cell or if they are spread over many rows (I see master
branch is more than 2x slower than 0.94 if all threads are contending on a
single Cell but if not contending, master seems faster -- I must be doing
something wrong over in my experiments on HBASE-14460). I would also be
interested in what you loading is like in production if you can describe it
at all (send me offlist if you'd rather talk about it in public). In
production, can you tell how much slowdown you are seeing? Is it 2x or 7x
as per your test program?

Thank you,
St.Ack



>
> Thanks,
>
> Toshihiro Suzuki
>