You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Buckley,Ron" <bu...@oclc.org> on 2012/08/06 23:37:38 UTC

Extremely long flush times

We ran into this situation today on our production instance, I'm
wondering if anyone has seen or knows a way around it.

 

We were repeatedly updating the same 60,000 column wide row.  With each
update, we were updating nearly every column of the row. Each update
generated nearly 9 MB in the WAL.

This worked OK most of the time. We saw frequent messages like this in
the region server log, indicating a 2.6 second flush time:

2012-08-06 11:42:52,316 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of
~155.4m/162931400, currentsize=2.7k/2760 for region
Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in
2610ms, sequenceid=1792263953, compaction requested=false

 

However, sometimes we would see a flush times around 8 - 10 Minutes. 

2012-08-06 11:51:28,139 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of
~155.4m/162905016, currentsize=116.5m/122172880 for region
Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in
501337ms, sequenceid=1792264092, compaction requested=true

 

During these long flushes, we would end up taking all the ipc handler
slots into that region server that we had configured (50).  It looked to
our application like the region server was unavailable.

 

After the long delay, the region server would continue running fine, but
then do it again a few minutes later.

 

I thought it was most likely something with Garbage Collection, so I got
a stack trace during one of the events.  The trace doesn't look like GC
was running. 

 

 

http://pastebin.com/ANBiUzX4  - Stack Trace when the region server was
unreachable

http://pastebin.com/naTYQxZR  - Region Server Log during the same
interval

 

 

Versions:

HBase 0.92.1, r1298924

Hadoop 0.20.2-cdh3u3

 

 

----------------------------------------------------------------------

Ron Buckley

x6365

RE: Extremely long flush times

Posted by "Espinoza,Carlos" <es...@oclc.org>.

Lars,

Glad I could help. It was cool to see how you approached the problem and came to a solution. Thanks for being so quick addressing this!

Carlos

-----Original Message-----
From: lars hofhansl [mailto:lhofhansl@yahoo.com] 
Sent: Thursday, August 16, 2012 2:40 PM
To: user@hbase.apache.org; dev@hbase.apache.org; lars hofhansl
Subject: Re: Extremely long flush times

This is now committed to 0.94 (i.e. will be in 0.94.2) and 0.96. The fix turned out to be pretty simple (but in an intricate part of HBase)

Thanks for program demonstrating the problem Carlos, that was extremely helpful!

-- Lars



________________________________
 From: lars hofhansl <lh...@yahoo.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>
Sent: Sunday, August 12, 2012 2:41 PM
Subject: Re: Extremely long flush times
 
I filed HBASE-6561 for this (Jira is back).



----- Original Message -----
From: lars hofhansl <lh...@yahoo.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org>
Cc: 
Sent: Saturday, August 11, 2012 12:42 AM
Subject: Re: Extremely long flush times

A possible solution is to have the MemStoreScanner reseek eagerly (i.e. just iterate forward) for a bit (say 100 KVs or so).If that is not fruitful then issue the expensive reseek. I'll try that tomorrow.



(In this case the tailset created from the reseek often 300.000 or more entries in it. That is not necessarily a problem since it is not recreated.)

-- Lars


----- Original Message -----
From: lars hofhansl <lh...@yahoo.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; hbase-dev <de...@hbase.apache.org>
Cc: 
Sent: Saturday, August 11, 2012 12:26 AM
Subject: Re: Extremely long flush times

It turns out the problem is not the loop in MemStoreScanner, but excessive calls to StoreScanner.reseek.

As a test I changed ScanWildcardColumnTracker.checkVersion to MatchCode.SKIP instead of MatchCode.SEEK_NEXT_COL (when the max number of versions is reached); and I do not see this behavior (I see the loop that would not go past 15 or so, happily go on until I stop the client).

Not sure what the conclusion would be. Seeking the memstore seems to be expensive, so it should only be done when many KV can be skipped with a seek, otherwise we should just iterate.
It is not clear how to find this out ahead of time.

I'm open to suggestions.

-- Lars



----- Original Message -----
From: lars hofhansl <lh...@yahoo.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; hbase-dev <de...@hbase.apache.org>
Cc: 
Sent: Friday, August 10, 2012 11:43 PM
Subject: Re: Extremely long flush times

Ran your test code (thanks Carlos).

Found two things:
1. Store.internalFlushCache(...) should be calling StoreScanner.next(List<KeyValue>, int limit) - currently it does not set a limit.(But this is not the problem).

2. With jstack I found that the code is stuck in a loop in Memstore.MemstoreScanner.getNext(...)

Here's the relevant part of the jstack:
"IPC Server handler 6 on 60020" daemon prio=10 tid=0x00007f0574625000 nid=0x720c runnable [0x00007f05669e7000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:726)
        at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seekInSubLists(MemStore.java:761)
        - locked <0x00000000c4a8a860> (a org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner)
        at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.reseek(MemStore.java:800)
        - locked <0x00000000c4a8a860> (a org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner)
        at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299)
        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:522)
        - eliminated <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:403)
        - locked <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3459)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
        - locked <0x00000000c59ee610> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)
        - locked <0x00000000c59ee610> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4144)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1958)
        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1389)



At the same time I find that flush cannot finish:

"regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007f05749ab000 nid=0x71fe waiting for monitor entry [0x00007f05677f6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:443)
        - waiting to lock <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
        at org.apache.hadoop.hbase.regionserver.Store.notifyChangedReadersObservers(Store.java:904)
        at org.apache.hadoop.hbase.regionserver.Store.updateStorefiles(Store.java:893)
        at org.apache.hadoop.hbase.regionserver.Store.access$600(Store.java:107)
        at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.commit(Store.java:2291)
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1455)
        at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1353)
        at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1294)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
        at java.lang.Thread.run(Thread.java:722)


Both StoreScanner.updateReaders and StoreScanner.reseek are synchronized.


So the problem seems to be that MemStoreScanner loops forever in getNext(...). I took a jstack a bunch of times during execution, this always shows up.
Need to dig a bit more, I do not see a good way to deal with this, yet.

-- Lars


----- Original Message -----
From: "Espinoza,Carlos" <es...@oclc.org>
To: user@hbase.apache.org
Cc: 
Sent: Friday, August 10, 2012 3:12 PM
Subject: RE: Extremely long flush times

We were able to replicate this behavior in a pseudo-distributed hbase
(hbase-0.94.1) environment. We wrote a test program that creates a test table "MyTestTable" and populates it with random rows, then it creates a row with 60,000 columns and repeatedly updates it. Each column has a 18 byte qualifier and a 50 byte value. In our tests, when we ran the program, we usually never got beyond 15 updates before it would flush for a really long time. The rows that are being updated are about 4MB each (minues any hbase metadata).

It doesn't seem like it's caused by GC. I turned on gc logging, and didn't see any long pauses. This is the gc log during the flush.
http://pastebin.com/vJKKXDx5

This is the regionserver log with debug on during the same flush http://pastebin.com/Fh5213mg

This is the test program we wrote. 
http://pastebin.com/aZ0k5tx2

You should be able to just compile it, and run it against a running HBase cluster.
$ java TestTable

Carlos

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Monday, August 06, 2012 8:04 PM
To: user@hbase.apache.org
Subject: Re: Extremely long flush times

Hi Ron,

I think this is caused by GC, for example there's 7 minutes between the last blocking message and the end of the flush:

2012-08-06 11:44:36,348 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 42 on 9009' on region
Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6.:
memstore size 271.9m is >= than blocking 256.0m size
2012-08-06 11:51:28,139 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~155.4m/162905016, currentsize=116.5m/122172880 for region Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in 501337ms, sequenceid=1792264092, compaction requested=true

During that time I would expect that clients got socket timeouts, tried to write again, and data would just pile up in the RPC queue.

Do you have the gc log handy? Also did you set the zookeeper session timeout higher than the default of 3 minutes?

Thx,

J-D

On Mon, Aug 6, 2012 at 2:37 PM, Buckley,Ron <bu...@oclc.org> wrote:
> We ran into this situation today on our production instance, I'm 
> wondering if anyone has seen or knows a way around it.
>
>
>
> We were repeatedly updating the same 60,000 column wide row.  With 
> each update, we were updating nearly every column of the row. Each 
> update generated nearly 9 MB in the WAL.
>
> This worked OK most of the time. We saw frequent messages like this in

> the region server log, indicating a 2.6 second flush time:
>
> 2012-08-06 11:42:52,316 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush 
> of ~155.4m/162931400, currentsize=2.7k/2760 for region 
> Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in 
> 2610ms, sequenceid=1792263953, compaction requested=false
>
>
>
> However, sometimes we would see a flush times around 8 - 10 Minutes.
>
> 2012-08-06 11:51:28,139 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush 
> of ~155.4m/162905016, currentsize=116.5m/122172880 for region 
> Works,73658752:eng,1337915462169.1cfc92481a74ec278337871563d189d6. in 
> 501337ms, sequenceid=1792264092, compaction requested=true
>
>
>
> During these long flushes, we would end up taking all the ipc handler 
> slots into that region server that we had configured (50).  It looked 
> to our application like the region server was unavailable.
>
>
>
> After the long delay, the region server would continue running fine, 
> but then do it again a few minutes later.
>
>
>
> I thought it was most likely something with Garbage Collection, so I 
> got a stack trace during one of the events.  The trace doesn't look 
> like GC was running.
>
>
>
>
>
> http://pastebin.com/ANBiUzX4  - Stack Trace when the region server was

> unreachable
>
> http://pastebin.com/naTYQxZR  - Region Server Log during the same 
> interval
>
>
>
>
>
> Versions:
>
> HBase 0.92.1, r1298924
>
> Hadoop 0.20.2-cdh3u3
>
>
>
>
>
> ----------------------------------------------------------------------
>
> Ron Buckley
>
> x6365
>
>
>