You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Daniel Iancu <da...@1and1.ro> on 2011/12/21 17:56:48 UTC

very slow scan performance on just one region

Hi there
I'm investigating a problem we have with a MR job and I discovered that 
the tasks that fail (scan lease expired while fetching next row) were 
processing one particular region.
I've written a small app that scans that region and counts its rows and 
run it on same machine where region is hosted. The result is very very 
poor, scan speed is in average 7 rows/sec and sometimes when scan 
caching is increased it gets lease expired exception. By contrary, 
scanning the other regions from same table on same machine with same 
caching value gets ~3800 rows/sec. Any idea what can cause such 
dizastrous scan performance on a particular region ?

Some extra info

hbase is 0.90.4
lease timeout is 4 minutes
table has 1 family, cell values are empty, row keys and qualifiers are 
small strings, biggest row has 146 columns
row sizes are almost identical since table was create by a load tool and 
each row has almost the same nr of colums with same kind of values...
all regions have 1 store file of ~655MB
cluster has no activity except the test app
GC activity looks normal
regions might have many deleted KV (we were testing data cleanup with MR 
jobs)
major compaction is deactivated and we didn't run it for some time

Can this problem be caused by the last 2 points above, many deleted KV 
concentrated on that region so they need to be skipped by the StoreScanners?
Any other thoughts?

Thanks
Daniel

Re: very slow scan performance on just one region

Posted by Daniel Iancu <da...@1and1.ro>.

> If you move the region to another host, do you same same perf (Perhaps
> some hardware issue?).
Done more testing today. Is not related to a particular region, it 
happened today with an other region on same machine. Also is not a 
permanent issue, after some time I retried and the scan was fast again. 
I noticed that when scan is slow the RS displays few requests (7-8) 
while when is fast it has thousands, can it be related with the client  ?
I've stopped the RS with problems (btw cluster hanged for 20 mins until 
I killed the RS process) but I got same problem on other machines. The 
crash happens somewhere upstream, in  hbase, it does not reach the 
mapper setup method so I cannot see what split is processed, do you know 
where I should look to see for a task attempt what table split is using?
> Otherwise, if you look at the data under that region, what do you see.
100% puts, MC is actually on.
>   First do a listing of the hdfs content.
All regions have 1 single store since there were no inserts in a while

Re: very slow scan performance on just one region

Posted by Stack <st...@duboce.net>.

On Wed, Dec 21, 2011 at 8:56 AM, Daniel Iancu <da...@1and1.ro> wrote:
> Hi there
> I'm investigating a problem we have with a MR job and I discovered that the
> tasks that fail (scan lease expired while fetching next row) were processing
> one particular region.
> I've written a small app that scans that region and counts its rows and run
> it on same machine where region is hosted. The result is very very poor,
> scan speed is in average 7 rows/sec and sometimes when scan caching is
> increased it gets lease expired exception. By contrary, scanning the other
> regions from same table on same machine with same caching value gets ~3800
> rows/sec. Any idea what can cause such dizastrous scan performance on a
> particular region ?
>

If you move the region to another host, do you same same perf (Perhaps
some hardware issue?).

Otherwise, if you look at the data under that region, what do you see.
 First do a listing of the hdfs content.  Next try looking at the
actual key values with the hfile main tool: Poke down in here
http://hbase.apache.org/book/regions.arch.html#store

> Some extra info
>
> hbase is 0.90.4
> lease timeout is 4 minutes
> table has 1 family, cell values are empty, row keys and qualifiers are small
> strings, biggest row has 146 columns
> row sizes are almost identical since table was create by a load tool and
> each row has almost the same nr of colums with same kind of values...
> all regions have 1 store file of ~655MB
> cluster has no activity except the test app
> GC activity looks normal
> regions might have many deleted KV (we were testing data cleanup with MR
> jobs)


Looksee first w/ hfile tool.

If a major compaction 'fixes' it, then it could be having to pass over
lots of delete items.

St.Ack