You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Eric Hauser <ew...@gmail.com> on 2011/07/29 18:57:01 UTC

HBase / YCSB

Hi,
I've been doing different experiments with a 5-node cluster with YCSB.
 We have been testing a number of different configurations, so I have
been constantly been wiping our cluster up and setting it up again
since we configure everything via Chef.   At one point, I was able to
get the following stats from our cluster which I was pretty happy
with:
YCSB Client 0.1

Command line: -load -db com.yahoo.ycsb.db.HBaseClient
-Pworkloads/workloada -p columnfamily=family -p recordcount=10000000
-s

[OVERALL], RunTime(ms), 1057645.0

[OVERALL], Throughput(ops/sec), 9454.96834949345

[INSERT], Operations, 10000000

[INSERT], AverageLatency(ms), 0.0915235

[INSERT], MinLatency(ms), 0

[INSERT], MaxLatency(ms), 6925

[INSERT], 95thPercentileLatency(ms), 0

[INSERT], 99thPercentileLatency(ms), 0

[INSERT], Return=0, 10000000

However, in our most recent server builds, I seem to very quickly
deadlock something in HBase.  I've backed through all our old
revisions and reverted a number of different configuration settings,
and I can't seem to figure out now why the cluster is so slow.  Our
terasort M/R tests are returning the same values as before, so I do
not believe that there is anything wrong external to HBase.

The behavior that I see when I kick off the tests is this:

[UPDATE], 0, 4765

[UPDATE], 1, 248

[UPDATE], 2, 0

[UPDATE], 3, 0

[UPDATE], 4, 0

Basically, it kicks off a large number of inserts and HBase grinds to
a halt.  Some number of the writes end up getting inserted (usually
around ~50), but then everything stops.  Here's the behavior I see
with the region servers:

npin-172-16-12-203.np.local:60030	1311956094792	requests=50,
regions=1, usedHeap=151, maxHeap=16358
npin-172-16-12-204.np.local:60030	1311956094776	requests=5, regions=2,
usedHeap=157, maxHeap=16358
npin-172-16-12-205.np.local:60030	1311956093804	requests=0, regions=0,
usedHeap=134, maxHeap=16358
npin-172-16-12-206.np.local:60030	1311956093809	requests=0, regions=0,
usedHeap=134, maxHeap=16358
npin-172-16-12-207.np.local:60030	1311956094799	requests=0, regions=0,
usedHeap=134, maxHeap=16358
Total:	servers: 5	 	requests=55, regions=3

I did thread dumps on both the masters and region servers during this
time and did not see anything interesting. I'm using 0.90.3-CDH3U1.
Anyone have a suggestion on where to look next?

Re: HBase / YCSB

Posted by Jeff Whiting <je...@qualtrics.com>.

Check the region server logs.  If they are blocking on something it should show up there.  For cdh3 
the logs are in /var/log/hbase/.  Also you may want to turn on debug level for your logging (either 
in log4j or in the web interface).  Finally all of your requests are going to just one region 
server...npin-172-16-12-204.np.local...so it may be stuck trying to split a region or something.  
You could try to pre-split the regions which may help.

~Jeff

On 7/29/2011 10:57 AM, Eric Hauser wrote:
> Hi,
> I've been doing different experiments with a 5-node cluster with YCSB.
>   We have been testing a number of different configurations, so I have
> been constantly been wiping our cluster up and setting it up again
> since we configure everything via Chef.   At one point, I was able to
> get the following stats from our cluster which I was pretty happy
> with:
> YCSB Client 0.1
>
> Command line: -load -db com.yahoo.ycsb.db.HBaseClient
> -Pworkloads/workloada -p columnfamily=family -p recordcount=10000000
> -s
>
> [OVERALL], RunTime(ms), 1057645.0
>
> [OVERALL], Throughput(ops/sec), 9454.96834949345
>
> [INSERT], Operations, 10000000
>
> [INSERT], AverageLatency(ms), 0.0915235
>
> [INSERT], MinLatency(ms), 0
>
> [INSERT], MaxLatency(ms), 6925
>
> [INSERT], 95thPercentileLatency(ms), 0
>
> [INSERT], 99thPercentileLatency(ms), 0
>
> [INSERT], Return=0, 10000000
>
> However, in our most recent server builds, I seem to very quickly
> deadlock something in HBase.  I've backed through all our old
> revisions and reverted a number of different configuration settings,
> and I can't seem to figure out now why the cluster is so slow.  Our
> terasort M/R tests are returning the same values as before, so I do
> not believe that there is anything wrong external to HBase.
>
> The behavior that I see when I kick off the tests is this:
>
> [UPDATE], 0, 4765
>
> [UPDATE], 1, 248
>
> [UPDATE], 2, 0
>
> [UPDATE], 3, 0
>
> [UPDATE], 4, 0
>
> Basically, it kicks off a large number of inserts and HBase grinds to
> a halt.  Some number of the writes end up getting inserted (usually
> around ~50), but then everything stops.  Here's the behavior I see
> with the region servers:
>
> npin-172-16-12-203.np.local:60030	1311956094792	requests=50,
> regions=1, usedHeap=151, maxHeap=16358
> npin-172-16-12-204.np.local:60030	1311956094776	requests=5, regions=2,
> usedHeap=157, maxHeap=16358
> npin-172-16-12-205.np.local:60030	1311956093804	requests=0, regions=0,
> usedHeap=134, maxHeap=16358
> npin-172-16-12-206.np.local:60030	1311956093809	requests=0, regions=0,
> usedHeap=134, maxHeap=16358
> npin-172-16-12-207.np.local:60030	1311956094799	requests=0, regions=0,
> usedHeap=134, maxHeap=16358
> Total:	servers: 5	 	requests=55, regions=3
>
> I did thread dumps on both the masters and region servers during this
> time and did not see anything interesting. I'm using 0.90.3-CDH3U1.
> Anyone have a suggestion on where to look next?

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com

Re: HBase / YCSB

Posted by Gary Helmling <gh...@gmail.com>.

Is it possible that you have mismatched versions of either the hbase jar or
hadoop jar on the ycsb client versus the servers? In almost all cases where
I've run into mysterious rpc hangs right off the bat it's been attributable
to forgetting to update a jar file or an older version still being present
in the classpath.

If all of that checks out ok, you can enable rpc logging by adding the
following to log4j.properties on both the client and the server:

log4j.logger.org.apache.hadoop.ipc=DEBUG

This will produce a lot of output, but should make it easier to track what's
going on.

--gh



On Fri, Jul 29, 2011 at 9:57 AM, Eric Hauser <ew...@gmail.com> wrote:

> Hi,
> I've been doing different experiments with a 5-node cluster with YCSB.
>  We have been testing a number of different configurations, so I have
> been constantly been wiping our cluster up and setting it up again
> since we configure everything via Chef.   At one point, I was able to
> get the following stats from our cluster which I was pretty happy
> with:
> YCSB Client 0.1
>
> Command line: -load -db com.yahoo.ycsb.db.HBaseClient
> -Pworkloads/workloada -p columnfamily=family -p recordcount=10000000
> -s
>
> [OVERALL], RunTime(ms), 1057645.0
>
> [OVERALL], Throughput(ops/sec), 9454.96834949345
>
> [INSERT], Operations, 10000000
>
> [INSERT], AverageLatency(ms), 0.0915235
>
> [INSERT], MinLatency(ms), 0
>
> [INSERT], MaxLatency(ms), 6925
>
> [INSERT], 95thPercentileLatency(ms), 0
>
> [INSERT], 99thPercentileLatency(ms), 0
>
> [INSERT], Return=0, 10000000
>
> However, in our most recent server builds, I seem to very quickly
> deadlock something in HBase.  I've backed through all our old
> revisions and reverted a number of different configuration settings,
> and I can't seem to figure out now why the cluster is so slow.  Our
> terasort M/R tests are returning the same values as before, so I do
> not believe that there is anything wrong external to HBase.
>
> The behavior that I see when I kick off the tests is this:
>
> [UPDATE], 0, 4765
>
> [UPDATE], 1, 248
>
> [UPDATE], 2, 0
>
> [UPDATE], 3, 0
>
> [UPDATE], 4, 0
>
> Basically, it kicks off a large number of inserts and HBase grinds to
> a halt.  Some number of the writes end up getting inserted (usually
> around ~50), but then everything stops.  Here's the behavior I see
> with the region servers:
>
> npin-172-16-12-203.np.local:60030       1311956094792   requests=50,
> regions=1, usedHeap=151, maxHeap=16358
> npin-172-16-12-204.np.local:60030       1311956094776   requests=5,
> regions=2,
> usedHeap=157, maxHeap=16358
> npin-172-16-12-205.np.local:60030       1311956093804   requests=0,
> regions=0,
> usedHeap=134, maxHeap=16358
> npin-172-16-12-206.np.local:60030       1311956093809   requests=0,
> regions=0,
> usedHeap=134, maxHeap=16358
> npin-172-16-12-207.np.local:60030       1311956094799   requests=0,
> regions=0,
> usedHeap=134, maxHeap=16358
> Total:  servers: 5              requests=55, regions=3
>
> I did thread dumps on both the masters and region servers during this
> time and did not see anything interesting. I'm using 0.90.3-CDH3U1.
> Anyone have a suggestion on where to look next?
>