You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Dmitriy Lyfar <dl...@gmail.com> on 2010/01/04 14:18:41 UTC

Re: Problems with write performance (25kb rows)

Hello, Stack

> Of course I will insert less rows per second in
> > case of 25Kb, but throughput should stay the same. Now I'm trying to run
> > several instances of client each of them inserts 100K records (each
> record
> > is 25Kb). Time of execution grows for each client.
> >
> >
> > >
> > > In general, our client ain't to good at multiplexing because of such as
> > the
> > > above noted limitation (our client does not yet do nio).  If you want
> to
> > > test cluster performance, run multiple concurrent clients each to its
> own
> > > process.  MapReduce is good for doing this.  See the
> > PerformanceEvaluation
> > > code for a sample MR job that floats many clients doing different
> loading
> > > types.
> > >
> >
> > MapReduce is good idea, but actually we don't have data which is located
> in
> > hadoop, we processes data in realtime and insert it into hbase. So I
> think
> > it will be inefficient to write our data in hadoop and then run MapReduce
> > work which will insert that data into the tables.
> >
> >
> Agreed.  Was just suggesting it as a way of parallellizing clients.  I
> presume that the source of the data feed is multiple, that you can run
> multiple instances of your upload process?
>

Yes, I think I can run multiple instances of uploader.


>
> > >
> > Time with several clients is growing. For example when I'm running four
> > processes, each of them have one inserter thread I got following results:
> > 1) Thread-1 have finished its work in 189 sec
> > 2) Thread-1 have finished its work in 198 sec
> > 3) Thread-1 have finished its work in 206 sec
> > 4) Thread-1 have finished its work in 208 sec
> > I.e. each next process works longer than previous. It was timings for
> test
> > where each process inserts 100K 25Kb rows with WAL on. Btw WAL have great
> > impact on performance when I increase size of row. I have about 80 sec
> for
> > this test with WAL off. Also when running several clients nodes seems
> still
> > almost idle.
> >
>
> Oh, how many regions in your cluster?  At the start, all clients will be
> hitting a single region (and thus a single server).  Check your master
> console at port 60010.
>
> You could rerun a second upload just after a first upload.


As I said I have 6 nodes except master node and each node has 235 regions.
1406 regions total.
And throughput without WAL is about 50 Mb/sec and  about 15 Mb/sec with WAL
on. When I run clients in serial order (i.e. at the moment there is only one
working script) time almost stable and not grows.


> See what the
> numbers are like uploading into a table that is pre-split?


Sorry, what you mean pre-split? You mean splitting regions before running
script?


-- 
Regards, Lyfar Dmitriy

Re: Problems with write performance (25kb rows)

Posted by stack <st...@duboce.net>.

>From a quick perusal of the posted log, it looks like hbase is staying up?
 Is it having problems Dmitry other than slowness after you made changes
like xceiver and upped tickTime?  I'll take a closer look later.

St.Ack

On Wed, Jan 13, 2010 at 4:35 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:

> Sorry, forgot to insert link to DEBUG regionserver logs:
> http://pastebin.com/m70f01f36
>
> 2010/1/13 Dmitriy Lyfar <dl...@gmail.com>
>
> > Hi Stack,
> >
> > Thank you for you help. I set xceivers in hdfs xml config like:
> >
> > <property>
> >         <name>dfs.datanode.max.xcievers</name>
> >         <value>8192</value>
> > </property>
> >
> > And ulimit is 32K for sure. I turned off DEBUG logging level for hbase
> and
> > here is log for one of regionservers after I have inserted 200K records
> > (each row is 25Kb).
> > Speed still the same (about 1K rows per second).
> > Random ints plays a role of row keys now (i.e. uniform random
> distribution
> > on (0, 100 * 1000)).
> > What do you think is 5GB for hbase and 2GB for hdfs enough?
> >
> >
> >> What are you tasktrackers doing?   Are they doing the hbase loading?
>  You
> >> might try turning down how many task run concurrently on each
> tasktracker.
> >> The running tasktracker may be sucking resources from hdfs (and thus by
> >> association, from hbase): i.e. mapred.map.tasks and mapred.reduce.tasks
> >> (Pardon me if this advice has been given previous and you've already
> acted
> >> on it).
> >
> >
> > Tasktrackers is not used now (I planned them for future use in
> statistical
> > analysis). So I turned them off for last tests. Data uploader is several
> > clients which run simultaneously on name node and each of them inserts
> 100K
> > records.
> >
> > --
> > Regards, Lyfar Dmitriy
> >
>
>
>
> --
> Regards, Lyfar Dmitriy
> mailto: dlyfar@crystalnix.com
> jabber: dlyfar@gmail.com
>

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Sorry, forgot to insert link to DEBUG regionserver logs:
http://pastebin.com/m70f01f36

2010/1/13 Dmitriy Lyfar <dl...@gmail.com>

> Hi Stack,
>
> Thank you for you help. I set xceivers in hdfs xml config like:
>
> <property>
>         <name>dfs.datanode.max.xcievers</name>
>         <value>8192</value>
> </property>
>
> And ulimit is 32K for sure. I turned off DEBUG logging level for hbase and
> here is log for one of regionservers after I have inserted 200K records
> (each row is 25Kb).
> Speed still the same (about 1K rows per second).
> Random ints plays a role of row keys now (i.e. uniform random distribution
> on (0, 100 * 1000)).
> What do you think is 5GB for hbase and 2GB for hdfs enough?
>
>
>> What are you tasktrackers doing?   Are they doing the hbase loading?  You
>> might try turning down how many task run concurrently on each tasktracker.
>> The running tasktracker may be sucking resources from hdfs (and thus by
>> association, from hbase): i.e. mapred.map.tasks and mapred.reduce.tasks
>> (Pardon me if this advice has been given previous and you've already acted
>> on it).
>
>
> Tasktrackers is not used now (I planned them for future use in statistical
> analysis). So I turned them off for last tests. Data uploader is several
> clients which run simultaneously on name node and each of them inserts 100K
> records.
>
> --
> Regards, Lyfar Dmitriy
>



-- 
Regards, Lyfar Dmitriy
mailto: dlyfar@crystalnix.com
jabber: dlyfar@gmail.com

Re: Problems with write performance (25kb rows)

Posted by stack <st...@duboce.net>.

On Thu, Jan 14, 2010 at 5:20 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:

> Hi,
>
> > Speed still the same (about 1K rows per second).
>> >
>>
>> This seems low for your 6 node cluster.
>>
>> If you look at the servers, are they cpu or io bound-up in any way?
>>
>> How many clients you have running now?
>>
>
> Now I'm running 1-2 clients in parallel. If I run more -- timings grows.
> Also I not use namenode as datanode and as regionserver. There is only
> namenode/secondarynn/master/zk.
>

Understood, but is it because the regionservers+datanodes load is going up
if you add more clients?   Or are the timeouts because of something else?
 (Clients are running on the machine that has NN/Master/ZK?  If so, could
the clients be sucking resources from  these servers in a way that slows
down whole cluster?  Is load on machine high when clients are running?  That
kinda thing).


>
>
>>
>> This is not a new table right?  (I see there is an existing table in your
>> cluster looking at the regionserver log).   Its an existing table of many
>> regions?
>>
>
> Yes. I have 7 test tables. Client randomly select table which will be used
> at start.
> Now after some tests I have about 800 regions per region server and 7
> tables.
>
>
Thats a lot of regions per regionserver.  Just FYI.



>
>> You have upped the handlers in hbase.  Have you done same for datanodes
>> (In
>> case we are bottlenecking here).
>>
>
> I've updated this setting for hadoop also. As I understand if something
> wrong with
> number of handles -- I will get an exception TooManyOpenFiles and datanode
> finish its work.
>

No. Your change to ulimit addresses this issue.  Upping the handlers makes
it so requests get into the server.  Otherwise, they are blocked until one
becomes available.  If servers are powerful, as are yours, they can handle
more work concurrenlty that handlers might allow come in.



> All works fine for now. I've attached metrics from one of datanodes. On
> other nodes we have almost same picture. Please look at the throughput
> picture. It seems illogical to me that node have almost equal inbound and
> outbound traffic (render.png). These pictures were snapped while running two
> clients and then after some break I've ran one client.
>

I'll take a look.


>
>
>>  > Random ints plays a role of row keys now (i.e. uniform random
>> distribution
>> > on (0, 100 * 1000)).
>> > What do you think is 5GB for hbase and 2GB for hdfs enough?
>> >
>> > Yes, that should be good.  Writing you are not using that memory in
>> regionserver though, maybe you should go with bigger regions if you have
>> 25k
>> cells.  You using compression?
>>
>
> Yes, 25Kb is important, but I think in production system we will have
> 70-80% of 5-10Kb rows,
> about 20% of 25Kb rows and 10% of > 25Kb rows. I'm not using any
> compression for columns because I was thinking about throughput. But I was
> planning to use compression when I can achieve 80-90 Mb/sec for this test.
>
>

Currently we are at what?


>
>> I took a look at your regionserver log.  Its just after an open of the
>> regionserver.  I see no activity other than the opening of a few regions.
>>  These regions do happen to have alot of store files so we're starting up
>> compactions but that all should be fine.  I'd be interested in seeing a
>> log
>> snippet from a regionserver under load.
>>
>
> Ok, there are some tests running now which will be interesting I think,
> I'll provide regionserver logs a bit later.
> Thank you for your help!
>

Thanks for your patience sticking with it.


St.Ack

RE: HBase as DB tier for Tomcat - best practices to instantiate HTables?

Posted by Jeyendran Balakrishnan <jb...@docomolabs-usa.com>.

Many thanks to Jean-Daniel, Andy and Vaibhav for the solutions.
In my case, I am not using Spring, so I am planning to use the
HTablePool approach.

-Jeyendran


-----Original Message-----
From: Vaibhav Puranik [mailto:vpuranik@gmail.com] 
Sent: Saturday, January 16, 2010 3:05 PM
To: hbase-user@hadoop.apache.org
Subject: Re: HBase as DB tier for Tomcat - best practices to instantiate
HTables?

In case you are using Spring Framework, take a look at the following
issue:

http://jira.springframework.org/browse/SPR-5950

This shows how to configure HBase connection as a Spring bean.

Regards,
Vaibhav

On Fri, Jan 15, 2010 at 7:04 PM, Andrew Purtell <ap...@apache.org>
wrote:

> A HTable object is not thread safe for writing. You'll need to watch
> for that.
>
> So use the pool of HTables as J-D suggests, or instantiate a group of
> worker threads and allocate one HTable object for each at the top of
> run() before entering the work loop.
>
>   - Andy
>
>
>
> ----- Original Message ----
> > From: Jean-Daniel Cryans <jd...@apache.org>
> > To: hbase-user@hadoop.apache.org
> > Sent: Fri, January 15, 2010 5:40:14 PM
> > Subject: Re: HBase as DB tier for Tomcat - best practices to
instantiate
>  HTables?
> >
> > You want to keep using a pool of HTables, see
> >
>
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/
client/HTablePool.html
> >
> > J-D
> >
> > On Fri, Jan 15, 2010 at 5:35 PM, Jeyendran Balakrishnan
> > wrote:
> > > What's the best way to instantiate HTables when the HBase cluster
is
> > > being accessed from client code running inside a Tomcat web app?
The
> web
> > > app dooes the typical CRUD operations [but no table alteration].
> > >
> > > I read from the lists some time ago that instantiating a HTable in
> > > servlet request is not advisable, since instantiating HTable is a
bit
> > > slow [accesses the meta regions?].
> > >
> > > So is the best practice for web apps to instantiate a HTable once
at
> > > startup and cache in the servlet context, and re-use it every
request?
> > >
> > > In that case, is this thread safe [since each request spawns a
> different
> > > thread]? On the other hand, syncing on the single HTable instance
at
> the
> > > web application level also slows down requests when contending for
this
> > > one instance.
> > >
> > > Also, is holding on to a single HTable instance in a long-running
web
> > > app a reliable approach?
> > >
> > > Or is it better to bite the bullet and instantiate an HTable per
> request
> > > after all?
> > >
> > > Finally, has anybody looked into or have some kind of HTable pool
[like
> > > DB connection pools], which is a sort of medium between one HTable
per
> > > web app and one HTable per request?
> > >
> > > Any advice on best practices from the community would be greatly
> > > appreciated.
> > >
> > > Thanks,
> > > Jeyendran
> > >
> > >
>
>
>
>
>
>

Re: HBase as DB tier for Tomcat - best practices to instantiate HTables?

Posted by Vaibhav Puranik <vp...@gmail.com>.

In case you are using Spring Framework, take a look at the following issue:

http://jira.springframework.org/browse/SPR-5950

This shows how to configure HBase connection as a Spring bean.

Regards,
Vaibhav

On Fri, Jan 15, 2010 at 7:04 PM, Andrew Purtell <ap...@apache.org> wrote:

> A HTable object is not thread safe for writing. You'll need to watch
> for that.
>
> So use the pool of HTables as J-D suggests, or instantiate a group of
> worker threads and allocate one HTable object for each at the top of
> run() before entering the work loop.
>
>   - Andy
>
>
>
> ----- Original Message ----
> > From: Jean-Daniel Cryans <jd...@apache.org>
> > To: hbase-user@hadoop.apache.org
> > Sent: Fri, January 15, 2010 5:40:14 PM
> > Subject: Re: HBase as DB tier for Tomcat - best practices to instantiate
>  HTables?
> >
> > You want to keep using a pool of HTables, see
> >
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTablePool.html
> >
> > J-D
> >
> > On Fri, Jan 15, 2010 at 5:35 PM, Jeyendran Balakrishnan
> > wrote:
> > > What's the best way to instantiate HTables when the HBase cluster is
> > > being accessed from client code running inside a Tomcat web app? The
> web
> > > app dooes the typical CRUD operations [but no table alteration].
> > >
> > > I read from the lists some time ago that instantiating a HTable in
> > > servlet request is not advisable, since instantiating HTable is a bit
> > > slow [accesses the meta regions?].
> > >
> > > So is the best practice for web apps to instantiate a HTable once at
> > > startup and cache in the servlet context, and re-use it every request?
> > >
> > > In that case, is this thread safe [since each request spawns a
> different
> > > thread]? On the other hand, syncing on the single HTable instance at
> the
> > > web application level also slows down requests when contending for this
> > > one instance.
> > >
> > > Also, is holding on to a single HTable instance in a long-running web
> > > app a reliable approach?
> > >
> > > Or is it better to bite the bullet and instantiate an HTable per
> request
> > > after all?
> > >
> > > Finally, has anybody looked into or have some kind of HTable pool [like
> > > DB connection pools], which is a sort of medium between one HTable per
> > > web app and one HTable per request?
> > >
> > > Any advice on best practices from the community would be greatly
> > > appreciated.
> > >
> > > Thanks,
> > > Jeyendran
> > >
> > >
>
>
>
>
>
>

Re: HBase as DB tier for Tomcat - best practices to instantiate HTables?

Posted by Andrew Purtell <ap...@apache.org>.

A HTable object is not thread safe for writing. You'll need to watch
for that. 

So use the pool of HTables as J-D suggests, or instantiate a group of
worker threads and allocate one HTable object for each at the top of
run() before entering the work loop. 

   - Andy



----- Original Message ----
> From: Jean-Daniel Cryans <jd...@apache.org>
> To: hbase-user@hadoop.apache.org
> Sent: Fri, January 15, 2010 5:40:14 PM
> Subject: Re: HBase as DB tier for Tomcat - best practices to instantiate  HTables?
> 
> You want to keep using a pool of HTables, see
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTablePool.html
> 
> J-D
> 
> On Fri, Jan 15, 2010 at 5:35 PM, Jeyendran Balakrishnan
> wrote:
> > What's the best way to instantiate HTables when the HBase cluster is
> > being accessed from client code running inside a Tomcat web app? The web
> > app dooes the typical CRUD operations [but no table alteration].
> >
> > I read from the lists some time ago that instantiating a HTable in
> > servlet request is not advisable, since instantiating HTable is a bit
> > slow [accesses the meta regions?].
> >
> > So is the best practice for web apps to instantiate a HTable once at
> > startup and cache in the servlet context, and re-use it every request?
> >
> > In that case, is this thread safe [since each request spawns a different
> > thread]? On the other hand, syncing on the single HTable instance at the
> > web application level also slows down requests when contending for this
> > one instance.
> >
> > Also, is holding on to a single HTable instance in a long-running web
> > app a reliable approach?
> >
> > Or is it better to bite the bullet and instantiate an HTable per request
> > after all?
> >
> > Finally, has anybody looked into or have some kind of HTable pool [like
> > DB connection pools], which is a sort of medium between one HTable per
> > web app and one HTable per request?
> >
> > Any advice on best practices from the community would be greatly
> > appreciated.
> >
> > Thanks,
> > Jeyendran
> >
> >

Re: HBase as DB tier for Tomcat - best practices to instantiate HTables?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

You want to keep using a pool of HTables, see
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTablePool.html

J-D

On Fri, Jan 15, 2010 at 5:35 PM, Jeyendran Balakrishnan
<jb...@docomolabs-usa.com> wrote:
> What's the best way to instantiate HTables when the HBase cluster is
> being accessed from client code running inside a Tomcat web app? The web
> app dooes the typical CRUD operations [but no table alteration].
>
> I read from the lists some time ago that instantiating a HTable in
> servlet request is not advisable, since instantiating HTable is a bit
> slow [accesses the meta regions?].
>
> So is the best practice for web apps to instantiate a HTable once at
> startup and cache in the servlet context, and re-use it every request?
>
> In that case, is this thread safe [since each request spawns a different
> thread]? On the other hand, syncing on the single HTable instance at the
> web application level also slows down requests when contending for this
> one instance.
>
> Also, is holding on to a single HTable instance in a long-running web
> app a reliable approach?
>
> Or is it better to bite the bullet and instantiate an HTable per request
> after all?
>
> Finally, has anybody looked into or have some kind of HTable pool [like
> DB connection pools], which is a sort of medium between one HTable per
> web app and one HTable per request?
>
> Any advice on best practices from the community would be greatly
> appreciated.
>
> Thanks,
> Jeyendran
>
>

HBase as DB tier for Tomcat - best practices to instantiate HTables?

Posted by Jeyendran Balakrishnan <jb...@docomolabs-usa.com>.

What's the best way to instantiate HTables when the HBase cluster is
being accessed from client code running inside a Tomcat web app? The web
app dooes the typical CRUD operations [but no table alteration]. 

I read from the lists some time ago that instantiating a HTable in
servlet request is not advisable, since instantiating HTable is a bit
slow [accesses the meta regions?]. 

So is the best practice for web apps to instantiate a HTable once at
startup and cache in the servlet context, and re-use it every request? 

In that case, is this thread safe [since each request spawns a different
thread]? On the other hand, syncing on the single HTable instance at the
web application level also slows down requests when contending for this
one instance. 

Also, is holding on to a single HTable instance in a long-running web
app a reliable approach? 

Or is it better to bite the bullet and instantiate an HTable per request
after all?

Finally, has anybody looked into or have some kind of HTable pool [like
DB connection pools], which is a sort of medium between one HTable per
web app and one HTable per request?

Any advice on best practices from the community would be greatly
appreciated.

Thanks,
Jeyendran

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Hi Stack,

I would like to summarize all results that we have in this email.

At start of this discussion I had following config:
http://pastebin.com/m6c7358e6

Which allowed me to have 10-15 Mb per second of throughput (with WAL on) for
a serial client which inserts 100K of 25Kb records and exits.

Now I've removed block.multiplier option (default is used now) and set
region size to 1GB. In both case I had
hbase.hregion.memstore.flush.size = 67108864 (i.e. by default).
Changing of regionsize makes sense as you said and for 25Kb test and
throughput with WAL on is about 25-27 Mb per second now when I have about
200 regions per server.

As for your questions:

Understood, but is it because the regionservers+datanodes load is going up
> if you add more clients?   Or are the timeouts because of something else?
>  (Clients are running on the machine that has NN/Master/ZK?  If so, could
> the clients be sucking resources from  these servers in a way that slows
> down whole cluster?  Is load on machine high when clients are running?
>  That
> kinda thing).
>

Yes, clients are running on namenode machine, and that's may be a problem
when running several of them in parallel. I think I need to move running out
of cluster to see, but I don't see any reason to do that on performance
graphs.
Also I'm running tests on LZO-enabled tables and will provide my results
soon.

Dmitriy.

Re: Problems with write performance (25kb rows)

Posted by stack <st...@duboce.net>.

On Sat, Jan 16, 2010 at 5:32 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:

>
> As I understood hbase.hregion.memstore.block.multiplier parameter impacts
> on
> this parameter. I have multiplier equal to 12. It helped me in case 5Kb
> rows, but seems not effective with 25Kb rows. I'll provide test results a
> bit later.
>
>
What do you have for hbase.hregion.memstore.flush.size?  Is it not 67108864?
St.Ack

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Hello Stack,

Thank you for analyzing. Now just quick answer on your question.

2010/1/16 stack <st...@duboce.net>

> Looking at our log again Dmitry, you are flushing a lot under load --
> sometimes multiple times per second.
>
> 2010-01-15 13:52:52,430 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Added hdfs://nn1/hbase/2/1459207030/contents/3021941585147682122,
> entries=492, sequenceid=129985024, memsize=5.3m, filesize=5.2m to
> 2,\x00\x01\x27D,1263549378148
>
> I notice that we are flushing at ~6.4M or so rather than at the usual 64M.
>  Did you change that setting?  If so, that'd help explain why we're having
> some trouble keeping up -- lots of small files.
>

As I understood hbase.hregion.memstore.block.multiplier parameter impacts on
this parameter. I have multiplier equal to 12. It helped me in case 5Kb
rows, but seems not effective with 25Kb rows. I'll provide test results a
bit later.


Dmitriy.

Re: Problems with write performance (25kb rows)

Posted by stack <st...@duboce.net>.

Looking at our log again Dmitry, you are flushing a lot under load --
sometimes multiple times per second.

2010-01-15 13:52:52,430 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Added hdfs://nn1/hbase/2/1459207030/contents/3021941585147682122,
entries=492, sequenceid=129985024, memsize=5.3m, filesize=5.2m to
2,\x00\x01\x27D,1263549378148

I notice that we are flushing at ~6.4M or so rather than at the usual 64M.
 Did you change that setting?  If so, that'd help explain why we're having
some trouble keeping up -- lots of small files.

Also a reason to consider lzo is that you could hold the same amount of data
in many less the the 800 regions per server.  It might also smooth out the
differences in cell sizes.

St.Ack



On Fri, Jan 15, 2010 at 10:48 AM, stack <st...@duboce.net> wrote:

> Your pngs of traffic didn't come across.  Please put them somewhere I can
> pull.
>
> On Fri, Jan 15, 2010 at 5:40 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:
>
>>
>> After some night tests I have log of one regionserver in debug mode.
>> I've uploaded it here: http://slil.ru/28491882 (downloading begins after
>> 10
>> second)
>>
>
> Thats an interesting site Dmitry (smile).
>
>
>
>> But there is some problems I see after these tests, I regularly have
>> following exception in client logs:
>>
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> contact
>> region server Some server, retryOnlyOne=true, index=0, islastrow=false,
>> tries=9, numtries=10, i=179, listsize=883,
>> region=4,\x00\x00F\x16,1263403845332 for region
>> 4,\x00\x00E5,1263403845332,
>> row '\x00\x00E\xA2', but failed after 10 attempts.
>>
>>
> Your log is interesting.  I see a bunch of this:
>
> 2010-01-15 14:17:39,064 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of
> 0,\x00\x01T\xE5,1263450046617 because global memstore limit of 1.9g
> exceeded; currently 1.8g and flushing till 1.2g
>
> Which would explain some of your slowness.
>
> That you have 800 regions per server likely makes the above happen more
> frequently that it should... this and your randomized keys.  The latter are
> probably putting little pieces into each of the regions making it harder for
> a good fat flush to happen to free up the above.
>
> I also see forced flushing happening because you have "too many log files".
>  My guess this latter is a new phenomeon because of the randomized keys.
>  You are running hbase 0.20.2 or the head of the 0.20 branch.  The latter
> might help with the log issue.  This issue shouldn't get in the way of
> slowing your servers.  That'd be the former issue.
>
>
>>
>> But I see that all servers are online. I can only suppose that sometimes
>> there is insufficient number of RPC handlers. Also I would like to ask how
>> replication in hadoop works. You can see in pictures from previous post
>> that
>> inbound traffic = outbound for server under load. Is that mean that hadoop
>> creates replication for block on another server as we wrote this block on
>> current server? Is there any influence of replication on read/write speed
>> (I
>> mean is there any case when replications impacts on network throughput and
>> read/write operations became slower)?
>>
>
>
> Yes.  Hadoop replicates as you write.
>
>
> Do you need 800 regions per server?  You might want to up the size of your
> regions... make them 1G regions rather than 256M.  It would depend on your
> write rate.
>
> Let me get back to you.  I have to go at the moment.  This log is
> interesting.  I want to look at it more.
>
> St.Ack
>
>
>>
>> 2010/1/14 Dmitriy Lyfar <dl...@gmail.com>
>>
>> > Hi,
>> >
>> > > Speed still the same (about 1K rows per second).
>> >> >
>> >>
>> >> This seems low for your 6 node cluster.
>> >>
>> >> If you look at the servers, are they cpu or io bound-up in any way?
>> >>
>> >> How many clients you have running now?
>> >>
>> >
>> > Now I'm running 1-2 clients in parallel. If I run more -- timings grows.
>> > Also I not use namenode as datanode and as regionserver. There is only
>> > namenode/secondarynn/master/zk.
>> >
>> >
>> >>
>> >> This is not a new table right?  (I see there is an existing table in
>> your
>> >> cluster looking at the regionserver log).   Its an existing table of
>> many
>> >> regions?
>> >>
>> >
>> > Yes. I have 7 test tables. Client randomly select table which will be
>> used
>> > at start.
>> > Now after some tests I have about 800 regions per region server and 7
>> > tables.
>> >
>> >
>> >>
>> >> You have upped the handlers in hbase.  Have you done same for datanodes
>> >> (In
>> >> case we are bottlenecking here).
>> >>
>> >
>> > I've updated this setting for hadoop also. As I understand if something
>> > wrong with
>> > number of handles -- I will get an exception TooManyOpenFiles and
>> datanode
>> > finish its work.
>> > All works fine for now. I've attached metrics from one of datanodes. On
>> > other nodes we have almost same picture. Please look at the throughput
>> > picture. It seems illogical to me that node have almost equal inbound
>> and
>> > outbound traffic (render.png). These pictures were snapped while running
>> two
>> > clients and then after some break I've ran one client.
>> >
>> >
>> >>  > Random ints plays a role of row keys now (i.e. uniform random
>> >> distribution
>> >> > on (0, 100 * 1000)).
>> >> > What do you think is 5GB for hbase and 2GB for hdfs enough?
>> >> >
>> >> > Yes, that should be good.  Writing you are not using that memory in
>> >> regionserver though, maybe you should go with bigger regions if you
>> have
>> >> 25k
>> >> cells.  You using compression?
>> >>
>> >
>> > Yes, 25Kb is important, but I think in production system we will have
>> > 70-80% of 5-10Kb rows,
>> > about 20% of 25Kb rows and 10% of > 25Kb rows. I'm not using any
>> > compression for columns because I was thinking about throughput. But I
>> was
>> > planning to use compression when I can achieve 80-90 Mb/sec for this
>> test.
>> >
>> >
>> >>
>> >> I took a look at your regionserver log.  Its just after an open of the
>> >> regionserver.  I see no activity other than the opening of a few
>> regions.
>> >>  These regions do happen to have alot of store files so we're starting
>> up
>> >> compactions but that all should be fine.  I'd be interested in seeing a
>> >> log
>> >> snippet from a regionserver under load.
>> >>
>> >
>> > Ok, there are some tests running now which will be interesting I think,
>> > I'll provide regionserver logs a bit later.
>> > Thank you for your help!
>> >
>> > --
>> > Regards, Lyfar Dmitriy
>> >
>> >
>>
>>
>> --
>> Regards, Lyfar Dmitriy
>> mailto: dlyfar@crystalnix.com
>> jabber: dlyfar@gmail.com
>>
>
>

Re: Problems with write performance (25kb rows)

Posted by stack <st...@duboce.net>.

Your pngs of traffic didn't come across.  Please put them somewhere I can
pull.

On Fri, Jan 15, 2010 at 5:40 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:

>
> After some night tests I have log of one regionserver in debug mode.
> I've uploaded it here: http://slil.ru/28491882 (downloading begins after
> 10
> second)
>

Thats an interesting site Dmitry (smile).



> But there is some problems I see after these tests, I regularly have
> following exception in client logs:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server, retryOnlyOne=true, index=0, islastrow=false,
> tries=9, numtries=10, i=179, listsize=883,
> region=4,\x00\x00F\x16,1263403845332 for region 4,\x00\x00E5,1263403845332,
> row '\x00\x00E\xA2', but failed after 10 attempts.
>
>
Your log is interesting.  I see a bunch of this:

2010-01-15 14:17:39,064 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of
0,\x00\x01T\xE5,1263450046617 because global memstore limit of 1.9g
exceeded; currently 1.8g and flushing till 1.2g

Which would explain some of your slowness.

That you have 800 regions per server likely makes the above happen more
frequently that it should... this and your randomized keys.  The latter are
probably putting little pieces into each of the regions making it harder for
a good fat flush to happen to free up the above.

I also see forced flushing happening because you have "too many log files".
 My guess this latter is a new phenomeon because of the randomized keys.
 You are running hbase 0.20.2 or the head of the 0.20 branch.  The latter
might help with the log issue.  This issue shouldn't get in the way of
slowing your servers.  That'd be the former issue.


>
> But I see that all servers are online. I can only suppose that sometimes
> there is insufficient number of RPC handlers. Also I would like to ask how
> replication in hadoop works. You can see in pictures from previous post
> that
> inbound traffic = outbound for server under load. Is that mean that hadoop
> creates replication for block on another server as we wrote this block on
> current server? Is there any influence of replication on read/write speed
> (I
> mean is there any case when replications impacts on network throughput and
> read/write operations became slower)?
>


Yes.  Hadoop replicates as you write.


Do you need 800 regions per server?  You might want to up the size of your
regions... make them 1G regions rather than 256M.  It would depend on your
write rate.

Let me get back to you.  I have to go at the moment.  This log is
interesting.  I want to look at it more.

St.Ack


>
> 2010/1/14 Dmitriy Lyfar <dl...@gmail.com>
>
> > Hi,
> >
> > > Speed still the same (about 1K rows per second).
> >> >
> >>
> >> This seems low for your 6 node cluster.
> >>
> >> If you look at the servers, are they cpu or io bound-up in any way?
> >>
> >> How many clients you have running now?
> >>
> >
> > Now I'm running 1-2 clients in parallel. If I run more -- timings grows.
> > Also I not use namenode as datanode and as regionserver. There is only
> > namenode/secondarynn/master/zk.
> >
> >
> >>
> >> This is not a new table right?  (I see there is an existing table in
> your
> >> cluster looking at the regionserver log).   Its an existing table of
> many
> >> regions?
> >>
> >
> > Yes. I have 7 test tables. Client randomly select table which will be
> used
> > at start.
> > Now after some tests I have about 800 regions per region server and 7
> > tables.
> >
> >
> >>
> >> You have upped the handlers in hbase.  Have you done same for datanodes
> >> (In
> >> case we are bottlenecking here).
> >>
> >
> > I've updated this setting for hadoop also. As I understand if something
> > wrong with
> > number of handles -- I will get an exception TooManyOpenFiles and
> datanode
> > finish its work.
> > All works fine for now. I've attached metrics from one of datanodes. On
> > other nodes we have almost same picture. Please look at the throughput
> > picture. It seems illogical to me that node have almost equal inbound and
> > outbound traffic (render.png). These pictures were snapped while running
> two
> > clients and then after some break I've ran one client.
> >
> >
> >>  > Random ints plays a role of row keys now (i.e. uniform random
> >> distribution
> >> > on (0, 100 * 1000)).
> >> > What do you think is 5GB for hbase and 2GB for hdfs enough?
> >> >
> >> > Yes, that should be good.  Writing you are not using that memory in
> >> regionserver though, maybe you should go with bigger regions if you have
> >> 25k
> >> cells.  You using compression?
> >>
> >
> > Yes, 25Kb is important, but I think in production system we will have
> > 70-80% of 5-10Kb rows,
> > about 20% of 25Kb rows and 10% of > 25Kb rows. I'm not using any
> > compression for columns because I was thinking about throughput. But I
> was
> > planning to use compression when I can achieve 80-90 Mb/sec for this
> test.
> >
> >
> >>
> >> I took a look at your regionserver log.  Its just after an open of the
> >> regionserver.  I see no activity other than the opening of a few
> regions.
> >>  These regions do happen to have alot of store files so we're starting
> up
> >> compactions but that all should be fine.  I'd be interested in seeing a
> >> log
> >> snippet from a regionserver under load.
> >>
> >
> > Ok, there are some tests running now which will be interesting I think,
> > I'll provide regionserver logs a bit later.
> > Thank you for your help!
> >
> > --
> > Regards, Lyfar Dmitriy
> >
> >
>
>
> --
> Regards, Lyfar Dmitriy
> mailto: dlyfar@crystalnix.com
> jabber: dlyfar@gmail.com
>

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Hi Stack,

After some night tests I have log of one regionserver in debug mode.
I've uploaded it here: http://slil.ru/28491882 (downloading begins after 10
second)
But there is some problems I see after these tests, I regularly have
following exception in client logs:

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server, retryOnlyOne=true, index=0, islastrow=false,
tries=9, numtries=10, i=179, listsize=883,
region=4,\x00\x00F\x16,1263403845332 for region 4,\x00\x00E5,1263403845332,
row '\x00\x00E\xA2', but failed after 10 attempts.


But I see that all servers are online. I can only suppose that sometimes
there is insufficient number of RPC handlers. Also I would like to ask how
replication in hadoop works. You can see in pictures from previous post that
inbound traffic = outbound for server under load. Is that mean that hadoop
creates replication for block on another server as we wrote this block on
current server? Is there any influence of replication on read/write speed (I
mean is there any case when replications impacts on network throughput and
read/write operations became slower)?

2010/1/14 Dmitriy Lyfar <dl...@gmail.com>

> Hi,
>
> > Speed still the same (about 1K rows per second).
>> >
>>
>> This seems low for your 6 node cluster.
>>
>> If you look at the servers, are they cpu or io bound-up in any way?
>>
>> How many clients you have running now?
>>
>
> Now I'm running 1-2 clients in parallel. If I run more -- timings grows.
> Also I not use namenode as datanode and as regionserver. There is only
> namenode/secondarynn/master/zk.
>
>
>>
>> This is not a new table right?  (I see there is an existing table in your
>> cluster looking at the regionserver log).   Its an existing table of many
>> regions?
>>
>
> Yes. I have 7 test tables. Client randomly select table which will be used
> at start.
> Now after some tests I have about 800 regions per region server and 7
> tables.
>
>
>>
>> You have upped the handlers in hbase.  Have you done same for datanodes
>> (In
>> case we are bottlenecking here).
>>
>
> I've updated this setting for hadoop also. As I understand if something
> wrong with
> number of handles -- I will get an exception TooManyOpenFiles and datanode
> finish its work.
> All works fine for now. I've attached metrics from one of datanodes. On
> other nodes we have almost same picture. Please look at the throughput
> picture. It seems illogical to me that node have almost equal inbound and
> outbound traffic (render.png). These pictures were snapped while running two
> clients and then after some break I've ran one client.
>
>
>>  > Random ints plays a role of row keys now (i.e. uniform random
>> distribution
>> > on (0, 100 * 1000)).
>> > What do you think is 5GB for hbase and 2GB for hdfs enough?
>> >
>> > Yes, that should be good.  Writing you are not using that memory in
>> regionserver though, maybe you should go with bigger regions if you have
>> 25k
>> cells.  You using compression?
>>
>
> Yes, 25Kb is important, but I think in production system we will have
> 70-80% of 5-10Kb rows,
> about 20% of 25Kb rows and 10% of > 25Kb rows. I'm not using any
> compression for columns because I was thinking about throughput. But I was
> planning to use compression when I can achieve 80-90 Mb/sec for this test.
>
>
>>
>> I took a look at your regionserver log.  Its just after an open of the
>> regionserver.  I see no activity other than the opening of a few regions.
>>  These regions do happen to have alot of store files so we're starting up
>> compactions but that all should be fine.  I'd be interested in seeing a
>> log
>> snippet from a regionserver under load.
>>
>
> Ok, there are some tests running now which will be interesting I think,
> I'll provide regionserver logs a bit later.
> Thank you for your help!
>
> --
> Regards, Lyfar Dmitriy
>
>


-- 
Regards, Lyfar Dmitriy
mailto: dlyfar@crystalnix.com
jabber: dlyfar@gmail.com

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Hi,

> Speed still the same (about 1K rows per second).
> >
>
> This seems low for your 6 node cluster.
>
> If you look at the servers, are they cpu or io bound-up in any way?
>
> How many clients you have running now?
>

Now I'm running 1-2 clients in parallel. If I run more -- timings grows.
Also I not use namenode as datanode and as regionserver. There is only
namenode/secondarynn/master/zk.


>
> This is not a new table right?  (I see there is an existing table in your
> cluster looking at the regionserver log).   Its an existing table of many
> regions?
>

Yes. I have 7 test tables. Client randomly select table which will be used
at start.
Now after some tests I have about 800 regions per region server and 7
tables.


>
> You have upped the handlers in hbase.  Have you done same for datanodes (In
> case we are bottlenecking here).
>

I've updated this setting for hadoop also. As I understand if something
wrong with
number of handles -- I will get an exception TooManyOpenFiles and datanode
finish its work.
All works fine for now. I've attached metrics from one of datanodes. On
other nodes we have almost same picture. Please look at the throughput
picture. It seems illogical to me that node have almost equal inbound and
outbound traffic (render.png). These pictures were snapped while running two
clients and then after some break I've ran one client.


>  > Random ints plays a role of row keys now (i.e. uniform random
> distribution
> > on (0, 100 * 1000)).
> > What do you think is 5GB for hbase and 2GB for hdfs enough?
> >
> > Yes, that should be good.  Writing you are not using that memory in
> regionserver though, maybe you should go with bigger regions if you have
> 25k
> cells.  You using compression?
>

Yes, 25Kb is important, but I think in production system we will have 70-80%
of 5-10Kb rows,
about 20% of 25Kb rows and 10% of > 25Kb rows. I'm not using any compression
for columns because I was thinking about throughput. But I was planning to
use compression when I can achieve 80-90 Mb/sec for this test.


>
> I took a look at your regionserver log.  Its just after an open of the
> regionserver.  I see no activity other than the opening of a few regions.
>  These regions do happen to have alot of store files so we're starting up
> compactions but that all should be fine.  I'd be interested in seeing a log
> snippet from a regionserver under load.
>

Ok, there are some tests running now which will be interesting I think, I'll
provide regionserver logs a bit later.
Thank you for your help!

-- 
Regards, Lyfar Dmitriy

Re: Problems with write performance (25kb rows)

Posted by stack <st...@duboce.net>.

On Wed, Jan 13, 2010 at 4:35 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:

> And ulimit is 32K for sure.

Yes, I see that in the log.

> Speed still the same (about 1K rows per second).
>

This seems low for your 6 node cluster.

If you look at the servers, are they cpu or io bound-up in any way?

How many clients you have running now?

This is not a new table right?  (I see there is an existing table in your
cluster looking at the regionserver log).   Its an existing table of many
regions?

You have upped the handlers in hbase.  Have you done same for datanodes (In
case we are bottlenecking here).

> Random ints plays a role of row keys now (i.e. uniform random distribution
> on (0, 100 * 1000)).
> What do you think is 5GB for hbase and 2GB for hdfs enough?
>
> Yes, that should be good.  Writing you are not using that memory in
regionserver though, maybe you should go with bigger regions if you have 25k
cells.  You using compression?

I took a look at your regionserver log.  Its just after an open of the
regionserver.  I see no activity other than the opening of a few regions.
 These regions do happen to have alot of store files so we're starting up
compactions but that all should be fine.  I'd be interested in seeing a log
snippet from a regionserver under load.

St.Ack

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Hi Stack,

Thank you for you help. I set xceivers in hdfs xml config like:

<property>
        <name>dfs.datanode.max.xcievers</name>
        <value>8192</value>
</property>

And ulimit is 32K for sure. I turned off DEBUG logging level for hbase and
here is log for one of regionservers after I have inserted 200K records
(each row is 25Kb).
Speed still the same (about 1K rows per second).
Random ints plays a role of row keys now (i.e. uniform random distribution
on (0, 100 * 1000)).
What do you think is 5GB for hbase and 2GB for hdfs enough?


> What are you tasktrackers doing?   Are they doing the hbase loading?  You
> might try turning down how many task run concurrently on each tasktracker.
> The running tasktracker may be sucking resources from hdfs (and thus by
> association, from hbase): i.e. mapred.map.tasks and mapred.reduce.tasks
> (Pardon me if this advice has been given previous and you've already acted
> on it).


Tasktrackers is not used now (I planned them for future use in statistical
analysis). So I turned them off for last tests. Data uploader is several
clients which run simultaneously on name node and each of them inserts 100K
records.

-- 
Regards, Lyfar Dmitriy

Re: Problems with write performance (25kb rows)

Posted by stack <st...@duboce.net>.

Thanks for posting config.

The reason I suggest removing i-cms is because of this paragraph (pointed to
from the hbase performance page up on wiki):
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#icms

Are you sure this works in your hbase-env.sh:

ulimit -n 32000

Check the first line in your master or regionserver log after startup.
It'll print out the ulimit that hbase sees.  Make sure its 32k.

Yes, xceivers is important, especially when loading gets up to where you
are.

I took a look at the regionserver log you posted.  Please enable DEBUG going
forward (See FAQ in wiki for how).

What are you tasktrackers doing?   Are they doing the hbase loading?  You
might try turning down how many task run concurrently on each tasktracker.
The running tasktracker may be sucking resources from hdfs (and thus by
association, from hbase): i.e. mapred.map.tasks and mapred.reduce.tasks
(Pardon me if this advice has been given previous and you've already acted
on it).

St.Ack

On Tue, Jan 12, 2010 at 5:36 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:

> Hi Stack,
>
> I have following configuration: http://pastebin.com/m3a471fab
> so I'll remove incremental mode option.
>
> 2010/1/11 stack <st...@duboce.net>
>
> > What do you have for your HBASE_OPTS in conf/hbase-env.sh Dmitry?
> >
> > Remove this:
> >
> > -XX:+CMSIncrementalMode
> >
> > ... if its present on your HBASE_OPTS.
> >
> > Change your hbase.zookeeper.property.tickTime from 2 to 3 so that zk
> > session
> > goes for longer (See the comment in the head of the 0.20 branch for
> > explaination).
> >
>
> Ok, I'll change.
>
>
> >
> > What else is running on the machines where the regionserver times out its
> > session w/ zk?  Anything?  The uploader?
> >
>
> Nothing, just hadoop datanode, tasktracker and regionserver (no zk).
>
>
> >
> > You've set the ulimit > 1024 and xceivers?  Right (I don't see that in
> your
> > old messages)
> >
>
> As for ulimit I have 32K. But did not change xceivers, I will try and
> report
> about results, thanks.
> Btw, is 2Gb heap size is enough for hadoop in such configuration?
>
> --
> Regards, Lyfar Dmitriy
>

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Hi Stack,

I have following configuration: http://pastebin.com/m3a471fab
so I'll remove incremental mode option.

2010/1/11 stack <st...@duboce.net>

> What do you have for your HBASE_OPTS in conf/hbase-env.sh Dmitry?
>
> Remove this:
>
> -XX:+CMSIncrementalMode
>
> ... if its present on your HBASE_OPTS.
>
> Change your hbase.zookeeper.property.tickTime from 2 to 3 so that zk
> session
> goes for longer (See the comment in the head of the 0.20 branch for
> explaination).
>

Ok, I'll change.


>
> What else is running on the machines where the regionserver times out its
> session w/ zk?  Anything?  The uploader?
>

Nothing, just hadoop datanode, tasktracker and regionserver (no zk).


>
> You've set the ulimit > 1024 and xceivers?  Right (I don't see that in your
> old messages)
>

As for ulimit I have 32K. But did not change xceivers, I will try and report
about results, thanks.
Btw, is 2Gb heap size is enough for hadoop in such configuration?

-- 
Regards, Lyfar Dmitriy

Re: Problems with write performance (25kb rows)

Posted by stack <st...@duboce.net>.

What do you have for your HBASE_OPTS in conf/hbase-env.sh Dmitry?

Remove this:

-XX:+CMSIncrementalMode

... if its present on your HBASE_OPTS.

Change your hbase.zookeeper.property.tickTime from 2 to 3 so that zk session
goes for longer (See the comment in the head of the 0.20 branch for
explaination).

What else is running on the machines where the regionserver times out its
session w/ zk?  Anything?  The uploader?

You've set the ulimit > 1024 and xceivers?  Right (I don't see that in your
old messages)

St.Ack

On Mon, Jan 11, 2010 at 6:21 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:

> Hi,
>
> 2010/1/10 Jean-Daniel Cryans <jd...@apache.org>
>
> > You have this line:
> >
> > 2010-01-08 21:25:24,709 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > slept 66413ms, ten times longer than scheduled: 3000
> >
> > That's a garbage collector pause that lasted more than a minute which
> > is higher than the default timeout to consider a region server dead
> > (40 seconds in 0.20 unless you are using 0.20.3RC1). The master
> > replayed the write-ahead-logs and reopened the regions elsewhere.
> >
> > You want to set a higher heap space in conf/hbase-env.sh because the
> > default 1GB is way too low, give it a much as you can without
> > swapping.
> >
> > J-D
> >
> >
> I can try to add more memory to regionservers. But now I already have 5Gb
> per each node.
> (I'm using 0.20.2).
>
> --
> Thank you, Lyfar Dmitriy
>

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Hi,

2010/1/10 Jean-Daniel Cryans <jd...@apache.org>

> You have this line:
>
> 2010-01-08 21:25:24,709 WARN org.apache.hadoop.hbase.util.Sleeper: We
> slept 66413ms, ten times longer than scheduled: 3000
>
> That's a garbage collector pause that lasted more than a minute which
> is higher than the default timeout to consider a region server dead
> (40 seconds in 0.20 unless you are using 0.20.3RC1). The master
> replayed the write-ahead-logs and reopened the regions elsewhere.
>
> You want to set a higher heap space in conf/hbase-env.sh because the
> default 1GB is way too low, give it a much as you can without
> swapping.
>
> J-D
>
>
I can try to add more memory to regionservers. But now I already have 5Gb
per each node.
(I'm using 0.20.2).

-- 
Thank you, Lyfar Dmitriy

Re: Problems with write performance (25kb rows)

Posted by Jean-Daniel Cryans <jd...@apache.org>.

You have this line:

2010-01-08 21:25:24,709 WARN org.apache.hadoop.hbase.util.Sleeper: We
slept 66413ms, ten times longer than scheduled: 3000

That's a garbage collector pause that lasted more than a minute which
is higher than the default timeout to consider a region server dead
(40 seconds in 0.20 unless you are using 0.20.3RC1). The master
replayed the write-ahead-logs and reopened the regions elsewhere.

You want to set a higher heap space in conf/hbase-env.sh because the
default 1GB is way too low, give it a much as you can without
swapping.

J-D

On Sat, Jan 9, 2010 at 4:06 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:
> Hello,
>
> 2010/1/5 Jean-Daniel Cryans <jd...@apache.org>
>
>> WRT your last 2 emails, HBase ships with defaults that are working
>> safely for most of the users and in no way tuned for one time upload.
>> Playing with the memstore size like you did makes sense.
>>
>> Now you said you were inserting with row key being reversed ts... are
>> all threads using the same key space when uploading? I ask this
>> because if all 60 threads are hitting almost always the same region
>> (different one in time), then all 60 threads are just filling up
>> really fast the same memstore, then all wait for the snapshot,
>> eventually all wait for the same region split and in the mean time
>> fills the same WAL which will probably be rolled some times. Is it the
>> case?
>>
>> You could also post a region server log for us to analyze.
>>
>
> Now I'm using random int keys to distribute loading between regionservers.
> Now I not use threaded client, but multiprocessed one. And timings still
> almost same (sometimes random keys are faster).
> I left cluster for night stress testing. I've ran several clients, each of
> them inserts 100K of 25Kb records. I noticed that one of my regionservers
> were closed. I've analyzed logs and seems there were timeout with zookeeper
> service which caused closing of regionserver.
> Cluster continued its work, but test's timings were increased. I have few
> questions.
> Should I shutdown all cluster in such case to return closed regionserver to
> work?
> What master will do in such cases? Will it reassign regions to another
> servers? How it impacts on read/write performance?
> Logs of this regionserver is here: http://pastebin.com/m1c25e2ae
>
> --
> Thank you, Lyfar Dmitriy
>

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Hello,

2010/1/5 Jean-Daniel Cryans <jd...@apache.org>

> WRT your last 2 emails, HBase ships with defaults that are working
> safely for most of the users and in no way tuned for one time upload.
> Playing with the memstore size like you did makes sense.
>
> Now you said you were inserting with row key being reversed ts... are
> all threads using the same key space when uploading? I ask this
> because if all 60 threads are hitting almost always the same region
> (different one in time), then all 60 threads are just filling up
> really fast the same memstore, then all wait for the snapshot,
> eventually all wait for the same region split and in the mean time
> fills the same WAL which will probably be rolled some times. Is it the
> case?
>
> You could also post a region server log for us to analyze.
>

Now I'm using random int keys to distribute loading between regionservers.
Now I not use threaded client, but multiprocessed one. And timings still
almost same (sometimes random keys are faster).
I left cluster for night stress testing. I've ran several clients, each of
them inserts 100K of 25Kb records. I noticed that one of my regionservers
were closed. I've analyzed logs and seems there were timeout with zookeeper
service which caused closing of regionserver.
Cluster continued its work, but test's timings were increased. I have few
questions.
Should I shutdown all cluster in such case to return closed regionserver to
work?
What master will do in such cases? Will it reassign regions to another
servers? How it impacts on read/write performance?
Logs of this regionserver is here: http://pastebin.com/m1c25e2ae

-- 
Thank you, Lyfar Dmitriy

Re: Problems with write performance (25kb rows)

Posted by Jean-Daniel Cryans <jd...@apache.org>.

WRT your last 2 emails, HBase ships with defaults that are working
safely for most of the users and in no way tuned for one time upload.
Playing with the memstore size like you did makes sense.

Now you said you were inserting with row key being reversed ts... are
all threads using the same key space when uploading? I ask this
because if all 60 threads are hitting almost always the same region
(different one in time), then all 60 threads are just filling up
really fast the same memstore, then all wait for the snapshot,
eventually all wait for the same region split and in the mean time
fills the same WAL which will probably be rolled some times. Is it the
case?

You could also post a region server log for us to analyze.

J-D

On Tue, Jan 5, 2010 at 5:56 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:
> Stack,
>
> I did some tests with different flushing parameters. I've touched following
> params:
> hbase.hregion.memstore.block.multiplier
> hbase.hregion.memstore.flush.size
>
> When I've increased flushsize to 256Mb (64Mb by default) time of 25Kb test
> grew.
> When I've changed block.multiplier to 12 (was 10) I won about 40 seconds in
> 25Kb test (was 150-160 secs, became ~110 secs for one running instance that
> inserts 100K records).
> All tests I did was with WAL on.
>
>
> 2010/1/5 Dmitriy Lyfar <dl...@gmail.com>
>
>> Hello Stack,
>>
>>
>>> > And throughput without WAL is about 50 Mb/sec and  about 15 Mb/sec with
>>> WAL
>>> > on. When I run clients in serial order (i.e. at the moment there is only
>>> > one
>>> > working script) time almost stable and not grows.
>>> >
>>> >
>>> > > See what the
>>> > > numbers are like uploading into a table that is pre-split?
>>> >
>>> >
>>> > Sorry, what you mean pre-split? You mean splitting regions before
>>> running
>>> > script?
>>> >
>>> >
>>> I was thinking you were uploading into a new table and that the region
>>> splits were happening inline with your upload.  I was asking what the
>>> performance was like if the table had already had all its regions pre-made
>>> wondering if it ran faster but sounds like your table is already
>>> pre-split.
>>>
>>> So where are we at now?  You tried running multiple separate upload
>>> processes and it still runs too slow?
>>>
>>
>> Yes, still too slow, especially with WAL on. Btw, I see the greater row
>> size, the greater impact has WAL. I'm not an expert in hbase internals, but
>> I begin think that the reason of throughput fall in case of 25Kb size
>> connected with flushing. I mean looks like we begin flush too often and it
>> impacts on throughput.
>> Also as I see from architecture description there are could be several
>> reasons, like rolling hlog too often and long compaction period. Would you
>> advice which log messages in region/master logs should warn me that
>> something going wrong?
>>
>>
>> --
>> Regards, Lyfar Dmitriy
>>
>>
>
>
> --
> Regards, Lyfar Dmitriy
> mailto: dlyfar@crystalnix.com
> jabber: dlyfar@gmail.com
>

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Stack,

I did some tests with different flushing parameters. I've touched following
params:
hbase.hregion.memstore.block.multiplier
hbase.hregion.memstore.flush.size

When I've increased flushsize to 256Mb (64Mb by default) time of 25Kb test
grew.
When I've changed block.multiplier to 12 (was 10) I won about 40 seconds in
25Kb test (was 150-160 secs, became ~110 secs for one running instance that
inserts 100K records).
All tests I did was with WAL on.


2010/1/5 Dmitriy Lyfar <dl...@gmail.com>

> Hello Stack,
>
>
>> > And throughput without WAL is about 50 Mb/sec and  about 15 Mb/sec with
>> WAL
>> > on. When I run clients in serial order (i.e. at the moment there is only
>> > one
>> > working script) time almost stable and not grows.
>> >
>> >
>> > > See what the
>> > > numbers are like uploading into a table that is pre-split?
>> >
>> >
>> > Sorry, what you mean pre-split? You mean splitting regions before
>> running
>> > script?
>> >
>> >
>> I was thinking you were uploading into a new table and that the region
>> splits were happening inline with your upload.  I was asking what the
>> performance was like if the table had already had all its regions pre-made
>> wondering if it ran faster but sounds like your table is already
>> pre-split.
>>
>> So where are we at now?  You tried running multiple separate upload
>> processes and it still runs too slow?
>>
>
> Yes, still too slow, especially with WAL on. Btw, I see the greater row
> size, the greater impact has WAL. I'm not an expert in hbase internals, but
> I begin think that the reason of throughput fall in case of 25Kb size
> connected with flushing. I mean looks like we begin flush too often and it
> impacts on throughput.
> Also as I see from architecture description there are could be several
> reasons, like rolling hlog too often and long compaction period. Would you
> advice which log messages in region/master logs should warn me that
> something going wrong?
>
>
> --
> Regards, Lyfar Dmitriy
>
>


-- 
Regards, Lyfar Dmitriy
mailto: dlyfar@crystalnix.com
jabber: dlyfar@gmail.com

Re: Problems with write performance (25kb rows)

Posted by Dmitriy Lyfar <dl...@gmail.com>.

Hello Stack,


> > And throughput without WAL is about 50 Mb/sec and  about 15 Mb/sec with
> WAL
> > on. When I run clients in serial order (i.e. at the moment there is only
> > one
> > working script) time almost stable and not grows.
> >
> >
> > > See what the
> > > numbers are like uploading into a table that is pre-split?
> >
> >
> > Sorry, what you mean pre-split? You mean splitting regions before running
> > script?
> >
> >
> I was thinking you were uploading into a new table and that the region
> splits were happening inline with your upload.  I was asking what the
> performance was like if the table had already had all its regions pre-made
> wondering if it ran faster but sounds like your table is already pre-split.
>
> So where are we at now?  You tried running multiple separate upload
> processes and it still runs too slow?
>

Yes, still too slow, especially with WAL on. Btw, I see the greater row
size, the greater impact has WAL. I'm not an expert in hbase internals, but
I begin think that the reason of throughput fall in case of 25Kb size
connected with flushing. I mean looks like we begin flush too often and it
impacts on throughput.
Also as I see from architecture description there are could be several
reasons, like rolling hlog too often and long compaction period. Would you
advice which log messages in region/master logs should warn me that
something going wrong?


-- 
Regards, Lyfar Dmitriy

Re: Problems with write performance (25kb rows)

Posted by stack <st...@duboce.net>.

On Mon, Jan 4, 2010 at 5:18 AM, Dmitriy Lyfar <dl...@gmail.com> wrote:

> As I said I have 6 nodes except master node and each node has 235 regions.
> 1406 regions total.
>

Pardon me.  I overlooked this bit of info.



> And throughput without WAL is about 50 Mb/sec and  about 15 Mb/sec with WAL
> on. When I run clients in serial order (i.e. at the moment there is only
> one
> working script) time almost stable and not grows.
>
>
> > See what the
> > numbers are like uploading into a table that is pre-split?
>
>
> Sorry, what you mean pre-split? You mean splitting regions before running
> script?
>
>
I was thinking you were uploading into a new table and that the region
splits were happening inline with your upload.  I was asking what the
performance was like if the table had already had all its regions pre-made
wondering if it ran faster but sounds like your table is already pre-split.

So where are we at now?  You tried running multiple separate upload
processes and it still runs too slow?

St.Ack