You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Eric K <co...@gmail.com> on 2014/09/05 09:34:43 UTC

Re: HBase balancer performance

Hi Esteban,

I would expect poor performance for awhile with the HDFS balancer running,
but I guess the mystery on my end was that the HDFS balancer logs claimed
that it is done in 5 minutes but the bad performance continues for 45.  I
did try your suggestion though, to leave the HDFS balancer out, and it
helps a lot, so the poor performance is almost surely from the HDFS
balancer.  I want to make use of more machines at runtime to get better
performance when it is needed, and the good news is that since HBase
appears to be the performance bottleneck, it helps to just let the HBase
balancer work without needing to explicitly call the HDFS balancer.

Thanks again for your help,
Eric


On Fri, Aug 29, 2014 at 5:10 PM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> Hi Eric,
>
> Sound pretty much to what you would expect when the HDFS balancer is
> running. Can you try this approach and tell us if that helps?
>
> 1. add new nodes, start DNs and RSs. (don't start the HDFS balancer)
> 2. wait until the HBase balancer stops and all region servers have about
> the same number of regions.
> 3. then major compact the table (wait until the compaction queue drops near
> to 0 or io due compaction has lowered)
> 4. run ycsb with the Workload C
>
> > So why isn't life good like this when invoking the balancer?
>
> Basically the HDFS balancer is killing data locality and gets worst due the
> fact you have one replica per block. If you could increase the number of
> HDFS replicas to 3 and don't run the HDFS balancer things shouldn't be that
> bad.
>
> Please keep us posted,
> Esteban.
>
>
>
>
> --
> Cloudera, Inc.
>
>
>
> On Fri, Aug 29, 2014 at 1:56 PM, Eric K <co...@gmail.com> wrote:
>
> > Thanks Esteban and others for your help on this, I'm still having issues
> > after digging into this further, so let me provide more info.  It's a bit
> > of a mystery.
> >
> > I am using the YCSB Workload C to achieve 100% reads, however I had
> changed
> > the request distribution from zipfian to uniform.  I changed it back to
> > zipfian, though, using the original Workload C, with the same poor
> results
> > -- it takes about 45 minutes after doubling the hbase nodes from 4 to 8
> > before YCSB performance goes back to normal.  I expect a small drop in
> > performance due to relocation of data that is no longer cached, but not
> > this bad:  700 ops/sec over the 45 minute period after adding the nodes,
> > compared to 7000 ops/sec once things get better.
> >
> > I double checked and the same poor performance happens if I add new nodes
> > but wait 15 minutes before actually starting the workload.  If I wait an
> > extended 45 minutes before starting the workload, then the time of poor
> > performance is shortened a lot (~15 minutes).  Next I tried waiting an
> hour
> > and there was no poor performance, so something long running appears to
> be
> > happening within HBase/HDFS or left over as a result of adding more nodes
> > to balance over.  But top and iostat don't really show any activity
> during
> > that extended 45 minute period (after the 5 minutes of actual balancing).
> >  Is there anything else helpful that I could check?
> >
> > My HBase heap size is set to 5120 MB in hbase-env.sh.  I'm also running
> > with HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=6
> > -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly
> > -XX:MaxGCPauseMillis=100".
> >
> > I'm not sure how useful this was, but I flush the user table and do a
> major
> > compact after loading in the data, but before starting the actual read
> > workload and adding the 4 extra nodes.
> >
> > When I'm not balancing onto more nodes, but just running a secondary
> > workload of read requests for a different table over the same HBase
> > servers, the read requests for the primary user table will suffer as
> > expected, and the performance will jump back up to normal after the
> > secondary read requests finish, also as expected.  So why isn't life good
> > like this when invoking the balancer?
> >
> > Thanks,
> > Eric
> >
> >
> >
> >
> >
> > On Fri, Aug 22, 2014 at 8:18 PM, Esteban Gutierrez <esteban@cloudera.com
> >
> > wrote:
> >
> > > Hello Eric,
> > >
> > > Can you tell us more about the YCSB workload? for instance for
> workload b
> > > you should expect a drop in performance after new RSs are added since
> > data
> > > locality will be impacted by the region re-assignment. If you major
> > compact
> > > the tables or the regions with poor locality you should see an
> > improvement,
> > > specially if you are running workload b or any workload with is high on
> > > reads. However you shouldn't see a huge drop in performance if RS and
> DNs
> > > are correctly configured and there are no networking issues.
> > >
> > > cheers,
> > > esteban.
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > >
> > > On Fri, Aug 22, 2014 at 4:26 PM, Eric K <co...@gmail.com>
> wrote:
> > >
> > > > I wish that explained it, but the last reassignment message happens
> > > around
> > > > T+1(min) and then it's just 44 minutes of "skipping because
> balanced."
> > > Is
> > > > there anything else that could be preventing HBase from serving data
> > from
> > > > the newly reassigned nodes?
> > > >
> > > > Perhaps HDFS could also be the culprit.  Hadoop namenode logs show a
> > lot
> > > of
> > > > this for T+5 minutes:
> > > >
> > > > 2014-08-15 16:45:21,169 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > > > NameSystem.addStoredBlock: blockMap updated: 172.19.152.251:9002 is
> > > added
> > > > to blk_5924440019130413908_2033 size 14
> > > > 2014-08-15 16:45:21,183 INFO org.apache.hadoop.hdfs.StateChange:
> > Removing
> > > > lease on  file /system/balancer.id from client DFSClient_-1186888991
> > > > 2014-08-15 16:45:21,183 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> > > > NameSystem.completeFile: file /system/balancer.id is closed by
> > > > DFSClient_-1186888991
> > > > 2014-08-15 16:45:21,183 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> > > > transactions: 198 Total time for transactions(ms): 3Number of
> > > transactions
> > > > batched in Syncs: 0 Number of syncs: 9 SyncTimes(ms): 209
> > > > 2014-08-15 16:45:21,201 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > > > NameSystem.addToInvalidates: blk_5924440019130413908 is added to
> > > invalidSet
> > > > of 172.19.152.251:9002
> > > > 2014-08-15 16:45:23,017 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > > ask
> > > > 172.19.152.251:9002 to delete  blk_5924440019130413908_2033
> > > > 2014-08-15 16:46:10,991 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > > > NameSystem.addStoredBlock: blockMap updated: 172.19.152.241:9002 is
> > > added
> > > > to blk_3473904193447805515_2028 size 311
> > > > 2014-08-15 16:46:10,995 INFO org.apache.hadoop.hdfs.StateChange:
> > Removing
> > > > lease on  file /hbase/.logs/
> > > >
> > > > But then there is not much log activity afterwards for the next hour.
> > > >
> > > > NameSystem.processReport: from 172.19.149.160:9002, blocks: 109,
> > > > processing
> > > > time: 16 msecs
> > > > 2014-08-15 16:50:58,871 INFO org.apache.hadoop.hdfs.StateChange:
> > *BLOCK*
> > > > NameSystem.processReport: from 172.19.149.89:9002, blocks: 67,
> > > processing
> > > > time: 0 msecs
> > > > 2014-08-15 16:54:27,788 INFO org.apache.hadoop.hdfs.StateChange:
> > *BLOCK*
> > > > NameSystem.processReport: from 172.19.152.247:9002, blocks: 84,
> > > processing
> > > > time: 3 msecs
> > > > 2014-08-15 16:56:08,408 INFO org.apache.hadoop.hdfs.StateChange:
> > *BLOCK*
> > > > NameSystem.processReport: from 172.19.152.251:9002, blocks: 58,
> > > processing
> > > > time: 1 msecs
> > > > 2014-08-15 16:57:57,491 INFO org.apache.hadoop.hdfs.StateChange:
> > *BLOCK*
> > > > NameSystem.processReport: from 172.19.152.241:9002, blocks: 83,
> > > processing
> > > > time: 3 msecs
> > > > 2014-08-15 16:59:37,278 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
> from
> > > > 172.19.152.221
> > > > 2014-08-15 16:59:37,278 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> > > > transactions: 205 Total time for transactions(ms): 3Number of
> > > transactions
> > > > batched in Syncs: 0 Number of syncs: 15 SyncTimes(ms): 307
> > > > 2014-08-15 16:59:37,920 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage
> from
> > > > 172.19.152.221
> > > > 2014-08-15 16:59:37,920 INFO
> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> > > > transactions: 0 Total time for transactions(ms): 0Number of
> > transactions
> > > > batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 33
> > > > 2014-08-15 17:21:45,268 INFO org.apache.hadoop.hdfs.StateChange:
> > *BLOCK*
> > > > NameSystem.processReport: from 172.19.149.42:9002, blocks: 81,
> > > processing
> > > > time: 0 msecs
> > > > 2014-08-15 17:23:43,104 INFO org.apache.hadoop.hdfs.StateChange:
> > *BLOCK*
> > > > NameSystem.processReport: from 172.19.149.210:9002, blocks: 90,
> > > processing
> > > > time: 2 msecs
> > > > 2014-08-15 17:33:26,596 INFO org.apache.hadoop.hdfs.StateChange:
> > *BLOCK*
> > > > NameSystem.processReport: from 172.19.149.231:9002, blocks: 74,
> > > processing
> > > > time: 1 msecs
> > > >
> > > > Thanks,
> > > > Eric
> > > >
> > > >
> > > > On Fri, Aug 22, 2014 at 5:00 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > Was the last 'Successfully deleted unassigned node' log close to
> > > > T+45(min)
> > > > > ?
> > > > >
> > > > > If so, it means that AssignmentManager was busy assigning these
> > regions
> > > > > during this period.
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > On Fri, Aug 22, 2014 at 1:50 PM, Eric K <co...@gmail.com>
> > > wrote:
> > > > >
> > > > > > After about a minute everything appears to have been moved, since
> > the
> > > > > > master only prints out lines like:
> > > > > >
> > > > > > 2014-08-15 16:40:56,945 INFO
> > > > org.apache.hadoop.hbase.master.LoadBalancer:
> > > > > > Skipping load balancing because balanced cluster; servers=8
> > > regions=126
> > > > > > average=15.75 mostloaded=16 leastloaded=15
> > > > > >
> > > > > > Within the first minute, though, after adding the new nodes, I
> see
> > a
> > > > lot
> > > > > of
> > > > > > activity like this:
> > > > > >
> > > > > > 2014-08-15 16:40:49,142 INFO
> > org.apache.hadoop.hbase.master.HMaster:
> > > > > > balance
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hri=usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31.,
> > > > > > src=172-19-152-247,60020,1408131968632,
> > > > > > dest=172-19-149-231,60020,1408135240788
> > > > > > 2014-08-15 16:40:49,142 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Starting
> > > unassignment
> > > > > of
> > > > > > region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31.
> > > > > > (offlining)
> > > > > > 2014-08-15 16:40:49,142 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x147db34a2c80000 Creating unassigned node for
> > > > > > e3c3e82d52f54692a70e7b59aeb56b31 in a CLOSING state
> > > > > > 2014-08-15 16:40:49,181 DEBUG
> > > > > org.apache.hadoop.hbase.master.ServerManager:
> > > > > > New connection to 172-19-152-247,60020,1408131968632
> > > > > > 2014-08-15 16:40:49,189 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
> > > > > > 172-19-152-247,60020,1408131968632 for region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> usertable,user9673372036854775807,1408128925670.e3c3e82d52f54692a70e7b59aeb56b31.
> > > > > > 2014-08-15 16:40:49,189 INFO
> > org.apache.hadoop.hbase.master.HMaster:
> > > > > > balance
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hri=usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a.,
> > > > > > src=172-19-152-241,60020,1408131968630,
> > > > > > dest=172-19-149-89,60020,1408135238164
> > > > > > 2014-08-15 16:40:49,189 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Starting
> > > unassignment
> > > > > of
> > > > > > region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a.
> > > > > > (offlining)
> > > > > > 2014-08-15 16:40:49,189 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x147db34a2c80000 Creating unassigned node for
> > > > > > 1fe73c5f470b78e7e9d2a0691659967a in a CLOSING state
> > > > > > 2014-08-15 16:40:49,223 DEBUG
> > > > > org.apache.hadoop.hbase.master.ServerManager:
> > > > > > New connection to 172-19-152-241,60020,1408131968630
> > > > > > 2014-08-15 16:40:49,223 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling
> > > > > > transition=RS_ZK_REGION_OPENED,
> > > > > server=172-19-149-231,60020,1408135240788,
> > > > > > region=7b08e35b1fb9f445d91a90ae89bb401a
> > > > > > 2014-08-15 16:40:49,224 DEBUG
> > > > > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
> > Handling
> > > > > OPENED
> > > > > > event for
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> othertable,user1342640501535615655,1408128955190.7b08e35b1fb9f445d91a90ae89bb401a.
> > > > > > from 172-19-149-231,60020,1408135240788; deleting unassigned node
> > > > > > 2014-08-15 16:40:49,224 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x147db34a2c80000 Deleting existing unassigned node
> > for
> > > > > > 7b08e35b1fb9f445d91a90ae89bb401a that is in expected state
> > > > > > RS_ZK_REGION_OPENED
> > > > > > 2014-08-15 16:40:49,225 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
> > > > > > 172-19-152-241,60020,1408131968630 for region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> usertable,user1068528100307123131,1408128925659.1fe73c5f470b78e7e9d2a0691659967a.
> > > > > > 2014-08-15 16:40:49,225 INFO
> > org.apache.hadoop.hbase.master.HMaster:
> > > > > > balance
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hri=usertable,user9873372036854775807,1408128925670.3eeb1e68835cbb98c9f796d437fe7ebc.,
> > > > > > src=172-19-152-241,60020,1408131968630,
> > > > > > dest=172-19-149-89,60020,1408135238164
> > > > > > 2014-08-15 16:40:49,225 DEBUG
> > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Starting
> > > unassignment
> > > > > of
> > > > > > region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> usertable,user9873372036854775807,1408128925670.3eeb1e68835cbb98c9f796d437fe7ebc.
> > > > > > (offlining)
> > > > > > 2014-08-15 16:40:49,225 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x147db34a2c80000 Creating unassigned node for
> > > > > > 3eeb1e68835cbb98c9f796d437fe7ebc in a CLOSING state
> > > > > > 2014-08-15 16:40:49,281 DEBUG
> > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > > master:60000-0x147db34a2c80000 Successfully deleted unassigned
> node
> > > for
> > > > > > region 7b08e35b1fb9f445d91a90ae89bb401a in expected state
> > > > > > RS_ZK_REGION_OPENED
> > > > > >
> > > > > > Thanks,
> > > > > > Eric
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 22, 2014 at 2:24 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Suppose you add new nodes at time T, can you check master log
> > > > between T
> > > > > > and
> > > > > > > T+45(min) to see how long region movements took ?
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Aug 22, 2014 at 11:19 AM, Eric K <
> codemonkeeez@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > I'm experimenting with HBase (0.94.4) and Hadoop (1.0.4)
> > runtime
> > > > > > > balancers
> > > > > > > > on just a very tiny 4 node cluster and finding that
> performance
> > > is
> > > > > bad
> > > > > > > for
> > > > > > > > an hour after adding new nodes, even though all the data is
> > > > > supposedly
> > > > > > > > offloaded within a few minutes.
> > > > > > > >
> > > > > > > > Using YCSB-generated requests, I load the database with about
> > 18
> > > GB
> > > > > > worth
> > > > > > > > of data across 12 million records.  The keyspace is initially
> > > > > pre-split
> > > > > > > to
> > > > > > > > have ~30 regions per node with single replication of data.
> > Then
> > > I
> > > > > hit
> > > > > > > > HBase with read requests from a set of clients so that there
> > are
> > > > 2000
> > > > > > > > requests outstanding, and new requests are immediately made
> > after
> > > > > > replies
> > > > > > > > are received.  While these requests are running, after about
> 2
> > > > > > minutes, I
> > > > > > > > double the nodes from 4 to 8, add the new node information to
> > the
> > > > > > slaves
> > > > > > > > and regionservers files, start up the new datanode and
> > > regionserver
> > > > > > > > processes, and call the hdfs balancer with the smallest
> > possible
> > > > > > > threshold
> > > > > > > > of 1.  The hbase.balancer.period is also set to 10 seconds so
> > as
> > > to
> > > > > > > respond
> > > > > > > > fast to new nodes.  The dfs.bandwidth.bandwidthPerSec is set
> > to 8
> > > > > MB/s
> > > > > > > per
> > > > > > > > node, but I have also tried higher numbers that don't
> > bottleneck
> > > > the
> > > > > > > > offload rate and gotten similar results.
> > > > > > > >
> > > > > > > > The expected response is that about half the data and regions
> > are
> > > > > > > offloaded
> > > > > > > > to the new nodes in the next few minutes, and the logs (and
> > > > hdfsadmin
> > > > > > > > reports) confirm that this is indeed happening.  However, I'm
> > > > seeing
> > > > > > the
> > > > > > > > throughput drop from 3000 to 500 requests/sec when the nodes
> > are
> > > > > added,
> > > > > > > > with latencies jumping from ~1s to 4 or 5 seconds, and this
> > poor
> > > > > > > > performance persists for almost 45 minutes, when it abruptly
> > gets
> > > > > > better
> > > > > > > to
> > > > > > > > 4000 requests/sec and 500ms latencies.
> > > > > > > >
> > > > > > > > I'd appreciate any ideas as to what could be causing that 45
> > > minute
> > > > > > > > performance delay, and how I can debug this further.  If I
> add
> > > the
> > > > > new
> > > > > > > > nodes and balance onto them before starting up any read
> > requests,
> > > > > > > > performance is much better from the start.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Eric
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>