You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Suraj Varma <sv...@gmail.com> on 2012/07/04 00:17:49 UTC

Re: Blocking Inserts

In your case, likely you are hitting the blocking store files
(hbase.hstore.blockingStoreFiles default:7) and/or
hbase.hregion.memstore.block.multiplier - check out
http://hbase.apache.org/book/config.files.html for more details on
this configurations and how they affect your insert performance.

On ganglia, also check whether you have a compaction queue spiking
during these timeouts.
--Suraj


On Thu, Jun 21, 2012 at 4:27 AM, Martin Alig <ma...@gmail.com> wrote:
> Thank you for the suggestions.
>
> So I changed the setup and now have:
> 1 Master running Namenode, SecondaryNamenode, ZK and the HMaster
> 7 Slaves running Datanode and Regionserver
> 2 Clients to insert data
>
>
> What I forgot in my first post, that sometimes the clients even get a
> SocketTimeOutException when inserting the data. (of course during that time
> 0 inserts are done)
> By looking at the logs, (I also turned on the gc logs) I see the following:
>
> Multiple consecutive entries like:
> 2012-06-21 11:42:13,962 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Blocking updates for 'IPC Server handler 6 on 60020' on region
> usertable,user600,1340200683555.a45b03dd65a62afa676488921e47dbaa.: memstore
> size 1.0g is >= than blocking 1.0g size
>
> Shortly after those entries, many entries like:
> 2012-06-21 12:43:53,028 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":35046,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2642a14d),
> rpc version=1, client version=29, methodsFingerPrint=-1508511443","client":"
> 10.110.129.12:54624
> ","starttimems":1340275397981,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
>
> Looking at the gc-logs, many entries like:
> 2870.329: [GC 2870.330: [ParNew: 108450K->3401K(118016K), 0.0182570 secs]
> 4184711K->4079843K(12569856K), 0.0183510 secs] [Times: user=0.24 sys=0.00,
> real=0.01 secs]
>
> But always arround 0.01 secs - 0.04secs.
>
> And also from the gc-log:
> 2696.013: [CMS-concurrent-sweep: 8.999/10.448 secs] [Times: user=46.93
> sys=2.24, real=10.45 secs]
>
> Is the 10.45 secs too long?
> Or what exactly should I watch out for in the gc logs?
>
>
> I also configured ganglia to have a look at some more metrics. Looking at
> io_wait (which should matter concerning my question to the disks), I can
> observe values between 10 % and 25 % on the regionserver.
> Should that be lower?
>
> Btw. I'm using HBase 0.94 and Hadoop 1.0.3.
>
>
> Thank you again.
>
>
> Martin
>
>
>
> On Wed, Jun 20, 2012 at 7:04 PM, Dave Wang <ds...@cloudera.com> wrote:
>
>> I'd also remove the DN and RS from the node running ZK, NN, etc. as you
>> don't want heavweight processes on that node.
>>
>> - Dave
>>
>> On Wed, Jun 20, 2012 at 9:31 AM, Elliott Clark <eclark@stumbleupon.com
>> >wrote:
>>
>> > Basically without metrics on what's going on it's tough to know for sure.
>> >
>> > I would turn on GC logging and make sure that is not playing a part, get
>> > metrics on IO while this is going on, and look through the logs to see
>> what
>> > is happening when you notice the pause.
>> >
>> > On Wed, Jun 20, 2012 at 6:39 AM, Martin Alig <ma...@gmail.com>
>> > wrote:
>> >
>> > > Hi
>> > >
>> > > I'm doing some evaluations with HBase. The workload I'm facing is
>> mainly
>> > > insert-only.
>> > > Currently I'm inserting 1KB rows, where 100Bytes go into one column.
>> > >
>> > > I have the following cluster machines at disposal:
>> > >
>> > > Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled)
>> > > 24 GiB Memory
>> > > 1 GigE
>> > > 2x 15k RPM Sas 73 GB (RAID1)
>> > >
>> > > I have 10 Nodes.
>> > > The first node runs:
>> > >
>> > > Namenode, SecondaryNamenode, Datanode, HMaster, Zookeeper, and a
>> > > RegionServer
>> > >
>> > > The other nodes run:
>> > >
>> > > Datanode and RegionServer
>> > >
>> > >
>> > > Now running my test client and inserting rows, the throughput goes up
>> to
>> > > 150'000 inserts/sec. But then after some time the throughput drops down
>> > to
>> > > 0 inserts/sec for quite some time, before it goes up again.
>> > > My assumption is, that it happens when the RegionServers start to write
>> > the
>> > > data from memory to the disks. I know, that the recommended hardware
>> for
>> > > HBase should contain multiple disks using JBOD or RAID 0.
>> > > But at that point I am limited right now.
>> > >
>> > > I am just asking if in my hardware setup, the blocking periods are
>> really
>> > > caused by the non-optimal disk configuration.
>> > >
>> > >
>> > > Thank you in advance for any suggestions.
>> > >
>> > >
>> > > Martin
>> > >
>> >
>>

Re: Blocking Inserts

Posted by Martin Alig <ma...@gmail.com>.
Thank you for the comment.

Compaction queue seems to be at 0 (?) all the time.
About the blocking store file: I already increased this value, but I could
not see any improvements.

Going through the logs during a "blocking" period, I often see a
"CompactionRequest". Then, for 1 minute or so nothing, and then it
continues.
Or similar, in the logs I see "Finished memstore flush" and then for 2
minutes nothing, and then it continues. And of course, insertions continue
also.

Is this just the normal behavior? Or did I miss-configure something?




On Wed, Jul 4, 2012 at 12:17 AM, Suraj Varma <sv...@gmail.com> wrote:

> In your case, likely you are hitting the blocking store files
> (hbase.hstore.blockingStoreFiles default:7) and/or
> hbase.hregion.memstore.block.multiplier - check out
> http://hbase.apache.org/book/config.files.html for more details on
> this configurations and how they affect your insert performance.
>
> On ganglia, also check whether you have a compaction queue spiking
> during these timeouts.
> --Suraj
>
>
> On Thu, Jun 21, 2012 at 4:27 AM, Martin Alig <ma...@gmail.com>
> wrote:
> > Thank you for the suggestions.
> >
> > So I changed the setup and now have:
> > 1 Master running Namenode, SecondaryNamenode, ZK and the HMaster
> > 7 Slaves running Datanode and Regionserver
> > 2 Clients to insert data
> >
> >
> > What I forgot in my first post, that sometimes the clients even get a
> > SocketTimeOutException when inserting the data. (of course during that
> time
> > 0 inserts are done)
> > By looking at the logs, (I also turned on the gc logs) I see the
> following:
> >
> > Multiple consecutive entries like:
> > 2012-06-21 11:42:13,962 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Blocking updates for 'IPC Server handler 6 on 60020' on region
> > usertable,user600,1340200683555.a45b03dd65a62afa676488921e47dbaa.:
> memstore
> > size 1.0g is >= than blocking 1.0g size
> >
> > Shortly after those entries, many entries like:
> > 2012-06-21 12:43:53,028 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> >
> {"processingtimems":35046,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2642a14d
> ),
> > rpc version=1, client version=29,
> methodsFingerPrint=-1508511443","client":"
> > 10.110.129.12:54624
> >
> ","starttimems":1340275397981,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> >
> > Looking at the gc-logs, many entries like:
> > 2870.329: [GC 2870.330: [ParNew: 108450K->3401K(118016K), 0.0182570 secs]
> > 4184711K->4079843K(12569856K), 0.0183510 secs] [Times: user=0.24
> sys=0.00,
> > real=0.01 secs]
> >
> > But always arround 0.01 secs - 0.04secs.
> >
> > And also from the gc-log:
> > 2696.013: [CMS-concurrent-sweep: 8.999/10.448 secs] [Times: user=46.93
> > sys=2.24, real=10.45 secs]
> >
> > Is the 10.45 secs too long?
> > Or what exactly should I watch out for in the gc logs?
> >
> >
> > I also configured ganglia to have a look at some more metrics. Looking at
> > io_wait (which should matter concerning my question to the disks), I can
> > observe values between 10 % and 25 % on the regionserver.
> > Should that be lower?
> >
> > Btw. I'm using HBase 0.94 and Hadoop 1.0.3.
> >
> >
> > Thank you again.
> >
> >
> > Martin
> >
> >
> >
> > On Wed, Jun 20, 2012 at 7:04 PM, Dave Wang <ds...@cloudera.com> wrote:
> >
> >> I'd also remove the DN and RS from the node running ZK, NN, etc. as you
> >> don't want heavweight processes on that node.
> >>
> >> - Dave
> >>
> >> On Wed, Jun 20, 2012 at 9:31 AM, Elliott Clark <eclark@stumbleupon.com
> >> >wrote:
> >>
> >> > Basically without metrics on what's going on it's tough to know for
> sure.
> >> >
> >> > I would turn on GC logging and make sure that is not playing a part,
> get
> >> > metrics on IO while this is going on, and look through the logs to see
> >> what
> >> > is happening when you notice the pause.
> >> >
> >> > On Wed, Jun 20, 2012 at 6:39 AM, Martin Alig <ma...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi
> >> > >
> >> > > I'm doing some evaluations with HBase. The workload I'm facing is
> >> mainly
> >> > > insert-only.
> >> > > Currently I'm inserting 1KB rows, where 100Bytes go into one column.
> >> > >
> >> > > I have the following cluster machines at disposal:
> >> > >
> >> > > Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled)
> >> > > 24 GiB Memory
> >> > > 1 GigE
> >> > > 2x 15k RPM Sas 73 GB (RAID1)
> >> > >
> >> > > I have 10 Nodes.
> >> > > The first node runs:
> >> > >
> >> > > Namenode, SecondaryNamenode, Datanode, HMaster, Zookeeper, and a
> >> > > RegionServer
> >> > >
> >> > > The other nodes run:
> >> > >
> >> > > Datanode and RegionServer
> >> > >
> >> > >
> >> > > Now running my test client and inserting rows, the throughput goes
> up
> >> to
> >> > > 150'000 inserts/sec. But then after some time the throughput drops
> down
> >> > to
> >> > > 0 inserts/sec for quite some time, before it goes up again.
> >> > > My assumption is, that it happens when the RegionServers start to
> write
> >> > the
> >> > > data from memory to the disks. I know, that the recommended hardware
> >> for
> >> > > HBase should contain multiple disks using JBOD or RAID 0.
> >> > > But at that point I am limited right now.
> >> > >
> >> > > I am just asking if in my hardware setup, the blocking periods are
> >> really
> >> > > caused by the non-optimal disk configuration.
> >> > >
> >> > >
> >> > > Thank you in advance for any suggestions.
> >> > >
> >> > >
> >> > > Martin
> >> > >
> >> >
> >>
>