You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Alok Singh <al...@urbanairship.com> on 2012/04/02 19:15:51 UTC

Re: 0.92 and Read/writes not scaling

Sorry for jumping on this thread late, but, I have seen very similar
behavior in our cluster with hadoop 0.23.2 (CDH4B2 snapshot) and hbase
0.23.1. We have a small, 7 node cluster (48GB/16Core/6x10Kdisk/GigE
network) with about 500M rows/4Tb of data. The random read performance
is excellent, but, random write throughput maxes out around 10K/sec.
Turning off the WAL takes it up to 40-50k/sec, but, that's not
something we will leave off in production.

One of the settings that I experimented with was
hbase.hregion.max.filesize. Increasing it to 10GB actually made the
write throughput worse, so, I have set it back down to 2GB. Later this
week, I will attempt to do another cycle of tests and hopefully have
some thread dumps to report back with.

Alok


2012/3/30 Doug Meil <do...@explorysmedical.com>:
>
> Just as a quick reminder regarding what Todd mentioned, that's exactly
> what was happening in this case study...
>
> http://hbase.apache.org/book.html#casestudies.slownode
>
> ... although it doesn't appear to be the problem in this particular
> situation.
>
>
>
>
> On 3/29/12 8:22 PM, "Juhani Connolly" <ju...@gmail.com> wrote:
>
>>On Fri, Mar 30, 2012 at 7:36 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>> On the other hand, I've seen that "frame errors" are often correlated
>>> with NICs auto-negotiating to the wrong speed, etc. Double check with
>>> ethtool that all of your machines are gigabit full-duplex and not
>>> doing something strange. Also double check your bonding settings, etc.
>>>
>>> -Todd
>>>
>>
>>I did this after seeing the errors on ifconfig, but everything looks
>>ok on that front:
>>Settings for eth0:
>>       Supported ports: [ TP ]
>>       Supported link modes:   10baseT/Half 10baseT/Full
>>                               100baseT/Half 100baseT/Full
>>                               1000baseT/Full
>>       Supports auto-negotiation: Yes
>>       Advertised link modes:  10baseT/Half 10baseT/Full
>>                               100baseT/Half 100baseT/Full
>>                               1000baseT/Full
>>       Advertised auto-negotiation: Yes
>>       Speed: 1000Mb/s
>>       Duplex: Full
>>       Port: Twisted Pair
>>       PHYAD: 1
>>       Transceiver: internal
>>       Auto-negotiation: on
>>       Supports Wake-on: g
>>       Wake-on: d
>>       Link detected: yes
>>
>>Also, since yesterday the error counts have not increased at all so I
>>guess that was just a red herring...
>>
>>
>>> 2012/3/28 Dave Wang <ds...@cloudera.com>:
>>>> As you said, the amount of errors and drops you are seeing are very
>>>>small
>>>> compared to your overall traffic, so I doubt that is a significant
>>>> contributor to the throughput problems you are seeing.
>>>>
>>>> - Dave
>>>>
>>>> On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly <
>>>> juhani_connolly@cyberagent.co.jp> wrote:
>>>>
>>>>> Ron,
>>>>>
>>>>> thanks for sharing those settings. Unfortunately they didn't help
>>>>>with our
>>>>> read throughput, but every little bit helps.
>>>>>
>>>>> Another suspicious thing that has come up is with the network... While
>>>>> overall throughput has been verified to be able to go much higher
>>>>>than the
>>>>> tax hbase is putting on it right now, there seem to be errors and
>>>>>dropped
>>>>> packets(though this is relative to a massive amount of traffic):
>>>>>
>>>>> [juhani_connolly@hornet-**slave01 ~]$ sudo /sbin/ifconfig bond0
>>>>> �ѥ���`��:
>>>>> bond0 Link encap:Ethernet HWaddr 78:2B:CB:59:A9:34
>>>>> inet addr:******** Bcast:********** Mask:255.255.0.0
>>>>> inet6 addr: fe80::7a2b:cbff:fe59:a934/64 Scope:Link
>>>>> UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
>>>>> RX packets:9422705447 errors:605 dropped:6222 overruns:0 frame:605
>>>>> TX packets:9317689449 errors:0 dropped:0 overruns:0 carrier:0
>>>>> collisions:0 txqueuelen:0
>>>>> RX bytes:6609813756075 (6.0 TiB) TX bytes:6033761947482 (5.4 TiB)
>>>>>
>>>>> could this possibly be a problem cause?
>>>>> Since we haven't heard anything on expected throughput we're
>>>>>downgrading
>>>>> our hdfs back to 0.20.2, I'd be curious to hear how other people do
>>>>>with
>>>>> 0.23 and the throughput they're getting.
>>>>>
>>>>>
>>>>> On 03/29/2012 02:56 AM, Buckley,Ron wrote:
>>>>>
>>>>>> Stack,
>>>>>>
>>>>>> We're about 80% random read and 20% random write. So, that would have
>>>>>> been the mix that we were running.
>>>>>>
>>>>>> We'll try a test with Nagel On and then Nagel off, random write only,
>>>>>> later this afternoon and see if the same pattern emerges.
>>>>>>
>>>>>> Ron
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>>>>>Stack
>>>>>> Sent: Wednesday, March 28, 2012 1:12 PM
>>>>>> To: user@hbase.apache.org
>>>>>> Subject: Re: 0.92 and Read/writes not scaling
>>>>>>
>>>>>> On Wed, Mar 28, 2012 at 5:41 AM, Buckley,Ron<bu...@oclc.org>
>>>>>>wrote:
>>>>>>
>>>>>>> For us, setting these two, got rid of  all of the 20 and 40 ms
>>>>>>>response
>>>>>>> times and dropped the average response time we measured from HBase
>>>>>>>by
>>>>>>> more than half.  Plus, we can push HBase a lot harder.
>>>>>>>
>>>>>>>  That had an effect on random read workload only Ron?
>>>>>> Thanks,
>>>>>> St.Ack
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>
>
>

Re: 0.92 and Read/writes not scaling

Posted by Stack <st...@duboce.net>.
2012/4/2 Alok Singh <al...@urbanairship.com>:
> Sorry for jumping on this thread late, but, I have seen very similar
> behavior in our cluster with hadoop 0.23.2 (CDH4B2 snapshot) and hbase
> 0.23.1. We have a small, 7 node cluster (48GB/16Core/6x10Kdisk/GigE
> network) with about 500M rows/4Tb of data. The random read performance
> is excellent, but, random write throughput maxes out around 10K/sec.
> Turning off the WAL takes it up to 40-50k/sec, but, that's not
> something we will leave off in production.
>
> One of the settings that I experimented with was
> hbase.hregion.max.filesize. Increasing it to 10GB actually made the
> write throughput worse, so, I have set it back down to 2GB. Later this
> week, I will attempt to do another cycle of tests and hopefully have
> some thread dumps to report back with.
>

Thanks for writing the list Alok.  Juhani is going to come back on
this thread saying he went back to 0.20.x hadoop to get his write
performance back.  I'll let him respond.  Seems like we have an issue
w/ 0.23 hadoop WAL writes going by yours and Juhani's experience.

St.Ack

Re: 0.92 and Read/writes not scaling

Posted by Juhani Connolly <ju...@gmail.com>.
Hi Alok, please refer to my previous post where I detailed some of the
stuff we did.

At this point, I'm unsure if it is actually possible to get good
autoFlushed throughput with 0.23, we weren't able to and switched back
to 0.20.2

If you want to persevere however, please let us know if you make any
breakthroughs!

2012/4/3 Alok Singh <al...@urbanairship.com>:
> Sorry for jumping on this thread late, but, I have seen very similar
> behavior in our cluster with hadoop 0.23.2 (CDH4B2 snapshot) and hbase
> 0.23.1. We have a small, 7 node cluster (48GB/16Core/6x10Kdisk/GigE
> network) with about 500M rows/4Tb of data. The random read performance
> is excellent, but, random write throughput maxes out around 10K/sec.
> Turning off the WAL takes it up to 40-50k/sec, but, that's not
> something we will leave off in production.
>
> One of the settings that I experimented with was
> hbase.hregion.max.filesize. Increasing it to 10GB actually made the
> write throughput worse, so, I have set it back down to 2GB. Later this
> week, I will attempt to do another cycle of tests and hopefully have
> some thread dumps to report back with.
>
> Alok
>
>
> 2012/3/30 Doug Meil <do...@explorysmedical.com>:
>>
>> Just as a quick reminder regarding what Todd mentioned, that's exactly
>> what was happening in this case study...
>>
>> http://hbase.apache.org/book.html#casestudies.slownode
>>
>> ... although it doesn't appear to be the problem in this particular
>> situation.
>>
>>
>>
>>
>> On 3/29/12 8:22 PM, "Juhani Connolly" <ju...@gmail.com> wrote:
>>
>>>On Fri, Mar 30, 2012 at 7:36 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>>> On the other hand, I've seen that "frame errors" are often correlated
>>>> with NICs auto-negotiating to the wrong speed, etc. Double check with
>>>> ethtool that all of your machines are gigabit full-duplex and not
>>>> doing something strange. Also double check your bonding settings, etc.
>>>>
>>>> -Todd
>>>>
>>>
>>>I did this after seeing the errors on ifconfig, but everything looks
>>>ok on that front:
>>>Settings for eth0:
>>>       Supported ports: [ TP ]
>>>       Supported link modes:   10baseT/Half 10baseT/Full
>>>                               100baseT/Half 100baseT/Full
>>>                               1000baseT/Full
>>>       Supports auto-negotiation: Yes
>>>       Advertised link modes:  10baseT/Half 10baseT/Full
>>>                               100baseT/Half 100baseT/Full
>>>                               1000baseT/Full
>>>       Advertised auto-negotiation: Yes
>>>       Speed: 1000Mb/s
>>>       Duplex: Full
>>>       Port: Twisted Pair
>>>       PHYAD: 1
>>>       Transceiver: internal
>>>       Auto-negotiation: on
>>>       Supports Wake-on: g
>>>       Wake-on: d
>>>       Link detected: yes
>>>
>>>Also, since yesterday the error counts have not increased at all so I
>>>guess that was just a red herring...
>>>
>>>
>>>> 2012/3/28 Dave Wang <ds...@cloudera.com>:
>>>>> As you said, the amount of errors and drops you are seeing are very
>>>>>small
>>>>> compared to your overall traffic, so I doubt that is a significant
>>>>> contributor to the throughput problems you are seeing.
>>>>>
>>>>> - Dave
>>>>>
>>>>> On Wed, Mar 28, 2012 at 7:36 PM, Juhani Connolly <
>>>>> juhani_connolly@cyberagent.co.jp> wrote:
>>>>>
>>>>>> Ron,
>>>>>>
>>>>>> thanks for sharing those settings. Unfortunately they didn't help
>>>>>>with our
>>>>>> read throughput, but every little bit helps.
>>>>>>
>>>>>> Another suspicious thing that has come up is with the network... While
>>>>>> overall throughput has been verified to be able to go much higher
>>>>>>than the
>>>>>> tax hbase is putting on it right now, there seem to be errors and
>>>>>>dropped
>>>>>> packets(though this is relative to a massive amount of traffic):
>>>>>>
>>>>>> [juhani_connolly@hornet-**slave01 ~]$ sudo /sbin/ifconfig bond0
>>>>>> �ѥ���`��:
>>>>>> bond0 Link encap:Ethernet HWaddr 78:2B:CB:59:A9:34
>>>>>> inet addr:******** Bcast:********** Mask:255.255.0.0
>>>>>> inet6 addr: fe80::7a2b:cbff:fe59:a934/64 Scope:Link
>>>>>> UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
>>>>>> RX packets:9422705447 errors:605 dropped:6222 overruns:0 frame:605
>>>>>> TX packets:9317689449 errors:0 dropped:0 overruns:0 carrier:0
>>>>>> collisions:0 txqueuelen:0
>>>>>> RX bytes:6609813756075 (6.0 TiB) TX bytes:6033761947482 (5.4 TiB)
>>>>>>
>>>>>> could this possibly be a problem cause?
>>>>>> Since we haven't heard anything on expected throughput we're
>>>>>>downgrading
>>>>>> our hdfs back to 0.20.2, I'd be curious to hear how other people do
>>>>>>with
>>>>>> 0.23 and the throughput they're getting.
>>>>>>
>>>>>>
>>>>>> On 03/29/2012 02:56 AM, Buckley,Ron wrote:
>>>>>>
>>>>>>> Stack,
>>>>>>>
>>>>>>> We're about 80% random read and 20% random write. So, that would have
>>>>>>> been the mix that we were running.
>>>>>>>
>>>>>>> We'll try a test with Nagel On and then Nagel off, random write only,
>>>>>>> later this afternoon and see if the same pattern emerges.
>>>>>>>
>>>>>>> Ron
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>>>>>>>Stack
>>>>>>> Sent: Wednesday, March 28, 2012 1:12 PM
>>>>>>> To: user@hbase.apache.org
>>>>>>> Subject: Re: 0.92 and Read/writes not scaling
>>>>>>>
>>>>>>> On Wed, Mar 28, 2012 at 5:41 AM, Buckley,Ron<bu...@oclc.org>
>>>>>>>wrote:
>>>>>>>
>>>>>>>> For us, setting these two, got rid of  all of the 20 and 40 ms
>>>>>>>>response
>>>>>>>> times and dropped the average response time we measured from HBase
>>>>>>>>by
>>>>>>>> more than half.  Plus, we can push HBase a lot harder.
>>>>>>>>
>>>>>>>>  That had an effect on random read workload only Ron?
>>>>>>> Thanks,
>>>>>>> St.Ack
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>
>>
>>