You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jonathan Hsieh <jo...@cloudera.com> on 2012/01/06 18:43:57 UTC

Stress testing releases candidates

Hey all,

I'm curious about the kinds of testing you all do and which paths you
exercise before +1'ing a release candidate (or putting it into production).
 Do you just simulate the expected workloads you have at your
installations?  How much testing do you all do on error recovery paths or
when HBase gets into a stressful place?

Jimmy and I've been doing long-running TestLoadAndVerify from Bigtop using
different configurations including a stressful (flush/split/compact heavy,
properties below) configuration with the recent 0.92 release
candidates. TestLoadAndVerify basically are two sequentially executed MR
jobs -- one that loads data that have "dependency chains" on previous
writes, and one that verifies that all chains are satisfied (link below).
At the moment we've been manually injecting faults (killing meta, masters,
root, random rs's, pausing them to simulate GC's) but will be likely
injecting faults and exercising recovery paths more regularly and
systematically.

This approach has resulted in some of the recent dist log splitting
deadlocks Jimmy's been working.

I've encountered a few "transient" data missing problems that I'm still
trying to duplicate and isolate.  Best I can say now is that it seems to
happen if/when region servers have a large number of regions (roughly
900-2000 regions per range). More specifically, in these particular cases
it seems that the verify job return a list of sequential rows indicating
that a region is was temporarily unavailable or not returning data.
Interestingly, when I run just the verify job again later on the same
table, all rows are present.  Since the Load and Verify jobs are
two consecutively run MR jobs, my guess is that there is a related in
something time delayed (balancing, splitting, compaction?).

Thanks,
Jon.

Here's how to setup bigtop:
https://cwiki.apache.org/confluence/display/BIGTOP/Setting+up+Bigtop+to+run+HBase+system+tests


Here's the patch I've been using.
https://issues.apache.org/jira/browse/BIGTOP-321

Here's part of the stress configuration that stresses flushing, splitting,
and balancing operations.

----
 <!-- stress settings -->
  <property>
    <name>io.file.buffer.size</name>
    <value>131072</value>
    <description>Hadoop setting </description>
  </property>
   <property>
    <name>hbase.hregion.max.filesize</name>
    <value>4194304</value>  <!-- 4MB -->
    <!-- <value>268435456</value> 256MB, for lots of flushes without splits
-->
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles
has
    grown to exceed this value, the hosting HRegion is split in two.
    Default: 256M.
    </description>
  </property>
  <property>
    <name>hbase.balancer.period
    </name>
    <value>2000</value>
    <description>Period at which the region balancer runs in the Master.
    </description>
  </property>
   <property>
    <name>hbase.hregion.max.filesize</name>
    <value>4194304</value>  <!-- 4MB -->
    <!-- <value>268435456</value> 256MB, for lots of flushes without splits
-->
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles
has
    grown to exceed this value, the hosting HRegion is split in two.
    Default: 256M.
    </description>
  </property>
  <property>
    <name>hbase.balancer.period
    </name>
    <value>2000</value>
    <description>Period at which the region balancer runs in the Master.
    </description>
  </property>
  <property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>262144</value> <!-- 256KB -->
    <description>
    Memstore will be flushed to disk if size of the memstore
    exceeds this number of bytes.  Value is checked by a thread that runs
    every hbase.server.thread.wakefrequency. (normally 64 MB)
    </description>
  </property>
----

-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: Stress testing releases candidates

Posted by Ted Yu <yu...@gmail.com>.
Thanks Jon for sharing your methodology.

I have backported LoadTestTool to 0.92 (HBASE-5124) and added two more
config parameters.
I ran it against a 5 node cluster.
More testing is underway.

Cheers

On Fri, Jan 6, 2012 at 9:43 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> Hey all,
>
> I'm curious about the kinds of testing you all do and which paths you
> exercise before +1'ing a release candidate (or putting it into production).
>  Do you just simulate the expected workloads you have at your
> installations?  How much testing do you all do on error recovery paths or
> when HBase gets into a stressful place?
>
> Jimmy and I've been doing long-running TestLoadAndVerify from Bigtop using
> different configurations including a stressful (flush/split/compact heavy,
> properties below) configuration with the recent 0.92 release
> candidates. TestLoadAndVerify basically are two sequentially executed MR
> jobs -- one that loads data that have "dependency chains" on previous
> writes, and one that verifies that all chains are satisfied (link below).
> At the moment we've been manually injecting faults (killing meta, masters,
> root, random rs's, pausing them to simulate GC's) but will be likely
> injecting faults and exercising recovery paths more regularly and
> systematically.
>
> This approach has resulted in some of the recent dist log splitting
> deadlocks Jimmy's been working.
>
> I've encountered a few "transient" data missing problems that I'm still
> trying to duplicate and isolate.  Best I can say now is that it seems to
> happen if/when region servers have a large number of regions (roughly
> 900-2000 regions per range). More specifically, in these particular cases
> it seems that the verify job return a list of sequential rows indicating
> that a region is was temporarily unavailable or not returning data.
> Interestingly, when I run just the verify job again later on the same
> table, all rows are present.  Since the Load and Verify jobs are
> two consecutively run MR jobs, my guess is that there is a related in
> something time delayed (balancing, splitting, compaction?).
>
> Thanks,
> Jon.
>
> Here's how to setup bigtop:
>
> https://cwiki.apache.org/confluence/display/BIGTOP/Setting+up+Bigtop+to+run+HBase+system+tests
>
>
> Here's the patch I've been using.
> https://issues.apache.org/jira/browse/BIGTOP-321
>
> Here's part of the stress configuration that stresses flushing, splitting,
> and balancing operations.
>
> ----
>  <!-- stress settings -->
>  <property>
>    <name>io.file.buffer.size</name>
>    <value>131072</value>
>    <description>Hadoop setting </description>
>  </property>
>   <property>
>    <name>hbase.hregion.max.filesize</name>
>    <value>4194304</value>  <!-- 4MB -->
>    <!-- <value>268435456</value> 256MB, for lots of flushes without splits
> -->
>    <description>
>    Maximum HStoreFile size. If any one of a column families' HStoreFiles
> has
>    grown to exceed this value, the hosting HRegion is split in two.
>    Default: 256M.
>    </description>
>  </property>
>  <property>
>    <name>hbase.balancer.period
>    </name>
>    <value>2000</value>
>    <description>Period at which the region balancer runs in the Master.
>    </description>
>  </property>
>   <property>
>    <name>hbase.hregion.max.filesize</name>
>    <value>4194304</value>  <!-- 4MB -->
>    <!-- <value>268435456</value> 256MB, for lots of flushes without splits
> -->
>    <description>
>    Maximum HStoreFile size. If any one of a column families' HStoreFiles
> has
>    grown to exceed this value, the hosting HRegion is split in two.
>    Default: 256M.
>    </description>
>  </property>
>  <property>
>    <name>hbase.balancer.period
>    </name>
>    <value>2000</value>
>    <description>Period at which the region balancer runs in the Master.
>    </description>
>  </property>
>  <property>
>    <name>hbase.hregion.memstore.flush.size</name>
>    <value>262144</value> <!-- 256KB -->
>    <description>
>    Memstore will be flushed to disk if size of the memstore
>    exceeds this number of bytes.  Value is checked by a thread that runs
>    every hbase.server.thread.wakefrequency. (normally 64 MB)
>    </description>
>  </property>
> ----
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

RE: Stress testing releases candidates

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
+1.

Large number of regions (> 1000) per RS + heavy load - this is what we have in our environment
(staging and productions)

Its not 0.92, of course. Its 90.4 (we just upgraded staging to 90.5)

Some observations:

1. Holes in .META. which can be resolved by cluster restart (but we had one incident recently, when restart did not help and OfflineMetaRepair utility did not help, we lost ALL data
in a staging grid.) Most of the time it manifests itself with NotServingRegion exceptions.
2. Real loss of data (small fraction of). We observe this regularly when we restart cluster some tables report different (lesser) row counts.
 We almost gave up on that and decided totally rewrite our data load path using bulk loader.

I think our operations engineers would add more bullets to this list.

Probably, these issues have been resolved (to some extent) in 0.92 but we are not ready to jump on Titanic yet :)
We will wait until it safely crosses the Atlantic.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Jonathan Hsieh [jon@cloudera.com]
Sent: Friday, January 06, 2012 9:43 AM
To: dev@hbase.apache.org
Subject: Stress testing releases candidates

Hey all,

I'm curious about the kinds of testing you all do and which paths you
exercise before +1'ing a release candidate (or putting it into production).
 Do you just simulate the expected workloads you have at your
installations?  How much testing do you all do on error recovery paths or
when HBase gets into a stressful place?

Jimmy and I've been doing long-running TestLoadAndVerify from Bigtop using
different configurations including a stressful (flush/split/compact heavy,
properties below) configuration with the recent 0.92 release
candidates. TestLoadAndVerify basically are two sequentially executed MR
jobs -- one that loads data that have "dependency chains" on previous
writes, and one that verifies that all chains are satisfied (link below).
At the moment we've been manually injecting faults (killing meta, masters,
root, random rs's, pausing them to simulate GC's) but will be likely
injecting faults and exercising recovery paths more regularly and
systematically.

This approach has resulted in some of the recent dist log splitting
deadlocks Jimmy's been working.

I've encountered a few "transient" data missing problems that I'm still
trying to duplicate and isolate.  Best I can say now is that it seems to
happen if/when region servers have a large number of regions (roughly
900-2000 regions per range). More specifically, in these particular cases
it seems that the verify job return a list of sequential rows indicating
that a region is was temporarily unavailable or not returning data.
Interestingly, when I run just the verify job again later on the same
table, all rows are present.  Since the Load and Verify jobs are
two consecutively run MR jobs, my guess is that there is a related in
something time delayed (balancing, splitting, compaction?).

Thanks,
Jon.

Here's how to setup bigtop:
https://cwiki.apache.org/confluence/display/BIGTOP/Setting+up+Bigtop+to+run+HBase+system+tests


Here's the patch I've been using.
https://issues.apache.org/jira/browse/BIGTOP-321

Here's part of the stress configuration that stresses flushing, splitting,
and balancing operations.

----
 <!-- stress settings -->
  <property>
    <name>io.file.buffer.size</name>
    <value>131072</value>
    <description>Hadoop setting </description>
  </property>
   <property>
    <name>hbase.hregion.max.filesize</name>
    <value>4194304</value>  <!-- 4MB -->
    <!-- <value>268435456</value> 256MB, for lots of flushes without splits
-->
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles
has
    grown to exceed this value, the hosting HRegion is split in two.
    Default: 256M.
    </description>
  </property>
  <property>
    <name>hbase.balancer.period
    </name>
    <value>2000</value>
    <description>Period at which the region balancer runs in the Master.
    </description>
  </property>
   <property>
    <name>hbase.hregion.max.filesize</name>
    <value>4194304</value>  <!-- 4MB -->
    <!-- <value>268435456</value> 256MB, for lots of flushes without splits
-->
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles
has
    grown to exceed this value, the hosting HRegion is split in two.
    Default: 256M.
    </description>
  </property>
  <property>
    <name>hbase.balancer.period
    </name>
    <value>2000</value>
    <description>Period at which the region balancer runs in the Master.
    </description>
  </property>
  <property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>262144</value> <!-- 256KB -->
    <description>
    Memstore will be flushed to disk if size of the memstore
    exceeds this number of bytes.  Value is checked by a thread that runs
    every hbase.server.thread.wakefrequency. (normally 64 MB)
    </description>
  </property>
----

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Stress testing releases candidates

Posted by Andrew Purtell <ap...@apache.org>.
I have tools for testing, I guess we should call now our fork, that are similar (I think) to TestLoadAndVerify from BigTop: a shell/ruby script like the FaceBook LoadTestTool but write side only, and another script than runs rowcounter, also a script that writes a list over a billion rows and another that verifies no "broken links". (I've not seen the transient availability problem you describe, but haven't been testing 0.92 like this either.) So it sounds like we've all come up with the same baseline techniques. Unlike the FB tool I can specify different value size distributions, and have some settings that approximate what our applications produce. A bit crufty and use case specific. Also, a complete canned application model that I can rehydrate up on EC2 for smoke tests.

The FB tool is useful on its own now that it's upstream. Maybe will move testing/capabilities to it.

Great to see Cloudera/BigTop doing this for ASF releases. Would be great to see BigTop evolve HBase testing beyond the baseline.

> At the moment we've been manually injecting faults (killing meta, masters,
> root, random rs's, pausing them to simulate GC's) but will be likely
> injecting faults and exercising recovery paths more regularly and
> systematically. 


This is something I don't have resources to do for ASF releases, so it's great to hear it's happening elsewhere. I'm hoping mid-year to switch over to something as close to upstream 0.94 (guessing availability) as possible, as part of a larger project involving a federated 0.23 deployment, god help us. 

I'd not enable distributed splitting for production at first, though painful, so that specifically wouldn't be tested here right away.


Best regards,


  - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: Jonathan Hsieh <jo...@cloudera.com>
> To: dev@hbase.apache.org
> Cc: 
> Sent: Friday, January 6, 2012 9:43 AM
> Subject: Stress testing releases candidates
> 
> Hey all,
> 
> I'm curious about the kinds of testing you all do and which paths you
> exercise before +1'ing a release candidate (or putting it into production).
> Do you just simulate the expected workloads you have at your
> installations?  How much testing do you all do on error recovery paths or
> when HBase gets into a stressful place?
> 
> Jimmy and I've been doing long-running TestLoadAndVerify from Bigtop using
> different configurations including a stressful (flush/split/compact heavy,
> properties below) configuration with the recent 0.92 release
> candidates. TestLoadAndVerify basically are two sequentially executed MR
> jobs -- one that loads data that have "dependency chains" on previous
> writes, and one that verifies that all chains are satisfied (link below).
> At the moment we've been manually injecting faults (killing meta, masters,
> root, random rs's, pausing them to simulate GC's) but will be likely
> injecting faults and exercising recovery paths more regularly and
> systematically.
> 
> This approach has resulted in some of the recent dist log splitting
> deadlocks Jimmy's been working.
> 
> I've encountered a few "transient" data missing problems that 
> I'm still
> trying to duplicate and isolate.  Best I can say now is that it seems to
> happen if/when region servers have a large number of regions (roughly
> 900-2000 regions per range). More specifically, in these particular cases
> it seems that the verify job return a list of sequential rows indicating
> that a region is was temporarily unavailable or not returning data.
> Interestingly, when I run just the verify job again later on the same
> table, all rows are present.  Since the Load and Verify jobs are
> two consecutively run MR jobs, my guess is that there is a related in
> something time delayed (balancing, splitting, compaction?).
> 
> Thanks,
> Jon.
> 
> Here's how to setup bigtop:
> https://cwiki.apache.org/confluence/display/BIGTOP/Setting+up+Bigtop+to+run+HBase+system+tests
> 
> 
> Here's the patch I've been using.
> https://issues.apache.org/jira/browse/BIGTOP-321
> 
> Here's part of the stress configuration that stresses flushing, splitting,
> and balancing operations.
> 
> ----
> <!-- stress settings -->
>   <property>
>     <name>io.file.buffer.size</name>
>     <value>131072</value>
>     <description>Hadoop setting </description>
>   </property>
>    <property>
>     <name>hbase.hregion.max.filesize</name>
>     <value>4194304</value>  <!-- 4MB -->
>     <!-- <value>268435456</value> 256MB, for lots of flushes 
> without splits
> -->
>     <description>
>     Maximum HStoreFile size. If any one of a column families' HStoreFiles
> has
>     grown to exceed this value, the hosting HRegion is split in two.
>     Default: 256M.
>     </description>
>   </property>
>   <property>
>     <name>hbase.balancer.period
>     </name>
>     <value>2000</value>
>     <description>Period at which the region balancer runs in the Master.
>     </description>
>   </property>
>    <property>
>     <name>hbase.hregion.max.filesize</name>
>     <value>4194304</value>  <!-- 4MB -->
>     <!-- <value>268435456</value> 256MB, for lots of flushes 
> without splits
> -->
>     <description>
>     Maximum HStoreFile size. If any one of a column families' HStoreFiles
> has
>     grown to exceed this value, the hosting HRegion is split in two.
>     Default: 256M.
>     </description>
>   </property>
>   <property>
>     <name>hbase.balancer.period
>     </name>
>     <value>2000</value>
>     <description>Period at which the region balancer runs in the Master.
>     </description>
>   </property>
>   <property>
>     <name>hbase.hregion.memstore.flush.size</name>
>     <value>262144</value> <!-- 256KB -->
>     <description>
>     Memstore will be flushed to disk if size of the memstore
>     exceeds this number of bytes.  Value is checked by a thread that runs
>     every hbase.server.thread.wakefrequency. (normally 64 MB)
>     </description>
>   </property>
> ----
> 
> -- 
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>