You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Karthik Ranganathan (JIRA)" <ji...@apache.org> on 2010/04/07 02:22:33 UTC

[jira] Updated: (HBASE-2341) Suite of test scripts that a.) load a cluster with a verifiable dataset and b.) do random kills of regionserver+datanodes in small cluster

     [ https://issues.apache.org/jira/browse/HBASE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karthik Ranganathan updated HBASE-2341:
---------------------------------------

    Attachment: HBASE-2341-0.20.3.patch

This is a manual test that tries to kill region servers and data nodes, restarts them and verifies the data load. I was running on a 3 RS cluster, and was not able to get to the point that verifies if the read verification works ok.

The patch will add the source code to the following location in the HBase source:
src/test/org/apache/hadoop/hbase/manual

usage:
HBaseTest   
  -zk <Zookeeper node>  
  -load <num keys>:<average cols per key>:<avg data size>[:<num threads = 20>]  
  -read <start key>:<end key>[:<verify percent = 0>:<num threads = 20>]  
  -kill <HBase root path>:<HDFS root path>:<minutes between kills>:<RS kill %>:<#keys to verify>


> Suite of test scripts that a.) load a cluster with a verifiable dataset and b.) do random kills of regionserver+datanodes in small cluster
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2341
>                 URL: https://issues.apache.org/jira/browse/HBASE-2341
>             Project: Hadoop HBase
>          Issue Type: Task
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: count-slaves.rb, HBASE-2341-0.20.3.patch, test.sh, VerifiableEditor.java, VerifiableEditor.java
>
>
> We just filed hbase-2340 but discussion up on irc has it that we need something more hardcore than pussy-footing inside a single jvm as hdfs-2340 does.  The point was made (tlipcon) that its hard to ensure real recovery working if all is in the one JVM.
> So, this issue is about scripts that can:
> + load a cluster with a dataset that we can 'verify' as in we can tell if it has holes in it, if data has been lost.
> + script that does random kill of a random node on some random occasion
> + Script that can check cluster for data loss
> All above should work while cluster is under load.
> The above would not sit under junit.
> This looks like a suite that we'd want to run up in ec2 using Andrew's scripts and our donated aws credits.
> {code}
> 16:12 < tlipcon> here's my goal: we have a 5 node cluster in the back room. I want to run hbase on that at near full load for a week straight while some process goes around screwing with it
> 16:12 < tlipcon> then I want to verify that I didn't lose a single edit over that week
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.