You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Sharma, Avani" <ag...@ebay.com> on 2010/09/02 02:49:40 UTC

RE: HBase table lost on upgrade

That email was just informational. Below are the details on my cluster - let me know if more is needed. 

I have 2 hbase clusters setup 
-	for production, 6 node cluster,  32G, 8 processors
-	for dev, 3 node cluster , 16GRAM , 4 processors 

1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters, successfully. 
2. After that I loaded 2G+ files into HDFS and HBASE table. 
	An example Hbase table looks like this: 
		{NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS => '100', COM true
		 PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMO
		 RY => 'false', BLOCKCACHE => 'true'}]}
3. I started stargate on one server and accessed Hbase for reading from another 3rd party application successfully. 
	It took 600 seconds on dev cluster and 250 on production to read .5M records from Hbase via stargate.
4. later to boost read performance, it was suggested that upgrading to Hbase0.20.6 will be helpful. I did that on production (w/o running the migrate script) and re-started stargate and everything was running fine, though I did not see a bump in performance. 

5. Eventually, I had to move to dev cluster from production because of some resource issues at our end. Dev cluster had 0.20.3 at this time. As I started loading more files into Hbase (<10 versions of <1G files) and converting my app to use hbase more heavily (via more stargate clients), the performance started degrading. I decided it was time to upgrade dev cluster as well to 0.20.6.  (I did not run the migrate script here as well, I missed this step in the doc).

6. When Hbase 0.20.6 came back up on dev cluster (with increased block cache (.6) and region server handler counts (75) ), pointing to the same rootdir, I noticed that some tables were missing. I could see a mention of them in the logs, but not when I did 'list' in the shell. I recovered those tables using add_table.rb script. 
	a. Is there a way to check the health of all Hbase tables in the cluster after an upgrade or even periodically, to make sure that everything is healthy ? 
	b. I would like to be able to force this error again and check the health of hbase and want it to report to me that some tables were lost. Currently, I just found out because I had very less data and it was easy to tell. 

7. Here are the issues I face after this upgrade 
	a. when I run stop-hbase.sh, it  does not stop my regionservers on other boxes. 
	b. It does start them using start-hbase.sh. 
	c. Is it that stopping regionservers is not reported, but it does stop them (I see that happening on production cluster) ? 
  	
8. I started stargate in the upgraded 0.20.6 in dev cluster 
	a. earlier when I sent a URL to look for a data row that did not exist, the return value was NULL , now I get an xml stating HTTP error 404/405. 	Everything works as expected for an existing data row.
	b. and this works okay on the production cluster after upgrade, it's the dev cluster that gives this error. 
	c. examples : 
	On production cluster:
			:~ hadoop$curl http://localhost:8080/version
				Stargate 0.0.1 [JVM: Sun Microsystems Inc. 1.6.0_20-16.3-b01] [OS: SunOS 5.10 x86] [Server: jetty/6.1.14] [Jersey: 1.1.0-ea]
			:~ hadoop$curl http://localhost:8080/verison
			:~ hadoop$curl http://localhost:8080/version/cluster
				0.20.6

	On dev cluster:
			:~ hadoop$curl http://localhost:8080/version
			Stargate 1.0 [JVM: Sun Microsystems Inc. 1.6.0_20-16.3-b01] [OS: SunOS 5.10 x86] [Server: jetty/6.1.14] [Jersey: 1.1.5.1]
:~ hadoop$curl http://localhost:8080/verison
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 405 METHOD_NOT_ALLOWED</title>
</head>
<body><h2>HTTP ERROR: 405</h2><pre>METHOD_NOT_ALLOWED</pre>
<p>RequestURI=/verson</p><p><i><small><a href="http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

</body>
</html>

9. Therefore, I thought I should try downgrading to 0.20.3, basically start hbase from that old dir I still have on the dev cluster since stargate was working as desired before the upgrade. I changed all my classpaths to point to the old dir and restarted hbase and stargate from hbase-0.20.3 dir. 
	a. but I think that doesn't really work. It recognizes 0.20.6 somehow... since my hbase shell kept pointing to 0.20.6 and 
		also stargate URL "curl http://localhost:8080/version/cluster" reports 0.20.6
	b. I am not sure if there is any such thing as downgrading hbase.

10. Now I started pointing back to 0.20.6 ( running everything out of here). I still get the same http error as above.
	Below is another error ... HTTP 404 this time with 0.20.6
hadoop$curl http://localhost:8080/<table_name>/75



<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 404 NOT_FOUND</title>
</head>
<body><h2>HTTP ERROR: 404</h2><pre>NOT_FOUND</pre>
<p>RequestURI=/VRS/75</p><p><i><small><a href="http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

</body>
</html>


That was a long email. Please let me know if futher clarifications are needed. 

Thank you, 
-Avani 

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Tuesday, August 31, 2010 12:24 PM
To: user@hbase.apache.org
Subject: Re: HBase table lost on upgrade

On Tue, Aug 31, 2010 at 12:14 PM, Sharma, Avani <ag...@ebay.com> wrote:
> Thanks, Stack. Well, I was able to get the basic hbase cluster to run, but now that I am trying to boost read performance, I am running into stuff that is either not working or I cannot easily find solutions to on the net.
>

This mail that you've just written above gives us nothing to go on.
You want to boost read performance saying nothing about what current
performance, datasize, hardware, nor schema looks like.

St.Ack

Re: HBase table lost on upgrade

Posted by Ted Yu <yu...@gmail.com>.

dfs.datanode.max.xcievers is read in DataXceiverServer ctor.
If you change its value, you need to restart the cluster.

On Sat, Sep 4, 2010 at 9:23 PM, Ted Yu <yu...@gmail.com> wrote:

> The tool Stack mentioned is hbck. If you want to port it to 0.20, see email
> thread entitled:
> compiling HBaseFsck.java for 0.20.5You should try reducing the number of
> tables in your system, possibly through HBASE-2473
>
> Cheers
>
>
> On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com> wrote:
>
>>
>>
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>> Sent: Wednesday, September 01, 2010 10:45 PM
>> To: user@hbase.apache.org
>> Subject: Re: HBase table lost on upgrade
>>
>> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com> wrote:
>> > That email was just informational. Below are the details on my cluster -
>> let me know if more is needed.
>> >
>> > I have 2 hbase clusters setup
>> > -       for production, 6 node cluster,  32G, 8 processors
>> > -       for dev, 3 node cluster , 16GRAM , 4 processors
>> >
>> > 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
>> successfully.
>>
>> Why not latest stable version, 0.20.6?
>>
>> This was couple of months ago.
>>
>>
>> > 2. After that I loaded 2G+ files into HDFS and HBASE table.
>>
>>
>> Whats this mean?  Each of the .5M cells was 2G in size or the total size
>> was 2G?
>>
>> The total file size is 2G. Cells are of the order of hundreds of bytes.
>>
>>
>> >        An example Hbase table looks like this:
>> >                {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS
>> => '100', COM true
>> >                 PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
>> '65536', IN_MEMO
>> >                 RY => 'false', BLOCKCACHE => 'true'}]}
>>
>> That looks fine.
>>
>> > 3. I started stargate on one server and accessed Hbase for reading from
>> another 3rd party application successfully.
>> >        It took 600 seconds on dev cluster and 250 on production to read
>> .5M records from Hbase via stargate.
>>
>>
>> That don't sound so good.
>>
>>
>>
>> > 4. later to boost read performance, it was suggested that upgrading to
>> Hbase0.20.6 will be helpful. I did that on production (w/o running the
>> migrate script) and re-started stargate and everything was running fine,
>> though I did not see a bump in performance.
>> >
>> > 5. Eventually, I had to move to dev cluster from production because of
>> some resource issues at our end. Dev cluster had 0.20.3 at this time. As I
>> started loading more files into Hbase (<10 versions of <1G files) and
>> converting my app to use hbase more heavily (via more stargate clients), the
>> performance started degrading. I decided it was time to upgrade dev cluster
>> as well to 0.20.6.  (I did not run the migrate script here as well, I missed
>> this step in the doc).
>> >
>>
>> What kinda perf you looking for from REST?
>>
>> Do you have to use REST?  All is base64'd so its safe to transport.
>>
>> I also have the Java Api code (for testing purposes) and that gave similar
>> performance results (520 seconds on dev and 250 on production cluster). Is
>> there a way to flush the cache before we run the next experiment? I doubt
>> that the first lookup always takes longer and then the later ones perform
>> better.
>>
>> I need something that can integrate with C++ - libcurl and stargate were
>> the easiest to start with. I could look at thrift or anything else the Hbase
>> gurus think might be a better fit performance-wise.
>>
>>
>> > 6. When Hbase 0.20.6 came back up on dev cluster (with increased block
>> cache (.6) and region server handler counts (75) ), pointing to the same
>> rootdir, I noticed that some tables were missing. I could see a mention of
>> them in the logs, but not when I did 'list' in the shell. I recovered those
>> tables using add_table.rb script.
>>
>>
>> How did you shutdown this cluster?  Did you reboot machines?  Was your
>> hdfs homed on /tmp?  What is going on on your systems?  Are they
>> swapping?  Did you give HBase more than its default memory?  You read
>> the requirements and made sure ulimit and xceivers had been upped on
>> these machines?
>>
>>
>> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp.
>> They are not swapping.
>> Hbase heap size is 2G.  I have upped the xcievers now on your
>> recommanedation.  Do I need to restart hdfs after making this change in
>> hdfs-site.xml ?
>> ulimit -n
>> 2048
>>
>>
>>
>> >        a. Is there a way to check the health of all Hbase tables in the
>> cluster after an upgrade or even periodically, to make sure that everything
>> is healthy ?
>> >        b. I would like to be able to force this error again and check
>> the health of hbase and want it to report to me that some tables were lost.
>> Currently, I just found out because I had very less data and it was easy to
>> tell.
>> >
>>
>> Iin trunk there is such a tool.  In 0.20.x, run a count against our
>> table.  See the hbase shell.  Type help to see how.
>>
>>
>> What tool are you talking about here - it wasn't clear ? Count against
>> which table ? I want hbase to check all tables and I don't know how many
>> tables I have since there are too many - is that possible?
>>
>> > 7. Here are the issues I face after this upgrade
>> >        a. when I run stop-hbase.sh, it  does not stop my regionservers
>> on other boxes.
>>
>> Why not.  Whats going on on those machines?  If you tail the logs on
>> the hosts that won't go down and/or on master, what do they say?
>> Tail the logs.  Should give you (us) clue.
>>
>> They do go down with some errors in the log, but down't report it on the
>> terminal.
>> http://pastebin.com/0hYwaffL  regionserver log
>>
>>
>>
>> >        b. It does start them using start-hbase.sh.
>> >        c. Is it that stopping regionservers is not reported, but it does
>> stop them (I see that happening on production cluster) ?
>> >
>>
>>
>>
>> > 8. I started stargate in the upgraded 0.20.6 in dev cluster
>> >        a. earlier when I sent a URL to look for a data row that did not
>> exist, the return value was NULL , now I get an xml stating HTTP error
>> 404/405.        Everything works as expected for an existing data row.
>>
>> The latter sounds RESTy.  What would you expect of it?  The null?
>>
>>
>> Yes, it should send NULL like it does in the production server. Is there
>> anyone else you can point to who would have used REST ? This is the main
>> showstopper for me currently.
>>
>>
>>
>

Re: HBase table lost on upgrade - compiling HBaseFsck.java

Posted by Ted Yu <yu...@gmail.com>.

Here is the file from my earlier post which compiles in 0.20.5.

On Wed, Sep 15, 2010 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:

> The scope of change to compile HBaseFsck.java in 0.20.x is bigger than it
> used to.
> Here are the errors I got - the last 3 depend on other HBase files.
>
> compile-core:
>     [javac] Compiling 2 source files to
> /Users/tyu/hbase-0.20.5/build/classes
>     [javac]
> /Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:46:
> cannot find symbol
>     [javac] symbol  : class ZooKeeperConnectionException
>     [javac] location: package org.apache.hadoop.hbase
>     [javac] import org.apache.hadoop.hbase.ZooKeeperConnectionException;
>     [javac]                               ^
>     [javac]
> /Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:81:
> cannot find symbol
>     [javac] symbol  : class ZooKeeperConnectionException
>     [javac] location: class org.apache.hadoop.hbase.client.HBaseFsck
>     [javac]     throws MasterNotRunningException,
> ZooKeeperConnectionException, IOException {
>     [javac]                                       ^
>     [javac]
> /Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:82:
> cannot find symbol
>     [javac] symbol  : constructor
> HBaseAdmin(org.apache.hadoop.conf.Configuration)
>     [javac] location: class org.apache.hadoop.hbase.client.HBaseAdmin
>     [javac]     super(conf);
>     [javac]     ^
>     [javac]
> /Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:267:
> cannot find symbol
>     [javac] symbol  : method getOnlineRegions()
>     [javac] location: interface
> org.apache.hadoop.hbase.ipc.HRegionInterface
>     [javac]         NavigableSet<HRegionInfo> regions =
> server.getOnlineRegions();
>     [javac]                                                   ^
>     [javac]
> /Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:440:
> cannot find symbol
>     [javac] symbol  : method
> metaScan(org.apache.hadoop.conf.Configuration,org.apache.hadoop.hbase.client.MetaScanner.MetaScannerVisitor)
>
>     [javac] location: class org.apache.hadoop.hbase.client.MetaScanner
>     [javac]       MetaScanner.metaScan(conf, visitor);
>     [javac]                  ^
>
> On Wed, Sep 15, 2010 at 12:19 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> If you show us the errors, that would help me understand your situation
>> better.
>> HBaseFsck.java has changed a lot since I last tried to compile it.
>>
>>
>> On Wed, Sep 15, 2010 at 11:06 AM, Sharma, Avani <ag...@ebay.com>wrote:
>>
>>> Ted,
>>>
>>> I am trying to compile the file and am getting the same errors like you
>>> mentioned and more:
>>> [javac] symbol  : method
>>>
>>> metaScan(org.apache.hadoop.conf.Configuration,org.apache.hadoop.hbase.client.MetaScanner.MetaScannerVisitor)
>>>    [javac] location: class org.apache.hadoop.hbase.client.MetaScanner
>>>    [javac]       MetaScanner.metaScan(conf, visitor);
>>>    [javac]                  ^
>>>    [javac]
>>>
>>> /Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:503:
>>> cannot find symbol
>>>    [javac] symbol  : method create()
>>>    [javac] location: class org.apache.hadoop.hbase.HBaseConfiguration
>>>    [javac]     Configuration conf = HBaseConfiguration.create();
>>>
>>> I got around a few of these by adding the logging jar to the CLASSPATH.
>>> But I still have some. I see that you sent out a fix, but I am unable to see
>>> the attachment.
>>>
>>> I have the conf dirs in CLASSPATH as well as hadoop, zk and hbase jars.
>>>
>>> Would you recall how can these be fixed? I guess some jars are needed in
>>> the CLASSPATH
>>>
>>> -----Original Message-----
>>> From: Ted Yu [mailto:yuzhihong@gmail.com]
>>> Sent: Wednesday, September 08, 2010 10:11 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: HBase table lost on upgrade
>>>
>>> You can copy HBaseFsck.java from trunk and compile in 0.20.6
>>>
>>> On Wed, Sep 8, 2010 at 3:43 PM, Sharma, Avani <ag...@ebay.com> wrote:
>>>
>>> > Right.
>>> >
>>> > Anyway, where can I get this file from ? Any pointers?
>>> > I can't find it at
>>> > src/main/java/org/apache/hadoop/hbase/client/HBaseFsck.java in 0.20.6.
>>> >
>>> > -----Original Message-----
>>> > From: Ted Yu [mailto:yuzhihong@gmail.com]
>>> > Sent: Wednesday, September 08, 2010 3:09 PM
>>> > To: user@hbase.apache.org
>>> > Subject: Re: HBase table lost on upgrade
>>> >
>>> > master.jsp shows tables, not regions.
>>> > I personally haven't encountered the problem you're facing.
>>> >
>>> > On Wed, Sep 8, 2010 at 2:36 PM, Sharma, Avani <ag...@ebay.com>
>>> wrote:
>>> >
>>> > > Ted,
>>> > > I did look at that thread. It seems I need to modify the code in that
>>> > file?
>>> > > Could you point me to the exact steps to get it and compile it?
>>> > >
>>> > > Did you get through the issue if regions being added to catalog , but
>>> do
>>> > > not show up in master.jsp?
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Sep 4, 2010, at 9:24 PM, Ted Yu <yu...@gmail.com> wrote:
>>> > >
>>> > > > The tool Stack mentioned is hbck. If you want to port it to 0.20,
>>> see
>>> > > email
>>> > > > thread entitled:
>>> > > > compiling HBaseFsck.java for 0.20.5You should try reducing the
>>> number
>>> > of
>>> > > > tables in your system, possibly through HBASE-2473
>>> > > >
>>> > > > Cheers
>>> > > >
>>> > > > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com>
>>> > > wrote:
>>> > > >
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> -----Original Message-----
>>> > > >> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf
>>> Of
>>> > > Stack
>>> > > >> Sent: Wednesday, September 01, 2010 10:45 PM
>>> > > >> To: user@hbase.apache.org
>>> > > >> Subject: Re: HBase table lost on upgrade
>>> > > >>
>>> > > >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com>
>>> > > wrote:
>>> > > >>> That email was just informational. Below are the details on my
>>> > cluster
>>> > > -
>>> > > >> let me know if more is needed.
>>> > > >>>
>>> > > >>> I have 2 hbase clusters setup
>>> > > >>> -       for production, 6 node cluster,  32G, 8 processors
>>> > > >>> -       for dev, 3 node cluster , 16GRAM , 4 processors
>>> > > >>>
>>> > > >>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these
>>> clusters,
>>> > > >> successfully.
>>> > > >>
>>> > > >> Why not latest stable version, 0.20.6?
>>> > > >>
>>> > > >> This was couple of months ago.
>>> > > >>
>>> > > >>
>>> > > >>> 2. After that I loaded 2G+ files into HDFS and HBASE table.
>>> > > >>
>>> > > >>
>>> > > >> Whats this mean?  Each of the .5M cells was 2G in size or the
>>> total
>>> > size
>>> > > >> was 2G?
>>> > > >>
>>> > > >> The total file size is 2G. Cells are of the order of hundreds of
>>> > bytes.
>>> > > >>
>>> > > >>
>>> > > >>>       An example Hbase table looks like this:
>>> > > >>>               {NAME =>'TABLE', FAMILIES => [{NAME => 'data',
>>> VERSIONS
>>> > > =>
>>> > > >> '100', COM true
>>> > > >>>                PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE
>>> =>
>>> > > >> '65536', IN_MEMO
>>> > > >>>                RY => 'false', BLOCKCACHE => 'true'}]}
>>> > > >>
>>> > > >> That looks fine.
>>> > > >>
>>> > > >>> 3. I started stargate on one server and accessed Hbase for
>>> reading
>>> > from
>>> > > >> another 3rd party application successfully.
>>> > > >>>       It took 600 seconds on dev cluster and 250 on production to
>>> > read
>>> > > >> .5M records from Hbase via stargate.
>>> > > >>
>>> > > >>
>>> > > >> That don't sound so good.
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>> 4. later to boost read performance, it was suggested that
>>> upgrading
>>> > to
>>> > > >> Hbase0.20.6 will be helpful. I did that on production (w/o running
>>> the
>>> > > >> migrate script) and re-started stargate and everything was running
>>> > fine,
>>> > > >> though I did not see a bump in performance.
>>> > > >>>
>>> > > >>> 5. Eventually, I had to move to dev cluster from production
>>> because
>>> > of
>>> > > >> some resource issues at our end. Dev cluster had 0.20.3 at this
>>> time.
>>> > As
>>> > > I
>>> > > >> started loading more files into Hbase (<10 versions of <1G files)
>>> and
>>> > > >> converting my app to use hbase more heavily (via more stargate
>>> > clients),
>>> > > the
>>> > > >> performance started degrading. I decided it was time to upgrade
>>> dev
>>> > > cluster
>>> > > >> as well to 0.20.6.  (I did not run the migrate script here as
>>> well, I
>>> > > missed
>>> > > >> this step in the doc).
>>> > > >>>
>>> > > >>
>>> > > >> What kinda perf you looking for from REST?
>>> > > >>
>>> > > >> Do you have to use REST?  All is base64'd so its safe to
>>> transport.
>>> > > >>
>>> > > >> I also have the Java Api code (for testing purposes) and that gave
>>> > > similar
>>> > > >> performance results (520 seconds on dev and 250 on production
>>> > cluster).
>>> > > Is
>>> > > >> there a way to flush the cache before we run the next experiment?
>>> I
>>> > > doubt
>>> > > >> that the first lookup always takes longer and then the later ones
>>> > > perform
>>> > > >> better.
>>> > > >>
>>> > > >> I need something that can integrate with C++ - libcurl and
>>> stargate
>>> > were
>>> > > >> the easiest to start with. I could look at thrift or anything else
>>> the
>>> > > Hbase
>>> > > >> gurus think might be a better fit performance-wise.
>>> > > >>
>>> > > >>
>>> > > >>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased
>>> > block
>>> > > >> cache (.6) and region server handler counts (75) ), pointing to
>>> the
>>> > same
>>> > > >> rootdir, I noticed that some tables were missing. I could see a
>>> > mention
>>> > > of
>>> > > >> them in the logs, but not when I did 'list' in the shell. I
>>> recovered
>>> > > those
>>> > > >> tables using add_table.rb script.
>>> > > >>
>>> > > >>
>>> > > >> How did you shutdown this cluster?  Did you reboot machines?  Was
>>> your
>>> > > >> hdfs homed on /tmp?  What is going on on your systems?  Are they
>>> > > >> swapping?  Did you give HBase more than its default memory?  You
>>> read
>>> > > >> the requirements and made sure ulimit and xceivers had been upped
>>> on
>>> > > >> these machines?
>>> > > >>
>>> > > >>
>>> > > >> Did not reboot machines. hdfs or hbase do not store data/logs in
>>> /tmp.
>>> > > They
>>> > > >> are not swapping.
>>> > > >> Hbase heap size is 2G.  I have upped the xcievers now on your
>>> > > >> recommanedation.  Do I need to restart hdfs after making this
>>> change
>>> > in
>>> > > >> hdfs-site.xml ?
>>> > > >> ulimit -n
>>> > > >> 2048
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>>       a. Is there a way to check the health of all Hbase tables
>>> in
>>> > the
>>> > > >> cluster after an upgrade or even periodically, to make sure that
>>> > > everything
>>> > > >> is healthy ?
>>> > > >>>       b. I would like to be able to force this error again and
>>> check
>>> > > the
>>> > > >> health of hbase and want it to report to me that some tables were
>>> > lost.
>>> > > >> Currently, I just found out because I had very less data and it
>>> was
>>> > easy
>>> > > to
>>> > > >> tell.
>>> > > >>>
>>> > > >>
>>> > > >> Iin trunk there is such a tool.  In 0.20.x, run a count against
>>> our
>>> > > >> table.  See the hbase shell.  Type help to see how.
>>> > > >>
>>> > > >>
>>> > > >> What tool are you talking about here - it wasn't clear ? Count
>>> against
>>> > > >> which table ? I want hbase to check all tables and I don't know
>>> how
>>> > many
>>> > > >> tables I have since there are too many - is that possible?
>>> > > >>
>>> > > >>> 7. Here are the issues I face after this upgrade
>>> > > >>>       a. when I run stop-hbase.sh, it  does not stop my
>>> regionservers
>>> > > on
>>> > > >> other boxes.
>>> > > >>
>>> > > >> Why not.  Whats going on on those machines?  If you tail the logs
>>> on
>>> > > >> the hosts that won't go down and/or on master, what do they say?
>>> > > >> Tail the logs.  Should give you (us) clue.
>>> > > >>
>>> > > >> They do go down with some errors in the log, but down't report it
>>> on
>>> > the
>>> > > >> terminal.
>>> > > >> http://pastebin.com/0hYwaffL  regionserver log
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>>       b. It does start them using start-hbase.sh.
>>> > > >>>       c. Is it that stopping regionservers is not reported, but
>>> it
>>> > does
>>> > > >> stop them (I see that happening on production cluster) ?
>>> > > >>>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>> 8. I started stargate in the upgraded 0.20.6 in dev cluster
>>> > > >>>       a. earlier when I sent a URL to look for a data row that
>>> did
>>> > not
>>> > > >> exist, the return value was NULL , now I get an xml stating HTTP
>>> error
>>> > > >> 404/405.        Everything works as expected for an existing data
>>> row.
>>> > > >>
>>> > > >> The latter sounds RESTy.  What would you expect of it?  The null?
>>> > > >>
>>> > > >>
>>> > > >> Yes, it should send NULL like it does in the production server. Is
>>> > there
>>> > > >> anyone else you can point to who would have used REST ? This is
>>> the
>>> > main
>>> > > >> showstopper for me currently.
>>> > > >>
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>>
>>
>>
>

RE: HBase table lost on upgrade - compiling HBaseFsck.java

Posted by "Sharma, Avani" <ag...@ebay.com>.

Hello Ted,

Here are the actual errors I get -

HBaseFsck.java:46: cannot find symbol
symbol  : class ZooKeeperConnectionException
location: package org.apache.hadoop.hbase
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
                              ^
HBaseFsck.java:81: cannot find symbol
symbol  : class ZooKeeperConnectionException
location: class org.apache.hadoop.hbase.client.HBaseFsck
    throws MasterNotRunningException, ZooKeeperConnectionException, IOException {
                                      ^
HBaseFsck.java:82: cannot find symbol
symbol  : constructor HBaseAdmin(org.apache.hadoop.conf.Configuration)
location: class org.apache.hadoop.hbase.client.HBaseAdmin
    super(conf);
    ^
HBaseFsck.java:267: cannot find symbol
symbol  : method getOnlineRegions()
location: interface org.apache.hadoop.hbase.ipc.HRegionInterface
        NavigableSet<HRegionInfo> regions = server.getOnlineRegions();
                                                  ^
HBaseFsck.java:440: cannot find symbol
symbol  : method metaScan(org.apache.hadoop.conf.Configuration,org.apache.hadoop.hbase.client.MetaScanner.MetaScannerVisitor)
location: class org.apache.hadoop.hbase.client.MetaScanner
      MetaScanner.metaScan(conf, visitor);
                 ^
HBaseFsck.java:496: cannot find symbol
symbol  : method create()
location: class org.apache.hadoop.hbase.HBaseConfiguration
    Configuration conf = HBaseConfiguration.create();
                                           ^
6 errors


-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Wednesday, September 15, 2010 12:50 PM
To: user@hbase.apache.org
Subject: Re: HBase table lost on upgrade - compiling HBaseFsck.java

The scope of change to compile HBaseFsck.java in 0.20.x is bigger than it
used to.
Here are the errors I got - the last 3 depend on other HBase files.

compile-core:
    [javac] Compiling 2 source files to
/Users/tyu/hbase-0.20.5/build/classes
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:46:
cannot find symbol
    [javac] symbol  : class ZooKeeperConnectionException
    [javac] location: package org.apache.hadoop.hbase
    [javac] import org.apache.hadoop.hbase.ZooKeeperConnectionException;
    [javac]                               ^
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:81:
cannot find symbol
    [javac] symbol  : class ZooKeeperConnectionException
    [javac] location: class org.apache.hadoop.hbase.client.HBaseFsck
    [javac]     throws MasterNotRunningException,
ZooKeeperConnectionException, IOException {
    [javac]                                       ^
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:82:
cannot find symbol
    [javac] symbol  : constructor
HBaseAdmin(org.apache.hadoop.conf.Configuration)
    [javac] location: class org.apache.hadoop.hbase.client.HBaseAdmin
    [javac]     super(conf);
    [javac]     ^
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:267:
cannot find symbol
    [javac] symbol  : method getOnlineRegions()
    [javac] location: interface org.apache.hadoop.hbase.ipc.HRegionInterface
    [javac]         NavigableSet<HRegionInfo> regions =
server.getOnlineRegions();
    [javac]                                                   ^
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:440:
cannot find symbol
    [javac] symbol  : method
metaScan(org.apache.hadoop.conf.Configuration,org.apache.hadoop.hbase.client.MetaScanner.MetaScannerVisitor)
    [javac] location: class org.apache.hadoop.hbase.client.MetaScanner
    [javac]       MetaScanner.metaScan(conf, visitor);
    [javac]                  ^

On Wed, Sep 15, 2010 at 12:19 PM, Ted Yu <yu...@gmail.com> wrote:

> If you show us the errors, that would help me understand your situation
> better.
> HBaseFsck.java has changed a lot since I last tried to compile it.
>
>
> On Wed, Sep 15, 2010 at 11:06 AM, Sharma, Avani <ag...@ebay.com> wrote:
>
>> Ted,
>>
>> I am trying to compile the file and am getting the same errors like you
>> mentioned and more:
>> [javac] symbol  : method
>>
>> metaScan(org.apache.hadoop.conf.Configuration,org.apache.hadoop.hbase.client.MetaScanner.MetaScannerVisitor)
>>    [javac] location: class org.apache.hadoop.hbase.client.MetaScanner
>>    [javac]       MetaScanner.metaScan(conf, visitor);
>>    [javac]                  ^
>>    [javac]
>>
>> /Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:503:
>> cannot find symbol
>>    [javac] symbol  : method create()
>>    [javac] location: class org.apache.hadoop.hbase.HBaseConfiguration
>>    [javac]     Configuration conf = HBaseConfiguration.create();
>>
>> I got around a few of these by adding the logging jar to the CLASSPATH.
>> But I still have some. I see that you sent out a fix, but I am unable to see
>> the attachment.
>>
>> I have the conf dirs in CLASSPATH as well as hadoop, zk and hbase jars.
>>
>> Would you recall how can these be fixed? I guess some jars are needed in
>> the CLASSPATH
>>
>> -----Original Message-----
>> From: Ted Yu [mailto:yuzhihong@gmail.com]
>> Sent: Wednesday, September 08, 2010 10:11 PM
>> To: user@hbase.apache.org
>> Subject: Re: HBase table lost on upgrade
>>
>> You can copy HBaseFsck.java from trunk and compile in 0.20.6
>>
>> On Wed, Sep 8, 2010 at 3:43 PM, Sharma, Avani <ag...@ebay.com> wrote:
>>
>> > Right.
>> >
>> > Anyway, where can I get this file from ? Any pointers?
>> > I can't find it at
>> > src/main/java/org/apache/hadoop/hbase/client/HBaseFsck.java in 0.20.6.
>> >
>> > -----Original Message-----
>> > From: Ted Yu [mailto:yuzhihong@gmail.com]
>> > Sent: Wednesday, September 08, 2010 3:09 PM
>> > To: user@hbase.apache.org
>> > Subject: Re: HBase table lost on upgrade
>> >
>> > master.jsp shows tables, not regions.
>> > I personally haven't encountered the problem you're facing.
>> >
>> > On Wed, Sep 8, 2010 at 2:36 PM, Sharma, Avani <ag...@ebay.com>
>> wrote:
>> >
>> > > Ted,
>> > > I did look at that thread. It seems I need to modify the code in that
>> > file?
>> > > Could you point me to the exact steps to get it and compile it?
>> > >
>> > > Did you get through the issue if regions being added to catalog , but
>> do
>> > > not show up in master.jsp?
>> > >
>> > >
>> > >
>> > >
>> > > On Sep 4, 2010, at 9:24 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > > > The tool Stack mentioned is hbck. If you want to port it to 0.20,
>> see
>> > > email
>> > > > thread entitled:
>> > > > compiling HBaseFsck.java for 0.20.5You should try reducing the
>> number
>> > of
>> > > > tables in your system, possibly through HBASE-2473
>> > > >
>> > > > Cheers
>> > > >
>> > > > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com>
>> > > wrote:
>> > > >
>> > > >>
>> > > >>
>> > > >>
>> > > >> -----Original Message-----
>> > > >> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf
>> Of
>> > > Stack
>> > > >> Sent: Wednesday, September 01, 2010 10:45 PM
>> > > >> To: user@hbase.apache.org
>> > > >> Subject: Re: HBase table lost on upgrade
>> > > >>
>> > > >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com>
>> > > wrote:
>> > > >>> That email was just informational. Below are the details on my
>> > cluster
>> > > -
>> > > >> let me know if more is needed.
>> > > >>>
>> > > >>> I have 2 hbase clusters setup
>> > > >>> -       for production, 6 node cluster,  32G, 8 processors
>> > > >>> -       for dev, 3 node cluster , 16GRAM , 4 processors
>> > > >>>
>> > > >>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these
>> clusters,
>> > > >> successfully.
>> > > >>
>> > > >> Why not latest stable version, 0.20.6?
>> > > >>
>> > > >> This was couple of months ago.
>> > > >>
>> > > >>
>> > > >>> 2. After that I loaded 2G+ files into HDFS and HBASE table.
>> > > >>
>> > > >>
>> > > >> Whats this mean?  Each of the .5M cells was 2G in size or the total
>> > size
>> > > >> was 2G?
>> > > >>
>> > > >> The total file size is 2G. Cells are of the order of hundreds of
>> > bytes.
>> > > >>
>> > > >>
>> > > >>>       An example Hbase table looks like this:
>> > > >>>               {NAME =>'TABLE', FAMILIES => [{NAME => 'data',
>> VERSIONS
>> > > =>
>> > > >> '100', COM true
>> > > >>>                PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE
>> =>
>> > > >> '65536', IN_MEMO
>> > > >>>                RY => 'false', BLOCKCACHE => 'true'}]}
>> > > >>
>> > > >> That looks fine.
>> > > >>
>> > > >>> 3. I started stargate on one server and accessed Hbase for reading
>> > from
>> > > >> another 3rd party application successfully.
>> > > >>>       It took 600 seconds on dev cluster and 250 on production to
>> > read
>> > > >> .5M records from Hbase via stargate.
>> > > >>
>> > > >>
>> > > >> That don't sound so good.
>> > > >>
>> > > >>
>> > > >>
>> > > >>> 4. later to boost read performance, it was suggested that
>> upgrading
>> > to
>> > > >> Hbase0.20.6 will be helpful. I did that on production (w/o running
>> the
>> > > >> migrate script) and re-started stargate and everything was running
>> > fine,
>> > > >> though I did not see a bump in performance.
>> > > >>>
>> > > >>> 5. Eventually, I had to move to dev cluster from production
>> because
>> > of
>> > > >> some resource issues at our end. Dev cluster had 0.20.3 at this
>> time.
>> > As
>> > > I
>> > > >> started loading more files into Hbase (<10 versions of <1G files)
>> and
>> > > >> converting my app to use hbase more heavily (via more stargate
>> > clients),
>> > > the
>> > > >> performance started degrading. I decided it was time to upgrade dev
>> > > cluster
>> > > >> as well to 0.20.6.  (I did not run the migrate script here as well,
>> I
>> > > missed
>> > > >> this step in the doc).
>> > > >>>
>> > > >>
>> > > >> What kinda perf you looking for from REST?
>> > > >>
>> > > >> Do you have to use REST?  All is base64'd so its safe to transport.
>> > > >>
>> > > >> I also have the Java Api code (for testing purposes) and that gave
>> > > similar
>> > > >> performance results (520 seconds on dev and 250 on production
>> > cluster).
>> > > Is
>> > > >> there a way to flush the cache before we run the next experiment? I
>> > > doubt
>> > > >> that the first lookup always takes longer and then the later ones
>> > > perform
>> > > >> better.
>> > > >>
>> > > >> I need something that can integrate with C++ - libcurl and stargate
>> > were
>> > > >> the easiest to start with. I could look at thrift or anything else
>> the
>> > > Hbase
>> > > >> gurus think might be a better fit performance-wise.
>> > > >>
>> > > >>
>> > > >>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased
>> > block
>> > > >> cache (.6) and region server handler counts (75) ), pointing to the
>> > same
>> > > >> rootdir, I noticed that some tables were missing. I could see a
>> > mention
>> > > of
>> > > >> them in the logs, but not when I did 'list' in the shell. I
>> recovered
>> > > those
>> > > >> tables using add_table.rb script.
>> > > >>
>> > > >>
>> > > >> How did you shutdown this cluster?  Did you reboot machines?  Was
>> your
>> > > >> hdfs homed on /tmp?  What is going on on your systems?  Are they
>> > > >> swapping?  Did you give HBase more than its default memory?  You
>> read
>> > > >> the requirements and made sure ulimit and xceivers had been upped
>> on
>> > > >> these machines?
>> > > >>
>> > > >>
>> > > >> Did not reboot machines. hdfs or hbase do not store data/logs in
>> /tmp.
>> > > They
>> > > >> are not swapping.
>> > > >> Hbase heap size is 2G.  I have upped the xcievers now on your
>> > > >> recommanedation.  Do I need to restart hdfs after making this
>> change
>> > in
>> > > >> hdfs-site.xml ?
>> > > >> ulimit -n
>> > > >> 2048
>> > > >>
>> > > >>
>> > > >>
>> > > >>>       a. Is there a way to check the health of all Hbase tables in
>> > the
>> > > >> cluster after an upgrade or even periodically, to make sure that
>> > > everything
>> > > >> is healthy ?
>> > > >>>       b. I would like to be able to force this error again and
>> check
>> > > the
>> > > >> health of hbase and want it to report to me that some tables were
>> > lost.
>> > > >> Currently, I just found out because I had very less data and it was
>> > easy
>> > > to
>> > > >> tell.
>> > > >>>
>> > > >>
>> > > >> Iin trunk there is such a tool.  In 0.20.x, run a count against our
>> > > >> table.  See the hbase shell.  Type help to see how.
>> > > >>
>> > > >>
>> > > >> What tool are you talking about here - it wasn't clear ? Count
>> against
>> > > >> which table ? I want hbase to check all tables and I don't know how
>> > many
>> > > >> tables I have since there are too many - is that possible?
>> > > >>
>> > > >>> 7. Here are the issues I face after this upgrade
>> > > >>>       a. when I run stop-hbase.sh, it  does not stop my
>> regionservers
>> > > on
>> > > >> other boxes.
>> > > >>
>> > > >> Why not.  Whats going on on those machines?  If you tail the logs
>> on
>> > > >> the hosts that won't go down and/or on master, what do they say?
>> > > >> Tail the logs.  Should give you (us) clue.
>> > > >>
>> > > >> They do go down with some errors in the log, but down't report it
>> on
>> > the
>> > > >> terminal.
>> > > >> http://pastebin.com/0hYwaffL  regionserver log
>> > > >>
>> > > >>
>> > > >>
>> > > >>>       b. It does start them using start-hbase.sh.
>> > > >>>       c. Is it that stopping regionservers is not reported, but it
>> > does
>> > > >> stop them (I see that happening on production cluster) ?
>> > > >>>
>> > > >>
>> > > >>
>> > > >>
>> > > >>> 8. I started stargate in the upgraded 0.20.6 in dev cluster
>> > > >>>       a. earlier when I sent a URL to look for a data row that did
>> > not
>> > > >> exist, the return value was NULL , now I get an xml stating HTTP
>> error
>> > > >> 404/405.        Everything works as expected for an existing data
>> row.
>> > > >>
>> > > >> The latter sounds RESTy.  What would you expect of it?  The null?
>> > > >>
>> > > >>
>> > > >> Yes, it should send NULL like it does in the production server. Is
>> > there
>> > > >> anyone else you can point to who would have used REST ? This is the
>> > main
>> > > >> showstopper for me currently.
>> > > >>
>> > > >>
>> > > >>
>> > >
>> >
>>
>
>

Re: HBase table lost on upgrade - compiling HBaseFsck.java

Posted by Ted Yu <yu...@gmail.com>.

The scope of change to compile HBaseFsck.java in 0.20.x is bigger than it
used to.
Here are the errors I got - the last 3 depend on other HBase files.

compile-core:
    [javac] Compiling 2 source files to
/Users/tyu/hbase-0.20.5/build/classes
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:46:
cannot find symbol
    [javac] symbol  : class ZooKeeperConnectionException
    [javac] location: package org.apache.hadoop.hbase
    [javac] import org.apache.hadoop.hbase.ZooKeeperConnectionException;
    [javac]                               ^
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:81:
cannot find symbol
    [javac] symbol  : class ZooKeeperConnectionException
    [javac] location: class org.apache.hadoop.hbase.client.HBaseFsck
    [javac]     throws MasterNotRunningException,
ZooKeeperConnectionException, IOException {
    [javac]                                       ^
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:82:
cannot find symbol
    [javac] symbol  : constructor
HBaseAdmin(org.apache.hadoop.conf.Configuration)
    [javac] location: class org.apache.hadoop.hbase.client.HBaseAdmin
    [javac]     super(conf);
    [javac]     ^
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:267:
cannot find symbol
    [javac] symbol  : method getOnlineRegions()
    [javac] location: interface org.apache.hadoop.hbase.ipc.HRegionInterface
    [javac]         NavigableSet<HRegionInfo> regions =
server.getOnlineRegions();
    [javac]                                                   ^
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:440:
cannot find symbol
    [javac] symbol  : method
metaScan(org.apache.hadoop.conf.Configuration,org.apache.hadoop.hbase.client.MetaScanner.MetaScannerVisitor)
    [javac] location: class org.apache.hadoop.hbase.client.MetaScanner
    [javac]       MetaScanner.metaScan(conf, visitor);
    [javac]                  ^

On Wed, Sep 15, 2010 at 12:19 PM, Ted Yu <yu...@gmail.com> wrote:

> If you show us the errors, that would help me understand your situation
> better.
> HBaseFsck.java has changed a lot since I last tried to compile it.
>
>
> On Wed, Sep 15, 2010 at 11:06 AM, Sharma, Avani <ag...@ebay.com> wrote:
>
>> Ted,
>>
>> I am trying to compile the file and am getting the same errors like you
>> mentioned and more:
>> [javac] symbol  : method
>>
>> metaScan(org.apache.hadoop.conf.Configuration,org.apache.hadoop.hbase.client.MetaScanner.MetaScannerVisitor)
>>    [javac] location: class org.apache.hadoop.hbase.client.MetaScanner
>>    [javac]       MetaScanner.metaScan(conf, visitor);
>>    [javac]                  ^
>>    [javac]
>>
>> /Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:503:
>> cannot find symbol
>>    [javac] symbol  : method create()
>>    [javac] location: class org.apache.hadoop.hbase.HBaseConfiguration
>>    [javac]     Configuration conf = HBaseConfiguration.create();
>>
>> I got around a few of these by adding the logging jar to the CLASSPATH.
>> But I still have some. I see that you sent out a fix, but I am unable to see
>> the attachment.
>>
>> I have the conf dirs in CLASSPATH as well as hadoop, zk and hbase jars.
>>
>> Would you recall how can these be fixed? I guess some jars are needed in
>> the CLASSPATH
>>
>> -----Original Message-----
>> From: Ted Yu [mailto:yuzhihong@gmail.com]
>> Sent: Wednesday, September 08, 2010 10:11 PM
>> To: user@hbase.apache.org
>> Subject: Re: HBase table lost on upgrade
>>
>> You can copy HBaseFsck.java from trunk and compile in 0.20.6
>>
>> On Wed, Sep 8, 2010 at 3:43 PM, Sharma, Avani <ag...@ebay.com> wrote:
>>
>> > Right.
>> >
>> > Anyway, where can I get this file from ? Any pointers?
>> > I can't find it at
>> > src/main/java/org/apache/hadoop/hbase/client/HBaseFsck.java in 0.20.6.
>> >
>> > -----Original Message-----
>> > From: Ted Yu [mailto:yuzhihong@gmail.com]
>> > Sent: Wednesday, September 08, 2010 3:09 PM
>> > To: user@hbase.apache.org
>> > Subject: Re: HBase table lost on upgrade
>> >
>> > master.jsp shows tables, not regions.
>> > I personally haven't encountered the problem you're facing.
>> >
>> > On Wed, Sep 8, 2010 at 2:36 PM, Sharma, Avani <ag...@ebay.com>
>> wrote:
>> >
>> > > Ted,
>> > > I did look at that thread. It seems I need to modify the code in that
>> > file?
>> > > Could you point me to the exact steps to get it and compile it?
>> > >
>> > > Did you get through the issue if regions being added to catalog , but
>> do
>> > > not show up in master.jsp?
>> > >
>> > >
>> > >
>> > >
>> > > On Sep 4, 2010, at 9:24 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > > > The tool Stack mentioned is hbck. If you want to port it to 0.20,
>> see
>> > > email
>> > > > thread entitled:
>> > > > compiling HBaseFsck.java for 0.20.5You should try reducing the
>> number
>> > of
>> > > > tables in your system, possibly through HBASE-2473
>> > > >
>> > > > Cheers
>> > > >
>> > > > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com>
>> > > wrote:
>> > > >
>> > > >>
>> > > >>
>> > > >>
>> > > >> -----Original Message-----
>> > > >> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf
>> Of
>> > > Stack
>> > > >> Sent: Wednesday, September 01, 2010 10:45 PM
>> > > >> To: user@hbase.apache.org
>> > > >> Subject: Re: HBase table lost on upgrade
>> > > >>
>> > > >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com>
>> > > wrote:
>> > > >>> That email was just informational. Below are the details on my
>> > cluster
>> > > -
>> > > >> let me know if more is needed.
>> > > >>>
>> > > >>> I have 2 hbase clusters setup
>> > > >>> -       for production, 6 node cluster,  32G, 8 processors
>> > > >>> -       for dev, 3 node cluster , 16GRAM , 4 processors
>> > > >>>
>> > > >>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these
>> clusters,
>> > > >> successfully.
>> > > >>
>> > > >> Why not latest stable version, 0.20.6?
>> > > >>
>> > > >> This was couple of months ago.
>> > > >>
>> > > >>
>> > > >>> 2. After that I loaded 2G+ files into HDFS and HBASE table.
>> > > >>
>> > > >>
>> > > >> Whats this mean?  Each of the .5M cells was 2G in size or the total
>> > size
>> > > >> was 2G?
>> > > >>
>> > > >> The total file size is 2G. Cells are of the order of hundreds of
>> > bytes.
>> > > >>
>> > > >>
>> > > >>>       An example Hbase table looks like this:
>> > > >>>               {NAME =>'TABLE', FAMILIES => [{NAME => 'data',
>> VERSIONS
>> > > =>
>> > > >> '100', COM true
>> > > >>>                PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE
>> =>
>> > > >> '65536', IN_MEMO
>> > > >>>                RY => 'false', BLOCKCACHE => 'true'}]}
>> > > >>
>> > > >> That looks fine.
>> > > >>
>> > > >>> 3. I started stargate on one server and accessed Hbase for reading
>> > from
>> > > >> another 3rd party application successfully.
>> > > >>>       It took 600 seconds on dev cluster and 250 on production to
>> > read
>> > > >> .5M records from Hbase via stargate.
>> > > >>
>> > > >>
>> > > >> That don't sound so good.
>> > > >>
>> > > >>
>> > > >>
>> > > >>> 4. later to boost read performance, it was suggested that
>> upgrading
>> > to
>> > > >> Hbase0.20.6 will be helpful. I did that on production (w/o running
>> the
>> > > >> migrate script) and re-started stargate and everything was running
>> > fine,
>> > > >> though I did not see a bump in performance.
>> > > >>>
>> > > >>> 5. Eventually, I had to move to dev cluster from production
>> because
>> > of
>> > > >> some resource issues at our end. Dev cluster had 0.20.3 at this
>> time.
>> > As
>> > > I
>> > > >> started loading more files into Hbase (<10 versions of <1G files)
>> and
>> > > >> converting my app to use hbase more heavily (via more stargate
>> > clients),
>> > > the
>> > > >> performance started degrading. I decided it was time to upgrade dev
>> > > cluster
>> > > >> as well to 0.20.6.  (I did not run the migrate script here as well,
>> I
>> > > missed
>> > > >> this step in the doc).
>> > > >>>
>> > > >>
>> > > >> What kinda perf you looking for from REST?
>> > > >>
>> > > >> Do you have to use REST?  All is base64'd so its safe to transport.
>> > > >>
>> > > >> I also have the Java Api code (for testing purposes) and that gave
>> > > similar
>> > > >> performance results (520 seconds on dev and 250 on production
>> > cluster).
>> > > Is
>> > > >> there a way to flush the cache before we run the next experiment? I
>> > > doubt
>> > > >> that the first lookup always takes longer and then the later ones
>> > > perform
>> > > >> better.
>> > > >>
>> > > >> I need something that can integrate with C++ - libcurl and stargate
>> > were
>> > > >> the easiest to start with. I could look at thrift or anything else
>> the
>> > > Hbase
>> > > >> gurus think might be a better fit performance-wise.
>> > > >>
>> > > >>
>> > > >>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased
>> > block
>> > > >> cache (.6) and region server handler counts (75) ), pointing to the
>> > same
>> > > >> rootdir, I noticed that some tables were missing. I could see a
>> > mention
>> > > of
>> > > >> them in the logs, but not when I did 'list' in the shell. I
>> recovered
>> > > those
>> > > >> tables using add_table.rb script.
>> > > >>
>> > > >>
>> > > >> How did you shutdown this cluster?  Did you reboot machines?  Was
>> your
>> > > >> hdfs homed on /tmp?  What is going on on your systems?  Are they
>> > > >> swapping?  Did you give HBase more than its default memory?  You
>> read
>> > > >> the requirements and made sure ulimit and xceivers had been upped
>> on
>> > > >> these machines?
>> > > >>
>> > > >>
>> > > >> Did not reboot machines. hdfs or hbase do not store data/logs in
>> /tmp.
>> > > They
>> > > >> are not swapping.
>> > > >> Hbase heap size is 2G.  I have upped the xcievers now on your
>> > > >> recommanedation.  Do I need to restart hdfs after making this
>> change
>> > in
>> > > >> hdfs-site.xml ?
>> > > >> ulimit -n
>> > > >> 2048
>> > > >>
>> > > >>
>> > > >>
>> > > >>>       a. Is there a way to check the health of all Hbase tables in
>> > the
>> > > >> cluster after an upgrade or even periodically, to make sure that
>> > > everything
>> > > >> is healthy ?
>> > > >>>       b. I would like to be able to force this error again and
>> check
>> > > the
>> > > >> health of hbase and want it to report to me that some tables were
>> > lost.
>> > > >> Currently, I just found out because I had very less data and it was
>> > easy
>> > > to
>> > > >> tell.
>> > > >>>
>> > > >>
>> > > >> Iin trunk there is such a tool.  In 0.20.x, run a count against our
>> > > >> table.  See the hbase shell.  Type help to see how.
>> > > >>
>> > > >>
>> > > >> What tool are you talking about here - it wasn't clear ? Count
>> against
>> > > >> which table ? I want hbase to check all tables and I don't know how
>> > many
>> > > >> tables I have since there are too many - is that possible?
>> > > >>
>> > > >>> 7. Here are the issues I face after this upgrade
>> > > >>>       a. when I run stop-hbase.sh, it  does not stop my
>> regionservers
>> > > on
>> > > >> other boxes.
>> > > >>
>> > > >> Why not.  Whats going on on those machines?  If you tail the logs
>> on
>> > > >> the hosts that won't go down and/or on master, what do they say?
>> > > >> Tail the logs.  Should give you (us) clue.
>> > > >>
>> > > >> They do go down with some errors in the log, but down't report it
>> on
>> > the
>> > > >> terminal.
>> > > >> http://pastebin.com/0hYwaffL  regionserver log
>> > > >>
>> > > >>
>> > > >>
>> > > >>>       b. It does start them using start-hbase.sh.
>> > > >>>       c. Is it that stopping regionservers is not reported, but it
>> > does
>> > > >> stop them (I see that happening on production cluster) ?
>> > > >>>
>> > > >>
>> > > >>
>> > > >>
>> > > >>> 8. I started stargate in the upgraded 0.20.6 in dev cluster
>> > > >>>       a. earlier when I sent a URL to look for a data row that did
>> > not
>> > > >> exist, the return value was NULL , now I get an xml stating HTTP
>> error
>> > > >> 404/405.        Everything works as expected for an existing data
>> row.
>> > > >>
>> > > >> The latter sounds RESTy.  What would you expect of it?  The null?
>> > > >>
>> > > >>
>> > > >> Yes, it should send NULL like it does in the production server. Is
>> > there
>> > > >> anyone else you can point to who would have used REST ? This is the
>> > main
>> > > >> showstopper for me currently.
>> > > >>
>> > > >>
>> > > >>
>> > >
>> >
>>
>
>

Re: HBase table lost on upgrade - compiling HBaseFsck.java

Posted by Ted Yu <yu...@gmail.com>.

If you show us the errors, that would help me understand your situation
better.
HBaseFsck.java has changed a lot since I last tried to compile it.

On Wed, Sep 15, 2010 at 11:06 AM, Sharma, Avani <ag...@ebay.com> wrote:

> Ted,
>
> I am trying to compile the file and am getting the same errors like you
> mentioned and more:
> [javac] symbol  : method
>
> metaScan(org.apache.hadoop.conf.Configuration,org.apache.hadoop.hbase.client.MetaScanner.MetaScannerVisitor)
>    [javac] location: class org.apache.hadoop.hbase.client.MetaScanner
>    [javac]       MetaScanner.metaScan(conf, visitor);
>    [javac]                  ^
>    [javac]
>
> /Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:503:
> cannot find symbol
>    [javac] symbol  : method create()
>    [javac] location: class org.apache.hadoop.hbase.HBaseConfiguration
>    [javac]     Configuration conf = HBaseConfiguration.create();
>
> I got around a few of these by adding the logging jar to the CLASSPATH. But
> I still have some. I see that you sent out a fix, but I am unable to see the
> attachment.
>
> I have the conf dirs in CLASSPATH as well as hadoop, zk and hbase jars.
>
> Would you recall how can these be fixed? I guess some jars are needed in
> the CLASSPATH
>
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Wednesday, September 08, 2010 10:11 PM
> To: user@hbase.apache.org
> Subject: Re: HBase table lost on upgrade
>
> You can copy HBaseFsck.java from trunk and compile in 0.20.6
>
> On Wed, Sep 8, 2010 at 3:43 PM, Sharma, Avani <ag...@ebay.com> wrote:
>
> > Right.
> >
> > Anyway, where can I get this file from ? Any pointers?
> > I can't find it at
> > src/main/java/org/apache/hadoop/hbase/client/HBaseFsck.java in 0.20.6.
> >
> > -----Original Message-----
> > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > Sent: Wednesday, September 08, 2010 3:09 PM
> > To: user@hbase.apache.org
> > Subject: Re: HBase table lost on upgrade
> >
> > master.jsp shows tables, not regions.
> > I personally haven't encountered the problem you're facing.
> >
> > On Wed, Sep 8, 2010 at 2:36 PM, Sharma, Avani <ag...@ebay.com> wrote:
> >
> > > Ted,
> > > I did look at that thread. It seems I need to modify the code in that
> > file?
> > > Could you point me to the exact steps to get it and compile it?
> > >
> > > Did you get through the issue if regions being added to catalog , but
> do
> > > not show up in master.jsp?
> > >
> > >
> > >
> > >
> > > On Sep 4, 2010, at 9:24 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > The tool Stack mentioned is hbck. If you want to port it to 0.20, see
> > > email
> > > > thread entitled:
> > > > compiling HBaseFsck.java for 0.20.5You should try reducing the number
> > of
> > > > tables in your system, possibly through HBASE-2473
> > > >
> > > > Cheers
> > > >
> > > > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com>
> > > wrote:
> > > >
> > > >>
> > > >>
> > > >>
> > > >> -----Original Message-----
> > > >> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> > > Stack
> > > >> Sent: Wednesday, September 01, 2010 10:45 PM
> > > >> To: user@hbase.apache.org
> > > >> Subject: Re: HBase table lost on upgrade
> > > >>
> > > >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com>
> > > wrote:
> > > >>> That email was just informational. Below are the details on my
> > cluster
> > > -
> > > >> let me know if more is needed.
> > > >>>
> > > >>> I have 2 hbase clusters setup
> > > >>> -       for production, 6 node cluster,  32G, 8 processors
> > > >>> -       for dev, 3 node cluster , 16GRAM , 4 processors
> > > >>>
> > > >>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
> > > >> successfully.
> > > >>
> > > >> Why not latest stable version, 0.20.6?
> > > >>
> > > >> This was couple of months ago.
> > > >>
> > > >>
> > > >>> 2. After that I loaded 2G+ files into HDFS and HBASE table.
> > > >>
> > > >>
> > > >> Whats this mean?  Each of the .5M cells was 2G in size or the total
> > size
> > > >> was 2G?
> > > >>
> > > >> The total file size is 2G. Cells are of the order of hundreds of
> > bytes.
> > > >>
> > > >>
> > > >>>       An example Hbase table looks like this:
> > > >>>               {NAME =>'TABLE', FAMILIES => [{NAME => 'data',
> VERSIONS
> > > =>
> > > >> '100', COM true
> > > >>>                PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE
> =>
> > > >> '65536', IN_MEMO
> > > >>>                RY => 'false', BLOCKCACHE => 'true'}]}
> > > >>
> > > >> That looks fine.
> > > >>
> > > >>> 3. I started stargate on one server and accessed Hbase for reading
> > from
> > > >> another 3rd party application successfully.
> > > >>>       It took 600 seconds on dev cluster and 250 on production to
> > read
> > > >> .5M records from Hbase via stargate.
> > > >>
> > > >>
> > > >> That don't sound so good.
> > > >>
> > > >>
> > > >>
> > > >>> 4. later to boost read performance, it was suggested that upgrading
> > to
> > > >> Hbase0.20.6 will be helpful. I did that on production (w/o running
> the
> > > >> migrate script) and re-started stargate and everything was running
> > fine,
> > > >> though I did not see a bump in performance.
> > > >>>
> > > >>> 5. Eventually, I had to move to dev cluster from production because
> > of
> > > >> some resource issues at our end. Dev cluster had 0.20.3 at this
> time.
> > As
> > > I
> > > >> started loading more files into Hbase (<10 versions of <1G files)
> and
> > > >> converting my app to use hbase more heavily (via more stargate
> > clients),
> > > the
> > > >> performance started degrading. I decided it was time to upgrade dev
> > > cluster
> > > >> as well to 0.20.6.  (I did not run the migrate script here as well,
> I
> > > missed
> > > >> this step in the doc).
> > > >>>
> > > >>
> > > >> What kinda perf you looking for from REST?
> > > >>
> > > >> Do you have to use REST?  All is base64'd so its safe to transport.
> > > >>
> > > >> I also have the Java Api code (for testing purposes) and that gave
> > > similar
> > > >> performance results (520 seconds on dev and 250 on production
> > cluster).
> > > Is
> > > >> there a way to flush the cache before we run the next experiment? I
> > > doubt
> > > >> that the first lookup always takes longer and then the later ones
> > > perform
> > > >> better.
> > > >>
> > > >> I need something that can integrate with C++ - libcurl and stargate
> > were
> > > >> the easiest to start with. I could look at thrift or anything else
> the
> > > Hbase
> > > >> gurus think might be a better fit performance-wise.
> > > >>
> > > >>
> > > >>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased
> > block
> > > >> cache (.6) and region server handler counts (75) ), pointing to the
> > same
> > > >> rootdir, I noticed that some tables were missing. I could see a
> > mention
> > > of
> > > >> them in the logs, but not when I did 'list' in the shell. I
> recovered
> > > those
> > > >> tables using add_table.rb script.
> > > >>
> > > >>
> > > >> How did you shutdown this cluster?  Did you reboot machines?  Was
> your
> > > >> hdfs homed on /tmp?  What is going on on your systems?  Are they
> > > >> swapping?  Did you give HBase more than its default memory?  You
> read
> > > >> the requirements and made sure ulimit and xceivers had been upped on
> > > >> these machines?
> > > >>
> > > >>
> > > >> Did not reboot machines. hdfs or hbase do not store data/logs in
> /tmp.
> > > They
> > > >> are not swapping.
> > > >> Hbase heap size is 2G.  I have upped the xcievers now on your
> > > >> recommanedation.  Do I need to restart hdfs after making this change
> > in
> > > >> hdfs-site.xml ?
> > > >> ulimit -n
> > > >> 2048
> > > >>
> > > >>
> > > >>
> > > >>>       a. Is there a way to check the health of all Hbase tables in
> > the
> > > >> cluster after an upgrade or even periodically, to make sure that
> > > everything
> > > >> is healthy ?
> > > >>>       b. I would like to be able to force this error again and
> check
> > > the
> > > >> health of hbase and want it to report to me that some tables were
> > lost.
> > > >> Currently, I just found out because I had very less data and it was
> > easy
> > > to
> > > >> tell.
> > > >>>
> > > >>
> > > >> Iin trunk there is such a tool.  In 0.20.x, run a count against our
> > > >> table.  See the hbase shell.  Type help to see how.
> > > >>
> > > >>
> > > >> What tool are you talking about here - it wasn't clear ? Count
> against
> > > >> which table ? I want hbase to check all tables and I don't know how
> > many
> > > >> tables I have since there are too many - is that possible?
> > > >>
> > > >>> 7. Here are the issues I face after this upgrade
> > > >>>       a. when I run stop-hbase.sh, it  does not stop my
> regionservers
> > > on
> > > >> other boxes.
> > > >>
> > > >> Why not.  Whats going on on those machines?  If you tail the logs on
> > > >> the hosts that won't go down and/or on master, what do they say?
> > > >> Tail the logs.  Should give you (us) clue.
> > > >>
> > > >> They do go down with some errors in the log, but down't report it on
> > the
> > > >> terminal.
> > > >> http://pastebin.com/0hYwaffL  regionserver log
> > > >>
> > > >>
> > > >>
> > > >>>       b. It does start them using start-hbase.sh.
> > > >>>       c. Is it that stopping regionservers is not reported, but it
> > does
> > > >> stop them (I see that happening on production cluster) ?
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >>> 8. I started stargate in the upgraded 0.20.6 in dev cluster
> > > >>>       a. earlier when I sent a URL to look for a data row that did
> > not
> > > >> exist, the return value was NULL , now I get an xml stating HTTP
> error
> > > >> 404/405.        Everything works as expected for an existing data
> row.
> > > >>
> > > >> The latter sounds RESTy.  What would you expect of it?  The null?
> > > >>
> > > >>
> > > >> Yes, it should send NULL like it does in the production server. Is
> > there
> > > >> anyone else you can point to who would have used REST ? This is the
> > main
> > > >> showstopper for me currently.
> > > >>
> > > >>
> > > >>
> > >
> >
>

RE: HBase table lost on upgrade - compiling HBaseFsck.java

Posted by "Sharma, Avani" <ag...@ebay.com>.

Ted,

I am trying to compile the file and am getting the same errors like you mentioned and more:
[javac] symbol  : method
metaScan(org.apache.hadoop.conf.Configuration,org.apache.hadoop.hbase.client.MetaScanner.MetaScannerVisitor)
    [javac] location: class org.apache.hadoop.hbase.client.MetaScanner
    [javac]       MetaScanner.metaScan(conf, visitor);
    [javac]                  ^
    [javac]
/Users/tyu/hbase-0.20.5/src/java/org/apache/hadoop/hbase/client/HBaseFsck.java:503:
cannot find symbol
    [javac] symbol  : method create()
    [javac] location: class org.apache.hadoop.hbase.HBaseConfiguration
    [javac]     Configuration conf = HBaseConfiguration.create();

I got around a few of these by adding the logging jar to the CLASSPATH. But I still have some. I see that you sent out a fix, but I am unable to see the attachment. 

I have the conf dirs in CLASSPATH as well as hadoop, zk and hbase jars.

Would you recall how can these be fixed? I guess some jars are needed in the CLASSPATH 

-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Wednesday, September 08, 2010 10:11 PM
To: user@hbase.apache.org
Subject: Re: HBase table lost on upgrade

You can copy HBaseFsck.java from trunk and compile in 0.20.6

On Wed, Sep 8, 2010 at 3:43 PM, Sharma, Avani <ag...@ebay.com> wrote:

> Right.
>
> Anyway, where can I get this file from ? Any pointers?
> I can't find it at
> src/main/java/org/apache/hadoop/hbase/client/HBaseFsck.java in 0.20.6.
>
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Wednesday, September 08, 2010 3:09 PM
> To: user@hbase.apache.org
> Subject: Re: HBase table lost on upgrade
>
> master.jsp shows tables, not regions.
> I personally haven't encountered the problem you're facing.
>
> On Wed, Sep 8, 2010 at 2:36 PM, Sharma, Avani <ag...@ebay.com> wrote:
>
> > Ted,
> > I did look at that thread. It seems I need to modify the code in that
> file?
> > Could you point me to the exact steps to get it and compile it?
> >
> > Did you get through the issue if regions being added to catalog , but do
> > not show up in master.jsp?
> >
> >
> >
> >
> > On Sep 4, 2010, at 9:24 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > The tool Stack mentioned is hbck. If you want to port it to 0.20, see
> > email
> > > thread entitled:
> > > compiling HBaseFsck.java for 0.20.5You should try reducing the number
> of
> > > tables in your system, possibly through HBASE-2473
> > >
> > > Cheers
> > >
> > > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com>
> > wrote:
> > >
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> > Stack
> > >> Sent: Wednesday, September 01, 2010 10:45 PM
> > >> To: user@hbase.apache.org
> > >> Subject: Re: HBase table lost on upgrade
> > >>
> > >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com>
> > wrote:
> > >>> That email was just informational. Below are the details on my
> cluster
> > -
> > >> let me know if more is needed.
> > >>>
> > >>> I have 2 hbase clusters setup
> > >>> -       for production, 6 node cluster,  32G, 8 processors
> > >>> -       for dev, 3 node cluster , 16GRAM , 4 processors
> > >>>
> > >>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
> > >> successfully.
> > >>
> > >> Why not latest stable version, 0.20.6?
> > >>
> > >> This was couple of months ago.
> > >>
> > >>
> > >>> 2. After that I loaded 2G+ files into HDFS and HBASE table.
> > >>
> > >>
> > >> Whats this mean?  Each of the .5M cells was 2G in size or the total
> size
> > >> was 2G?
> > >>
> > >> The total file size is 2G. Cells are of the order of hundreds of
> bytes.
> > >>
> > >>
> > >>>       An example Hbase table looks like this:
> > >>>               {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS
> > =>
> > >> '100', COM true
> > >>>                PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
> > >> '65536', IN_MEMO
> > >>>                RY => 'false', BLOCKCACHE => 'true'}]}
> > >>
> > >> That looks fine.
> > >>
> > >>> 3. I started stargate on one server and accessed Hbase for reading
> from
> > >> another 3rd party application successfully.
> > >>>       It took 600 seconds on dev cluster and 250 on production to
> read
> > >> .5M records from Hbase via stargate.
> > >>
> > >>
> > >> That don't sound so good.
> > >>
> > >>
> > >>
> > >>> 4. later to boost read performance, it was suggested that upgrading
> to
> > >> Hbase0.20.6 will be helpful. I did that on production (w/o running the
> > >> migrate script) and re-started stargate and everything was running
> fine,
> > >> though I did not see a bump in performance.
> > >>>
> > >>> 5. Eventually, I had to move to dev cluster from production because
> of
> > >> some resource issues at our end. Dev cluster had 0.20.3 at this time.
> As
> > I
> > >> started loading more files into Hbase (<10 versions of <1G files) and
> > >> converting my app to use hbase more heavily (via more stargate
> clients),
> > the
> > >> performance started degrading. I decided it was time to upgrade dev
> > cluster
> > >> as well to 0.20.6.  (I did not run the migrate script here as well, I
> > missed
> > >> this step in the doc).
> > >>>
> > >>
> > >> What kinda perf you looking for from REST?
> > >>
> > >> Do you have to use REST?  All is base64'd so its safe to transport.
> > >>
> > >> I also have the Java Api code (for testing purposes) and that gave
> > similar
> > >> performance results (520 seconds on dev and 250 on production
> cluster).
> > Is
> > >> there a way to flush the cache before we run the next experiment? I
> > doubt
> > >> that the first lookup always takes longer and then the later ones
> > perform
> > >> better.
> > >>
> > >> I need something that can integrate with C++ - libcurl and stargate
> were
> > >> the easiest to start with. I could look at thrift or anything else the
> > Hbase
> > >> gurus think might be a better fit performance-wise.
> > >>
> > >>
> > >>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased
> block
> > >> cache (.6) and region server handler counts (75) ), pointing to the
> same
> > >> rootdir, I noticed that some tables were missing. I could see a
> mention
> > of
> > >> them in the logs, but not when I did 'list' in the shell. I recovered
> > those
> > >> tables using add_table.rb script.
> > >>
> > >>
> > >> How did you shutdown this cluster?  Did you reboot machines?  Was your
> > >> hdfs homed on /tmp?  What is going on on your systems?  Are they
> > >> swapping?  Did you give HBase more than its default memory?  You read
> > >> the requirements and made sure ulimit and xceivers had been upped on
> > >> these machines?
> > >>
> > >>
> > >> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp.
> > They
> > >> are not swapping.
> > >> Hbase heap size is 2G.  I have upped the xcievers now on your
> > >> recommanedation.  Do I need to restart hdfs after making this change
> in
> > >> hdfs-site.xml ?
> > >> ulimit -n
> > >> 2048
> > >>
> > >>
> > >>
> > >>>       a. Is there a way to check the health of all Hbase tables in
> the
> > >> cluster after an upgrade or even periodically, to make sure that
> > everything
> > >> is healthy ?
> > >>>       b. I would like to be able to force this error again and check
> > the
> > >> health of hbase and want it to report to me that some tables were
> lost.
> > >> Currently, I just found out because I had very less data and it was
> easy
> > to
> > >> tell.
> > >>>
> > >>
> > >> Iin trunk there is such a tool.  In 0.20.x, run a count against our
> > >> table.  See the hbase shell.  Type help to see how.
> > >>
> > >>
> > >> What tool are you talking about here - it wasn't clear ? Count against
> > >> which table ? I want hbase to check all tables and I don't know how
> many
> > >> tables I have since there are too many - is that possible?
> > >>
> > >>> 7. Here are the issues I face after this upgrade
> > >>>       a. when I run stop-hbase.sh, it  does not stop my regionservers
> > on
> > >> other boxes.
> > >>
> > >> Why not.  Whats going on on those machines?  If you tail the logs on
> > >> the hosts that won't go down and/or on master, what do they say?
> > >> Tail the logs.  Should give you (us) clue.
> > >>
> > >> They do go down with some errors in the log, but down't report it on
> the
> > >> terminal.
> > >> http://pastebin.com/0hYwaffL  regionserver log
> > >>
> > >>
> > >>
> > >>>       b. It does start them using start-hbase.sh.
> > >>>       c. Is it that stopping regionservers is not reported, but it
> does
> > >> stop them (I see that happening on production cluster) ?
> > >>>
> > >>
> > >>
> > >>
> > >>> 8. I started stargate in the upgraded 0.20.6 in dev cluster
> > >>>       a. earlier when I sent a URL to look for a data row that did
> not
> > >> exist, the return value was NULL , now I get an xml stating HTTP error
> > >> 404/405.        Everything works as expected for an existing data row.
> > >>
> > >> The latter sounds RESTy.  What would you expect of it?  The null?
> > >>
> > >>
> > >> Yes, it should send NULL like it does in the production server. Is
> there
> > >> anyone else you can point to who would have used REST ? This is the
> main
> > >> showstopper for me currently.
> > >>
> > >>
> > >>
> >
>

Re: HBase table lost on upgrade

Posted by Ted Yu <yu...@gmail.com>.

You can copy HBaseFsck.java from trunk and compile in 0.20.6

On Wed, Sep 8, 2010 at 3:43 PM, Sharma, Avani <ag...@ebay.com> wrote:

> Right.
>
> Anyway, where can I get this file from ? Any pointers?
> I can't find it at
> src/main/java/org/apache/hadoop/hbase/client/HBaseFsck.java in 0.20.6.
>
> -----Original Message-----
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Wednesday, September 08, 2010 3:09 PM
> To: user@hbase.apache.org
> Subject: Re: HBase table lost on upgrade
>
> master.jsp shows tables, not regions.
> I personally haven't encountered the problem you're facing.
>
> On Wed, Sep 8, 2010 at 2:36 PM, Sharma, Avani <ag...@ebay.com> wrote:
>
> > Ted,
> > I did look at that thread. It seems I need to modify the code in that
> file?
> > Could you point me to the exact steps to get it and compile it?
> >
> > Did you get through the issue if regions being added to catalog , but do
> > not show up in master.jsp?
> >
> >
> >
> >
> > On Sep 4, 2010, at 9:24 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > The tool Stack mentioned is hbck. If you want to port it to 0.20, see
> > email
> > > thread entitled:
> > > compiling HBaseFsck.java for 0.20.5You should try reducing the number
> of
> > > tables in your system, possibly through HBASE-2473
> > >
> > > Cheers
> > >
> > > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com>
> > wrote:
> > >
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> > Stack
> > >> Sent: Wednesday, September 01, 2010 10:45 PM
> > >> To: user@hbase.apache.org
> > >> Subject: Re: HBase table lost on upgrade
> > >>
> > >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com>
> > wrote:
> > >>> That email was just informational. Below are the details on my
> cluster
> > -
> > >> let me know if more is needed.
> > >>>
> > >>> I have 2 hbase clusters setup
> > >>> -       for production, 6 node cluster,  32G, 8 processors
> > >>> -       for dev, 3 node cluster , 16GRAM , 4 processors
> > >>>
> > >>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
> > >> successfully.
> > >>
> > >> Why not latest stable version, 0.20.6?
> > >>
> > >> This was couple of months ago.
> > >>
> > >>
> > >>> 2. After that I loaded 2G+ files into HDFS and HBASE table.
> > >>
> > >>
> > >> Whats this mean?  Each of the .5M cells was 2G in size or the total
> size
> > >> was 2G?
> > >>
> > >> The total file size is 2G. Cells are of the order of hundreds of
> bytes.
> > >>
> > >>
> > >>>       An example Hbase table looks like this:
> > >>>               {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS
> > =>
> > >> '100', COM true
> > >>>                PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
> > >> '65536', IN_MEMO
> > >>>                RY => 'false', BLOCKCACHE => 'true'}]}
> > >>
> > >> That looks fine.
> > >>
> > >>> 3. I started stargate on one server and accessed Hbase for reading
> from
> > >> another 3rd party application successfully.
> > >>>       It took 600 seconds on dev cluster and 250 on production to
> read
> > >> .5M records from Hbase via stargate.
> > >>
> > >>
> > >> That don't sound so good.
> > >>
> > >>
> > >>
> > >>> 4. later to boost read performance, it was suggested that upgrading
> to
> > >> Hbase0.20.6 will be helpful. I did that on production (w/o running the
> > >> migrate script) and re-started stargate and everything was running
> fine,
> > >> though I did not see a bump in performance.
> > >>>
> > >>> 5. Eventually, I had to move to dev cluster from production because
> of
> > >> some resource issues at our end. Dev cluster had 0.20.3 at this time.
> As
> > I
> > >> started loading more files into Hbase (<10 versions of <1G files) and
> > >> converting my app to use hbase more heavily (via more stargate
> clients),
> > the
> > >> performance started degrading. I decided it was time to upgrade dev
> > cluster
> > >> as well to 0.20.6.  (I did not run the migrate script here as well, I
> > missed
> > >> this step in the doc).
> > >>>
> > >>
> > >> What kinda perf you looking for from REST?
> > >>
> > >> Do you have to use REST?  All is base64'd so its safe to transport.
> > >>
> > >> I also have the Java Api code (for testing purposes) and that gave
> > similar
> > >> performance results (520 seconds on dev and 250 on production
> cluster).
> > Is
> > >> there a way to flush the cache before we run the next experiment? I
> > doubt
> > >> that the first lookup always takes longer and then the later ones
> > perform
> > >> better.
> > >>
> > >> I need something that can integrate with C++ - libcurl and stargate
> were
> > >> the easiest to start with. I could look at thrift or anything else the
> > Hbase
> > >> gurus think might be a better fit performance-wise.
> > >>
> > >>
> > >>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased
> block
> > >> cache (.6) and region server handler counts (75) ), pointing to the
> same
> > >> rootdir, I noticed that some tables were missing. I could see a
> mention
> > of
> > >> them in the logs, but not when I did 'list' in the shell. I recovered
> > those
> > >> tables using add_table.rb script.
> > >>
> > >>
> > >> How did you shutdown this cluster?  Did you reboot machines?  Was your
> > >> hdfs homed on /tmp?  What is going on on your systems?  Are they
> > >> swapping?  Did you give HBase more than its default memory?  You read
> > >> the requirements and made sure ulimit and xceivers had been upped on
> > >> these machines?
> > >>
> > >>
> > >> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp.
> > They
> > >> are not swapping.
> > >> Hbase heap size is 2G.  I have upped the xcievers now on your
> > >> recommanedation.  Do I need to restart hdfs after making this change
> in
> > >> hdfs-site.xml ?
> > >> ulimit -n
> > >> 2048
> > >>
> > >>
> > >>
> > >>>       a. Is there a way to check the health of all Hbase tables in
> the
> > >> cluster after an upgrade or even periodically, to make sure that
> > everything
> > >> is healthy ?
> > >>>       b. I would like to be able to force this error again and check
> > the
> > >> health of hbase and want it to report to me that some tables were
> lost.
> > >> Currently, I just found out because I had very less data and it was
> easy
> > to
> > >> tell.
> > >>>
> > >>
> > >> Iin trunk there is such a tool.  In 0.20.x, run a count against our
> > >> table.  See the hbase shell.  Type help to see how.
> > >>
> > >>
> > >> What tool are you talking about here - it wasn't clear ? Count against
> > >> which table ? I want hbase to check all tables and I don't know how
> many
> > >> tables I have since there are too many - is that possible?
> > >>
> > >>> 7. Here are the issues I face after this upgrade
> > >>>       a. when I run stop-hbase.sh, it  does not stop my regionservers
> > on
> > >> other boxes.
> > >>
> > >> Why not.  Whats going on on those machines?  If you tail the logs on
> > >> the hosts that won't go down and/or on master, what do they say?
> > >> Tail the logs.  Should give you (us) clue.
> > >>
> > >> They do go down with some errors in the log, but down't report it on
> the
> > >> terminal.
> > >> http://pastebin.com/0hYwaffL  regionserver log
> > >>
> > >>
> > >>
> > >>>       b. It does start them using start-hbase.sh.
> > >>>       c. Is it that stopping regionservers is not reported, but it
> does
> > >> stop them (I see that happening on production cluster) ?
> > >>>
> > >>
> > >>
> > >>
> > >>> 8. I started stargate in the upgraded 0.20.6 in dev cluster
> > >>>       a. earlier when I sent a URL to look for a data row that did
> not
> > >> exist, the return value was NULL , now I get an xml stating HTTP error
> > >> 404/405.        Everything works as expected for an existing data row.
> > >>
> > >> The latter sounds RESTy.  What would you expect of it?  The null?
> > >>
> > >>
> > >> Yes, it should send NULL like it does in the production server. Is
> there
> > >> anyone else you can point to who would have used REST ? This is the
> main
> > >> showstopper for me currently.
> > >>
> > >>
> > >>
> >
>

RE: HBase table lost on upgrade

Posted by "Sharma, Avani" <ag...@ebay.com>.

Right.

Anyway, where can I get this file from ? Any pointers? 
I can't find it at src/main/java/org/apache/hadoop/hbase/client/HBaseFsck.java in 0.20.6.

-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Wednesday, September 08, 2010 3:09 PM
To: user@hbase.apache.org
Subject: Re: HBase table lost on upgrade

master.jsp shows tables, not regions.
I personally haven't encountered the problem you're facing.

On Wed, Sep 8, 2010 at 2:36 PM, Sharma, Avani <ag...@ebay.com> wrote:

> Ted,
> I did look at that thread. It seems I need to modify the code in that file?
> Could you point me to the exact steps to get it and compile it?
>
> Did you get through the issue if regions being added to catalog , but do
> not show up in master.jsp?
>
>
>
>
> On Sep 4, 2010, at 9:24 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > The tool Stack mentioned is hbck. If you want to port it to 0.20, see
> email
> > thread entitled:
> > compiling HBaseFsck.java for 0.20.5You should try reducing the number of
> > tables in your system, possibly through HBASE-2473
> >
> > Cheers
> >
> > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com>
> wrote:
> >
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> >> Sent: Wednesday, September 01, 2010 10:45 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: HBase table lost on upgrade
> >>
> >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com>
> wrote:
> >>> That email was just informational. Below are the details on my cluster
> -
> >> let me know if more is needed.
> >>>
> >>> I have 2 hbase clusters setup
> >>> -       for production, 6 node cluster,  32G, 8 processors
> >>> -       for dev, 3 node cluster , 16GRAM , 4 processors
> >>>
> >>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
> >> successfully.
> >>
> >> Why not latest stable version, 0.20.6?
> >>
> >> This was couple of months ago.
> >>
> >>
> >>> 2. After that I loaded 2G+ files into HDFS and HBASE table.
> >>
> >>
> >> Whats this mean?  Each of the .5M cells was 2G in size or the total size
> >> was 2G?
> >>
> >> The total file size is 2G. Cells are of the order of hundreds of bytes.
> >>
> >>
> >>>       An example Hbase table looks like this:
> >>>               {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS
> =>
> >> '100', COM true
> >>>                PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
> >> '65536', IN_MEMO
> >>>                RY => 'false', BLOCKCACHE => 'true'}]}
> >>
> >> That looks fine.
> >>
> >>> 3. I started stargate on one server and accessed Hbase for reading from
> >> another 3rd party application successfully.
> >>>       It took 600 seconds on dev cluster and 250 on production to read
> >> .5M records from Hbase via stargate.
> >>
> >>
> >> That don't sound so good.
> >>
> >>
> >>
> >>> 4. later to boost read performance, it was suggested that upgrading to
> >> Hbase0.20.6 will be helpful. I did that on production (w/o running the
> >> migrate script) and re-started stargate and everything was running fine,
> >> though I did not see a bump in performance.
> >>>
> >>> 5. Eventually, I had to move to dev cluster from production because of
> >> some resource issues at our end. Dev cluster had 0.20.3 at this time. As
> I
> >> started loading more files into Hbase (<10 versions of <1G files) and
> >> converting my app to use hbase more heavily (via more stargate clients),
> the
> >> performance started degrading. I decided it was time to upgrade dev
> cluster
> >> as well to 0.20.6.  (I did not run the migrate script here as well, I
> missed
> >> this step in the doc).
> >>>
> >>
> >> What kinda perf you looking for from REST?
> >>
> >> Do you have to use REST?  All is base64'd so its safe to transport.
> >>
> >> I also have the Java Api code (for testing purposes) and that gave
> similar
> >> performance results (520 seconds on dev and 250 on production cluster).
> Is
> >> there a way to flush the cache before we run the next experiment? I
> doubt
> >> that the first lookup always takes longer and then the later ones
> perform
> >> better.
> >>
> >> I need something that can integrate with C++ - libcurl and stargate were
> >> the easiest to start with. I could look at thrift or anything else the
> Hbase
> >> gurus think might be a better fit performance-wise.
> >>
> >>
> >>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased block
> >> cache (.6) and region server handler counts (75) ), pointing to the same
> >> rootdir, I noticed that some tables were missing. I could see a mention
> of
> >> them in the logs, but not when I did 'list' in the shell. I recovered
> those
> >> tables using add_table.rb script.
> >>
> >>
> >> How did you shutdown this cluster?  Did you reboot machines?  Was your
> >> hdfs homed on /tmp?  What is going on on your systems?  Are they
> >> swapping?  Did you give HBase more than its default memory?  You read
> >> the requirements and made sure ulimit and xceivers had been upped on
> >> these machines?
> >>
> >>
> >> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp.
> They
> >> are not swapping.
> >> Hbase heap size is 2G.  I have upped the xcievers now on your
> >> recommanedation.  Do I need to restart hdfs after making this change in
> >> hdfs-site.xml ?
> >> ulimit -n
> >> 2048
> >>
> >>
> >>
> >>>       a. Is there a way to check the health of all Hbase tables in the
> >> cluster after an upgrade or even periodically, to make sure that
> everything
> >> is healthy ?
> >>>       b. I would like to be able to force this error again and check
> the
> >> health of hbase and want it to report to me that some tables were lost.
> >> Currently, I just found out because I had very less data and it was easy
> to
> >> tell.
> >>>
> >>
> >> Iin trunk there is such a tool.  In 0.20.x, run a count against our
> >> table.  See the hbase shell.  Type help to see how.
> >>
> >>
> >> What tool are you talking about here - it wasn't clear ? Count against
> >> which table ? I want hbase to check all tables and I don't know how many
> >> tables I have since there are too many - is that possible?
> >>
> >>> 7. Here are the issues I face after this upgrade
> >>>       a. when I run stop-hbase.sh, it  does not stop my regionservers
> on
> >> other boxes.
> >>
> >> Why not.  Whats going on on those machines?  If you tail the logs on
> >> the hosts that won't go down and/or on master, what do they say?
> >> Tail the logs.  Should give you (us) clue.
> >>
> >> They do go down with some errors in the log, but down't report it on the
> >> terminal.
> >> http://pastebin.com/0hYwaffL  regionserver log
> >>
> >>
> >>
> >>>       b. It does start them using start-hbase.sh.
> >>>       c. Is it that stopping regionservers is not reported, but it does
> >> stop them (I see that happening on production cluster) ?
> >>>
> >>
> >>
> >>
> >>> 8. I started stargate in the upgraded 0.20.6 in dev cluster
> >>>       a. earlier when I sent a URL to look for a data row that did not
> >> exist, the return value was NULL , now I get an xml stating HTTP error
> >> 404/405.        Everything works as expected for an existing data row.
> >>
> >> The latter sounds RESTy.  What would you expect of it?  The null?
> >>
> >>
> >> Yes, it should send NULL like it does in the production server. Is there
> >> anyone else you can point to who would have used REST ? This is the main
> >> showstopper for me currently.
> >>
> >>
> >>
>

Re: HBase table lost on upgrade

Posted by Ted Yu <yu...@gmail.com>.

master.jsp shows tables, not regions.
I personally haven't encountered the problem you're facing.

On Wed, Sep 8, 2010 at 2:36 PM, Sharma, Avani <ag...@ebay.com> wrote:

> Ted,
> I did look at that thread. It seems I need to modify the code in that file?
> Could you point me to the exact steps to get it and compile it?
>
> Did you get through the issue if regions being added to catalog , but do
> not show up in master.jsp?
>
>
>
>
> On Sep 4, 2010, at 9:24 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > The tool Stack mentioned is hbck. If you want to port it to 0.20, see
> email
> > thread entitled:
> > compiling HBaseFsck.java for 0.20.5You should try reducing the number of
> > tables in your system, possibly through HBASE-2473
> >
> > Cheers
> >
> > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com>
> wrote:
> >
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> >> Sent: Wednesday, September 01, 2010 10:45 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: HBase table lost on upgrade
> >>
> >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com>
> wrote:
> >>> That email was just informational. Below are the details on my cluster
> -
> >> let me know if more is needed.
> >>>
> >>> I have 2 hbase clusters setup
> >>> -       for production, 6 node cluster,  32G, 8 processors
> >>> -       for dev, 3 node cluster , 16GRAM , 4 processors
> >>>
> >>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
> >> successfully.
> >>
> >> Why not latest stable version, 0.20.6?
> >>
> >> This was couple of months ago.
> >>
> >>
> >>> 2. After that I loaded 2G+ files into HDFS and HBASE table.
> >>
> >>
> >> Whats this mean?  Each of the .5M cells was 2G in size or the total size
> >> was 2G?
> >>
> >> The total file size is 2G. Cells are of the order of hundreds of bytes.
> >>
> >>
> >>>       An example Hbase table looks like this:
> >>>               {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS
> =>
> >> '100', COM true
> >>>                PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
> >> '65536', IN_MEMO
> >>>                RY => 'false', BLOCKCACHE => 'true'}]}
> >>
> >> That looks fine.
> >>
> >>> 3. I started stargate on one server and accessed Hbase for reading from
> >> another 3rd party application successfully.
> >>>       It took 600 seconds on dev cluster and 250 on production to read
> >> .5M records from Hbase via stargate.
> >>
> >>
> >> That don't sound so good.
> >>
> >>
> >>
> >>> 4. later to boost read performance, it was suggested that upgrading to
> >> Hbase0.20.6 will be helpful. I did that on production (w/o running the
> >> migrate script) and re-started stargate and everything was running fine,
> >> though I did not see a bump in performance.
> >>>
> >>> 5. Eventually, I had to move to dev cluster from production because of
> >> some resource issues at our end. Dev cluster had 0.20.3 at this time. As
> I
> >> started loading more files into Hbase (<10 versions of <1G files) and
> >> converting my app to use hbase more heavily (via more stargate clients),
> the
> >> performance started degrading. I decided it was time to upgrade dev
> cluster
> >> as well to 0.20.6.  (I did not run the migrate script here as well, I
> missed
> >> this step in the doc).
> >>>
> >>
> >> What kinda perf you looking for from REST?
> >>
> >> Do you have to use REST?  All is base64'd so its safe to transport.
> >>
> >> I also have the Java Api code (for testing purposes) and that gave
> similar
> >> performance results (520 seconds on dev and 250 on production cluster).
> Is
> >> there a way to flush the cache before we run the next experiment? I
> doubt
> >> that the first lookup always takes longer and then the later ones
> perform
> >> better.
> >>
> >> I need something that can integrate with C++ - libcurl and stargate were
> >> the easiest to start with. I could look at thrift or anything else the
> Hbase
> >> gurus think might be a better fit performance-wise.
> >>
> >>
> >>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased block
> >> cache (.6) and region server handler counts (75) ), pointing to the same
> >> rootdir, I noticed that some tables were missing. I could see a mention
> of
> >> them in the logs, but not when I did 'list' in the shell. I recovered
> those
> >> tables using add_table.rb script.
> >>
> >>
> >> How did you shutdown this cluster?  Did you reboot machines?  Was your
> >> hdfs homed on /tmp?  What is going on on your systems?  Are they
> >> swapping?  Did you give HBase more than its default memory?  You read
> >> the requirements and made sure ulimit and xceivers had been upped on
> >> these machines?
> >>
> >>
> >> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp.
> They
> >> are not swapping.
> >> Hbase heap size is 2G.  I have upped the xcievers now on your
> >> recommanedation.  Do I need to restart hdfs after making this change in
> >> hdfs-site.xml ?
> >> ulimit -n
> >> 2048
> >>
> >>
> >>
> >>>       a. Is there a way to check the health of all Hbase tables in the
> >> cluster after an upgrade or even periodically, to make sure that
> everything
> >> is healthy ?
> >>>       b. I would like to be able to force this error again and check
> the
> >> health of hbase and want it to report to me that some tables were lost.
> >> Currently, I just found out because I had very less data and it was easy
> to
> >> tell.
> >>>
> >>
> >> Iin trunk there is such a tool.  In 0.20.x, run a count against our
> >> table.  See the hbase shell.  Type help to see how.
> >>
> >>
> >> What tool are you talking about here - it wasn't clear ? Count against
> >> which table ? I want hbase to check all tables and I don't know how many
> >> tables I have since there are too many - is that possible?
> >>
> >>> 7. Here are the issues I face after this upgrade
> >>>       a. when I run stop-hbase.sh, it  does not stop my regionservers
> on
> >> other boxes.
> >>
> >> Why not.  Whats going on on those machines?  If you tail the logs on
> >> the hosts that won't go down and/or on master, what do they say?
> >> Tail the logs.  Should give you (us) clue.
> >>
> >> They do go down with some errors in the log, but down't report it on the
> >> terminal.
> >> http://pastebin.com/0hYwaffL  regionserver log
> >>
> >>
> >>
> >>>       b. It does start them using start-hbase.sh.
> >>>       c. Is it that stopping regionservers is not reported, but it does
> >> stop them (I see that happening on production cluster) ?
> >>>
> >>
> >>
> >>
> >>> 8. I started stargate in the upgraded 0.20.6 in dev cluster
> >>>       a. earlier when I sent a URL to look for a data row that did not
> >> exist, the return value was NULL , now I get an xml stating HTTP error
> >> 404/405.        Everything works as expected for an existing data row.
> >>
> >> The latter sounds RESTy.  What would you expect of it?  The null?
> >>
> >>
> >> Yes, it should send NULL like it does in the production server. Is there
> >> anyone else you can point to who would have used REST ? This is the main
> >> showstopper for me currently.
> >>
> >>
> >>
>

Re: HBase table lost on upgrade

Posted by "Sharma, Avani" <ag...@ebay.com>.

Ted,
I did look at that thread. It seems I need to modify the code in that file? Could you point me to the exact steps to get it and compile it? 

Did you get through the issue if regions being added to catalog , but do not show up in master.jsp?




On Sep 4, 2010, at 9:24 PM, Ted Yu <yu...@gmail.com> wrote:

> The tool Stack mentioned is hbck. If you want to port it to 0.20, see email
> thread entitled:
> compiling HBaseFsck.java for 0.20.5You should try reducing the number of
> tables in your system, possibly through HBASE-2473
> 
> Cheers
> 
> On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com> wrote:
> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>> Sent: Wednesday, September 01, 2010 10:45 PM
>> To: user@hbase.apache.org
>> Subject: Re: HBase table lost on upgrade
>> 
>> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com> wrote:
>>> That email was just informational. Below are the details on my cluster -
>> let me know if more is needed.
>>> 
>>> I have 2 hbase clusters setup
>>> -       for production, 6 node cluster,  32G, 8 processors
>>> -       for dev, 3 node cluster , 16GRAM , 4 processors
>>> 
>>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
>> successfully.
>> 
>> Why not latest stable version, 0.20.6?
>> 
>> This was couple of months ago.
>> 
>> 
>>> 2. After that I loaded 2G+ files into HDFS and HBASE table.
>> 
>> 
>> Whats this mean?  Each of the .5M cells was 2G in size or the total size
>> was 2G?
>> 
>> The total file size is 2G. Cells are of the order of hundreds of bytes.
>> 
>> 
>>>       An example Hbase table looks like this:
>>>               {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS =>
>> '100', COM true
>>>                PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
>> '65536', IN_MEMO
>>>                RY => 'false', BLOCKCACHE => 'true'}]}
>> 
>> That looks fine.
>> 
>>> 3. I started stargate on one server and accessed Hbase for reading from
>> another 3rd party application successfully.
>>>       It took 600 seconds on dev cluster and 250 on production to read
>> .5M records from Hbase via stargate.
>> 
>> 
>> That don't sound so good.
>> 
>> 
>> 
>>> 4. later to boost read performance, it was suggested that upgrading to
>> Hbase0.20.6 will be helpful. I did that on production (w/o running the
>> migrate script) and re-started stargate and everything was running fine,
>> though I did not see a bump in performance.
>>> 
>>> 5. Eventually, I had to move to dev cluster from production because of
>> some resource issues at our end. Dev cluster had 0.20.3 at this time. As I
>> started loading more files into Hbase (<10 versions of <1G files) and
>> converting my app to use hbase more heavily (via more stargate clients), the
>> performance started degrading. I decided it was time to upgrade dev cluster
>> as well to 0.20.6.  (I did not run the migrate script here as well, I missed
>> this step in the doc).
>>> 
>> 
>> What kinda perf you looking for from REST?
>> 
>> Do you have to use REST?  All is base64'd so its safe to transport.
>> 
>> I also have the Java Api code (for testing purposes) and that gave similar
>> performance results (520 seconds on dev and 250 on production cluster). Is
>> there a way to flush the cache before we run the next experiment? I doubt
>> that the first lookup always takes longer and then the later ones perform
>> better.
>> 
>> I need something that can integrate with C++ - libcurl and stargate were
>> the easiest to start with. I could look at thrift or anything else the Hbase
>> gurus think might be a better fit performance-wise.
>> 
>> 
>>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased block
>> cache (.6) and region server handler counts (75) ), pointing to the same
>> rootdir, I noticed that some tables were missing. I could see a mention of
>> them in the logs, but not when I did 'list' in the shell. I recovered those
>> tables using add_table.rb script.
>> 
>> 
>> How did you shutdown this cluster?  Did you reboot machines?  Was your
>> hdfs homed on /tmp?  What is going on on your systems?  Are they
>> swapping?  Did you give HBase more than its default memory?  You read
>> the requirements and made sure ulimit and xceivers had been upped on
>> these machines?
>> 
>> 
>> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp. They
>> are not swapping.
>> Hbase heap size is 2G.  I have upped the xcievers now on your
>> recommanedation.  Do I need to restart hdfs after making this change in
>> hdfs-site.xml ?
>> ulimit -n
>> 2048
>> 
>> 
>> 
>>>       a. Is there a way to check the health of all Hbase tables in the
>> cluster after an upgrade or even periodically, to make sure that everything
>> is healthy ?
>>>       b. I would like to be able to force this error again and check the
>> health of hbase and want it to report to me that some tables were lost.
>> Currently, I just found out because I had very less data and it was easy to
>> tell.
>>> 
>> 
>> Iin trunk there is such a tool.  In 0.20.x, run a count against our
>> table.  See the hbase shell.  Type help to see how.
>> 
>> 
>> What tool are you talking about here - it wasn't clear ? Count against
>> which table ? I want hbase to check all tables and I don't know how many
>> tables I have since there are too many - is that possible?
>> 
>>> 7. Here are the issues I face after this upgrade
>>>       a. when I run stop-hbase.sh, it  does not stop my regionservers on
>> other boxes.
>> 
>> Why not.  Whats going on on those machines?  If you tail the logs on
>> the hosts that won't go down and/or on master, what do they say?
>> Tail the logs.  Should give you (us) clue.
>> 
>> They do go down with some errors in the log, but down't report it on the
>> terminal.
>> http://pastebin.com/0hYwaffL  regionserver log
>> 
>> 
>> 
>>>       b. It does start them using start-hbase.sh.
>>>       c. Is it that stopping regionservers is not reported, but it does
>> stop them (I see that happening on production cluster) ?
>>> 
>> 
>> 
>> 
>>> 8. I started stargate in the upgraded 0.20.6 in dev cluster
>>>       a. earlier when I sent a URL to look for a data row that did not
>> exist, the return value was NULL , now I get an xml stating HTTP error
>> 404/405.        Everything works as expected for an existing data row.
>> 
>> The latter sounds RESTy.  What would you expect of it?  The null?
>> 
>> 
>> Yes, it should send NULL like it does in the production server. Is there
>> anyone else you can point to who would have used REST ? This is the main
>> showstopper for me currently.
>> 
>> 
>>

Re: HBase table lost on upgrade

Posted by Ted Yu <yu...@gmail.com>.

The tool Stack mentioned is hbck. If you want to port it to 0.20, see email
thread entitled:
compiling HBaseFsck.java for 0.20.5You should try reducing the number of
tables in your system, possibly through HBASE-2473

Cheers

On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com> wrote:

>
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Wednesday, September 01, 2010 10:45 PM
> To: user@hbase.apache.org
> Subject: Re: HBase table lost on upgrade
>
> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com> wrote:
> > That email was just informational. Below are the details on my cluster -
> let me know if more is needed.
> >
> > I have 2 hbase clusters setup
> > -       for production, 6 node cluster,  32G, 8 processors
> > -       for dev, 3 node cluster , 16GRAM , 4 processors
> >
> > 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
> successfully.
>
> Why not latest stable version, 0.20.6?
>
> This was couple of months ago.
>
>
> > 2. After that I loaded 2G+ files into HDFS and HBASE table.
>
>
> Whats this mean?  Each of the .5M cells was 2G in size or the total size
> was 2G?
>
> The total file size is 2G. Cells are of the order of hundreds of bytes.
>
>
> >        An example Hbase table looks like this:
> >                {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS =>
> '100', COM true
> >                 PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
> '65536', IN_MEMO
> >                 RY => 'false', BLOCKCACHE => 'true'}]}
>
> That looks fine.
>
> > 3. I started stargate on one server and accessed Hbase for reading from
> another 3rd party application successfully.
> >        It took 600 seconds on dev cluster and 250 on production to read
> .5M records from Hbase via stargate.
>
>
> That don't sound so good.
>
>
>
> > 4. later to boost read performance, it was suggested that upgrading to
> Hbase0.20.6 will be helpful. I did that on production (w/o running the
> migrate script) and re-started stargate and everything was running fine,
> though I did not see a bump in performance.
> >
> > 5. Eventually, I had to move to dev cluster from production because of
> some resource issues at our end. Dev cluster had 0.20.3 at this time. As I
> started loading more files into Hbase (<10 versions of <1G files) and
> converting my app to use hbase more heavily (via more stargate clients), the
> performance started degrading. I decided it was time to upgrade dev cluster
> as well to 0.20.6.  (I did not run the migrate script here as well, I missed
> this step in the doc).
> >
>
> What kinda perf you looking for from REST?
>
> Do you have to use REST?  All is base64'd so its safe to transport.
>
> I also have the Java Api code (for testing purposes) and that gave similar
> performance results (520 seconds on dev and 250 on production cluster). Is
> there a way to flush the cache before we run the next experiment? I doubt
> that the first lookup always takes longer and then the later ones perform
> better.
>
> I need something that can integrate with C++ - libcurl and stargate were
> the easiest to start with. I could look at thrift or anything else the Hbase
> gurus think might be a better fit performance-wise.
>
>
> > 6. When Hbase 0.20.6 came back up on dev cluster (with increased block
> cache (.6) and region server handler counts (75) ), pointing to the same
> rootdir, I noticed that some tables were missing. I could see a mention of
> them in the logs, but not when I did 'list' in the shell. I recovered those
> tables using add_table.rb script.
>
>
> How did you shutdown this cluster?  Did you reboot machines?  Was your
> hdfs homed on /tmp?  What is going on on your systems?  Are they
> swapping?  Did you give HBase more than its default memory?  You read
> the requirements and made sure ulimit and xceivers had been upped on
> these machines?
>
>
> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp. They
> are not swapping.
> Hbase heap size is 2G.  I have upped the xcievers now on your
> recommanedation.  Do I need to restart hdfs after making this change in
> hdfs-site.xml ?
> ulimit -n
> 2048
>
>
>
> >        a. Is there a way to check the health of all Hbase tables in the
> cluster after an upgrade or even periodically, to make sure that everything
> is healthy ?
> >        b. I would like to be able to force this error again and check the
> health of hbase and want it to report to me that some tables were lost.
> Currently, I just found out because I had very less data and it was easy to
> tell.
> >
>
> Iin trunk there is such a tool.  In 0.20.x, run a count against our
> table.  See the hbase shell.  Type help to see how.
>
>
> What tool are you talking about here - it wasn't clear ? Count against
> which table ? I want hbase to check all tables and I don't know how many
> tables I have since there are too many - is that possible?
>
> > 7. Here are the issues I face after this upgrade
> >        a. when I run stop-hbase.sh, it  does not stop my regionservers on
> other boxes.
>
> Why not.  Whats going on on those machines?  If you tail the logs on
> the hosts that won't go down and/or on master, what do they say?
> Tail the logs.  Should give you (us) clue.
>
> They do go down with some errors in the log, but down't report it on the
> terminal.
> http://pastebin.com/0hYwaffL  regionserver log
>
>
>
> >        b. It does start them using start-hbase.sh.
> >        c. Is it that stopping regionservers is not reported, but it does
> stop them (I see that happening on production cluster) ?
> >
>
>
>
> > 8. I started stargate in the upgraded 0.20.6 in dev cluster
> >        a. earlier when I sent a URL to look for a data row that did not
> exist, the return value was NULL , now I get an xml stating HTTP error
> 404/405.        Everything works as expected for an existing data row.
>
> The latter sounds RESTy.  What would you expect of it?  The null?
>
>
> Yes, it should send NULL like it does in the production server. Is there
> anyone else you can point to who would have used REST ? This is the main
> showstopper for me currently.
>
>
>

Re: HBase table lost on upgrade

Posted by Stack <st...@duboce.net>.

On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <ag...@ebay.com> wrote:
>
> I also have the Java Api code (for testing purposes) and that gave similar performance results (520 seconds on dev and 250 on production cluster). Is there a way to flush the cache before we run the next experiment? I doubt that the first lookup always takes longer and then the later ones perform better.
>

~2k/second into a six node cluster?  I'd say your perf is slow because
you've so little data.  Could that be it?  You are talking 500k rows
and 2G total data.  Go up a couple of orders of magnitude.   Add
500Million rows.  See what perf. is like then?

> I need something that can integrate with C++ - libcurl and stargate were the easiest to start with. I could look at thrift or anything else the Hbase gurus think might be a better fit performance-wise.
>

You could also try thrift and c++

St.Ack

RE: HBase table lost on upgrade

Posted by "Sharma, Avani" <ag...@ebay.com>.

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Wednesday, September 01, 2010 10:45 PM
To: user@hbase.apache.org
Subject: Re: HBase table lost on upgrade

On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com> wrote:
> That email was just informational. Below are the details on my cluster - let me know if more is needed.
>
> I have 2 hbase clusters setup
> -       for production, 6 node cluster,  32G, 8 processors
> -       for dev, 3 node cluster , 16GRAM , 4 processors
>
> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters, successfully.

Why not latest stable version, 0.20.6?

This was couple of months ago.

> 2. After that I loaded 2G+ files into HDFS and HBASE table.

Whats this mean?  Each of the .5M cells was 2G in size or the total size was 2G?

The total file size is 2G. Cells are of the order of hundreds of bytes.

>        An example Hbase table looks like this:
>                {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS => '100', COM true
>                 PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMO
>                 RY => 'false', BLOCKCACHE => 'true'}]}

That looks fine.

> 3. I started stargate on one server and accessed Hbase for reading from another 3rd party application successfully.
>        It took 600 seconds on dev cluster and 250 on production to read .5M records from Hbase via stargate.

That don't sound so good. 

> 4. later to boost read performance, it was suggested that upgrading to Hbase0.20.6 will be helpful. I did that on production (w/o running the migrate script) and re-started stargate and everything was running fine, though I did not see a bump in performance.
>
> 5. Eventually, I had to move to dev cluster from production because of some resource issues at our end. Dev cluster had 0.20.3 at this time. As I started loading more files into Hbase (<10 versions of <1G files) and converting my app to use hbase more heavily (via more stargate clients), the performance started degrading. I decided it was time to upgrade dev cluster as well to 0.20.6.  (I did not run the migrate script here as well, I missed this step in the doc).
>

What kinda perf you looking for from REST?

Do you have to use REST?  All is base64'd so its safe to transport.

I also have the Java Api code (for testing purposes) and that gave similar performance results (520 seconds on dev and 250 on production cluster). Is there a way to flush the cache before we run the next experiment? I doubt that the first lookup always takes longer and then the later ones perform better.

I need something that can integrate with C++ - libcurl and stargate were the easiest to start with. I could look at thrift or anything else the Hbase gurus think might be a better fit performance-wise.

> 6. When Hbase 0.20.6 came back up on dev cluster (with increased block cache (.6) and region server handler counts (75) ), pointing to the same rootdir, I noticed that some tables were missing. I could see a mention of them in the logs, but not when I did 'list' in the shell. I recovered those tables using add_table.rb script.

How did you shutdown this cluster?  Did you reboot machines?  Was your
hdfs homed on /tmp?  What is going on on your systems?  Are they
swapping?  Did you give HBase more than its default memory?  You read
the requirements and made sure ulimit and xceivers had been upped on
these machines?

Did not reboot machines. hdfs or hbase do not store data/logs in /tmp. They are not swapping. 
Hbase heap size is 2G.  I have upped the xcievers now on your recommanedation.  Do I need to restart hdfs after making this change in hdfs-site.xml ? 
ulimit -n
2048

>        a. Is there a way to check the health of all Hbase tables in the cluster after an upgrade or even periodically, to make sure that everything is healthy ?
>        b. I would like to be able to force this error again and check the health of hbase and want it to report to me that some tables were lost. Currently, I just found out because I had very less data and it was easy to tell.
>

Iin trunk there is such a tool.  In 0.20.x, run a count against our
table.  See the hbase shell.  Type help to see how.

What tool are you talking about here - it wasn't clear ? Count against which table ? I want hbase to check all tables and I don't know how many tables I have since there are too many - is that possible?

> 7. Here are the issues I face after this upgrade
>        a. when I run stop-hbase.sh, it  does not stop my regionservers on other boxes.

Why not.  Whats going on on those machines?  If you tail the logs on
the hosts that won't go down and/or on master, what do they say?
Tail the logs.  Should give you (us) clue.

They do go down with some errors in the log, but down't report it on the terminal. 
http://pastebin.com/0hYwaffL  regionserver log 

>        b. It does start them using start-hbase.sh.
>        c. Is it that stopping regionservers is not reported, but it does stop them (I see that happening on production cluster) ?
>

> 8. I started stargate in the upgraded 0.20.6 in dev cluster
>        a. earlier when I sent a URL to look for a data row that did not exist, the return value was NULL , now I get an xml stating HTTP error 404/405.        Everything works as expected for an existing data row.

The latter sounds RESTy.  What would you expect of it?  The null?

Yes, it should send NULL like it does in the production server. Is there anyone else you can point to who would have used REST ? This is the main showstopper for me currently.

Re: HBase table lost on upgrade

Posted by Stack <st...@duboce.net>.

On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <ag...@ebay.com> wrote:
> That email was just informational. Below are the details on my cluster - let me know if more is needed.
>
> I have 2 hbase clusters setup
> -       for production, 6 node cluster,  32G, 8 processors
> -       for dev, 3 node cluster , 16GRAM , 4 processors
>
> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters, successfully.

Why not latest stable version, 0.20.6?

> 2. After that I loaded 2G+ files into HDFS and HBASE table.


Whats this mean?  Each of the .5M cells was 2G in size or the total size was 2G?

>        An example Hbase table looks like this:
>                {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS => '100', COM true
>                 PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMO
>                 RY => 'false', BLOCKCACHE => 'true'}]}

That looks fine.

> 3. I started stargate on one server and accessed Hbase for reading from another 3rd party application successfully.
>        It took 600 seconds on dev cluster and 250 on production to read .5M records from Hbase via stargate.


That don't sound so good.


> 4. later to boost read performance, it was suggested that upgrading to Hbase0.20.6 will be helpful. I did that on production (w/o running the migrate script) and re-started stargate and everything was running fine, though I did not see a bump in performance.
>
> 5. Eventually, I had to move to dev cluster from production because of some resource issues at our end. Dev cluster had 0.20.3 at this time. As I started loading more files into Hbase (<10 versions of <1G files) and converting my app to use hbase more heavily (via more stargate clients), the performance started degrading. I decided it was time to upgrade dev cluster as well to 0.20.6.  (I did not run the migrate script here as well, I missed this step in the doc).
>

What kinda perf you looking for from REST?

Do you have to use REST?  All is base64'd so its safe to transport.


> 6. When Hbase 0.20.6 came back up on dev cluster (with increased block cache (.6) and region server handler counts (75) ), pointing to the same rootdir, I noticed that some tables were missing. I could see a mention of them in the logs, but not when I did 'list' in the shell. I recovered those tables using add_table.rb script.


How did you shutdown this cluster?  Did you reboot machines?  Was your
hdfs homed on /tmp?  What is going on on your systems?  Are they
swapping?  Did you give HBase more than its default memory?  You read
the requirements and made sure ulimit and xceivers had been upped on
these machiens?


>        a. Is there a way to check the health of all Hbase tables in the cluster after an upgrade or even periodically, to make sure that everything is healthy ?
>        b. I would like to be able to force this error again and check the health of hbase and want it to report to me that some tables were lost. Currently, I just found out because I had very less data and it was easy to tell.
>

Iin trunk there is such a tool.  In 0.20.x, run a count against our
table.  See the hbase shell.  Type help to see how.


> 7. Here are the issues I face after this upgrade
>        a. when I run stop-hbase.sh, it  does not stop my regionservers on other boxes.

Why not.  Whats going on on those machines?  If you tail the logs on
the hosts that won't go down and/or on master, what do they say?


>        b. It does start them using start-hbase.sh.
>        c. Is it that stopping regionservers is not reported, but it does stop them (I see that happening on production cluster) ?
>

Tail the logs.  Should give you (us) clue.


> 8. I started stargate in the upgraded 0.20.6 in dev cluster
>        a. earlier when I sent a URL to look for a data row that did not exist, the return value was NULL , now I get an xml stating HTTP error 404/405.        Everything works as expected for an existing data row.

The latter sounds RESTy.  What would you expect of it?  The null?

St.Ack