You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Stanislaw Kogut <sk...@sistyma.net> on 2010/06/29 15:21:01 UTC

HBase 0.20.5 issues

Hi everyone!

Has someone noticed same behaviour of hbase-0.20.5 after upgrade from
0.20.3?

$hadoop jar hbase/hbase-0.20.5-test.jar sequentialWrite 1
10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51 GMT
10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client environment:host.name
=se002.cluster.local
10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_20
10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_20/jre
10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/opt/hadoop/common/bin/../conf:/usr/java/latest/lib/tools.jar:/opt/hadoop/common/bin/..:/opt/hadoop/common/bin/../hadoop-0.20.2-core.jar:/opt/hadoop/common/bin/../lib/commons-cli-1.2.jar:/opt/hadoop/common/bin/../lib/commons-codec-1.3.jar:/opt/hadoop/common/bin/../lib/commons-el-1.0.jar:/opt/hadoop/common/bin/../lib/commons-httpclient-3.0.1.jar:/opt/hadoop/common/bin/../lib/commons-logging-1.0.4.jar:/opt/hadoop/common/bin/../lib/commons-logging-api-1.0.4.jar:/opt/hadoop/common/bin/../lib/commons-net-1.4.1.jar:/opt/hadoop/common/bin/../lib/core-3.1.1.jar:/opt/hadoop/common/bin/../lib/hsqldb-1.8.0.10.jar:/opt/hadoop/common/bin/../lib/jasper-compiler-5.5.12.jar:/opt/hadoop/common/bin/../lib/jasper-runtime-5.5.12.jar:/opt/hadoop/common/bin/../lib/jets3t-0.6.1.jar:/opt/hadoop/common/bin/../lib/jetty-6.1.14.jar:/opt/hadoop/common/bin/../lib/jetty-util-6.1.14.jar:/opt/hadoop/common/bin/../lib/junit-3.8.1.jar:/opt/hadoop/common/bin/../lib/kfs-0.2.2.jar:/opt/hadoop/common/bin/../lib/log4j-1.2.15.jar:/opt/hadoop/common/bin/../lib/mockito-all-1.8.0.jar:/opt/hadoop/common/bin/../lib/oro-2.0.8.jar:/opt/hadoop/common/bin/../lib/servlet-api-2.5-6.1.14.jar:/opt/hadoop/common/bin/../lib/slf4j-api-1.4.3.jar:/opt/hadoop/common/bin/../lib/slf4j-log4j12-1.4.3.jar:/opt/hadoop/common/bin/../lib/xmlenc-0.52.jar:/opt/hadoop/common/bin/../lib/jsp-2.1/jsp-2.1.jar:/opt/hadoop/common/bin/../lib/jsp-2.1/jsp-api-2.1.jar:/opt/hadoop/hbase/lib/zookeeper-3.2.2.jar:/opt/hadoop/hbase/conf:/opt/hadoop/hbase/hbase-0.20.5.jar
10/06/29 16:03:22 INFO hbase.PerformanceEvaluation: Table {NAME =>
'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
=> '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}]} created
10/06/29 16:03:22 INFO hbase.PerformanceEvaluation: Start class
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at offset
0 for 1048576 rows
10/06/29 16:03:37 INFO hbase.PerformanceEvaluation: 0/104857/1048576
10/06/29 16:03:52 INFO hbase.PerformanceEvaluation: 0/209714/1048576
10/06/29 16:04:09 INFO hbase.PerformanceEvaluation: 0/314571/1048576
10/06/29 16:04:27 INFO hbase.PerformanceEvaluation: 0/419428/1048576
10/06/29 16:06:06 ERROR hbase.PerformanceEvaluation: Failed
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server, retryOnlyOne=true, index=0, islastrow=false,
tries=9, numtries=10, i=0, listsize=9650, region=TestTable,,1277816601856
for region TestTable,,1277816601856, row '0000511450', but failed after 10
attempts.
Exceptions:

    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1149)
    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230)
    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
    at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:621)
    at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:637)
    at
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:889)
    at
org.apache.hadoop.hbase.PerformanceEvaluation.runNIsOne(PerformanceEvaluation.java:907)
    at
org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:939)
    at
org.apache.hadoop.hbase.PerformanceEvaluation.doCommandLine(PerformanceEvaluation.java:1036)
    at
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:1061)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Looks like it happens on region splits.

Also, some other strange things:
1. After writing something to TestTable, some regionservers log these:
2010-06-29 16:05:06,458 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
.META.,,1

After that, there comes another .META. region on this server in 'status
detailed' output.
Even more, sometimes it opens not only .META., but also other regions from
data tables, such as these TestTable from performanceEvaluation.

2. After disabling and removing table its regions still look like assigned
to regionservers in 'status detailed'.

3. After 3-4 tries to write into TestTable from performanceEvaluation there
another strange thing:
10/06/29 16:01:59 ERROR hbase.PerformanceEvaluation: Failed
org.apache.hadoop.hbase.TableExistsException:
org.apache.hadoop.hbase.TableExistsException: TestTable

but, table not exists. You cannot disable and drop it, and hbase shells not
lists it in 'list' output. But you also cannot create it, because it is
"exists", and it's regions are assigned to regionservers. Note, nobody drops
this table.

I spent some time to find out why this all happens, trying to play around
hadoop cluster versions (first it was cloudera, than "vanilla" 0.20.2), but
still having this issue. So, I will hope, if someone help to find cause for
this.

-- 
Regards,
Stanislaw Kogut

Re: dilemma of memory and CPU for hbase.

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Check the requirements:
http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements

You can confirm that you have a xcievers problem by grepping the
datanode logs with the error message pasted in the last bullet point.
If so, it will explain a lot!

J-D

On Thu, Jul 1, 2010 at 5:49 PM, Jinsong Hu <ji...@hotmail.com> wrote:
>
> I do have some errors , such as
>
> 2010-07-01 22:53:30,187 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> crea
> teBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
> 10.11
> 0.8.85:50010
> java.io.EOFException
>
> 2010-07-01 23:00:49,976 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> crea
> teBlockOutputStream java.net.ConnectException: Connection timed out
> 2010-07-01 23:04:13,356 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> crea
> teBlockOutputStream java.net.ConnectException: Connection timed out
>
>
> seems they are all hadoop data node errors.
>
> I searched and people say I need to increase dfs.datanode.max.xcievers to
> 2K, and increase
> ulimit to 32K ( currently it is set at 16K).
>
> I will get that done and do more testing.
>
> Jimmy.
>
> --------------------------------------------------
> From: "Jean-Daniel Cryans" <jd...@apache.org>
> Sent: Thursday, July 01, 2010 5:41 PM
> To: <us...@hbase.apache.org>
> Subject: Re: dilemma of memory and CPU for hbase.
>
>> When I start HBase I usually just tail the master log, but it's
>> actually just a few seconds then another few seconds for .META. then
>> it starts assigning all other regions.
>>
>> Did you make sure your master log was clean of errors?
>>
>> J-D
>>
>> On Thu, Jul 1, 2010 at 5:40 PM, Jinsong Hu <ji...@hotmail.com> wrote:
>>>
>>> yes, it terminated correctely. there is no exception while running the
>>> add_table.
>>>
>>> are you saying that after restart, I need to wait for some time for the
>>> -ROOT- to
>>> be assigned ? usually how long I need to wait ?
>>>
>>> Jimmy
>>>
>>> --------------------------------------------------
>>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>>> Sent: Thursday, July 01, 2010 5:27 PM
>>> To: <us...@hbase.apache.org>
>>> Subject: Re: dilemma of memory and CPU for hbase.
>>>
>>>> Did you see any exception when you ran add_table? Did it even
>>>> terminated correctly?
>>>>
>>>> After a restart, the regions aren't readily available. If something
>>>> blocked the master from assigning -ROOT-, it should be pretty evident
>>>> by looking at the master log.
>>>>
>>>> J-D
>>>>
>>>> On Thu, Jul 1, 2010 at 5:23 PM, Jinsong Hu <ji...@hotmail.com>
>>>> wrote:
>>>>>
>>>>> After I run the add_table.rb, I  refreshed the master's UI page, and
>>>>> then
>>>>> clicked on the table to show the regions. I expect that all regions
>>>>> will
>>>>> be
>>>>> there.
>>>>> But , I found that there are significantly fewer regions. Lots of
>>>>> regions
>>>>> that was there before were gone.
>>>>>
>>>>> I then restarted the whole hbase master and region server. And now it
>>>>> is
>>>>> even worse. the master UI page doesn't even load. saying the _ROOT
>>>>> region
>>>>> is and .META is not served by any regionserver.  The whole cluster is
>>>>> not
>>>>> in
>>>>> a usable state.
>>>>>
>>>>> That forced me to rename the /hbase to /hbase-0.20.4, and restart all
>>>>> hbase
>>>>> master and regionservers. recreate all tables, etc.essentially starting
>>>>> from scratch.
>>>>>
>>>>> Jimmy
>>>>>
>>>>> --------------------------------------------------
>>>>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>>>>> Sent: Thursday, July 01, 2010 5:10 PM
>>>>> To: <us...@hbase.apache.org>
>>>>> Subject: Re: dilemma of memory and CPU for hbase.
>>>>>
>>>>>> add_table.rb doesn't actually write much in the file system, all your
>>>>>> data is still there. It just wipes all the .META. entries and replaces
>>>>>> them with the .regioninfo files found in every region directory.
>>>>>>
>>>>>> Can you define what you mean by "corrupted". It's really an
>>>>>> overloaded-term.
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Thu, Jul 1, 2010 at 5:01 PM, Jinsong Hu <ji...@hotmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi, Jean:
>>>>>>>  Thanks! I will run the add_table.rb and see if it fixes the problem.
>>>>>>>  Our namenode is backed up with  HA and DRBD, and the hbase master
>>>>>>> machine
>>>>>>> colocates with name node , job tracker so we are not wasting
>>>>>>> resources.
>>>>>>>
>>>>>>>  The region hole probably comes from previous 0.20.4 hbase operation.
>>>>>>> the
>>>>>>> 0.20.4 hbase was
>>>>>>> very unstable during its operation. lots of times the master says the
>>>>>>> region
>>>>>>> is not there but actually
>>>>>>> the region server says it was serving the region.
>>>>>>>
>>>>>>>
>>>>>>> I followed the instruction and run commands like
>>>>>>>
>>>>>>> bin/hbase org.jruby.Main bin/add_table.rb /hbase/table_name
>>>>>>>
>>>>>>> After the execution, I found all my tables are corrupted and I can't
>>>>>>> use
>>>>>>> it
>>>>>>> any more. restarting hbase
>>>>>>> doesn't help either. I have to wipe out all the /hbase directory and
>>>>>>> start
>>>>>>> from scratch.
>>>>>>>
>>>>>>>
>>>>>>> it looks that the add_table.rb can corrupt the whole hbase.  Anyway,
>>>>>>> I
>>>>>>> am
>>>>>>> regenerating the data from
>>>>>>> scratch and let's see if it will work out.
>>>>>>>
>>>>>>> Jimmy.
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------
>>>>>>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>>>>>>> Sent: Thursday, July 01, 2010 2:17 PM
>>>>>>> To: <us...@hbase.apache.org>
>>>>>>> Subject: Re: dilemma of memory and CPU for hbase.
>>>>>>>
>>>>>>>> (taking the conversation back to the list after receiving logs and
>>>>>>>> heap
>>>>>>>> dump)
>>>>>>>>
>>>>>>>> The issue here is actually much more nasty than it seems. But before
>>>>>>>> I
>>>>>>>> describe the problem, you said:
>>>>>>>>
>>>>>>>>>  I have 3 machines as hbase master (only 1 is active), 3
>>>>>>>>> zookeepers.
>>>>>>>>> 8
>>>>>>>>> regionservers.
>>>>>>>>
>>>>>>>> If those are all distinct machines, you are wasting a lot of
>>>>>>>> hardware.
>>>>>>>> Unless you have a HA Namenode (I highly doubt), then you already
>>>>>>>> have
>>>>>>>> a SPOF there so you might as well put every service on that single
>>>>>>>> node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK
>>>>>>>> node, but unless you share the zookeeper ensemble between clusters
>>>>>>>> then losing the Namenode is as bad as losing ZK so might as well put
>>>>>>>> them together. At StumbleUpon we have 2-3 clusters using the same
>>>>>>>> ensembles, so it makes more sense to put them in a HA setup.
>>>>>>>>
>>>>>>>> That said, in your log I see:
>>>>>>>>
>>>>>>>> 2010-06-29 00:00:00,064 DEBUG
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>>>>>>> interrupted at index=0 because:Requested row out of range for
>>>>>>>> HRegion
>>>>>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>>>>>> ...
>>>>>>>> 2010-06-29 12:26:13,352 DEBUG
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>>>>>>> interrupted at index=0 because:Requested row out of range for
>>>>>>>> HRegion
>>>>>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>>>>>>
>>>>>>>> So for 12 hours (and probably more), the same row was requested
>>>>>>>> almost
>>>>>>>> every 100ms but it was always failing on a WrongRegionException
>>>>>>>> (that's the name of what we see here). You probably use the write
>>>>>>>> buffer since you want to import as fast as possible, so all these
>>>>>>>> buffers are left unused after the clients terminate their RPC. That
>>>>>>>> rate of failed insertion must have kept your garbage collector
>>>>>>>> _very_
>>>>>>>> busy, and at some point the JVM OOMEd. This is the stack from your
>>>>>>>> OOME:
>>>>>>>>
>>>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>>> at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175)
>>>>>>>> at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867)
>>>>>>>> at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835)
>>>>>>>> at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
>>>>>>>> at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
>>>>>>>>
>>>>>>>> This is where we deserialize client data, so it correlates with what
>>>>>>>> I
>>>>>>>> just described.
>>>>>>>>
>>>>>>>> Now, this means that you probably have a hole (or more) in your
>>>>>>>> .META.
>>>>>>>> table. It usually happens after a region server fails if it was
>>>>>>>> carrying it (since data loss is possible with that version of HDFS)
>>>>>>>> or
>>>>>>>> if a bug in the master messes up the .META. region. Now 2 things:
>>>>>>>>
>>>>>>>> - It would be nice to know why you have a hole. Look at your .META.
>>>>>>>> table around the row in your region server log, you should see that
>>>>>>>> the start/end keys don't match. Then you can look in the master log
>>>>>>>> from yesterday to search for what went wrong, maybe see some
>>>>>>>> exceptions, or maybe a region server failed for any reason and it
>>>>>>>> was
>>>>>>>> hosting .META.
>>>>>>>>
>>>>>>>> - You probably want to fix your table. Use the bin/add_table.rb
>>>>>>>> script (other people on this list used it in the past, search the
>>>>>>>> archive for more info).
>>>>>>>>
>>>>>>>> Finally (whew!), if you are still developing your solution around
>>>>>>>> HBase, you might want to try out one of our dev release that does
>>>>>>>> work
>>>>>>>> with a durable Hadoop release. See
>>>>>>>> http://hbase.apache.org/docs/r0.89.20100621/ for more info.
>>>>>>>> Cloudera's
>>>>>>>> CDH3b2 also has everything you need.
>>>>>>>>
>>>>>>>> J-D
>>>>>>>>
>>>>>>>> On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans
>>>>>>>> <jd...@apache.org>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> 653 regions is very low, even if you had a total of 3 region
>>>>>>>>> servers
>>>>>>>>> I
>>>>>>>>> wouldn't expect any problem.
>>>>>>>>>
>>>>>>>>> So to me it seems to point towards either a configuration issue or
>>>>>>>>> a
>>>>>>>>> usage issue. Can you:
>>>>>>>>>
>>>>>>>>>  - Put the log of one region server that OOMEd on a public server.
>>>>>>>>>  - Tell us more about your setup: # of nodes, hardware,
>>>>>>>>> configuration
>>>>>>>>> file
>>>>>>>>>  - Tell us more about how you insert data into HBase
>>>>>>>>>
>>>>>>>>> And BTW are you trying to do an initial import of your data set? If
>>>>>>>>> so, have you considered using HFileOutputFormat?
>>>>>>>>>
>>>>>>>>> Thx,
>>>>>>>>>
>>>>>>>>> J-D
>>>>>>>>>
>>>>>>>>> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu
>>>>>>>>> <ji...@hotmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi, Sir:
>>>>>>>>>>  I am using hbase 0.20.5 and this morning I found that 3 of  my
>>>>>>>>>> region
>>>>>>>>>> server running out of memory.
>>>>>>>>>> the regionserver is given 6G memory each, and on average, I have
>>>>>>>>>> 653
>>>>>>>>>> regions
>>>>>>>>>> in total. max store size
>>>>>>>>>> is 256M. I analyzed the dump and it shows that there are too many
>>>>>>>>>> HRegion in
>>>>>>>>>> memory.
>>>>>>>>>>
>>>>>>>>>>  Previously set max store size to 2G, but then I found the region
>>>>>>>>>> server
>>>>>>>>>> constantly does minor compaction and the CPU usage is very high,
>>>>>>>>>> It
>>>>>>>>>> also
>>>>>>>>>> blocks the heavy client record insertion.
>>>>>>>>>>
>>>>>>>>>>  So now I am limited on one side by memory,  limited on another
>>>>>>>>>> size
>>>>>>>>>> by
>>>>>>>>>> CPU.
>>>>>>>>>> Is there anyway to get out of this dilemma ?
>>>>>>>>>>
>>>>>>>>>> Jimmy.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: dilemma of memory and CPU for hbase.

Posted by Jinsong Hu <ji...@hotmail.com>.

I do have some errors , such as

2010-07-01 22:53:30,187 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
crea
teBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 
10.11
0.8.85:50010
java.io.EOFException

2010-07-01 23:00:49,976 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
crea
teBlockOutputStream java.net.ConnectException: Connection timed out
2010-07-01 23:04:13,356 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
crea
teBlockOutputStream java.net.ConnectException: Connection timed out


seems they are all hadoop data node errors.

I searched and people say I need to increase dfs.datanode.max.xcievers to 
2K, and increase
ulimit to 32K ( currently it is set at 16K).

I will get that done and do more testing.

Jimmy.

--------------------------------------------------
From: "Jean-Daniel Cryans" <jd...@apache.org>
Sent: Thursday, July 01, 2010 5:41 PM
To: <us...@hbase.apache.org>
Subject: Re: dilemma of memory and CPU for hbase.

> When I start HBase I usually just tail the master log, but it's
> actually just a few seconds then another few seconds for .META. then
> it starts assigning all other regions.
>
> Did you make sure your master log was clean of errors?
>
> J-D
>
> On Thu, Jul 1, 2010 at 5:40 PM, Jinsong Hu <ji...@hotmail.com> wrote:
>> yes, it terminated correctely. there is no exception while running the
>> add_table.
>>
>> are you saying that after restart, I need to wait for some time for the
>> -ROOT- to
>> be assigned ? usually how long I need to wait ?
>>
>> Jimmy
>>
>> --------------------------------------------------
>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>> Sent: Thursday, July 01, 2010 5:27 PM
>> To: <us...@hbase.apache.org>
>> Subject: Re: dilemma of memory and CPU for hbase.
>>
>>> Did you see any exception when you ran add_table? Did it even
>>> terminated correctly?
>>>
>>> After a restart, the regions aren't readily available. If something
>>> blocked the master from assigning -ROOT-, it should be pretty evident
>>> by looking at the master log.
>>>
>>> J-D
>>>
>>> On Thu, Jul 1, 2010 at 5:23 PM, Jinsong Hu <ji...@hotmail.com> 
>>> wrote:
>>>>
>>>> After I run the add_table.rb, I  refreshed the master's UI page, and 
>>>> then
>>>> clicked on the table to show the regions. I expect that all regions 
>>>> will
>>>> be
>>>> there.
>>>> But , I found that there are significantly fewer regions. Lots of 
>>>> regions
>>>> that was there before were gone.
>>>>
>>>> I then restarted the whole hbase master and region server. And now it 
>>>> is
>>>> even worse. the master UI page doesn't even load. saying the _ROOT 
>>>> region
>>>> is and .META is not served by any regionserver.  The whole cluster is 
>>>> not
>>>> in
>>>> a usable state.
>>>>
>>>> That forced me to rename the /hbase to /hbase-0.20.4, and restart all
>>>> hbase
>>>> master and regionservers. recreate all tables, etc.essentially starting
>>>> from scratch.
>>>>
>>>> Jimmy
>>>>
>>>> --------------------------------------------------
>>>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>>>> Sent: Thursday, July 01, 2010 5:10 PM
>>>> To: <us...@hbase.apache.org>
>>>> Subject: Re: dilemma of memory and CPU for hbase.
>>>>
>>>>> add_table.rb doesn't actually write much in the file system, all your
>>>>> data is still there. It just wipes all the .META. entries and replaces
>>>>> them with the .regioninfo files found in every region directory.
>>>>>
>>>>> Can you define what you mean by "corrupted". It's really an
>>>>> overloaded-term.
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, Jul 1, 2010 at 5:01 PM, Jinsong Hu <ji...@hotmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi, Jean:
>>>>>>  Thanks! I will run the add_table.rb and see if it fixes the problem.
>>>>>>  Our namenode is backed up with  HA and DRBD, and the hbase master
>>>>>> machine
>>>>>> colocates with name node , job tracker so we are not wasting 
>>>>>> resources.
>>>>>>
>>>>>>  The region hole probably comes from previous 0.20.4 hbase operation.
>>>>>> the
>>>>>> 0.20.4 hbase was
>>>>>> very unstable during its operation. lots of times the master says the
>>>>>> region
>>>>>> is not there but actually
>>>>>> the region server says it was serving the region.
>>>>>>
>>>>>>
>>>>>> I followed the instruction and run commands like
>>>>>>
>>>>>> bin/hbase org.jruby.Main bin/add_table.rb /hbase/table_name
>>>>>>
>>>>>> After the execution, I found all my tables are corrupted and I can't
>>>>>> use
>>>>>> it
>>>>>> any more. restarting hbase
>>>>>> doesn't help either. I have to wipe out all the /hbase directory and
>>>>>> start
>>>>>> from scratch.
>>>>>>
>>>>>>
>>>>>> it looks that the add_table.rb can corrupt the whole hbase.  Anyway, 
>>>>>> I
>>>>>> am
>>>>>> regenerating the data from
>>>>>> scratch and let's see if it will work out.
>>>>>>
>>>>>> Jimmy.
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------
>>>>>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>>>>>> Sent: Thursday, July 01, 2010 2:17 PM
>>>>>> To: <us...@hbase.apache.org>
>>>>>> Subject: Re: dilemma of memory and CPU for hbase.
>>>>>>
>>>>>>> (taking the conversation back to the list after receiving logs and
>>>>>>> heap
>>>>>>> dump)
>>>>>>>
>>>>>>> The issue here is actually much more nasty than it seems. But before 
>>>>>>> I
>>>>>>> describe the problem, you said:
>>>>>>>
>>>>>>>>  I have 3 machines as hbase master (only 1 is active), 3 
>>>>>>>> zookeepers.
>>>>>>>> 8
>>>>>>>> regionservers.
>>>>>>>
>>>>>>> If those are all distinct machines, you are wasting a lot of 
>>>>>>> hardware.
>>>>>>> Unless you have a HA Namenode (I highly doubt), then you already 
>>>>>>> have
>>>>>>> a SPOF there so you might as well put every service on that single
>>>>>>> node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK
>>>>>>> node, but unless you share the zookeeper ensemble between clusters
>>>>>>> then losing the Namenode is as bad as losing ZK so might as well put
>>>>>>> them together. At StumbleUpon we have 2-3 clusters using the same
>>>>>>> ensembles, so it makes more sense to put them in a HA setup.
>>>>>>>
>>>>>>> That said, in your log I see:
>>>>>>>
>>>>>>> 2010-06-29 00:00:00,064 DEBUG
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>>>>>> interrupted at index=0 because:Requested row out of range for 
>>>>>>> HRegion
>>>>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>>>>> ...
>>>>>>> 2010-06-29 12:26:13,352 DEBUG
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>>>>>> interrupted at index=0 because:Requested row out of range for 
>>>>>>> HRegion
>>>>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>>>>>
>>>>>>> So for 12 hours (and probably more), the same row was requested 
>>>>>>> almost
>>>>>>> every 100ms but it was always failing on a WrongRegionException
>>>>>>> (that's the name of what we see here). You probably use the write
>>>>>>> buffer since you want to import as fast as possible, so all these
>>>>>>> buffers are left unused after the clients terminate their RPC. That
>>>>>>> rate of failed insertion must have kept your garbage collector 
>>>>>>> _very_
>>>>>>> busy, and at some point the JVM OOMEd. This is the stack from your
>>>>>>> OOME:
>>>>>>>
>>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
>>>>>>>
>>>>>>> This is where we deserialize client data, so it correlates with what 
>>>>>>> I
>>>>>>> just described.
>>>>>>>
>>>>>>> Now, this means that you probably have a hole (or more) in your 
>>>>>>> .META.
>>>>>>> table. It usually happens after a region server fails if it was
>>>>>>> carrying it (since data loss is possible with that version of HDFS) 
>>>>>>> or
>>>>>>> if a bug in the master messes up the .META. region. Now 2 things:
>>>>>>>
>>>>>>> - It would be nice to know why you have a hole. Look at your .META.
>>>>>>> table around the row in your region server log, you should see that
>>>>>>> the start/end keys don't match. Then you can look in the master log
>>>>>>> from yesterday to search for what went wrong, maybe see some
>>>>>>> exceptions, or maybe a region server failed for any reason and it 
>>>>>>> was
>>>>>>> hosting .META.
>>>>>>>
>>>>>>> - You probably want to fix your table. Use the bin/add_table.rb
>>>>>>> script (other people on this list used it in the past, search the
>>>>>>> archive for more info).
>>>>>>>
>>>>>>> Finally (whew!), if you are still developing your solution around
>>>>>>> HBase, you might want to try out one of our dev release that does 
>>>>>>> work
>>>>>>> with a durable Hadoop release. See
>>>>>>> http://hbase.apache.org/docs/r0.89.20100621/ for more info. 
>>>>>>> Cloudera's
>>>>>>> CDH3b2 also has everything you need.
>>>>>>>
>>>>>>> J-D
>>>>>>>
>>>>>>> On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans
>>>>>>> <jd...@apache.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> 653 regions is very low, even if you had a total of 3 region 
>>>>>>>> servers
>>>>>>>> I
>>>>>>>> wouldn't expect any problem.
>>>>>>>>
>>>>>>>> So to me it seems to point towards either a configuration issue or 
>>>>>>>> a
>>>>>>>> usage issue. Can you:
>>>>>>>>
>>>>>>>>  - Put the log of one region server that OOMEd on a public server.
>>>>>>>>  - Tell us more about your setup: # of nodes, hardware, 
>>>>>>>> configuration
>>>>>>>> file
>>>>>>>>  - Tell us more about how you insert data into HBase
>>>>>>>>
>>>>>>>> And BTW are you trying to do an initial import of your data set? If
>>>>>>>> so, have you considered using HFileOutputFormat?
>>>>>>>>
>>>>>>>> Thx,
>>>>>>>>
>>>>>>>> J-D
>>>>>>>>
>>>>>>>> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu 
>>>>>>>> <ji...@hotmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi, Sir:
>>>>>>>>>  I am using hbase 0.20.5 and this morning I found that 3 of  my
>>>>>>>>> region
>>>>>>>>> server running out of memory.
>>>>>>>>> the regionserver is given 6G memory each, and on average, I have 
>>>>>>>>> 653
>>>>>>>>> regions
>>>>>>>>> in total. max store size
>>>>>>>>> is 256M. I analyzed the dump and it shows that there are too many
>>>>>>>>> HRegion in
>>>>>>>>> memory.
>>>>>>>>>
>>>>>>>>>  Previously set max store size to 2G, but then I found the region
>>>>>>>>> server
>>>>>>>>> constantly does minor compaction and the CPU usage is very high, 
>>>>>>>>> It
>>>>>>>>> also
>>>>>>>>> blocks the heavy client record insertion.
>>>>>>>>>
>>>>>>>>>  So now I am limited on one side by memory,  limited on another 
>>>>>>>>> size
>>>>>>>>> by
>>>>>>>>> CPU.
>>>>>>>>> Is there anyway to get out of this dilemma ?
>>>>>>>>>
>>>>>>>>> Jimmy.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: dilemma of memory and CPU for hbase.

Posted by Jean-Daniel Cryans <jd...@apache.org>.

When I start HBase I usually just tail the master log, but it's
actually just a few seconds then another few seconds for .META. then
it starts assigning all other regions.

Did you make sure your master log was clean of errors?

J-D

On Thu, Jul 1, 2010 at 5:40 PM, Jinsong Hu <ji...@hotmail.com> wrote:
> yes, it terminated correctely. there is no exception while running the
> add_table.
>
> are you saying that after restart, I need to wait for some time for the
> -ROOT- to
> be assigned ? usually how long I need to wait ?
>
> Jimmy
>
> --------------------------------------------------
> From: "Jean-Daniel Cryans" <jd...@apache.org>
> Sent: Thursday, July 01, 2010 5:27 PM
> To: <us...@hbase.apache.org>
> Subject: Re: dilemma of memory and CPU for hbase.
>
>> Did you see any exception when you ran add_table? Did it even
>> terminated correctly?
>>
>> After a restart, the regions aren't readily available. If something
>> blocked the master from assigning -ROOT-, it should be pretty evident
>> by looking at the master log.
>>
>> J-D
>>
>> On Thu, Jul 1, 2010 at 5:23 PM, Jinsong Hu <ji...@hotmail.com> wrote:
>>>
>>> After I run the add_table.rb, I  refreshed the master's UI page, and then
>>> clicked on the table to show the regions. I expect that all regions will
>>> be
>>> there.
>>> But , I found that there are significantly fewer regions. Lots of regions
>>> that was there before were gone.
>>>
>>> I then restarted the whole hbase master and region server. And now it is
>>> even worse. the master UI page doesn't even load. saying the _ROOT region
>>> is and .META is not served by any regionserver.  The whole cluster is not
>>> in
>>> a usable state.
>>>
>>> That forced me to rename the /hbase to /hbase-0.20.4, and restart all
>>> hbase
>>> master and regionservers. recreate all tables, etc.essentially starting
>>> from scratch.
>>>
>>> Jimmy
>>>
>>> --------------------------------------------------
>>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>>> Sent: Thursday, July 01, 2010 5:10 PM
>>> To: <us...@hbase.apache.org>
>>> Subject: Re: dilemma of memory and CPU for hbase.
>>>
>>>> add_table.rb doesn't actually write much in the file system, all your
>>>> data is still there. It just wipes all the .META. entries and replaces
>>>> them with the .regioninfo files found in every region directory.
>>>>
>>>> Can you define what you mean by "corrupted". It's really an
>>>> overloaded-term.
>>>>
>>>> J-D
>>>>
>>>> On Thu, Jul 1, 2010 at 5:01 PM, Jinsong Hu <ji...@hotmail.com>
>>>> wrote:
>>>>>
>>>>> Hi, Jean:
>>>>>  Thanks! I will run the add_table.rb and see if it fixes the problem.
>>>>>  Our namenode is backed up with  HA and DRBD, and the hbase master
>>>>> machine
>>>>> colocates with name node , job tracker so we are not wasting resources.
>>>>>
>>>>>  The region hole probably comes from previous 0.20.4 hbase operation.
>>>>> the
>>>>> 0.20.4 hbase was
>>>>> very unstable during its operation. lots of times the master says the
>>>>> region
>>>>> is not there but actually
>>>>> the region server says it was serving the region.
>>>>>
>>>>>
>>>>> I followed the instruction and run commands like
>>>>>
>>>>> bin/hbase org.jruby.Main bin/add_table.rb /hbase/table_name
>>>>>
>>>>> After the execution, I found all my tables are corrupted and I can't
>>>>> use
>>>>> it
>>>>> any more. restarting hbase
>>>>> doesn't help either. I have to wipe out all the /hbase directory and
>>>>> start
>>>>> from scratch.
>>>>>
>>>>>
>>>>> it looks that the add_table.rb can corrupt the whole hbase.  Anyway, I
>>>>> am
>>>>> regenerating the data from
>>>>> scratch and let's see if it will work out.
>>>>>
>>>>> Jimmy.
>>>>>
>>>>>
>>>>> --------------------------------------------------
>>>>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>>>>> Sent: Thursday, July 01, 2010 2:17 PM
>>>>> To: <us...@hbase.apache.org>
>>>>> Subject: Re: dilemma of memory and CPU for hbase.
>>>>>
>>>>>> (taking the conversation back to the list after receiving logs and
>>>>>> heap
>>>>>> dump)
>>>>>>
>>>>>> The issue here is actually much more nasty than it seems. But before I
>>>>>> describe the problem, you said:
>>>>>>
>>>>>>>  I have 3 machines as hbase master (only 1 is active), 3 zookeepers.
>>>>>>> 8
>>>>>>> regionservers.
>>>>>>
>>>>>> If those are all distinct machines, you are wasting a lot of hardware.
>>>>>> Unless you have a HA Namenode (I highly doubt), then you already have
>>>>>> a SPOF there so you might as well put every service on that single
>>>>>> node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK
>>>>>> node, but unless you share the zookeeper ensemble between clusters
>>>>>> then losing the Namenode is as bad as losing ZK so might as well put
>>>>>> them together. At StumbleUpon we have 2-3 clusters using the same
>>>>>> ensembles, so it makes more sense to put them in a HA setup.
>>>>>>
>>>>>> That said, in your log I see:
>>>>>>
>>>>>> 2010-06-29 00:00:00,064 DEBUG
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>>>>> interrupted at index=0 because:Requested row out of range for HRegion
>>>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>>>> ...
>>>>>> 2010-06-29 12:26:13,352 DEBUG
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>>>>> interrupted at index=0 because:Requested row out of range for HRegion
>>>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>>>>
>>>>>> So for 12 hours (and probably more), the same row was requested almost
>>>>>> every 100ms but it was always failing on a WrongRegionException
>>>>>> (that's the name of what we see here). You probably use the write
>>>>>> buffer since you want to import as fast as possible, so all these
>>>>>> buffers are left unused after the clients terminate their RPC. That
>>>>>> rate of failed insertion must have kept your garbage collector _very_
>>>>>> busy, and at some point the JVM OOMEd. This is the stack from your
>>>>>> OOME:
>>>>>>
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
>>>>>>
>>>>>> This is where we deserialize client data, so it correlates with what I
>>>>>> just described.
>>>>>>
>>>>>> Now, this means that you probably have a hole (or more) in your .META.
>>>>>> table. It usually happens after a region server fails if it was
>>>>>> carrying it (since data loss is possible with that version of HDFS) or
>>>>>> if a bug in the master messes up the .META. region. Now 2 things:
>>>>>>
>>>>>> - It would be nice to know why you have a hole. Look at your .META.
>>>>>> table around the row in your region server log, you should see that
>>>>>> the start/end keys don't match. Then you can look in the master log
>>>>>> from yesterday to search for what went wrong, maybe see some
>>>>>> exceptions, or maybe a region server failed for any reason and it was
>>>>>> hosting .META.
>>>>>>
>>>>>> - You probably want to fix your table. Use the bin/add_table.rb
>>>>>> script (other people on this list used it in the past, search the
>>>>>> archive for more info).
>>>>>>
>>>>>> Finally (whew!), if you are still developing your solution around
>>>>>> HBase, you might want to try out one of our dev release that does work
>>>>>> with a durable Hadoop release. See
>>>>>> http://hbase.apache.org/docs/r0.89.20100621/ for more info. Cloudera's
>>>>>> CDH3b2 also has everything you need.
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans
>>>>>> <jd...@apache.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> 653 regions is very low, even if you had a total of 3 region servers
>>>>>>> I
>>>>>>> wouldn't expect any problem.
>>>>>>>
>>>>>>> So to me it seems to point towards either a configuration issue or a
>>>>>>> usage issue. Can you:
>>>>>>>
>>>>>>>  - Put the log of one region server that OOMEd on a public server.
>>>>>>>  - Tell us more about your setup: # of nodes, hardware, configuration
>>>>>>> file
>>>>>>>  - Tell us more about how you insert data into HBase
>>>>>>>
>>>>>>> And BTW are you trying to do an initial import of your data set? If
>>>>>>> so, have you considered using HFileOutputFormat?
>>>>>>>
>>>>>>> Thx,
>>>>>>>
>>>>>>> J-D
>>>>>>>
>>>>>>> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu <ji...@hotmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi, Sir:
>>>>>>>>  I am using hbase 0.20.5 and this morning I found that 3 of  my
>>>>>>>> region
>>>>>>>> server running out of memory.
>>>>>>>> the regionserver is given 6G memory each, and on average, I have 653
>>>>>>>> regions
>>>>>>>> in total. max store size
>>>>>>>> is 256M. I analyzed the dump and it shows that there are too many
>>>>>>>> HRegion in
>>>>>>>> memory.
>>>>>>>>
>>>>>>>>  Previously set max store size to 2G, but then I found the region
>>>>>>>> server
>>>>>>>> constantly does minor compaction and the CPU usage is very high, It
>>>>>>>> also
>>>>>>>> blocks the heavy client record insertion.
>>>>>>>>
>>>>>>>>  So now I am limited on one side by memory,  limited on another size
>>>>>>>> by
>>>>>>>> CPU.
>>>>>>>> Is there anyway to get out of this dilemma ?
>>>>>>>>
>>>>>>>> Jimmy.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: dilemma of memory and CPU for hbase.

Posted by Jinsong Hu <ji...@hotmail.com>.

yes, it terminated correctely. there is no exception while running the 
add_table.

are you saying that after restart, I need to wait for some time for 
the -ROOT- to
be assigned ? usually how long I need to wait ?

Jimmy

--------------------------------------------------
From: "Jean-Daniel Cryans" <jd...@apache.org>
Sent: Thursday, July 01, 2010 5:27 PM
To: <us...@hbase.apache.org>
Subject: Re: dilemma of memory and CPU for hbase.

> Did you see any exception when you ran add_table? Did it even
> terminated correctly?
>
> After a restart, the regions aren't readily available. If something
> blocked the master from assigning -ROOT-, it should be pretty evident
> by looking at the master log.
>
> J-D
>
> On Thu, Jul 1, 2010 at 5:23 PM, Jinsong Hu <ji...@hotmail.com> wrote:
>> After I run the add_table.rb, I  refreshed the master's UI page, and then
>> clicked on the table to show the regions. I expect that all regions will 
>> be
>> there.
>> But , I found that there are significantly fewer regions. Lots of regions
>> that was there before were gone.
>>
>> I then restarted the whole hbase master and region server. And now it is
>> even worse. the master UI page doesn't even load. saying the _ROOT region
>> is and .META is not served by any regionserver.  The whole cluster is not 
>> in
>> a usable state.
>>
>> That forced me to rename the /hbase to /hbase-0.20.4, and restart all 
>> hbase
>> master and regionservers. recreate all tables, etc.essentially starting
>> from scratch.
>>
>> Jimmy
>>
>> --------------------------------------------------
>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>> Sent: Thursday, July 01, 2010 5:10 PM
>> To: <us...@hbase.apache.org>
>> Subject: Re: dilemma of memory and CPU for hbase.
>>
>>> add_table.rb doesn't actually write much in the file system, all your
>>> data is still there. It just wipes all the .META. entries and replaces
>>> them with the .regioninfo files found in every region directory.
>>>
>>> Can you define what you mean by "corrupted". It's really an
>>> overloaded-term.
>>>
>>> J-D
>>>
>>> On Thu, Jul 1, 2010 at 5:01 PM, Jinsong Hu <ji...@hotmail.com> 
>>> wrote:
>>>>
>>>> Hi, Jean:
>>>>  Thanks! I will run the add_table.rb and see if it fixes the problem.
>>>>  Our namenode is backed up with  HA and DRBD, and the hbase master
>>>> machine
>>>> colocates with name node , job tracker so we are not wasting resources.
>>>>
>>>>  The region hole probably comes from previous 0.20.4 hbase operation. 
>>>> the
>>>> 0.20.4 hbase was
>>>> very unstable during its operation. lots of times the master says the
>>>> region
>>>> is not there but actually
>>>> the region server says it was serving the region.
>>>>
>>>>
>>>> I followed the instruction and run commands like
>>>>
>>>> bin/hbase org.jruby.Main bin/add_table.rb /hbase/table_name
>>>>
>>>> After the execution, I found all my tables are corrupted and I can't 
>>>> use
>>>> it
>>>> any more. restarting hbase
>>>> doesn't help either. I have to wipe out all the /hbase directory and
>>>> start
>>>> from scratch.
>>>>
>>>>
>>>> it looks that the add_table.rb can corrupt the whole hbase.  Anyway, I 
>>>> am
>>>> regenerating the data from
>>>> scratch and let's see if it will work out.
>>>>
>>>> Jimmy.
>>>>
>>>>
>>>> --------------------------------------------------
>>>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>>>> Sent: Thursday, July 01, 2010 2:17 PM
>>>> To: <us...@hbase.apache.org>
>>>> Subject: Re: dilemma of memory and CPU for hbase.
>>>>
>>>>> (taking the conversation back to the list after receiving logs and 
>>>>> heap
>>>>> dump)
>>>>>
>>>>> The issue here is actually much more nasty than it seems. But before I
>>>>> describe the problem, you said:
>>>>>
>>>>>>  I have 3 machines as hbase master (only 1 is active), 3 zookeepers. 
>>>>>> 8
>>>>>> regionservers.
>>>>>
>>>>> If those are all distinct machines, you are wasting a lot of hardware.
>>>>> Unless you have a HA Namenode (I highly doubt), then you already have
>>>>> a SPOF there so you might as well put every service on that single
>>>>> node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK
>>>>> node, but unless you share the zookeeper ensemble between clusters
>>>>> then losing the Namenode is as bad as losing ZK so might as well put
>>>>> them together. At StumbleUpon we have 2-3 clusters using the same
>>>>> ensembles, so it makes more sense to put them in a HA setup.
>>>>>
>>>>> That said, in your log I see:
>>>>>
>>>>> 2010-06-29 00:00:00,064 DEBUG
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>>>> interrupted at index=0 because:Requested row out of range for HRegion
>>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>>> ...
>>>>> 2010-06-29 12:26:13,352 DEBUG
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>>>> interrupted at index=0 because:Requested row out of range for HRegion
>>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>>>
>>>>> So for 12 hours (and probably more), the same row was requested almost
>>>>> every 100ms but it was always failing on a WrongRegionException
>>>>> (that's the name of what we see here). You probably use the write
>>>>> buffer since you want to import as fast as possible, so all these
>>>>> buffers are left unused after the clients terminate their RPC. That
>>>>> rate of failed insertion must have kept your garbage collector _very_
>>>>> busy, and at some point the JVM OOMEd. This is the stack from your
>>>>> OOME:
>>>>>
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>> at
>>>>>
>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175)
>>>>> at
>>>>>
>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867)
>>>>> at
>>>>>
>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835)
>>>>> at
>>>>>
>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
>>>>> at
>>>>>
>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
>>>>>
>>>>> This is where we deserialize client data, so it correlates with what I
>>>>> just described.
>>>>>
>>>>> Now, this means that you probably have a hole (or more) in your .META.
>>>>> table. It usually happens after a region server fails if it was
>>>>> carrying it (since data loss is possible with that version of HDFS) or
>>>>> if a bug in the master messes up the .META. region. Now 2 things:
>>>>>
>>>>> - It would be nice to know why you have a hole. Look at your .META.
>>>>> table around the row in your region server log, you should see that
>>>>> the start/end keys don't match. Then you can look in the master log
>>>>> from yesterday to search for what went wrong, maybe see some
>>>>> exceptions, or maybe a region server failed for any reason and it was
>>>>> hosting .META.
>>>>>
>>>>> - You probably want to fix your table. Use the bin/add_table.rb
>>>>> script (other people on this list used it in the past, search the
>>>>> archive for more info).
>>>>>
>>>>> Finally (whew!), if you are still developing your solution around
>>>>> HBase, you might want to try out one of our dev release that does work
>>>>> with a durable Hadoop release. See
>>>>> http://hbase.apache.org/docs/r0.89.20100621/ for more info. Cloudera's
>>>>> CDH3b2 also has everything you need.
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans
>>>>> <jd...@apache.org>
>>>>> wrote:
>>>>>>
>>>>>> 653 regions is very low, even if you had a total of 3 region servers 
>>>>>> I
>>>>>> wouldn't expect any problem.
>>>>>>
>>>>>> So to me it seems to point towards either a configuration issue or a
>>>>>> usage issue. Can you:
>>>>>>
>>>>>>  - Put the log of one region server that OOMEd on a public server.
>>>>>>  - Tell us more about your setup: # of nodes, hardware, configuration
>>>>>> file
>>>>>>  - Tell us more about how you insert data into HBase
>>>>>>
>>>>>> And BTW are you trying to do an initial import of your data set? If
>>>>>> so, have you considered using HFileOutputFormat?
>>>>>>
>>>>>> Thx,
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu <ji...@hotmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi, Sir:
>>>>>>>  I am using hbase 0.20.5 and this morning I found that 3 of  my 
>>>>>>> region
>>>>>>> server running out of memory.
>>>>>>> the regionserver is given 6G memory each, and on average, I have 653
>>>>>>> regions
>>>>>>> in total. max store size
>>>>>>> is 256M. I analyzed the dump and it shows that there are too many
>>>>>>> HRegion in
>>>>>>> memory.
>>>>>>>
>>>>>>>  Previously set max store size to 2G, but then I found the region
>>>>>>> server
>>>>>>> constantly does minor compaction and the CPU usage is very high, It
>>>>>>> also
>>>>>>> blocks the heavy client record insertion.
>>>>>>>
>>>>>>>  So now I am limited on one side by memory,  limited on another size
>>>>>>> by
>>>>>>> CPU.
>>>>>>> Is there anyway to get out of this dilemma ?
>>>>>>>
>>>>>>> Jimmy.
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: dilemma of memory and CPU for hbase.

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Did you see any exception when you ran add_table? Did it even
terminated correctly?

After a restart, the regions aren't readily available. If something
blocked the master from assigning -ROOT-, it should be pretty evident
by looking at the master log.

J-D

On Thu, Jul 1, 2010 at 5:23 PM, Jinsong Hu <ji...@hotmail.com> wrote:
> After I run the add_table.rb, I  refreshed the master's UI page, and then
> clicked on the table to show the regions. I expect that all regions will be
> there.
> But , I found that there are significantly fewer regions. Lots of regions
> that was there before were gone.
>
> I then restarted the whole hbase master and region server. And now it is
> even worse. the master UI page doesn't even load. saying the _ROOT region
> is and .META is not served by any regionserver.  The whole cluster is not in
> a usable state.
>
> That forced me to rename the /hbase to /hbase-0.20.4, and restart all hbase
> master and regionservers. recreate all tables, etc.essentially starting
> from scratch.
>
> Jimmy
>
> --------------------------------------------------
> From: "Jean-Daniel Cryans" <jd...@apache.org>
> Sent: Thursday, July 01, 2010 5:10 PM
> To: <us...@hbase.apache.org>
> Subject: Re: dilemma of memory and CPU for hbase.
>
>> add_table.rb doesn't actually write much in the file system, all your
>> data is still there. It just wipes all the .META. entries and replaces
>> them with the .regioninfo files found in every region directory.
>>
>> Can you define what you mean by "corrupted". It's really an
>> overloaded-term.
>>
>> J-D
>>
>> On Thu, Jul 1, 2010 at 5:01 PM, Jinsong Hu <ji...@hotmail.com> wrote:
>>>
>>> Hi, Jean:
>>>  Thanks! I will run the add_table.rb and see if it fixes the problem.
>>>  Our namenode is backed up with  HA and DRBD, and the hbase master
>>> machine
>>> colocates with name node , job tracker so we are not wasting resources.
>>>
>>>  The region hole probably comes from previous 0.20.4 hbase operation. the
>>> 0.20.4 hbase was
>>> very unstable during its operation. lots of times the master says the
>>> region
>>> is not there but actually
>>> the region server says it was serving the region.
>>>
>>>
>>> I followed the instruction and run commands like
>>>
>>> bin/hbase org.jruby.Main bin/add_table.rb /hbase/table_name
>>>
>>> After the execution, I found all my tables are corrupted and I can't use
>>> it
>>> any more. restarting hbase
>>> doesn't help either. I have to wipe out all the /hbase directory and
>>> start
>>> from scratch.
>>>
>>>
>>> it looks that the add_table.rb can corrupt the whole hbase.  Anyway, I am
>>> regenerating the data from
>>> scratch and let's see if it will work out.
>>>
>>> Jimmy.
>>>
>>>
>>> --------------------------------------------------
>>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>>> Sent: Thursday, July 01, 2010 2:17 PM
>>> To: <us...@hbase.apache.org>
>>> Subject: Re: dilemma of memory and CPU for hbase.
>>>
>>>> (taking the conversation back to the list after receiving logs and heap
>>>> dump)
>>>>
>>>> The issue here is actually much more nasty than it seems. But before I
>>>> describe the problem, you said:
>>>>
>>>>>  I have 3 machines as hbase master (only 1 is active), 3 zookeepers. 8
>>>>> regionservers.
>>>>
>>>> If those are all distinct machines, you are wasting a lot of hardware.
>>>> Unless you have a HA Namenode (I highly doubt), then you already have
>>>> a SPOF there so you might as well put every service on that single
>>>> node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK
>>>> node, but unless you share the zookeeper ensemble between clusters
>>>> then losing the Namenode is as bad as losing ZK so might as well put
>>>> them together. At StumbleUpon we have 2-3 clusters using the same
>>>> ensembles, so it makes more sense to put them in a HA setup.
>>>>
>>>> That said, in your log I see:
>>>>
>>>> 2010-06-29 00:00:00,064 DEBUG
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>>> interrupted at index=0 because:Requested row out of range for HRegion
>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>> ...
>>>> 2010-06-29 12:26:13,352 DEBUG
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>>> interrupted at index=0 because:Requested row out of range for HRegion
>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>>
>>>> So for 12 hours (and probably more), the same row was requested almost
>>>> every 100ms but it was always failing on a WrongRegionException
>>>> (that's the name of what we see here). You probably use the write
>>>> buffer since you want to import as fast as possible, so all these
>>>> buffers are left unused after the clients terminate their RPC. That
>>>> rate of failed insertion must have kept your garbage collector _very_
>>>> busy, and at some point the JVM OOMEd. This is the stack from your
>>>> OOME:
>>>>
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175)
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867)
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835)
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
>>>>
>>>> This is where we deserialize client data, so it correlates with what I
>>>> just described.
>>>>
>>>> Now, this means that you probably have a hole (or more) in your .META.
>>>> table. It usually happens after a region server fails if it was
>>>> carrying it (since data loss is possible with that version of HDFS) or
>>>> if a bug in the master messes up the .META. region. Now 2 things:
>>>>
>>>> - It would be nice to know why you have a hole. Look at your .META.
>>>> table around the row in your region server log, you should see that
>>>> the start/end keys don't match. Then you can look in the master log
>>>> from yesterday to search for what went wrong, maybe see some
>>>> exceptions, or maybe a region server failed for any reason and it was
>>>> hosting .META.
>>>>
>>>> - You probably want to fix your table. Use the bin/add_table.rb
>>>> script (other people on this list used it in the past, search the
>>>> archive for more info).
>>>>
>>>> Finally (whew!), if you are still developing your solution around
>>>> HBase, you might want to try out one of our dev release that does work
>>>> with a durable Hadoop release. See
>>>> http://hbase.apache.org/docs/r0.89.20100621/ for more info. Cloudera's
>>>> CDH3b2 also has everything you need.
>>>>
>>>> J-D
>>>>
>>>> On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans
>>>> <jd...@apache.org>
>>>> wrote:
>>>>>
>>>>> 653 regions is very low, even if you had a total of 3 region servers I
>>>>> wouldn't expect any problem.
>>>>>
>>>>> So to me it seems to point towards either a configuration issue or a
>>>>> usage issue. Can you:
>>>>>
>>>>>  - Put the log of one region server that OOMEd on a public server.
>>>>>  - Tell us more about your setup: # of nodes, hardware, configuration
>>>>> file
>>>>>  - Tell us more about how you insert data into HBase
>>>>>
>>>>> And BTW are you trying to do an initial import of your data set? If
>>>>> so, have you considered using HFileOutputFormat?
>>>>>
>>>>> Thx,
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu <ji...@hotmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi, Sir:
>>>>>>  I am using hbase 0.20.5 and this morning I found that 3 of  my region
>>>>>> server running out of memory.
>>>>>> the regionserver is given 6G memory each, and on average, I have 653
>>>>>> regions
>>>>>> in total. max store size
>>>>>> is 256M. I analyzed the dump and it shows that there are too many
>>>>>> HRegion in
>>>>>> memory.
>>>>>>
>>>>>>  Previously set max store size to 2G, but then I found the region
>>>>>> server
>>>>>> constantly does minor compaction and the CPU usage is very high, It
>>>>>> also
>>>>>> blocks the heavy client record insertion.
>>>>>>
>>>>>>  So now I am limited on one side by memory,  limited on another size
>>>>>> by
>>>>>> CPU.
>>>>>> Is there anyway to get out of this dilemma ?
>>>>>>
>>>>>> Jimmy.
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: dilemma of memory and CPU for hbase.

Posted by Jinsong Hu <ji...@hotmail.com>.

After I run the add_table.rb, I  refreshed the master's UI page, and then 
clicked on the table to show the regions. I expect that all regions will be 
there.
But , I found that there are significantly fewer regions. Lots of regions 
that was there before were gone.

I then restarted the whole hbase master and region server. And now it is 
even worse. the master UI page doesn't even load. saying the _ROOT region
is and .META is not served by any regionserver.  The whole cluster is not in 
a usable state.

That forced me to rename the /hbase to /hbase-0.20.4, and restart all hbase 
master and regionservers. recreate all tables, etc.essentially starting
from scratch.

Jimmy

--------------------------------------------------
From: "Jean-Daniel Cryans" <jd...@apache.org>
Sent: Thursday, July 01, 2010 5:10 PM
To: <us...@hbase.apache.org>
Subject: Re: dilemma of memory and CPU for hbase.

> add_table.rb doesn't actually write much in the file system, all your
> data is still there. It just wipes all the .META. entries and replaces
> them with the .regioninfo files found in every region directory.
>
> Can you define what you mean by "corrupted". It's really an 
> overloaded-term.
>
> J-D
>
> On Thu, Jul 1, 2010 at 5:01 PM, Jinsong Hu <ji...@hotmail.com> wrote:
>> Hi, Jean:
>>  Thanks! I will run the add_table.rb and see if it fixes the problem.
>>  Our namenode is backed up with  HA and DRBD, and the hbase master 
>> machine
>> colocates with name node , job tracker so we are not wasting resources.
>>
>>  The region hole probably comes from previous 0.20.4 hbase operation. the
>> 0.20.4 hbase was
>> very unstable during its operation. lots of times the master says the 
>> region
>> is not there but actually
>> the region server says it was serving the region.
>>
>>
>> I followed the instruction and run commands like
>>
>> bin/hbase org.jruby.Main bin/add_table.rb /hbase/table_name
>>
>> After the execution, I found all my tables are corrupted and I can't use 
>> it
>> any more. restarting hbase
>> doesn't help either. I have to wipe out all the /hbase directory and 
>> start
>> from scratch.
>>
>>
>> it looks that the add_table.rb can corrupt the whole hbase.  Anyway, I am
>> regenerating the data from
>> scratch and let's see if it will work out.
>>
>> Jimmy.
>>
>>
>> --------------------------------------------------
>> From: "Jean-Daniel Cryans" <jd...@apache.org>
>> Sent: Thursday, July 01, 2010 2:17 PM
>> To: <us...@hbase.apache.org>
>> Subject: Re: dilemma of memory and CPU for hbase.
>>
>>> (taking the conversation back to the list after receiving logs and heap
>>> dump)
>>>
>>> The issue here is actually much more nasty than it seems. But before I
>>> describe the problem, you said:
>>>
>>>>  I have 3 machines as hbase master (only 1 is active), 3 zookeepers. 8
>>>> regionservers.
>>>
>>> If those are all distinct machines, you are wasting a lot of hardware.
>>> Unless you have a HA Namenode (I highly doubt), then you already have
>>> a SPOF there so you might as well put every service on that single
>>> node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK
>>> node, but unless you share the zookeeper ensemble between clusters
>>> then losing the Namenode is as bad as losing ZK so might as well put
>>> them together. At StumbleUpon we have 2-3 clusters using the same
>>> ensembles, so it makes more sense to put them in a HA setup.
>>>
>>> That said, in your log I see:
>>>
>>> 2010-06-29 00:00:00,064 DEBUG
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>> interrupted at index=0 because:Requested row out of range for HRegion
>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>> ...
>>> 2010-06-29 12:26:13,352 DEBUG
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>>> interrupted at index=0 because:Requested row out of range for HRegion
>>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>>
>>> So for 12 hours (and probably more), the same row was requested almost
>>> every 100ms but it was always failing on a WrongRegionException
>>> (that's the name of what we see here). You probably use the write
>>> buffer since you want to import as fast as possible, so all these
>>> buffers are left unused after the clients terminate their RPC. That
>>> rate of failed insertion must have kept your garbage collector _very_
>>> busy, and at some point the JVM OOMEd. This is the stack from your
>>> OOME:
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175)
>>> at
>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867)
>>> at
>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835)
>>> at
>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
>>> at
>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
>>>
>>> This is where we deserialize client data, so it correlates with what I
>>> just described.
>>>
>>> Now, this means that you probably have a hole (or more) in your .META.
>>> table. It usually happens after a region server fails if it was
>>> carrying it (since data loss is possible with that version of HDFS) or
>>> if a bug in the master messes up the .META. region. Now 2 things:
>>>
>>> - It would be nice to know why you have a hole. Look at your .META.
>>> table around the row in your region server log, you should see that
>>> the start/end keys don't match. Then you can look in the master log
>>> from yesterday to search for what went wrong, maybe see some
>>> exceptions, or maybe a region server failed for any reason and it was
>>> hosting .META.
>>>
>>> - You probably want to fix your table. Use the bin/add_table.rb
>>> script (other people on this list used it in the past, search the
>>> archive for more info).
>>>
>>> Finally (whew!), if you are still developing your solution around
>>> HBase, you might want to try out one of our dev release that does work
>>> with a durable Hadoop release. See
>>> http://hbase.apache.org/docs/r0.89.20100621/ for more info. Cloudera's
>>> CDH3b2 also has everything you need.
>>>
>>> J-D
>>>
>>> On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans 
>>> <jd...@apache.org>
>>> wrote:
>>>>
>>>> 653 regions is very low, even if you had a total of 3 region servers I
>>>> wouldn't expect any problem.
>>>>
>>>> So to me it seems to point towards either a configuration issue or a
>>>> usage issue. Can you:
>>>>
>>>>  - Put the log of one region server that OOMEd on a public server.
>>>>  - Tell us more about your setup: # of nodes, hardware, configuration
>>>> file
>>>>  - Tell us more about how you insert data into HBase
>>>>
>>>> And BTW are you trying to do an initial import of your data set? If
>>>> so, have you considered using HFileOutputFormat?
>>>>
>>>> Thx,
>>>>
>>>> J-D
>>>>
>>>> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu <ji...@hotmail.com>
>>>> wrote:
>>>>>
>>>>> Hi, Sir:
>>>>>  I am using hbase 0.20.5 and this morning I found that 3 of  my region
>>>>> server running out of memory.
>>>>> the regionserver is given 6G memory each, and on average, I have 653
>>>>> regions
>>>>> in total. max store size
>>>>> is 256M. I analyzed the dump and it shows that there are too many
>>>>> HRegion in
>>>>> memory.
>>>>>
>>>>>  Previously set max store size to 2G, but then I found the region 
>>>>> server
>>>>> constantly does minor compaction and the CPU usage is very high, It 
>>>>> also
>>>>> blocks the heavy client record insertion.
>>>>>
>>>>>  So now I am limited on one side by memory,  limited on another size 
>>>>> by
>>>>> CPU.
>>>>> Is there anyway to get out of this dilemma ?
>>>>>
>>>>> Jimmy.
>>>>>
>>>>
>>>
>>
>

Re: dilemma of memory and CPU for hbase.

Posted by Jean-Daniel Cryans <jd...@apache.org>.

add_table.rb doesn't actually write much in the file system, all your
data is still there. It just wipes all the .META. entries and replaces
them with the .regioninfo files found in every region directory.

Can you define what you mean by "corrupted". It's really an overloaded-term.

J-D

On Thu, Jul 1, 2010 at 5:01 PM, Jinsong Hu <ji...@hotmail.com> wrote:
> Hi, Jean:
>  Thanks! I will run the add_table.rb and see if it fixes the problem.
>  Our namenode is backed up with  HA and DRBD, and the hbase master machine
> colocates with name node , job tracker so we are not wasting resources.
>
>  The region hole probably comes from previous 0.20.4 hbase operation. the
> 0.20.4 hbase was
> very unstable during its operation. lots of times the master says the region
> is not there but actually
> the region server says it was serving the region.
>
>
> I followed the instruction and run commands like
>
> bin/hbase org.jruby.Main bin/add_table.rb /hbase/table_name
>
> After the execution, I found all my tables are corrupted and I can't use it
> any more. restarting hbase
> doesn't help either. I have to wipe out all the /hbase directory and start
> from scratch.
>
>
> it looks that the add_table.rb can corrupt the whole hbase.  Anyway, I am
> regenerating the data from
> scratch and let's see if it will work out.
>
> Jimmy.
>
>
> --------------------------------------------------
> From: "Jean-Daniel Cryans" <jd...@apache.org>
> Sent: Thursday, July 01, 2010 2:17 PM
> To: <us...@hbase.apache.org>
> Subject: Re: dilemma of memory and CPU for hbase.
>
>> (taking the conversation back to the list after receiving logs and heap
>> dump)
>>
>> The issue here is actually much more nasty than it seems. But before I
>> describe the problem, you said:
>>
>>>  I have 3 machines as hbase master (only 1 is active), 3 zookeepers. 8
>>> regionservers.
>>
>> If those are all distinct machines, you are wasting a lot of hardware.
>> Unless you have a HA Namenode (I highly doubt), then you already have
>> a SPOF there so you might as well put every service on that single
>> node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK
>> node, but unless you share the zookeeper ensemble between clusters
>> then losing the Namenode is as bad as losing ZK so might as well put
>> them together. At StumbleUpon we have 2-3 clusters using the same
>> ensembles, so it makes more sense to put them in a HA setup.
>>
>> That said, in your log I see:
>>
>> 2010-06-29 00:00:00,064 DEBUG
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>> interrupted at index=0 because:Requested row out of range for HRegion
>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>> ...
>> 2010-06-29 12:26:13,352 DEBUG
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
>> interrupted at index=0 because:Requested row out of range for HRegion
>> Spam_MsgEventTable,2010-06-28 11:34:02blah
>>
>> So for 12 hours (and probably more), the same row was requested almost
>> every 100ms but it was always failing on a WrongRegionException
>> (that's the name of what we see here). You probably use the write
>> buffer since you want to import as fast as possible, so all these
>> buffers are left unused after the clients terminate their RPC. That
>> rate of failed insertion must have kept your garbage collector _very_
>> busy, and at some point the JVM OOMEd. This is the stack from your
>> OOME:
>>
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175)
>> at
>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867)
>> at
>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835)
>> at
>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
>> at
>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
>>
>> This is where we deserialize client data, so it correlates with what I
>> just described.
>>
>> Now, this means that you probably have a hole (or more) in your .META.
>> table. It usually happens after a region server fails if it was
>> carrying it (since data loss is possible with that version of HDFS) or
>> if a bug in the master messes up the .META. region. Now 2 things:
>>
>> - It would be nice to know why you have a hole. Look at your .META.
>> table around the row in your region server log, you should see that
>> the start/end keys don't match. Then you can look in the master log
>> from yesterday to search for what went wrong, maybe see some
>> exceptions, or maybe a region server failed for any reason and it was
>> hosting .META.
>>
>> - You probably want to fix your table. Use the bin/add_table.rb
>> script (other people on this list used it in the past, search the
>> archive for more info).
>>
>> Finally (whew!), if you are still developing your solution around
>> HBase, you might want to try out one of our dev release that does work
>> with a durable Hadoop release. See
>> http://hbase.apache.org/docs/r0.89.20100621/ for more info. Cloudera's
>> CDH3b2 also has everything you need.
>>
>> J-D
>>
>> On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>>>
>>> 653 regions is very low, even if you had a total of 3 region servers I
>>> wouldn't expect any problem.
>>>
>>> So to me it seems to point towards either a configuration issue or a
>>> usage issue. Can you:
>>>
>>>  - Put the log of one region server that OOMEd on a public server.
>>>  - Tell us more about your setup: # of nodes, hardware, configuration
>>> file
>>>  - Tell us more about how you insert data into HBase
>>>
>>> And BTW are you trying to do an initial import of your data set? If
>>> so, have you considered using HFileOutputFormat?
>>>
>>> Thx,
>>>
>>> J-D
>>>
>>> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu <ji...@hotmail.com>
>>> wrote:
>>>>
>>>> Hi, Sir:
>>>>  I am using hbase 0.20.5 and this morning I found that 3 of  my region
>>>> server running out of memory.
>>>> the regionserver is given 6G memory each, and on average, I have 653
>>>> regions
>>>> in total. max store size
>>>> is 256M. I analyzed the dump and it shows that there are too many
>>>> HRegion in
>>>> memory.
>>>>
>>>>  Previously set max store size to 2G, but then I found the region server
>>>> constantly does minor compaction and the CPU usage is very high, It also
>>>> blocks the heavy client record insertion.
>>>>
>>>>  So now I am limited on one side by memory,  limited on another size by
>>>> CPU.
>>>> Is there anyway to get out of this dilemma ?
>>>>
>>>> Jimmy.
>>>>
>>>
>>
>

Re: dilemma of memory and CPU for hbase.

Posted by Jinsong Hu <ji...@hotmail.com>.

Hi, Jean:
  Thanks! I will run the add_table.rb and see if it fixes the problem.
  Our namenode is backed up with  HA and DRBD, and the hbase master machine 
colocates with name node , job tracker so we are not wasting resources.

  The region hole probably comes from previous 0.20.4 hbase operation. the 
0.20.4 hbase was
very unstable during its operation. lots of times the master says the region 
is not there but actually
the region server says it was serving the region.


I followed the instruction and run commands like

bin/hbase org.jruby.Main bin/add_table.rb /hbase/table_name

After the execution, I found all my tables are corrupted and I can't use it 
any more. restarting hbase
doesn't help either. I have to wipe out all the /hbase directory and start 
from scratch.


it looks that the add_table.rb can corrupt the whole hbase.  Anyway, I am 
regenerating the data from
scratch and let's see if it will work out.

Jimmy.


--------------------------------------------------
From: "Jean-Daniel Cryans" <jd...@apache.org>
Sent: Thursday, July 01, 2010 2:17 PM
To: <us...@hbase.apache.org>
Subject: Re: dilemma of memory and CPU for hbase.

> (taking the conversation back to the list after receiving logs and heap 
> dump)
>
> The issue here is actually much more nasty than it seems. But before I
> describe the problem, you said:
>
>>  I have 3 machines as hbase master (only 1 is active), 3 zookeepers. 8
>> regionservers.
>
> If those are all distinct machines, you are wasting a lot of hardware.
> Unless you have a HA Namenode (I highly doubt), then you already have
> a SPOF there so you might as well put every service on that single
> node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK
> node, but unless you share the zookeeper ensemble between clusters
> then losing the Namenode is as bad as losing ZK so might as well put
> them together. At StumbleUpon we have 2-3 clusters using the same
> ensembles, so it makes more sense to put them in a HA setup.
>
> That said, in your log I see:
>
> 2010-06-29 00:00:00,064 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
> interrupted at index=0 because:Requested row out of range for HRegion
> Spam_MsgEventTable,2010-06-28 11:34:02blah
> ...
> 2010-06-29 12:26:13,352 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
> interrupted at index=0 because:Requested row out of range for HRegion
> Spam_MsgEventTable,2010-06-28 11:34:02blah
>
> So for 12 hours (and probably more), the same row was requested almost
> every 100ms but it was always failing on a WrongRegionException
> (that's the name of what we see here). You probably use the write
> buffer since you want to import as fast as possible, so all these
> buffers are left unused after the clients terminate their RPC. That
> rate of failed insertion must have kept your garbage collector _very_
> busy, and at some point the JVM OOMEd. This is the stack from your
> OOME:
>
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)
>
> This is where we deserialize client data, so it correlates with what I
> just described.
>
> Now, this means that you probably have a hole (or more) in your .META.
> table. It usually happens after a region server fails if it was
> carrying it (since data loss is possible with that version of HDFS) or
> if a bug in the master messes up the .META. region. Now 2 things:
>
> - It would be nice to know why you have a hole. Look at your .META.
> table around the row in your region server log, you should see that
> the start/end keys don't match. Then you can look in the master log
> from yesterday to search for what went wrong, maybe see some
> exceptions, or maybe a region server failed for any reason and it was
> hosting .META.
>
> - You probably want to fix your table. Use the bin/add_table.rb
> script (other people on this list used it in the past, search the
> archive for more info).
>
> Finally (whew!), if you are still developing your solution around
> HBase, you might want to try out one of our dev release that does work
> with a durable Hadoop release. See
> http://hbase.apache.org/docs/r0.89.20100621/ for more info. Cloudera's
> CDH3b2 also has everything you need.
>
> J-D
>
> On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans <jd...@apache.org> 
> wrote:
>> 653 regions is very low, even if you had a total of 3 region servers I
>> wouldn't expect any problem.
>>
>> So to me it seems to point towards either a configuration issue or a
>> usage issue. Can you:
>>
>>  - Put the log of one region server that OOMEd on a public server.
>>  - Tell us more about your setup: # of nodes, hardware, configuration 
>> file
>>  - Tell us more about how you insert data into HBase
>>
>> And BTW are you trying to do an initial import of your data set? If
>> so, have you considered using HFileOutputFormat?
>>
>> Thx,
>>
>> J-D
>>
>> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu <ji...@hotmail.com> 
>> wrote:
>>> Hi, Sir:
>>>  I am using hbase 0.20.5 and this morning I found that 3 of  my region
>>> server running out of memory.
>>> the regionserver is given 6G memory each, and on average, I have 653 
>>> regions
>>> in total. max store size
>>> is 256M. I analyzed the dump and it shows that there are too many 
>>> HRegion in
>>> memory.
>>>
>>>  Previously set max store size to 2G, but then I found the region server
>>> constantly does minor compaction and the CPU usage is very high, It also
>>> blocks the heavy client record insertion.
>>>
>>>  So now I am limited on one side by memory,  limited on another size by 
>>> CPU.
>>> Is there anyway to get out of this dilemma ?
>>>
>>> Jimmy.
>>>
>>
>

Re: dilemma of memory and CPU for hbase.

Posted by Jean-Daniel Cryans <jd...@apache.org>.

(taking the conversation back to the list after receiving logs and heap dump)

The issue here is actually much more nasty than it seems. But before I
describe the problem, you said:

>  I have 3 machines as hbase master (only 1 is active), 3 zookeepers. 8
> regionservers.

If those are all distinct machines, you are wasting a lot of hardware.
Unless you have a HA Namenode (I highly doubt), then you already have
a SPOF there so you might as well put every service on that single
node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK
node, but unless you share the zookeeper ensemble between clusters
then losing the Namenode is as bad as losing ZK so might as well put
them together. At StumbleUpon we have 2-3 clusters using the same
ensembles, so it makes more sense to put them in a HA setup.

That said, in your log I see:

2010-06-29 00:00:00,064 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
interrupted at index=0 because:Requested row out of range for HRegion
Spam_MsgEventTable,2010-06-28 11:34:02blah
...
2010-06-29 12:26:13,352 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts
interrupted at index=0 because:Requested row out of range for HRegion
Spam_MsgEventTable,2010-06-28 11:34:02blah

So for 12 hours (and probably more), the same row was requested almost
every 100ms but it was always failing on a WrongRegionException
(that's the name of what we see here). You probably use the write
buffer since you want to import as fast as possible, so all these
buffers are left unused after the clients terminate their RPC. That
rate of failed insertion must have kept your garbage collector _very_
busy, and at some point the JVM OOMEd. This is the stack from your
OOME:

java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318)

This is where we deserialize client data, so it correlates with what I
just described.

Now, this means that you probably have a hole (or more) in your .META.
table. It usually happens after a region server fails if it was
carrying it (since data loss is possible with that version of HDFS) or
if a bug in the master messes up the .META. region. Now 2 things:

 - It would be nice to know why you have a hole. Look at your .META.
table around the row in your region server log, you should see that
the start/end keys don't match. Then you can look in the master log
from yesterday to search for what went wrong, maybe see some
exceptions, or maybe a region server failed for any reason and it was
hosting .META.

 - You probably want to fix your table. Use the bin/add_table.rb
script (other people on this list used it in the past, search the
archive for more info).

Finally (whew!), if you are still developing your solution around
HBase, you might want to try out one of our dev release that does work
with a durable Hadoop release. See
http://hbase.apache.org/docs/r0.89.20100621/ for more info. Cloudera's
CDH3b2 also has everything you need.

J-D

On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> 653 regions is very low, even if you had a total of 3 region servers I
> wouldn't expect any problem.
>
> So to me it seems to point towards either a configuration issue or a
> usage issue. Can you:
>
>  - Put the log of one region server that OOMEd on a public server.
>  - Tell us more about your setup: # of nodes, hardware, configuration file
>  - Tell us more about how you insert data into HBase
>
> And BTW are you trying to do an initial import of your data set? If
> so, have you considered using HFileOutputFormat?
>
> Thx,
>
> J-D
>
> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu <ji...@hotmail.com> wrote:
>> Hi, Sir:
>>  I am using hbase 0.20.5 and this morning I found that 3 of  my region
>> server running out of memory.
>> the regionserver is given 6G memory each, and on average, I have 653 regions
>> in total. max store size
>> is 256M. I analyzed the dump and it shows that there are too many HRegion in
>> memory.
>>
>>  Previously set max store size to 2G, but then I found the region server
>> constantly does minor compaction and the CPU usage is very high, It also
>> blocks the heavy client record insertion.
>>
>>  So now I am limited on one side by memory,  limited on another size by CPU.
>> Is there anyway to get out of this dilemma ?
>>
>> Jimmy.
>>
>

Re: dilemma of memory and CPU for hbase.

Posted by Jean-Daniel Cryans <jd...@apache.org>.

653 regions is very low, even if you had a total of 3 region servers I
wouldn't expect any problem.

So to me it seems to point towards either a configuration issue or a
usage issue. Can you:

 - Put the log of one region server that OOMEd on a public server.
 - Tell us more about your setup: # of nodes, hardware, configuration file
 - Tell us more about how you insert data into HBase

And BTW are you trying to do an initial import of your data set? If
so, have you considered using HFileOutputFormat?

Thx,

J-D

On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu <ji...@hotmail.com> wrote:
> Hi, Sir:
>  I am using hbase 0.20.5 and this morning I found that 3 of  my region
> server running out of memory.
> the regionserver is given 6G memory each, and on average, I have 653 regions
> in total. max store size
> is 256M. I analyzed the dump and it shows that there are too many HRegion in
> memory.
>
>  Previously set max store size to 2G, but then I found the region server
> constantly does minor compaction and the CPU usage is very high, It also
> blocks the heavy client record insertion.
>
>  So now I am limited on one side by memory,  limited on another size by CPU.
> Is there anyway to get out of this dilemma ?
>
> Jimmy.
>

dilemma of memory and CPU for hbase.

Posted by Jinsong Hu <ji...@hotmail.com>.

Hi, Sir:
  I am using hbase 0.20.5 and this morning I found that 3 of  my region 
server running out of memory.
the regionserver is given 6G memory each, and on average, I have 653 regions 
in total. max store size
is 256M. I analyzed the dump and it shows that there are too many HRegion in 
memory.

  Previously set max store size to 2G, but then I found the region server 
constantly does minor compaction and the CPU usage is very high, It also 
blocks the heavy client record insertion.

  So now I am limited on one side by memory,  limited on another size by 
CPU. Is there anyway to get out of this dilemma ?

Jimmy.

Re: HBase 0.20.5 issues

Posted by Jean-Daniel Cryans <jd...@apache.org>.

I removed that part of the code in
http://svn.apache.org/viewvc/hbase/branches/0.20/src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?r1=948360&r2=948631

So you don't need to apply that hunk :)

J-D

On Tue, Jul 6, 2010 at 5:14 PM, Ted Yu <yu...@gmail.com> wrote:

> Here is the output from patch:
>
> tyumac:hbase-0.20.5 tyu$ patch -p0 --dry-run < 2599-0.20.txt
> patching file src/java/org/apache/hadoop/hbase/ClusterStatus.java
> patching file
> src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> Hunk #1 FAILED at 520.
> Hunk #2 succeeded at 616 (offset -28 lines).
> Hunk #3 succeeded at 638 (offset -28 lines).
> Hunk #4 succeeded at 1037 (offset -11 lines).
> Hunk #5 succeeded at 1173 (offset -9 lines).
> Hunk #6 succeeded at 1335 (offset -9 lines).
> Hunk #7 succeeded at 1358 (offset -9 lines).
> 1 out of 7 hunks FAILED -- saving rejects to file
> src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java.rej
> patching file src/java/org/apache/hadoop/hbase/HServerInfo.java
> patching file src/java/org/apache/hadoop/hbase/master/ServerManager.java
> Hunk #2 succeeded at 73 (offset -2 lines).
> Hunk #3 succeeded at 105 (offset -2 lines).
> Hunk #4 succeeded at 215 (offset -2 lines).
> Hunk #5 succeeded at 309 (offset -2 lines).
> Hunk #6 succeeded at 332 (offset -2 lines).
> Hunk #7 succeeded at 391 (offset -2 lines).
> Hunk #8 succeeded at 597 (offset -2 lines).
> Hunk #9 succeeded at 686 (offset -2 lines).
> Hunk #10 succeeded at 769 (offset -2 lines).
> Hunk #11 succeeded at 839 (offset -2 lines).
> Hunk #12 succeeded at 903 (offset -2 lines).
> patching file src/java/org/apache/hadoop/hbase/master/HMaster.java
> patching file
> src/java/org/apache/hadoop/hbase/master/ProcessRegionOpen.java
> patching file src/java/org/apache/hadoop/hbase/master/BaseScanner.java
> patching file src/webapps/master/table.jsp
>
> I looked at
> src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> and the only serverInfo.setStartCode() call is at line 1337.
>
> Please advise how to apply the patch to 0.20.5
>
> Thanks
>
> On Fri, Jul 2, 2010 at 7:57 AM, Stanislaw Kogut <sk...@sistyma.net>
> wrote:
>
> > Thanks, I have applied this patch and issue went out.
> >
> > On Thu, Jul 1, 2010 at 9:18 PM, Jean-Daniel Cryans <jdcryans@apache.org
> > >wrote:
> >
> > > (sorry it took so long to answer, we were all busy with the various
> > > meetings around the Bay Area)
> > >
> > > I can see the issue:
> > >
> > > 2010-06-30 13:48:16,135 DEBUG master.BaseScanner
> > > (BaseScanner.java:checkAssigned(580)) - Current assignment of
> > > .META.,,1 is not valid;  serverAddress=, startCode=0 unknown.
> > > ...
> > > 2010-06-30 13:48:26,967 INFO  master.RegionServerOperation
> > > (ProcessRegionOpen.java:process(80)) - Updated row .META.,,1 in region
> > > -ROOT-,,0 with startcode=1277894894819, server=192.168.242.142:60020
> > > 2010-06-30 13:48:26,968 DEBUG master.RegionServerOperation
> > > (ProcessRegionOpen.java:process(98)) - Adding to onlineMetaRegions:
> > > {server: 192.168.242.142:60020, regionname: .META.,,1, startKey: <>}
> > > ...
> > > 2010-06-30 13:49:13,050 DEBUG master.BaseScanner
> > > (BaseScanner.java:checkAssigned(580)) - Current assignment of
> > > .META.,,1 is not valid;  serverAddress=192.168.242.142:60020,
> > > startCode=1277894894819 unknown.
> > > ...
> > > 2010-06-30 13:49:13,176 INFO  master.RegionServerOperation
> > > (ProcessRegionOpen.java:process(80)) - Updated row .META.,,1 in region
> > > -ROOT-,,0 with startcode=1277894893999, server=192.168.242.145:60020
> > > 2010-06-30 13:49:13,176 DEBUG master.RegionServerOperation
> > > (ProcessRegionOpen.java:process(98)) - Adding to onlineMetaRegions:
> > > {server: 192.168.242.145:60020, regionname: .META.,,1, startKey: <>}
> > > ...
> > >
> > > This is https://issues.apache.org/jira/browse/hbase-2599. We didn't
> > > apply it on branch because it was breaking rolling restarts but you
> > > can safely apply the patch before a cluster startup.
> > >
> > > J-D
> > >
> > > On Thu, Jul 1, 2010 at 5:03 AM, Stanislaw Kogut <sk...@sistyma.net>
> > > wrote:
> > > > Completely changed all hadoop configuration to almost default, PE
> > > completes
> > > > writing for 1000000 rows, but regions still come assigned to multiple
> > > RS's
> > > >
> > > > hbase(main):001:0> status 'detailed'
> > > > version 0.20.5
> > > > 0 regionsInTransition
> > > > 6 live servers
> > > >    uasstse005.ua.sistyma.com:60020 1277985198620
> > > >        requests=0, regions=2, usedHeap=25, maxHeap=1196
> > > >        .META.,,1
> > > >            stores=2, storefiles=0, storefileSizeMB=0,
> memstoreSizeMB=0,
> > > > storefileIndexSizeMB=0
> > > >        -ROOT-,,0
> > > >            stores=1, storefiles=3, storefileSizeMB=0,
> memstoreSizeMB=0,
> > > > storefileIndexSizeMB=0
> > > >    stas-node.ua.sistyma.com:60020 1277985198573
> > > >        requests=0, regions=1, usedHeap=22, maxHeap=1996
> > > >        .META.,,1
> > > >            stores=2, storefiles=0, storefileSizeMB=0,
> memstoreSizeMB=0,
> > > > storefileIndexSizeMB=0
> > > >    uasstse004.ua.sistyma.com:60020 1277985198572
> > > >        requests=0, regions=1, usedHeap=23, maxHeap=1996
> > > >        .META.,,1
> > > >            stores=2, storefiles=0, storefileSizeMB=0,
> memstoreSizeMB=0,
> > > > storefileIndexSizeMB=0
> > > >    uasstse006.ua.sistyma.com:60020 1277985198554
> > > >        requests=0, regions=0, usedHeap=33, maxHeap=1196
> > > >    uasstse002.ua.sistyma.com:60020 1277985198667
> > > >        requests=0, regions=1, usedHeap=34, maxHeap=1996
> > > >        .META.,,1
> > > >            stores=2, storefiles=0, storefileSizeMB=0,
> memstoreSizeMB=0,
> > > > storefileIndexSizeMB=0
> > > >    uasstse003.ua.sistyma.com:60020 1277985198550
> > > >        requests=0, regions=1, usedHeap=22, maxHeap=1996
> > > >        .META.,,1
> > > >            stores=2, storefiles=0, storefileSizeMB=0,
> memstoreSizeMB=0,
> > > > storefileIndexSizeMB=0
> > > > 0 dead servers
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Stanislaw Kogut
> > Sistyma LLC
> >
>

Re: HBase 0.20.5 issues

Posted by Ted Yu <yu...@gmail.com>.

Here is the output from patch:

tyumac:hbase-0.20.5 tyu$ patch -p0 --dry-run < 2599-0.20.txt
patching file src/java/org/apache/hadoop/hbase/ClusterStatus.java
patching file
src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Hunk #1 FAILED at 520.
Hunk #2 succeeded at 616 (offset -28 lines).
Hunk #3 succeeded at 638 (offset -28 lines).
Hunk #4 succeeded at 1037 (offset -11 lines).
Hunk #5 succeeded at 1173 (offset -9 lines).
Hunk #6 succeeded at 1335 (offset -9 lines).
Hunk #7 succeeded at 1358 (offset -9 lines).
1 out of 7 hunks FAILED -- saving rejects to file
src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java.rej
patching file src/java/org/apache/hadoop/hbase/HServerInfo.java
patching file src/java/org/apache/hadoop/hbase/master/ServerManager.java
Hunk #2 succeeded at 73 (offset -2 lines).
Hunk #3 succeeded at 105 (offset -2 lines).
Hunk #4 succeeded at 215 (offset -2 lines).
Hunk #5 succeeded at 309 (offset -2 lines).
Hunk #6 succeeded at 332 (offset -2 lines).
Hunk #7 succeeded at 391 (offset -2 lines).
Hunk #8 succeeded at 597 (offset -2 lines).
Hunk #9 succeeded at 686 (offset -2 lines).
Hunk #10 succeeded at 769 (offset -2 lines).
Hunk #11 succeeded at 839 (offset -2 lines).
Hunk #12 succeeded at 903 (offset -2 lines).
patching file src/java/org/apache/hadoop/hbase/master/HMaster.java
patching file src/java/org/apache/hadoop/hbase/master/ProcessRegionOpen.java
patching file src/java/org/apache/hadoop/hbase/master/BaseScanner.java
patching file src/webapps/master/table.jsp

I looked at src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
and the only serverInfo.setStartCode() call is at line 1337.

Please advise how to apply the patch to 0.20.5

Thanks

On Fri, Jul 2, 2010 at 7:57 AM, Stanislaw Kogut <sk...@sistyma.net> wrote:

> Thanks, I have applied this patch and issue went out.
>
> On Thu, Jul 1, 2010 at 9:18 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
>
> > (sorry it took so long to answer, we were all busy with the various
> > meetings around the Bay Area)
> >
> > I can see the issue:
> >
> > 2010-06-30 13:48:16,135 DEBUG master.BaseScanner
> > (BaseScanner.java:checkAssigned(580)) - Current assignment of
> > .META.,,1 is not valid;  serverAddress=, startCode=0 unknown.
> > ...
> > 2010-06-30 13:48:26,967 INFO  master.RegionServerOperation
> > (ProcessRegionOpen.java:process(80)) - Updated row .META.,,1 in region
> > -ROOT-,,0 with startcode=1277894894819, server=192.168.242.142:60020
> > 2010-06-30 13:48:26,968 DEBUG master.RegionServerOperation
> > (ProcessRegionOpen.java:process(98)) - Adding to onlineMetaRegions:
> > {server: 192.168.242.142:60020, regionname: .META.,,1, startKey: <>}
> > ...
> > 2010-06-30 13:49:13,050 DEBUG master.BaseScanner
> > (BaseScanner.java:checkAssigned(580)) - Current assignment of
> > .META.,,1 is not valid;  serverAddress=192.168.242.142:60020,
> > startCode=1277894894819 unknown.
> > ...
> > 2010-06-30 13:49:13,176 INFO  master.RegionServerOperation
> > (ProcessRegionOpen.java:process(80)) - Updated row .META.,,1 in region
> > -ROOT-,,0 with startcode=1277894893999, server=192.168.242.145:60020
> > 2010-06-30 13:49:13,176 DEBUG master.RegionServerOperation
> > (ProcessRegionOpen.java:process(98)) - Adding to onlineMetaRegions:
> > {server: 192.168.242.145:60020, regionname: .META.,,1, startKey: <>}
> > ...
> >
> > This is https://issues.apache.org/jira/browse/hbase-2599. We didn't
> > apply it on branch because it was breaking rolling restarts but you
> > can safely apply the patch before a cluster startup.
> >
> > J-D
> >
> > On Thu, Jul 1, 2010 at 5:03 AM, Stanislaw Kogut <sk...@sistyma.net>
> > wrote:
> > > Completely changed all hadoop configuration to almost default, PE
> > completes
> > > writing for 1000000 rows, but regions still come assigned to multiple
> > RS's
> > >
> > > hbase(main):001:0> status 'detailed'
> > > version 0.20.5
> > > 0 regionsInTransition
> > > 6 live servers
> > >    uasstse005.ua.sistyma.com:60020 1277985198620
> > >        requests=0, regions=2, usedHeap=25, maxHeap=1196
> > >        .META.,,1
> > >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> > > storefileIndexSizeMB=0
> > >        -ROOT-,,0
> > >            stores=1, storefiles=3, storefileSizeMB=0, memstoreSizeMB=0,
> > > storefileIndexSizeMB=0
> > >    stas-node.ua.sistyma.com:60020 1277985198573
> > >        requests=0, regions=1, usedHeap=22, maxHeap=1996
> > >        .META.,,1
> > >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> > > storefileIndexSizeMB=0
> > >    uasstse004.ua.sistyma.com:60020 1277985198572
> > >        requests=0, regions=1, usedHeap=23, maxHeap=1996
> > >        .META.,,1
> > >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> > > storefileIndexSizeMB=0
> > >    uasstse006.ua.sistyma.com:60020 1277985198554
> > >        requests=0, regions=0, usedHeap=33, maxHeap=1196
> > >    uasstse002.ua.sistyma.com:60020 1277985198667
> > >        requests=0, regions=1, usedHeap=34, maxHeap=1996
> > >        .META.,,1
> > >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> > > storefileIndexSizeMB=0
> > >    uasstse003.ua.sistyma.com:60020 1277985198550
> > >        requests=0, regions=1, usedHeap=22, maxHeap=1996
> > >        .META.,,1
> > >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> > > storefileIndexSizeMB=0
> > > 0 dead servers
> > >
> > >
> >
>
>
>
> --
> Regards,
> Stanislaw Kogut
> Sistyma LLC
>

Re: HBase 0.20.5 issues

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Awesome!

Have fun HBase'ing!

J-D

On Fri, Jul 2, 2010 at 7:57 AM, Stanislaw Kogut <sk...@sistyma.net> wrote:
> Thanks, I have applied this patch and issue went out.
>
> On Thu, Jul 1, 2010 at 9:18 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> (sorry it took so long to answer, we were all busy with the various
>> meetings around the Bay Area)
>>
>> I can see the issue:
>>
>> 2010-06-30 13:48:16,135 DEBUG master.BaseScanner
>> (BaseScanner.java:checkAssigned(580)) - Current assignment of
>> .META.,,1 is not valid;  serverAddress=, startCode=0 unknown.
>> ...
>> 2010-06-30 13:48:26,967 INFO  master.RegionServerOperation
>> (ProcessRegionOpen.java:process(80)) - Updated row .META.,,1 in region
>> -ROOT-,,0 with startcode=1277894894819, server=192.168.242.142:60020
>> 2010-06-30 13:48:26,968 DEBUG master.RegionServerOperation
>> (ProcessRegionOpen.java:process(98)) - Adding to onlineMetaRegions:
>> {server: 192.168.242.142:60020, regionname: .META.,,1, startKey: <>}
>> ...
>> 2010-06-30 13:49:13,050 DEBUG master.BaseScanner
>> (BaseScanner.java:checkAssigned(580)) - Current assignment of
>> .META.,,1 is not valid;  serverAddress=192.168.242.142:60020,
>> startCode=1277894894819 unknown.
>> ...
>> 2010-06-30 13:49:13,176 INFO  master.RegionServerOperation
>> (ProcessRegionOpen.java:process(80)) - Updated row .META.,,1 in region
>> -ROOT-,,0 with startcode=1277894893999, server=192.168.242.145:60020
>> 2010-06-30 13:49:13,176 DEBUG master.RegionServerOperation
>> (ProcessRegionOpen.java:process(98)) - Adding to onlineMetaRegions:
>> {server: 192.168.242.145:60020, regionname: .META.,,1, startKey: <>}
>> ...
>>
>> This is https://issues.apache.org/jira/browse/hbase-2599. We didn't
>> apply it on branch because it was breaking rolling restarts but you
>> can safely apply the patch before a cluster startup.
>>
>> J-D
>>
>> On Thu, Jul 1, 2010 at 5:03 AM, Stanislaw Kogut <sk...@sistyma.net>
>> wrote:
>> > Completely changed all hadoop configuration to almost default, PE
>> completes
>> > writing for 1000000 rows, but regions still come assigned to multiple
>> RS's
>> >
>> > hbase(main):001:0> status 'detailed'
>> > version 0.20.5
>> > 0 regionsInTransition
>> > 6 live servers
>> >    uasstse005.ua.sistyma.com:60020 1277985198620
>> >        requests=0, regions=2, usedHeap=25, maxHeap=1196
>> >        .META.,,1
>> >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
>> > storefileIndexSizeMB=0
>> >        -ROOT-,,0
>> >            stores=1, storefiles=3, storefileSizeMB=0, memstoreSizeMB=0,
>> > storefileIndexSizeMB=0
>> >    stas-node.ua.sistyma.com:60020 1277985198573
>> >        requests=0, regions=1, usedHeap=22, maxHeap=1996
>> >        .META.,,1
>> >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
>> > storefileIndexSizeMB=0
>> >    uasstse004.ua.sistyma.com:60020 1277985198572
>> >        requests=0, regions=1, usedHeap=23, maxHeap=1996
>> >        .META.,,1
>> >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
>> > storefileIndexSizeMB=0
>> >    uasstse006.ua.sistyma.com:60020 1277985198554
>> >        requests=0, regions=0, usedHeap=33, maxHeap=1196
>> >    uasstse002.ua.sistyma.com:60020 1277985198667
>> >        requests=0, regions=1, usedHeap=34, maxHeap=1996
>> >        .META.,,1
>> >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
>> > storefileIndexSizeMB=0
>> >    uasstse003.ua.sistyma.com:60020 1277985198550
>> >        requests=0, regions=1, usedHeap=22, maxHeap=1996
>> >        .META.,,1
>> >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
>> > storefileIndexSizeMB=0
>> > 0 dead servers
>> >
>> >
>>
>
>
>
> --
> Regards,
> Stanislaw Kogut
> Sistyma LLC
>

Re: HBase 0.20.5 issues

Posted by Stanislaw Kogut <sk...@sistyma.net>.

Thanks, I have applied this patch and issue went out.

On Thu, Jul 1, 2010 at 9:18 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> (sorry it took so long to answer, we were all busy with the various
> meetings around the Bay Area)
>
> I can see the issue:
>
> 2010-06-30 13:48:16,135 DEBUG master.BaseScanner
> (BaseScanner.java:checkAssigned(580)) - Current assignment of
> .META.,,1 is not valid;  serverAddress=, startCode=0 unknown.
> ...
> 2010-06-30 13:48:26,967 INFO  master.RegionServerOperation
> (ProcessRegionOpen.java:process(80)) - Updated row .META.,,1 in region
> -ROOT-,,0 with startcode=1277894894819, server=192.168.242.142:60020
> 2010-06-30 13:48:26,968 DEBUG master.RegionServerOperation
> (ProcessRegionOpen.java:process(98)) - Adding to onlineMetaRegions:
> {server: 192.168.242.142:60020, regionname: .META.,,1, startKey: <>}
> ...
> 2010-06-30 13:49:13,050 DEBUG master.BaseScanner
> (BaseScanner.java:checkAssigned(580)) - Current assignment of
> .META.,,1 is not valid;  serverAddress=192.168.242.142:60020,
> startCode=1277894894819 unknown.
> ...
> 2010-06-30 13:49:13,176 INFO  master.RegionServerOperation
> (ProcessRegionOpen.java:process(80)) - Updated row .META.,,1 in region
> -ROOT-,,0 with startcode=1277894893999, server=192.168.242.145:60020
> 2010-06-30 13:49:13,176 DEBUG master.RegionServerOperation
> (ProcessRegionOpen.java:process(98)) - Adding to onlineMetaRegions:
> {server: 192.168.242.145:60020, regionname: .META.,,1, startKey: <>}
> ...
>
> This is https://issues.apache.org/jira/browse/hbase-2599. We didn't
> apply it on branch because it was breaking rolling restarts but you
> can safely apply the patch before a cluster startup.
>
> J-D
>
> On Thu, Jul 1, 2010 at 5:03 AM, Stanislaw Kogut <sk...@sistyma.net>
> wrote:
> > Completely changed all hadoop configuration to almost default, PE
> completes
> > writing for 1000000 rows, but regions still come assigned to multiple
> RS's
> >
> > hbase(main):001:0> status 'detailed'
> > version 0.20.5
> > 0 regionsInTransition
> > 6 live servers
> >    uasstse005.ua.sistyma.com:60020 1277985198620
> >        requests=0, regions=2, usedHeap=25, maxHeap=1196
> >        .META.,,1
> >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0
> >        -ROOT-,,0
> >            stores=1, storefiles=3, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0
> >    stas-node.ua.sistyma.com:60020 1277985198573
> >        requests=0, regions=1, usedHeap=22, maxHeap=1996
> >        .META.,,1
> >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0
> >    uasstse004.ua.sistyma.com:60020 1277985198572
> >        requests=0, regions=1, usedHeap=23, maxHeap=1996
> >        .META.,,1
> >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0
> >    uasstse006.ua.sistyma.com:60020 1277985198554
> >        requests=0, regions=0, usedHeap=33, maxHeap=1196
> >    uasstse002.ua.sistyma.com:60020 1277985198667
> >        requests=0, regions=1, usedHeap=34, maxHeap=1996
> >        .META.,,1
> >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0
> >    uasstse003.ua.sistyma.com:60020 1277985198550
> >        requests=0, regions=1, usedHeap=22, maxHeap=1996
> >        .META.,,1
> >            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0
> > 0 dead servers
> >
> >
>



-- 
Regards,
Stanislaw Kogut
Sistyma LLC

Re: HBase 0.20.5 issues

Posted by Jean-Daniel Cryans <jd...@apache.org>.

(sorry it took so long to answer, we were all busy with the various
meetings around the Bay Area)

I can see the issue:

2010-06-30 13:48:16,135 DEBUG master.BaseScanner
(BaseScanner.java:checkAssigned(580)) - Current assignment of
.META.,,1 is not valid;  serverAddress=, startCode=0 unknown.
...
2010-06-30 13:48:26,967 INFO  master.RegionServerOperation
(ProcessRegionOpen.java:process(80)) - Updated row .META.,,1 in region
-ROOT-,,0 with startcode=1277894894819, server=192.168.242.142:60020
2010-06-30 13:48:26,968 DEBUG master.RegionServerOperation
(ProcessRegionOpen.java:process(98)) - Adding to onlineMetaRegions:
{server: 192.168.242.142:60020, regionname: .META.,,1, startKey: <>}
...
2010-06-30 13:49:13,050 DEBUG master.BaseScanner
(BaseScanner.java:checkAssigned(580)) - Current assignment of
.META.,,1 is not valid;  serverAddress=192.168.242.142:60020,
startCode=1277894894819 unknown.
...
2010-06-30 13:49:13,176 INFO  master.RegionServerOperation
(ProcessRegionOpen.java:process(80)) - Updated row .META.,,1 in region
-ROOT-,,0 with startcode=1277894893999, server=192.168.242.145:60020
2010-06-30 13:49:13,176 DEBUG master.RegionServerOperation
(ProcessRegionOpen.java:process(98)) - Adding to onlineMetaRegions:
{server: 192.168.242.145:60020, regionname: .META.,,1, startKey: <>}
...

This is https://issues.apache.org/jira/browse/hbase-2599. We didn't
apply it on branch because it was breaking rolling restarts but you
can safely apply the patch before a cluster startup.

J-D

On Thu, Jul 1, 2010 at 5:03 AM, Stanislaw Kogut <sk...@sistyma.net> wrote:
> Completely changed all hadoop configuration to almost default, PE completes
> writing for 1000000 rows, but regions still come assigned to multiple RS's
>
> hbase(main):001:0> status 'detailed'
> version 0.20.5
> 0 regionsInTransition
> 6 live servers
>    uasstse005.ua.sistyma.com:60020 1277985198620
>        requests=0, regions=2, usedHeap=25, maxHeap=1196
>        .META.,,1
>            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> storefileIndexSizeMB=0
>        -ROOT-,,0
>            stores=1, storefiles=3, storefileSizeMB=0, memstoreSizeMB=0,
> storefileIndexSizeMB=0
>    stas-node.ua.sistyma.com:60020 1277985198573
>        requests=0, regions=1, usedHeap=22, maxHeap=1996
>        .META.,,1
>            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> storefileIndexSizeMB=0
>    uasstse004.ua.sistyma.com:60020 1277985198572
>        requests=0, regions=1, usedHeap=23, maxHeap=1996
>        .META.,,1
>            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> storefileIndexSizeMB=0
>    uasstse006.ua.sistyma.com:60020 1277985198554
>        requests=0, regions=0, usedHeap=33, maxHeap=1196
>    uasstse002.ua.sistyma.com:60020 1277985198667
>        requests=0, regions=1, usedHeap=34, maxHeap=1996
>        .META.,,1
>            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> storefileIndexSizeMB=0
>    uasstse003.ua.sistyma.com:60020 1277985198550
>        requests=0, regions=1, usedHeap=22, maxHeap=1996
>        .META.,,1
>            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
> storefileIndexSizeMB=0
> 0 dead servers
>
>

Re: HBase 0.20.5 issues

Posted by Stanislaw Kogut <sk...@sistyma.net>.

Completely changed all hadoop configuration to almost default, PE completes
writing for 1000000 rows, but regions still come assigned to multiple RS's

hbase(main):001:0> status 'detailed'
version 0.20.5
0 regionsInTransition
6 live servers
    uasstse005.ua.sistyma.com:60020 1277985198620
        requests=0, regions=2, usedHeap=25, maxHeap=1196
        .META.,,1
            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0
        -ROOT-,,0
            stores=1, storefiles=3, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0
    stas-node.ua.sistyma.com:60020 1277985198573
        requests=0, regions=1, usedHeap=22, maxHeap=1996
        .META.,,1
            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0
    uasstse004.ua.sistyma.com:60020 1277985198572
        requests=0, regions=1, usedHeap=23, maxHeap=1996
        .META.,,1
            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0
    uasstse006.ua.sistyma.com:60020 1277985198554
        requests=0, regions=0, usedHeap=33, maxHeap=1196
    uasstse002.ua.sistyma.com:60020 1277985198667
        requests=0, regions=1, usedHeap=34, maxHeap=1996
        .META.,,1
            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0
    uasstse003.ua.sistyma.com:60020 1277985198550
        requests=0, regions=1, usedHeap=22, maxHeap=1996
        .META.,,1
            stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0
0 dead servers


On Wed, Jun 30, 2010 at 2:49 PM, Stanislaw Kogut <sk...@sistyma.net> wrote:

> See clean logs from scratch for hadoop and hbase after start with clean
> hbase rootdir.
>
> http://sp.sistyma.com/hbase_logs.tar.gz
>
>
> On Tue, Jun 29, 2010 at 8:46 PM, Stack <sa...@gmail.com> wrote:
>
>> Something is seriously wrong with your setup.  Please put your master logs
>> somewhere we can pull from.   Enable debug too.  Thanks
>>
>>
>>
>> On Jun 29, 2010, at 10:29 AM, Stanislaw Kogut <sk...@sistyma.net> wrote:
>>
>> > 1. Stopping hbase
>> > 2. Removing hbase.root.dir from hdfs
>> > 3. Starting hbase
>> > 4. Doing major_compact on .META.
>> > 5. Starting PE
>> >
>> > 10/06/29 20:17:30 INFO hbase.PerformanceEvaluation: Table {NAME =>
>> > 'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE',
>> VERSIONS
>> > => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
>> > BLOCKCACHE => 'true'}]} created
>> > 10/06/29 20:17:30 INFO hbase.PerformanceEvaluation: Start class
>> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at
>> offset
>> > 0 for 1048576 rows
>> > 10/06/29 20:17:42 INFO hbase.PerformanceEvaluation: 0/104857/1048576
>> > 10/06/29 20:17:55 INFO hbase.PerformanceEvaluation: 0/209714/1048576
>> > 10/06/29 20:18:13 INFO hbase.PerformanceEvaluation: 0/314571/1048576
>> > 10/06/29 20:18:29 INFO hbase.PerformanceEvaluation: 0/419428/1048576
>> > 10/06/29 20:22:37 ERROR hbase.PerformanceEvaluation: Failed
>> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> contact
>> > region server  -- nothing found, no 'location' returned,
>> > tableName=TestTable, reload=true -- for region , row '0000511450', but
>> > failed after 11 attempts.
>> > Exceptions:
>> > java.io.IOException: HRegionInfo was null or empty in .META.
>> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
>> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
>> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
>> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
>> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
>> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
>> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
>> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
>> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
>> >
>> >    at
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:1087)
>> >    at
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.access$200(HConnectionManager.java:240)
>> >    at
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.getRegionName(HConnectionManager.java:1183)
>> >    at
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1160)
>> >    at
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230)
>> >    at
>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:621)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:637)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:889)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation.runNIsOne(PerformanceEvaluation.java:907)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:939)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation.doCommandLine(PerformanceEvaluation.java:1036)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:1061)
>> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >    at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >    at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >    at java.lang.reflect.Method.invoke(Method.java:597)
>> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>> >
>> >
>> > On Tue, Jun 29, 2010 at 8:03 PM, Stack <st...@duboce.net> wrote:
>> >
>> >> For sure you are removing the hbase dir in hdfs?
>> >>
>> >> Try major compaction of your .META. table?
>> >>
>> >> hbase> major_compact ".META."
>> >>
>> >> You seem to be suffering  HBASE-1880 but if you are removing the hbase
>> >> dir, you shouldn't be running into this.
>> >>
>> >> St.Ack
>> >>
>> >>
>> > --
>> > Regards,
>> > Stanislaw Kogut
>> > Sistyma LLC
>>
>
>
>
> --
> Regards,
> Stanislaw Kogut
> Sistyma LLC
>



-- 
Regards,
Stanislaw Kogut
Sistyma LLC

Re: HBase 0.20.5 issues

Posted by Stanislaw Kogut <sk...@sistyma.net>.

See clean logs from scratch for hadoop and hbase after start with clean
hbase rootdir.

http://sp.sistyma.com/hbase_logs.tar.gz

On Tue, Jun 29, 2010 at 8:46 PM, Stack <sa...@gmail.com> wrote:

> Something is seriously wrong with your setup.  Please put your master logs
> somewhere we can pull from.   Enable debug too.  Thanks
>
>
>
> On Jun 29, 2010, at 10:29 AM, Stanislaw Kogut <sk...@sistyma.net> wrote:
>
> > 1. Stopping hbase
> > 2. Removing hbase.root.dir from hdfs
> > 3. Starting hbase
> > 4. Doing major_compact on .META.
> > 5. Starting PE
> >
> > 10/06/29 20:17:30 INFO hbase.PerformanceEvaluation: Table {NAME =>
> > 'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE',
> VERSIONS
> > => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> > BLOCKCACHE => 'true'}]} created
> > 10/06/29 20:17:30 INFO hbase.PerformanceEvaluation: Start class
> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at
> offset
> > 0 for 1048576 rows
> > 10/06/29 20:17:42 INFO hbase.PerformanceEvaluation: 0/104857/1048576
> > 10/06/29 20:17:55 INFO hbase.PerformanceEvaluation: 0/209714/1048576
> > 10/06/29 20:18:13 INFO hbase.PerformanceEvaluation: 0/314571/1048576
> > 10/06/29 20:18:29 INFO hbase.PerformanceEvaluation: 0/419428/1048576
> > 10/06/29 20:22:37 ERROR hbase.PerformanceEvaluation: Failed
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact
> > region server  -- nothing found, no 'location' returned,
> > tableName=TestTable, reload=true -- for region , row '0000511450', but
> > failed after 11 attempts.
> > Exceptions:
> > java.io.IOException: HRegionInfo was null or empty in .META.
> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
> > org.apache.hadoop.hbase.TableNotFoundException: TestTable
> >
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:1087)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.access$200(HConnectionManager.java:240)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.getRegionName(HConnectionManager.java:1183)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1160)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230)
> >    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:621)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:637)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:889)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.runNIsOne(PerformanceEvaluation.java:907)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:939)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.doCommandLine(PerformanceEvaluation.java:1036)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:1061)
> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >    at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >    at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> >
> >
> > On Tue, Jun 29, 2010 at 8:03 PM, Stack <st...@duboce.net> wrote:
> >
> >> For sure you are removing the hbase dir in hdfs?
> >>
> >> Try major compaction of your .META. table?
> >>
> >> hbase> major_compact ".META."
> >>
> >> You seem to be suffering  HBASE-1880 but if you are removing the hbase
> >> dir, you shouldn't be running into this.
> >>
> >> St.Ack
> >>
> >>
> > --
> > Regards,
> > Stanislaw Kogut
> > Sistyma LLC
>



-- 
Regards,
Stanislaw Kogut
Sistyma LLC

Re: HBase 0.20.5 issues

Posted by Stack <sa...@gmail.com>.

Something is seriously wrong with your setup.  Please put your master logs somewhere we can pull from.   Enable debug too.  Thanks



On Jun 29, 2010, at 10:29 AM, Stanislaw Kogut <sk...@sistyma.net> wrote:

> 1. Stopping hbase
> 2. Removing hbase.root.dir from hdfs
> 3. Starting hbase
> 4. Doing major_compact on .META.
> 5. Starting PE
> 
> 10/06/29 20:17:30 INFO hbase.PerformanceEvaluation: Table {NAME =>
> 'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
> => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'}]} created
> 10/06/29 20:17:30 INFO hbase.PerformanceEvaluation: Start class
> org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at offset
> 0 for 1048576 rows
> 10/06/29 20:17:42 INFO hbase.PerformanceEvaluation: 0/104857/1048576
> 10/06/29 20:17:55 INFO hbase.PerformanceEvaluation: 0/209714/1048576
> 10/06/29 20:18:13 INFO hbase.PerformanceEvaluation: 0/314571/1048576
> 10/06/29 20:18:29 INFO hbase.PerformanceEvaluation: 0/419428/1048576
> 10/06/29 20:22:37 ERROR hbase.PerformanceEvaluation: Failed
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server  -- nothing found, no 'location' returned,
> tableName=TestTable, reload=true -- for region , row '0000511450', but
> failed after 11 attempts.
> Exceptions:
> java.io.IOException: HRegionInfo was null or empty in .META.
> org.apache.hadoop.hbase.TableNotFoundException: TestTable
> org.apache.hadoop.hbase.TableNotFoundException: TestTable
> org.apache.hadoop.hbase.TableNotFoundException: TestTable
> org.apache.hadoop.hbase.TableNotFoundException: TestTable
> org.apache.hadoop.hbase.TableNotFoundException: TestTable
> org.apache.hadoop.hbase.TableNotFoundException: TestTable
> org.apache.hadoop.hbase.TableNotFoundException: TestTable
> org.apache.hadoop.hbase.TableNotFoundException: TestTable
> org.apache.hadoop.hbase.TableNotFoundException: TestTable
> 
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:1087)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.access$200(HConnectionManager.java:240)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.getRegionName(HConnectionManager.java:1183)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1160)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230)
>    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:621)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:637)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:889)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation.runNIsOne(PerformanceEvaluation.java:907)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:939)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation.doCommandLine(PerformanceEvaluation.java:1036)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:1061)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> 
> 
> On Tue, Jun 29, 2010 at 8:03 PM, Stack <st...@duboce.net> wrote:
> 
>> For sure you are removing the hbase dir in hdfs?
>> 
>> Try major compaction of your .META. table?
>> 
>> hbase> major_compact ".META."
>> 
>> You seem to be suffering  HBASE-1880 but if you are removing the hbase
>> dir, you shouldn't be running into this.
>> 
>> St.Ack
>> 
>> 
> -- 
> Regards,
> Stanislaw Kogut
> Sistyma LLC

Re: HBase 0.20.5 issues

Posted by Stanislaw Kogut <sk...@sistyma.net>.

1. Stopping hbase
2. Removing hbase.root.dir from hdfs
3. Starting hbase
4. Doing major_compact on .META.
5. Starting PE

10/06/29 20:17:30 INFO hbase.PerformanceEvaluation: Table {NAME =>
'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
=> '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}]} created
10/06/29 20:17:30 INFO hbase.PerformanceEvaluation: Start class
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at offset
0 for 1048576 rows
10/06/29 20:17:42 INFO hbase.PerformanceEvaluation: 0/104857/1048576
10/06/29 20:17:55 INFO hbase.PerformanceEvaluation: 0/209714/1048576
10/06/29 20:18:13 INFO hbase.PerformanceEvaluation: 0/314571/1048576
10/06/29 20:18:29 INFO hbase.PerformanceEvaluation: 0/419428/1048576
10/06/29 20:22:37 ERROR hbase.PerformanceEvaluation: Failed
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server  -- nothing found, no 'location' returned,
tableName=TestTable, reload=true -- for region , row '0000511450', but
failed after 11 attempts.
Exceptions:
java.io.IOException: HRegionInfo was null or empty in .META.
org.apache.hadoop.hbase.TableNotFoundException: TestTable
org.apache.hadoop.hbase.TableNotFoundException: TestTable
org.apache.hadoop.hbase.TableNotFoundException: TestTable
org.apache.hadoop.hbase.TableNotFoundException: TestTable
org.apache.hadoop.hbase.TableNotFoundException: TestTable
org.apache.hadoop.hbase.TableNotFoundException: TestTable
org.apache.hadoop.hbase.TableNotFoundException: TestTable
org.apache.hadoop.hbase.TableNotFoundException: TestTable
org.apache.hadoop.hbase.TableNotFoundException: TestTable

    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:1087)
    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.access$200(HConnectionManager.java:240)
    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.getRegionName(HConnectionManager.java:1183)
    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1160)
    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230)
    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
    at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:621)
    at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:637)
    at
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:889)
    at
org.apache.hadoop.hbase.PerformanceEvaluation.runNIsOne(PerformanceEvaluation.java:907)
    at
org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:939)
    at
org.apache.hadoop.hbase.PerformanceEvaluation.doCommandLine(PerformanceEvaluation.java:1036)
    at
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:1061)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


On Tue, Jun 29, 2010 at 8:03 PM, Stack <st...@duboce.net> wrote:

> For sure you are removing the hbase dir in hdfs?
>
> Try major compaction of your .META. table?
>
> hbase> major_compact ".META."
>
> You seem to be suffering  HBASE-1880 but if you are removing the hbase
> dir, you shouldn't be running into this.
>
> St.Ack
>
>
-- 
Regards,
Stanislaw Kogut
Sistyma LLC

Re: HBase 0.20.5 issues

Posted by Stack <st...@duboce.net>.

For sure you are removing the hbase dir in hdfs?

Try major compaction of your .META. table?

hbase> major_compact ".META."

You seem to be suffering  HBASE-1880 but if you are removing the hbase
dir, you shouldn't be running into this.

St.Ack

On Tue, Jun 29, 2010 at 9:26 AM, Stanislaw Kogut <sk...@sistyma.net> wrote:
> Yes, I doing hadoop fs -rmr /hbase each time.
>
> So, here is some messages from master logs:
>
> 2010-06-29 19:15:11,309 INFO org.apache.hadoop.hbase.master.ServerManager: 5
> region servers, 0 dead, average load 0.8
> 2010-06-29 19:15:12,146 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.rootScanner scanning meta region {server:
> 192.168.242.144:60020, regionname: -ROOT-,,0, startKey: <>}
> 2010-06-29 19:15:12,151 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of .META.,,1 is not valid;  serverAddress=
> 192.168.242.146:60020, startCode=1277827873580 unknown.
> 2010-06-29 19:15:12,152 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.rootScanner scan of 1 row(s) of meta region {server:
> 192.168.242.144:60020, regionname: -ROOT-,,0, startKey: <>} complete
> 2010-06-29 19:15:12,304 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Assigning for address: 192.168.242.144:60020, startcode: 1277827873598,
> load: (requests=5, regions=1, usedHeap=33, maxHeap=1996): total nregions to
> assign=0, regions to give other servers than this=1, isMetaAssign=true
> 2010-06-29 19:15:12,304 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Assigning address: 192.168.242.144:60020, startcode: 1277827873598, load:
> (requests=5, regions=1, usedHeap=33, maxHeap=1996) 0 regions
> 2010-06-29 19:15:12,304 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region .META.,,1 to uasstse004.ua.sistyma.com.,60020,1277827873598
> 2010-06-29 19:15:12,329 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_OPEN: .META.,,1 from
> uasstse004.ua.sistyma.com.,60020,1277827873598;
> 1 of 1
> 2010-06-29 19:15:12,329 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Processing todo: PendingOpenOperation from uasstse004.ua.sistyma.com
> .,60020,1277827873598
> 2010-06-29 19:15:12,329 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: .META.,,1 open on
> 192.168.242.144:60020
> 2010-06-29 19:15:12,331 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row .META.,,1
> in region -ROOT-,,0 with startcode=1277827873598, server=
> 192.168.242.144:60020
> 2010-06-29 19:15:12,331 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: Adding to
> onlineMetaRegions: {server: 192.168.242.144:60020, regionname: .META.,,1,
> startKey: <>}
> 2010-06-29 19:15:12,331 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scanning meta region {server:
> 192.168.242.144:60020, regionname: .META.,,1, startKey: <>}
> 2010-06-29 19:15:12,333 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 0 row(s) of meta region {server:
> 192.168.242.144:60020, regionname: .META.,,1, startKey: <>} complete
> 2010-06-29 19:15:12,333 INFO org.apache.hadoop.hbase.master.BaseScanner: All
> 1 .META. region(s) scanned
>
> then, after some time of inactivity (no regionserver failures, no
> stop/starts, nothing)
>
> 2010-06-29 19:16:11,317 INFO org.apache.hadoop.hbase.master.ServerManager: 5
> region servers, 0 dead, average load 1.0
> 2010-06-29 19:16:12,153 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.rootScanner scanning meta region {server:
> 192.168.242.144:60020, regionname: -ROOT-,,0, startKey: <>}
> 2010-06-29 19:16:12,162 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of .META.,,1 is not valid;  serverAddress=
> 192.168.242.144:60020, startCode=1277827873598 unknown.
> 2010-06-29 19:16:12,164 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.rootScanner scan of 1 row(s) of meta region {server:
> 192.168.242.144:60020, regionname: -ROOT-,,0, startKey: <>} complete
> 2010-06-29 19:16:12,303 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Assigning for address: 192.168.242.142:60020, startcode: 1277827873624,
> load: (requests=0, regions=0, usedHeap=30, maxHeap=1996): total nregions to
> assign=1, regions to give other servers than this=0, isMetaAssign=true
> 2010-06-29 19:16:12,303 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Assigning address: 192.168.242.142:60020, startcode: 1277827873624, load:
> (requests=0, regions=0, usedHeap=30, maxHeap=1996) 1 regions
> 2010-06-29 19:16:12,303 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region .META.,,1 to uasstse002.ua.sistyma.com.,60020,1277827873624
> 2010-06-29 19:16:12,340 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scanning meta region {server:
> 192.168.242.144:60020, regionname: .META.,,1, startKey: <>}
> 2010-06-29 19:16:12,342 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 0 row(s) of meta region {server:
> 192.168.242.144:60020, regionname: .META.,,1, startKey: <>} complete
> 2010-06-29 19:16:12,342 INFO org.apache.hadoop.hbase.master.BaseScanner: All
> 1 .META. region(s) scanned
> 2010-06-29 19:16:12,369 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_OPEN: .META.,,1 from
> uasstse002.ua.sistyma.com.,60020,1277827873624;
> 1 of 1
> 2010-06-29 19:16:12,369 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Processing todo: PendingOpenOperation from uasstse002.ua.sistyma.com
> .,60020,1277827873624
> 2010-06-29 19:16:12,369 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: .META.,,1 open on
> 192.168.242.142:60020
> 2010-06-29 19:16:12,370 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row .META.,,1
> in region -ROOT-,,0 with startcode=1277827873624, server=
> 192.168.242.142:60020
> 2010-06-29 19:16:12,370 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: Adding to
> onlineMetaRegions: {server: 192.168.242.142:60020, regionname: .META.,,1,
> startKey: <>}
> 2010-06-29 19:16:12,371 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scanning meta region {server:
> 192.168.242.142:60020, regionname: .META.,,1, startKey: <>}
> 2010-06-29 19:16:12,401 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 0 row(s) of meta region {server:
> 192.168.242.142:60020, regionname: .META.,,1, startKey: <>} complete
> 2010-06-29 19:16:12,401 INFO org.apache.hadoop.hbase.master.BaseScanner: All
> 1 .META. region(s) scanned
>
> and so on.
>
> Same operations for TestTable after it was created:
> 2010-06-29 19:22:12,199 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of .META.,,1 is not valid;  serverAddress=
> 192.168.242.145:60020, startCode=1277827873614 unknown.
> 2010-06-29 19:22:12,200 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.rootScanner scan of 1 row(s) of meta region {server:
> 192.168.242.144:60020, regionname: -ROOT-,,0, startKey: <>} complete
> 2010-06-29 19:22:12,216 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scanning meta region {server:
> 192.168.242.145:60020, regionname: .META.,,1, startKey: <>}
> 2010-06-29 19:22:12,224 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of TestTable,,1277828486199 is not valid;  serverAddress=
> 192.168.242.142:60020, startCode=1277827873624 unknown.
> 2010-06-29 19:22:12,227 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of TestTable,0000149568,1277828511285 is not valid;
> serverAddress=192.168.242.142:60020, startCode=1277827873624 unknown.
> 2010-06-29 19:22:12,230 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> Current assignment of TestTable,0000332096,1277828511285 is not valid;
> serverAddress=192.168.242.142:60020, startCode=1277827873624 unknown.
> 2010-06-29 19:22:12,233 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> TestTable,,1277828486199/1681284195 no longer has references to
> TestTable,,1277828317329
> 2010-06-29 19:22:12,239 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> TestTable,0000149568,1277828486199/1743937205 no longer has references to
> TestTable,,1277828317329
> 2010-06-29 19:22:12,239 INFO org.apache.hadoop.hbase.master.BaseScanner:
> Deleting region TestTable,,1277828317329 (encoded=1736533012) because
> daughter splits no longer hold references
> 2010-06-29 19:22:12,240 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> DELETING region hdfs://
> uasstse002.ua.sistyma.com:8020/hbase/TestTable/1736533012
> 2010-06-29 19:22:12,266 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Assigning for address: 192.168.242.146:60020, startcode: 1277827873580,
> load: (requests=0, regions=2, usedHeap=34, maxHeap=1196): total nregions to
> assign=4, regions to give other servers than this=0, isMetaAssign=true
> 2010-06-29 19:22:12,266 DEBUG org.apache.hadoop.hbase.master.RegionManager:
> Assigning address: 192.168.242.146:60020, startcode: 1277827873580, load:
> (requests=0, regions=2, usedHeap=34, maxHeap=1196) 4 regions
> 2010-06-29 19:22:12,266 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region TestTable,,1277828486199 to uasstse006.ua.sistyma.com
> .,60020,1277827873580
> 2010-06-29 19:22:12,266 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region TestTable,0000149568,1277828511285 to
> uasstse006.ua.sistyma.com.,60020,1277827873580
> 2010-06-29 19:22:12,266 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region TestTable,0000332096,1277828511285 to
> uasstse006.ua.sistyma.com.,60020,1277827873580
> 2010-06-29 19:22:12,266 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region .META.,,1 to uasstse006.ua.sistyma.com.,60020,1277827873580
> 2010-06-29 19:22:12,287 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_OPEN: TestTable,,1277828486199 from
> uasstse006.ua.sistyma.com.,60020,1277827873580; 1 of 3
> 2010-06-29 19:22:12,287 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_PROCESS_OPEN: TestTable,0000332096,1277828511285 from
> uasstse006.ua.sistyma.com.,60020,1277827873580; 2 of 3
> 2010-06-29 19:22:12,287 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_PROCESS_OPEN: .META.,,1 from
> uasstse006.ua.sistyma.com.,60020,1277827873580;
> 3 of 3
> 2010-06-29 19:22:12,287 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Processing todo: PendingOpenOperation from uasstse006.ua.sistyma.com
> .,60020,1277827873580
> 2010-06-29 19:22:12,288 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: numberOfMetaRegions:
> 1, onlineMetaRegions.size(): 1
> 2010-06-29 19:22:12,288 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: Requeuing because not
> all meta regions are online
> 2010-06-29 19:22:12,291 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> TestTable,0000149568,1277828511285/436162763 no longer has references to
> TestTable,0000149568,1277828486199
> 2010-06-29 19:22:12,295 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
> TestTable,0000332096,1277828511285/315170953 no longer has references to
> TestTable,0000149568,1277828486199
> 2010-06-29 19:22:12,295 INFO org.apache.hadoop.hbase.master.BaseScanner:
> Deleting region TestTable,0000149568,1277828486199 (encoded=1743937205)
> because daughter splits no longer hold references
> 2010-06-29 19:22:12,297 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> DELETING region hdfs://
> uasstse002.ua.sistyma.com:8020/hbase/TestTable/1743937205
> 2010-06-29 19:22:12,310 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_OPEN: TestTable,0000149568,1277828511285 from
> uasstse006.ua.sistyma.com.,60020,1277827873580; 1 of 2
> 2010-06-29 19:22:12,310 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_PROCESS_OPEN: .META.,,1 from
> uasstse006.ua.sistyma.com.,60020,1277827873580;
> 2 of 2
> 2010-06-29 19:22:12,310 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Processing todo: PendingOpenOperation from uasstse006.ua.sistyma.com
> .,60020,1277827873580
> 2010-06-29 19:22:12,310 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: numberOfMetaRegions:
> 1, onlineMetaRegions.size(): 1
> 2010-06-29 19:22:12,310 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperation: Requeuing because not
> all meta regions are online
>
>
>
> On Tue, Jun 29, 2010 at 5:57 PM, Stack <sa...@gmail.com> wrote:
>
>> Is this a testing install?  If so remove the hbase dir in hdfs and start
>> over.
>>
>> Else on pe failure what does the master log say?
>>
>> In 0.20.5 we moved so some more messages show at info level which could
>> explain some of the differences you are seeing?
>>
>> Stack
>>
>>
>>
>> On Jun 29, 2010, at 6:21 AM, Stanislaw Kogut <sk...@sistyma.net> wrote:
>>
>> > Hi everyone!
>> >
>> > Has someone noticed same behaviour of hbase-0.20.5 after upgrade from
>> > 0.20.3?
>> >
>> > $hadoop jar hbase/hbase-0.20.5-test.jar sequentialWrite 1
>> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
>> > environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51 GMT
>> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client environment:host.name
>> > =se002.cluster.local
>> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
>> > environment:java.version=1.6.0_20
>> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
>> > environment:java.vendor=Sun Microsystems Inc.
>> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
>> > environment:java.home=/usr/java/jdk1.6.0_20/jre
>> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
>> >
>> environment:java.class.path=/opt/hadoop/common/bin/../conf:/usr/java/latest/lib/tools.jar:/opt/hadoop/common/bin/..:/opt/hadoop/common/bin/../hadoop-0.20.2-core.jar:/opt/hadoop/common/bin/../lib/commons-cli-1.2.jar:/opt/hadoop/common/bin/../lib/commons-codec-1.3.jar:/opt/hadoop/common/bin/../lib/commons-el-1.0.jar:/opt/hadoop/common/bin/../lib/commons-httpclient-3.0.1.jar:/opt/hadoop/common/bin/../lib/commons-logging-1.0.4.jar:/opt/hadoop/common/bin/../lib/commons-logging-api-1.0.4.jar:/opt/hadoop/common/bin/../lib/commons-net-1.4.1.jar:/opt/hadoop/common/bin/../lib/core-3.1.1.jar:/opt/hadoop/common/bin/../lib/hsqldb-1.8.0.10.jar:/opt/hadoop/common/bin/../lib/jasper-compiler-5.5.12.jar:/opt/hadoop/common/bin/../lib/jasper-runtime-5.5.12.jar:/opt/hadoop/common/bin/../lib/jets3t-0.6.1.jar:/opt/hadoop/common/bin/../lib/jetty-6.1.14.jar:/opt/hadoop/common/bin/../lib/jetty-util-6.1.14.jar:/opt/hadoop/common/bin/../lib/junit-3.8.1.jar:/opt/hadoop/common/bin/../lib/kfs-0.2.2.jar:/opt/hadoop/common/bin/../lib/log4j-1.2.15.jar:/opt/hadoop/common/bin/../lib/mockito-all-1.8.0.jar:/opt/hadoop/common/bin/../lib/oro-2.0.8.jar:/opt/hadoop/common/bin/../lib/servlet-api-2.5-6.1.14.jar:/opt/hadoop/common/bin/../lib/slf4j-api-1.4.3.jar:/opt/hadoop/common/bin/../lib/slf4j-log4j12-1.4.3.jar:/opt/hadoop/common/bin/../lib/xmlenc-0.52.jar:/opt/hadoop/common/bin/../lib/jsp-2.1/jsp-2.1.jar:/opt/hadoop/common/bin/../lib/jsp-2.1/jsp-api-2.1.jar:/opt/hadoop/hbase/lib/zookeeper-3.2.2.jar:/opt/hadoop/hbase/conf:/opt/hadoop/hbase/hbase-0.20.5.jar
>> > 10/06/29 16:03:22 INFO hbase.PerformanceEvaluation: Table {NAME =>
>> > 'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE',
>> VERSIONS
>> > => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
>> > BLOCKCACHE => 'true'}]} created
>> > 10/06/29 16:03:22 INFO hbase.PerformanceEvaluation: Start class
>> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at
>> offset
>> > 0 for 1048576 rows
>> > 10/06/29 16:03:37 INFO hbase.PerformanceEvaluation: 0/104857/1048576
>> > 10/06/29 16:03:52 INFO hbase.PerformanceEvaluation: 0/209714/1048576
>> > 10/06/29 16:04:09 INFO hbase.PerformanceEvaluation: 0/314571/1048576
>> > 10/06/29 16:04:27 INFO hbase.PerformanceEvaluation: 0/419428/1048576
>> > 10/06/29 16:06:06 ERROR hbase.PerformanceEvaluation: Failed
>> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> contact
>> > region server Some server, retryOnlyOne=true, index=0, islastrow=false,
>> > tries=9, numtries=10, i=0, listsize=9650, region=TestTable,,1277816601856
>> > for region TestTable,,1277816601856, row '0000511450', but failed after
>> 10
>> > attempts.
>> > Exceptions:
>> >
>> >    at
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1149)
>> >    at
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230)
>> >    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:621)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:637)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:889)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation.runNIsOne(PerformanceEvaluation.java:907)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:939)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation.doCommandLine(PerformanceEvaluation.java:1036)
>> >    at
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:1061)
>> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >    at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >    at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >    at java.lang.reflect.Method.invoke(Method.java:597)
>> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> >
>> > Looks like it happens on region splits.
>> >
>> > Also, some other strange things:
>> > 1. After writing something to TestTable, some regionservers log these:
>> > 2010-06-29 16:05:06,458 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> > .META.,,1
>> >
>> > After that, there comes another .META. region on this server in 'status
>> > detailed' output.
>> > Even more, sometimes it opens not only .META., but also other regions
>> from
>> > data tables, such as these TestTable from performanceEvaluation.
>> >
>> > 2. After disabling and removing table its regions still look like
>> assigned
>> > to regionservers in 'status detailed'.
>> >
>> > 3. After 3-4 tries to write into TestTable from performanceEvaluation
>> there
>> > another strange thing:
>> > 10/06/29 16:01:59 ERROR hbase.PerformanceEvaluation: Failed
>> > org.apache.hadoop.hbase.TableExistsException:
>> > org.apache.hadoop.hbase.TableExistsException: TestTable
>> >
>> > but, table not exists. You cannot disable and drop it, and hbase shells
>> not
>> > lists it in 'list' output. But you also cannot create it, because it is
>> > "exists", and it's regions are assigned to regionservers. Note, nobody
>> drops
>> > this table.
>> >
>> > I spent some time to find out why this all happens, trying to play around
>> > hadoop cluster versions (first it was cloudera, than "vanilla" 0.20.2),
>> but
>> > still having this issue. So, I will hope, if someone help to find cause
>> for
>> > this.
>> >
>> > --
>> > Regards,
>> > Stanislaw Kogut
>>
>
>
>
> --
> Regards,
> Stanislaw Kogut
> Sistyma LLC
>

Re: HBase 0.20.5 issues

Posted by Stanislaw Kogut <sk...@sistyma.net>.

Yes, I doing hadoop fs -rmr /hbase each time.

So, here is some messages from master logs:

2010-06-29 19:15:11,309 INFO org.apache.hadoop.hbase.master.ServerManager: 5
region servers, 0 dead, average load 0.8
2010-06-29 19:15:12,146 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {server:
192.168.242.144:60020, regionname: -ROOT-,,0, startKey: <>}
2010-06-29 19:15:12,151 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of .META.,,1 is not valid;  serverAddress=
192.168.242.146:60020, startCode=1277827873580 unknown.
2010-06-29 19:15:12,152 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {server:
192.168.242.144:60020, regionname: -ROOT-,,0, startKey: <>} complete
2010-06-29 19:15:12,304 DEBUG org.apache.hadoop.hbase.master.RegionManager:
Assigning for address: 192.168.242.144:60020, startcode: 1277827873598,
load: (requests=5, regions=1, usedHeap=33, maxHeap=1996): total nregions to
assign=0, regions to give other servers than this=1, isMetaAssign=true
2010-06-29 19:15:12,304 DEBUG org.apache.hadoop.hbase.master.RegionManager:
Assigning address: 192.168.242.144:60020, startcode: 1277827873598, load:
(requests=5, regions=1, usedHeap=33, maxHeap=1996) 0 regions
2010-06-29 19:15:12,304 INFO org.apache.hadoop.hbase.master.RegionManager:
Assigning region .META.,,1 to uasstse004.ua.sistyma.com.,60020,1277827873598
2010-06-29 19:15:12,329 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_OPEN: .META.,,1 from
uasstse004.ua.sistyma.com.,60020,1277827873598;
1 of 1
2010-06-29 19:15:12,329 DEBUG org.apache.hadoop.hbase.master.HMaster:
Processing todo: PendingOpenOperation from uasstse004.ua.sistyma.com
.,60020,1277827873598
2010-06-29 19:15:12,329 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: .META.,,1 open on
192.168.242.144:60020
2010-06-29 19:15:12,331 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Updated row .META.,,1
in region -ROOT-,,0 with startcode=1277827873598, server=
192.168.242.144:60020
2010-06-29 19:15:12,331 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperation: Adding to
onlineMetaRegions: {server: 192.168.242.144:60020, regionname: .META.,,1,
startKey: <>}
2010-06-29 19:15:12,331 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {server:
192.168.242.144:60020, regionname: .META.,,1, startKey: <>}
2010-06-29 19:15:12,333 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 0 row(s) of meta region {server:
192.168.242.144:60020, regionname: .META.,,1, startKey: <>} complete
2010-06-29 19:15:12,333 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned

then, after some time of inactivity (no regionserver failures, no
stop/starts, nothing)

2010-06-29 19:16:11,317 INFO org.apache.hadoop.hbase.master.ServerManager: 5
region servers, 0 dead, average load 1.0
2010-06-29 19:16:12,153 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {server:
192.168.242.144:60020, regionname: -ROOT-,,0, startKey: <>}
2010-06-29 19:16:12,162 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of .META.,,1 is not valid;  serverAddress=
192.168.242.144:60020, startCode=1277827873598 unknown.
2010-06-29 19:16:12,164 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {server:
192.168.242.144:60020, regionname: -ROOT-,,0, startKey: <>} complete
2010-06-29 19:16:12,303 DEBUG org.apache.hadoop.hbase.master.RegionManager:
Assigning for address: 192.168.242.142:60020, startcode: 1277827873624,
load: (requests=0, regions=0, usedHeap=30, maxHeap=1996): total nregions to
assign=1, regions to give other servers than this=0, isMetaAssign=true
2010-06-29 19:16:12,303 DEBUG org.apache.hadoop.hbase.master.RegionManager:
Assigning address: 192.168.242.142:60020, startcode: 1277827873624, load:
(requests=0, regions=0, usedHeap=30, maxHeap=1996) 1 regions
2010-06-29 19:16:12,303 INFO org.apache.hadoop.hbase.master.RegionManager:
Assigning region .META.,,1 to uasstse002.ua.sistyma.com.,60020,1277827873624
2010-06-29 19:16:12,340 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {server:
192.168.242.144:60020, regionname: .META.,,1, startKey: <>}
2010-06-29 19:16:12,342 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 0 row(s) of meta region {server:
192.168.242.144:60020, regionname: .META.,,1, startKey: <>} complete
2010-06-29 19:16:12,342 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2010-06-29 19:16:12,369 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_OPEN: .META.,,1 from
uasstse002.ua.sistyma.com.,60020,1277827873624;
1 of 1
2010-06-29 19:16:12,369 DEBUG org.apache.hadoop.hbase.master.HMaster:
Processing todo: PendingOpenOperation from uasstse002.ua.sistyma.com
.,60020,1277827873624
2010-06-29 19:16:12,369 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: .META.,,1 open on
192.168.242.142:60020
2010-06-29 19:16:12,370 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Updated row .META.,,1
in region -ROOT-,,0 with startcode=1277827873624, server=
192.168.242.142:60020
2010-06-29 19:16:12,370 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperation: Adding to
onlineMetaRegions: {server: 192.168.242.142:60020, regionname: .META.,,1,
startKey: <>}
2010-06-29 19:16:12,371 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {server:
192.168.242.142:60020, regionname: .META.,,1, startKey: <>}
2010-06-29 19:16:12,401 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 0 row(s) of meta region {server:
192.168.242.142:60020, regionname: .META.,,1, startKey: <>} complete
2010-06-29 19:16:12,401 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned

and so on.

Same operations for TestTable after it was created:
2010-06-29 19:22:12,199 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of .META.,,1 is not valid;  serverAddress=
192.168.242.145:60020, startCode=1277827873614 unknown.
2010-06-29 19:22:12,200 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {server:
192.168.242.144:60020, regionname: -ROOT-,,0, startKey: <>} complete
2010-06-29 19:22:12,216 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {server:
192.168.242.145:60020, regionname: .META.,,1, startKey: <>}
2010-06-29 19:22:12,224 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of TestTable,,1277828486199 is not valid;  serverAddress=
192.168.242.142:60020, startCode=1277827873624 unknown.
2010-06-29 19:22:12,227 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of TestTable,0000149568,1277828511285 is not valid;
serverAddress=192.168.242.142:60020, startCode=1277827873624 unknown.
2010-06-29 19:22:12,230 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of TestTable,0000332096,1277828511285 is not valid;
serverAddress=192.168.242.142:60020, startCode=1277827873624 unknown.
2010-06-29 19:22:12,233 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
TestTable,,1277828486199/1681284195 no longer has references to
TestTable,,1277828317329
2010-06-29 19:22:12,239 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
TestTable,0000149568,1277828486199/1743937205 no longer has references to
TestTable,,1277828317329
2010-06-29 19:22:12,239 INFO org.apache.hadoop.hbase.master.BaseScanner:
Deleting region TestTable,,1277828317329 (encoded=1736533012) because
daughter splits no longer hold references
2010-06-29 19:22:12,240 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
DELETING region hdfs://
uasstse002.ua.sistyma.com:8020/hbase/TestTable/1736533012
2010-06-29 19:22:12,266 DEBUG org.apache.hadoop.hbase.master.RegionManager:
Assigning for address: 192.168.242.146:60020, startcode: 1277827873580,
load: (requests=0, regions=2, usedHeap=34, maxHeap=1196): total nregions to
assign=4, regions to give other servers than this=0, isMetaAssign=true
2010-06-29 19:22:12,266 DEBUG org.apache.hadoop.hbase.master.RegionManager:
Assigning address: 192.168.242.146:60020, startcode: 1277827873580, load:
(requests=0, regions=2, usedHeap=34, maxHeap=1196) 4 regions
2010-06-29 19:22:12,266 INFO org.apache.hadoop.hbase.master.RegionManager:
Assigning region TestTable,,1277828486199 to uasstse006.ua.sistyma.com
.,60020,1277827873580
2010-06-29 19:22:12,266 INFO org.apache.hadoop.hbase.master.RegionManager:
Assigning region TestTable,0000149568,1277828511285 to
uasstse006.ua.sistyma.com.,60020,1277827873580
2010-06-29 19:22:12,266 INFO org.apache.hadoop.hbase.master.RegionManager:
Assigning region TestTable,0000332096,1277828511285 to
uasstse006.ua.sistyma.com.,60020,1277827873580
2010-06-29 19:22:12,266 INFO org.apache.hadoop.hbase.master.RegionManager:
Assigning region .META.,,1 to uasstse006.ua.sistyma.com.,60020,1277827873580
2010-06-29 19:22:12,287 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_OPEN: TestTable,,1277828486199 from
uasstse006.ua.sistyma.com.,60020,1277827873580; 1 of 3
2010-06-29 19:22:12,287 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_PROCESS_OPEN: TestTable,0000332096,1277828511285 from
uasstse006.ua.sistyma.com.,60020,1277827873580; 2 of 3
2010-06-29 19:22:12,287 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_PROCESS_OPEN: .META.,,1 from
uasstse006.ua.sistyma.com.,60020,1277827873580;
3 of 3
2010-06-29 19:22:12,287 DEBUG org.apache.hadoop.hbase.master.HMaster:
Processing todo: PendingOpenOperation from uasstse006.ua.sistyma.com
.,60020,1277827873580
2010-06-29 19:22:12,288 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperation: numberOfMetaRegions:
1, onlineMetaRegions.size(): 1
2010-06-29 19:22:12,288 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperation: Requeuing because not
all meta regions are online
2010-06-29 19:22:12,291 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
TestTable,0000149568,1277828511285/436162763 no longer has references to
TestTable,0000149568,1277828486199
2010-06-29 19:22:12,295 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
TestTable,0000332096,1277828511285/315170953 no longer has references to
TestTable,0000149568,1277828486199
2010-06-29 19:22:12,295 INFO org.apache.hadoop.hbase.master.BaseScanner:
Deleting region TestTable,0000149568,1277828486199 (encoded=1743937205)
because daughter splits no longer hold references
2010-06-29 19:22:12,297 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
DELETING region hdfs://
uasstse002.ua.sistyma.com:8020/hbase/TestTable/1743937205
2010-06-29 19:22:12,310 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_OPEN: TestTable,0000149568,1277828511285 from
uasstse006.ua.sistyma.com.,60020,1277827873580; 1 of 2
2010-06-29 19:22:12,310 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_PROCESS_OPEN: .META.,,1 from
uasstse006.ua.sistyma.com.,60020,1277827873580;
2 of 2
2010-06-29 19:22:12,310 DEBUG org.apache.hadoop.hbase.master.HMaster:
Processing todo: PendingOpenOperation from uasstse006.ua.sistyma.com
.,60020,1277827873580
2010-06-29 19:22:12,310 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperation: numberOfMetaRegions:
1, onlineMetaRegions.size(): 1
2010-06-29 19:22:12,310 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperation: Requeuing because not
all meta regions are online



On Tue, Jun 29, 2010 at 5:57 PM, Stack <sa...@gmail.com> wrote:

> Is this a testing install?  If so remove the hbase dir in hdfs and start
> over.
>
> Else on pe failure what does the master log say?
>
> In 0.20.5 we moved so some more messages show at info level which could
> explain some of the differences you are seeing?
>
> Stack
>
>
>
> On Jun 29, 2010, at 6:21 AM, Stanislaw Kogut <sk...@sistyma.net> wrote:
>
> > Hi everyone!
> >
> > Has someone noticed same behaviour of hbase-0.20.5 after upgrade from
> > 0.20.3?
> >
> > $hadoop jar hbase/hbase-0.20.5-test.jar sequentialWrite 1
> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
> > environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51 GMT
> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client environment:host.name
> > =se002.cluster.local
> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
> > environment:java.version=1.6.0_20
> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
> > environment:java.vendor=Sun Microsystems Inc.
> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
> > environment:java.home=/usr/java/jdk1.6.0_20/jre
> > 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
> >
> environment:java.class.path=/opt/hadoop/common/bin/../conf:/usr/java/latest/lib/tools.jar:/opt/hadoop/common/bin/..:/opt/hadoop/common/bin/../hadoop-0.20.2-core.jar:/opt/hadoop/common/bin/../lib/commons-cli-1.2.jar:/opt/hadoop/common/bin/../lib/commons-codec-1.3.jar:/opt/hadoop/common/bin/../lib/commons-el-1.0.jar:/opt/hadoop/common/bin/../lib/commons-httpclient-3.0.1.jar:/opt/hadoop/common/bin/../lib/commons-logging-1.0.4.jar:/opt/hadoop/common/bin/../lib/commons-logging-api-1.0.4.jar:/opt/hadoop/common/bin/../lib/commons-net-1.4.1.jar:/opt/hadoop/common/bin/../lib/core-3.1.1.jar:/opt/hadoop/common/bin/../lib/hsqldb-1.8.0.10.jar:/opt/hadoop/common/bin/../lib/jasper-compiler-5.5.12.jar:/opt/hadoop/common/bin/../lib/jasper-runtime-5.5.12.jar:/opt/hadoop/common/bin/../lib/jets3t-0.6.1.jar:/opt/hadoop/common/bin/../lib/jetty-6.1.14.jar:/opt/hadoop/common/bin/../lib/jetty-util-6.1.14.jar:/opt/hadoop/common/bin/../lib/junit-3.8.1.jar:/opt/hadoop/common/bin/../lib/kfs-0.2.2.jar:/opt/hadoop/common/bin/../lib/log4j-1.2.15.jar:/opt/hadoop/common/bin/../lib/mockito-all-1.8.0.jar:/opt/hadoop/common/bin/../lib/oro-2.0.8.jar:/opt/hadoop/common/bin/../lib/servlet-api-2.5-6.1.14.jar:/opt/hadoop/common/bin/../lib/slf4j-api-1.4.3.jar:/opt/hadoop/common/bin/../lib/slf4j-log4j12-1.4.3.jar:/opt/hadoop/common/bin/../lib/xmlenc-0.52.jar:/opt/hadoop/common/bin/../lib/jsp-2.1/jsp-2.1.jar:/opt/hadoop/common/bin/../lib/jsp-2.1/jsp-api-2.1.jar:/opt/hadoop/hbase/lib/zookeeper-3.2.2.jar:/opt/hadoop/hbase/conf:/opt/hadoop/hbase/hbase-0.20.5.jar
> > 10/06/29 16:03:22 INFO hbase.PerformanceEvaluation: Table {NAME =>
> > 'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE',
> VERSIONS
> > => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> > BLOCKCACHE => 'true'}]} created
> > 10/06/29 16:03:22 INFO hbase.PerformanceEvaluation: Start class
> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at
> offset
> > 0 for 1048576 rows
> > 10/06/29 16:03:37 INFO hbase.PerformanceEvaluation: 0/104857/1048576
> > 10/06/29 16:03:52 INFO hbase.PerformanceEvaluation: 0/209714/1048576
> > 10/06/29 16:04:09 INFO hbase.PerformanceEvaluation: 0/314571/1048576
> > 10/06/29 16:04:27 INFO hbase.PerformanceEvaluation: 0/419428/1048576
> > 10/06/29 16:06:06 ERROR hbase.PerformanceEvaluation: Failed
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact
> > region server Some server, retryOnlyOne=true, index=0, islastrow=false,
> > tries=9, numtries=10, i=0, listsize=9650, region=TestTable,,1277816601856
> > for region TestTable,,1277816601856, row '0000511450', but failed after
> 10
> > attempts.
> > Exceptions:
> >
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1149)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230)
> >    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:621)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:637)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:889)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.runNIsOne(PerformanceEvaluation.java:907)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:939)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.doCommandLine(PerformanceEvaluation.java:1036)
> >    at
> >
> org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:1061)
> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >    at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >    at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> > Looks like it happens on region splits.
> >
> > Also, some other strange things:
> > 1. After writing something to TestTable, some regionservers log these:
> > 2010-06-29 16:05:06,458 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> > .META.,,1
> >
> > After that, there comes another .META. region on this server in 'status
> > detailed' output.
> > Even more, sometimes it opens not only .META., but also other regions
> from
> > data tables, such as these TestTable from performanceEvaluation.
> >
> > 2. After disabling and removing table its regions still look like
> assigned
> > to regionservers in 'status detailed'.
> >
> > 3. After 3-4 tries to write into TestTable from performanceEvaluation
> there
> > another strange thing:
> > 10/06/29 16:01:59 ERROR hbase.PerformanceEvaluation: Failed
> > org.apache.hadoop.hbase.TableExistsException:
> > org.apache.hadoop.hbase.TableExistsException: TestTable
> >
> > but, table not exists. You cannot disable and drop it, and hbase shells
> not
> > lists it in 'list' output. But you also cannot create it, because it is
> > "exists", and it's regions are assigned to regionservers. Note, nobody
> drops
> > this table.
> >
> > I spent some time to find out why this all happens, trying to play around
> > hadoop cluster versions (first it was cloudera, than "vanilla" 0.20.2),
> but
> > still having this issue. So, I will hope, if someone help to find cause
> for
> > this.
> >
> > --
> > Regards,
> > Stanislaw Kogut
>



-- 
Regards,
Stanislaw Kogut
Sistyma LLC

Re: HBase 0.20.5 issues

Posted by Stack <sa...@gmail.com>.

Is this a testing install?  If so remove the hbase dir in hdfs and start over. 

Else on pe failure what does the master log say?

In 0.20.5 we moved so some more messages show at info level which could explain some of the differences you are seeing?

Stack



On Jun 29, 2010, at 6:21 AM, Stanislaw Kogut <sk...@sistyma.net> wrote:

> Hi everyone!
> 
> Has someone noticed same behaviour of hbase-0.20.5 after upgrade from
> 0.20.3?
> 
> $hadoop jar hbase/hbase-0.20.5-test.jar sequentialWrite 1
> 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51 GMT
> 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client environment:host.name
> =se002.cluster.local
> 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_20
> 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
> environment:java.vendor=Sun Microsystems Inc.
> 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
> environment:java.home=/usr/java/jdk1.6.0_20/jre
> 10/06/29 16:03:21 INFO zookeeper.ZooKeeper: Client
> environment:java.class.path=/opt/hadoop/common/bin/../conf:/usr/java/latest/lib/tools.jar:/opt/hadoop/common/bin/..:/opt/hadoop/common/bin/../hadoop-0.20.2-core.jar:/opt/hadoop/common/bin/../lib/commons-cli-1.2.jar:/opt/hadoop/common/bin/../lib/commons-codec-1.3.jar:/opt/hadoop/common/bin/../lib/commons-el-1.0.jar:/opt/hadoop/common/bin/../lib/commons-httpclient-3.0.1.jar:/opt/hadoop/common/bin/../lib/commons-logging-1.0.4.jar:/opt/hadoop/common/bin/../lib/commons-logging-api-1.0.4.jar:/opt/hadoop/common/bin/../lib/commons-net-1.4.1.jar:/opt/hadoop/common/bin/../lib/core-3.1.1.jar:/opt/hadoop/common/bin/../lib/hsqldb-1.8.0.10.jar:/opt/hadoop/common/bin/../lib/jasper-compiler-5.5.12.jar:/opt/hadoop/common/bin/../lib/jasper-runtime-5.5.12.jar:/opt/hadoop/common/bin/../lib/jets3t-0.6.1.jar:/opt/hadoop/common/bin/../lib/jetty-6.1.14.jar:/opt/hadoop/common/bin/../lib/jetty-util-6.1.14.jar:/opt/hadoop/common/bin/../lib/junit-3.8.1.jar:/opt/hadoop/common/bin/../lib/kfs-0.2.2.jar:/opt/hadoop/common/bin/../lib/log4j-1.2.15.jar:/opt/hadoop/common/bin/../lib/mockito-all-1.8.0.jar:/opt/hadoop/common/bin/../lib/oro-2.0.8.jar:/opt/hadoop/common/bin/../lib/servlet-api-2.5-6.1.14.jar:/opt/hadoop/common/bin/../lib/slf4j-api-1.4.3.jar:/opt/hadoop/common/bin/../lib/slf4j-log4j12-1.4.3.jar:/opt/hadoop/common/bin/../lib/xmlenc-0.52.jar:/opt/hadoop/common/bin/../lib/jsp-2.1/jsp-2.1.jar:/opt/hadoop/common/bin/../lib/jsp-2.1/jsp-api-2.1.jar:/opt/hadoop/hbase/lib/zookeeper-3.2.2.jar:/opt/hadoop/hbase/conf:/opt/hadoop/hbase/hbase-0.20.5.jar
> 10/06/29 16:03:22 INFO hbase.PerformanceEvaluation: Table {NAME =>
> 'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
> => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'}]} created
> 10/06/29 16:03:22 INFO hbase.PerformanceEvaluation: Start class
> org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at offset
> 0 for 1048576 rows
> 10/06/29 16:03:37 INFO hbase.PerformanceEvaluation: 0/104857/1048576
> 10/06/29 16:03:52 INFO hbase.PerformanceEvaluation: 0/209714/1048576
> 10/06/29 16:04:09 INFO hbase.PerformanceEvaluation: 0/314571/1048576
> 10/06/29 16:04:27 INFO hbase.PerformanceEvaluation: 0/419428/1048576
> 10/06/29 16:06:06 ERROR hbase.PerformanceEvaluation: Failed
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server, retryOnlyOne=true, index=0, islastrow=false,
> tries=9, numtries=10, i=0, listsize=9650, region=TestTable,,1277816601856
> for region TestTable,,1277816601856, row '0000511450', but failed after 10
> attempts.
> Exceptions:
> 
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1149)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230)
>    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:621)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:637)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:889)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation.runNIsOne(PerformanceEvaluation.java:907)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:939)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation.doCommandLine(PerformanceEvaluation.java:1036)
>    at
> org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:1061)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> Looks like it happens on region splits.
> 
> Also, some other strange things:
> 1. After writing something to TestTable, some regionservers log these:
> 2010-06-29 16:05:06,458 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> .META.,,1
> 
> After that, there comes another .META. region on this server in 'status
> detailed' output.
> Even more, sometimes it opens not only .META., but also other regions from
> data tables, such as these TestTable from performanceEvaluation.
> 
> 2. After disabling and removing table its regions still look like assigned
> to regionservers in 'status detailed'.
> 
> 3. After 3-4 tries to write into TestTable from performanceEvaluation there
> another strange thing:
> 10/06/29 16:01:59 ERROR hbase.PerformanceEvaluation: Failed
> org.apache.hadoop.hbase.TableExistsException:
> org.apache.hadoop.hbase.TableExistsException: TestTable
> 
> but, table not exists. You cannot disable and drop it, and hbase shells not
> lists it in 'list' output. But you also cannot create it, because it is
> "exists", and it's regions are assigned to regionservers. Note, nobody drops
> this table.
> 
> I spent some time to find out why this all happens, trying to play around
> hadoop cluster versions (first it was cloudera, than "vanilla" 0.20.2), but
> still having this issue. So, I will hope, if someone help to find cause for
> this.
> 
> -- 
> Regards,
> Stanislaw Kogut