You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Marchwiak, Patrick D." <ma...@llnl.gov> on 2010/08/14 01:50:22 UTC

Unable to perform list/create after startup

I am having issues performing any operations (list/create/put) on my hbase
instance once it starts up.

The environment:
Red Hat 5.5
Hadoop 0.20.2
HBase 0.20.4
java 1.6.0_20
1 running master
23 running regionserver + 3 also running zookeeper

When attemting to do a list from the hbase shell it returns this error:
NativeException: org.apache.hadoop.hbase.MasterNotRunningException: null

When attempting to perform inserts from a hadoop job I see the following
error in my application:

2010-08-13 14:03:22.207 INFO  [main] JobClient:1317 Task Id :
attempt_201006091333_0031_m_000000_0, Status : FAILED
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
to locate root region
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg
ion(HConnectionManager.java:930)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:581)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
n(HConnectionManager.java:563)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
nMeta(HConnectionManager.java:694)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:590)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
n(HConnectionManager.java:563)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
nMeta(HConnectionManager.java:694)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:594)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
HConnectionManager.java:557)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127)
...

Now contrary to what the shell is reporting, the HMaster process is
definitely running (along with HRegionServer and HQuorumPeer on the
appropriate other nodes in the cluster). I do not see any errors in the
master log, though interestingly I noticed a log message mentioning only 7
region servers - in fact there are more than twice that many in the cluster.

2010-08-13 14:04:32,018 INFO org.apache.hadoop.hbase.master.ServerManager: 7
region servers, 0 dead, average load 3.142857142857143

The last clue I have is some exceptions in the zookeeper logs:

2010-08-13 13:34:16,041 WARN
org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
processing sessionid:0x12a6d2847e40000 type:create cxid:0x28
zxid:0xfffffffffffffffe txntype:unknown n/a
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists
        at 
org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
or.java:245)
        at 
org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
va:114)
2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
Connected to /128.115.210.161:35883 lastZxid 0
2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
Creating new session 0x12a6d2847e40001
2010-08-13 14:05:08,800 INFO org.apache.zookeeper.server.NIOServerCnxn:
Finished init of 0x12a6d2847e40001 valid:true
2010-08-13 14:05:08,802 WARN
org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
processing sessionid:0x12a6d2847e40001 type:create cxid:0x1
zxid:0xfffffffffffffffe txntype:unknown n/a
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists
        at 
org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
or.java:245)
        at 
org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
va:114)
2010-08-13 14:05:09,762 WARN org.apache.zookeeper.server.NIOServerCnxn:
Exception causing close of session 0x12a6d2847e40001 due to
java.io.IOException: Read error
2010-08-13 14:05:09,763 INFO org.apache.zookeeper.server.NIOServerCnxn:
closing session:0x12a6d2847e40001 NIOServerCnxn:
java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181
remote=/128.115.210.161:35883]

HBase was running on this cluster a few months ago so I doubt it is a
blatant misconfiguration at fault. I've tried restarting everything hbase or
hadoop related as well as wiping out the hbase data directory on hdfs to
start fresh with no result. Any hints or suggestions as to what the problem
might be are greatly appreciated. Thanks!




   


Re: Unable to perform list/create after startup

Posted by Ted Yu <yu...@gmail.com>.
My question was answered by J-D in another thread.

Regards

On Wed, Aug 18, 2010 at 3:12 PM, Ted Yu <yu...@gmail.com> wrote:

> We use HBASE 0.20.6 with HBASE-2473
> I think we may have hit HBASE-2599
>
> I am looking at 2599-0.20.txt<https://issues.apache.org/jira/secure/attachment/12445536/2599-0.20.txt>which you attached to the JIRA.
>
> I cannot find how to apply this change for HRegionServer.java:
>
> -                    serverInfo.setStartCode(System.currentTimeMillis());
> +                    this.serverInfo =
> +                      createServerInfoWithNewStartCode(this.serverInfo);
>
> I only found one call of the following form at line 776 in protected void
> init(final MapWritable c):
> this.hlogFlusher.setHLog(hlog)
> ;
>
> If someone can help me apply the patch, that would be great.
>
>
> On Fri, Aug 13, 2010 at 5:36 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Ah very helpful, see how .META. is getting reassigned even if it has a
>> valid assignment? Some environments get this for some reason, and this
>> is fixed by https://issues.apache.org/jira/browse/HBASE-2599 which you
>> will need to apply on your hbase.
>>
>> J-D
>>
>> On Fri, Aug 13, 2010 at 5:22 PM, Marchwiak, Patrick D.
>> <ma...@llnl.gov> wrote:
>> > I've attached the log.
>> >
>> > One more thing I'll add is that the the stop-hbase.sh script hangs hangs
>> on
>> > the "stopping master..." line so I had to manually kill the Hmaster
>> process
>> > before doing a restart.
>> >
>> > On 8/13/10 5:00 PM, "Jean-Daniel Cryans" <jd...@apache.org> wrote:
>> >
>> >> A clean log of a full master startup would be really useful, can't
>> >> tell much more by the current info you provided.
>> >>
>> >> J-D
>> >>
>> >> On Fri, Aug 13, 2010 at 4:50 PM, Marchwiak, Patrick D.
>> >> <ma...@llnl.gov> wrote:
>> >>> I am having issues performing any operations (list/create/put) on my
>> hbase
>> >>> instance once it starts up.
>> >>>
>> >>> The environment:
>> >>> Red Hat 5.5
>> >>> Hadoop 0.20.2
>> >>> HBase 0.20.4
>> >>> java 1.6.0_20
>> >>> 1 running master
>> >>> 23 running regionserver + 3 also running zookeeper
>> >>>
>> >>> When attemting to do a list from the hbase shell it returns this
>> error:
>> >>> NativeException: org.apache.hadoop.hbase.MasterNotRunningException:
>> null
>> >>>
>> >>> When attempting to perform inserts from a hadoop job I see the
>> following
>> >>> error in my application:
>> >>>
>> >>> 2010-08-13 14:03:22.207 INFO  [main] JobClient:1317 Task Id :
>> >>> attempt_201006091333_0031_m_000000_0, Status : FAILED
>> >>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> trying
>> >>> to locate root region
>> >>>        at
>> >>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg
>> >>> ion(HConnectionManager.java:930)
>> >>>        at
>> >>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>> >>> HConnectionManager.java:581)
>> >>>        at
>> >>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
>> >>> n(HConnectionManager.java:563)
>> >>>        at
>> >>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
>> >>> nMeta(HConnectionManager.java:694)
>> >>>        at
>> >>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>> >>> HConnectionManager.java:590)
>> >>>        at
>> >>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
>> >>> n(HConnectionManager.java:563)
>> >>>        at
>> >>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
>> >>> nMeta(HConnectionManager.java:694)
>> >>>        at
>> >>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>> >>> HConnectionManager.java:594)
>> >>>        at
>> >>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>> >>> HConnectionManager.java:557)
>> >>>        at
>> org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127)
>> >>> ...
>> >>>
>> >>> Now contrary to what the shell is reporting, the HMaster process is
>> >>> definitely running (along with HRegionServer and HQuorumPeer on the
>> >>> appropriate other nodes in the cluster). I do not see any errors in
>> the
>> >>> master log, though interestingly I noticed a log message mentioning
>> only 7
>> >>> region servers - in fact there are more than twice that many in the
>> cluster.
>> >>>
>> >>> 2010-08-13 14:04:32,018 INFO
>> org.apache.hadoop.hbase.master.ServerManager: 7
>> >>> region servers, 0 dead, average load 3.142857142857143
>> >>>
>> >>> The last clue I have is some exceptions in the zookeeper logs:
>> >>>
>> >>> 2010-08-13 13:34:16,041 WARN
>> >>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
>> >>> processing sessionid:0x12a6d2847e40000 type:create cxid:0x28
>> >>> zxid:0xfffffffffffffffe txntype:unknown n/a
>> >>> org.apache.zookeeper.KeeperException$NodeExistsException:
>> KeeperErrorCode =
>> >>> NodeExists
>> >>>        at
>> >>>
>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
>> >>> or.java:245)
>> >>>        at
>> >>>
>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
>> >>> va:114)
>> >>> 2010-08-13 14:05:08,782 INFO
>> org.apache.zookeeper.server.NIOServerCnxn:
>> >>> Connected to /128.115.210.161:35883 lastZxid 0
>> >>> 2010-08-13 14:05:08,782 INFO
>> org.apache.zookeeper.server.NIOServerCnxn:
>> >>> Creating new session 0x12a6d2847e40001
>> >>> 2010-08-13 14:05:08,800 INFO
>> org.apache.zookeeper.server.NIOServerCnxn:
>> >>> Finished init of 0x12a6d2847e40001 valid:true
>> >>> 2010-08-13 14:05:08,802 WARN
>> >>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
>> >>> processing sessionid:0x12a6d2847e40001 type:create cxid:0x1
>> >>> zxid:0xfffffffffffffffe txntype:unknown n/a
>> >>> org.apache.zookeeper.KeeperException$NodeExistsException:
>> KeeperErrorCode =
>> >>> NodeExists
>> >>>        at
>> >>>
>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
>> >>> or.java:245)
>> >>>        at
>> >>>
>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
>> >>> va:114)
>> >>> 2010-08-13 14:05:09,762 WARN
>> org.apache.zookeeper.server.NIOServerCnxn:
>> >>> Exception causing close of session 0x12a6d2847e40001 due to
>> >>> java.io.IOException: Read error
>> >>> 2010-08-13 14:05:09,763 INFO
>> org.apache.zookeeper.server.NIOServerCnxn:
>> >>> closing session:0x12a6d2847e40001 NIOServerCnxn:
>> >>> java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181
>> >>> remote=/128.115.210.161:35883]
>> >>>
>> >>> HBase was running on this cluster a few months ago so I doubt it is a
>> >>> blatant misconfiguration at fault. I've tried restarting everything
>> hbase or
>> >>> hadoop related as well as wiping out the hbase data directory on hdfs
>> to
>> >>> start fresh with no result. Any hints or suggestions as to what the
>> problem
>> >>> might be are greatly appreciated. Thanks!
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >
>> >
>>
>
>

Re: Unable to perform list/create after startup

Posted by Ted Yu <yu...@gmail.com>.
We use HBASE 0.20.6 with HBASE-2473
I think we may have hit HBASE-2599

I am looking at
2599-0.20.txt<https://issues.apache.org/jira/secure/attachment/12445536/2599-0.20.txt>which
you attached to the JIRA.

I cannot find how to apply this change for HRegionServer.java:

-                    serverInfo.setStartCode(System.currentTimeMillis());
+                    this.serverInfo =
+                      createServerInfoWithNewStartCode(this.serverInfo);

I only found one call of the following form at line 776 in protected void
init(final MapWritable c):
this.hlogFlusher.setHLog(hlog)
;

If someone can help me apply the patch, that would be great.


On Fri, Aug 13, 2010 at 5:36 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Ah very helpful, see how .META. is getting reassigned even if it has a
> valid assignment? Some environments get this for some reason, and this
> is fixed by https://issues.apache.org/jira/browse/HBASE-2599 which you
> will need to apply on your hbase.
>
> J-D
>
> On Fri, Aug 13, 2010 at 5:22 PM, Marchwiak, Patrick D.
> <ma...@llnl.gov> wrote:
> > I've attached the log.
> >
> > One more thing I'll add is that the the stop-hbase.sh script hangs hangs
> on
> > the "stopping master..." line so I had to manually kill the Hmaster
> process
> > before doing a restart.
> >
> > On 8/13/10 5:00 PM, "Jean-Daniel Cryans" <jd...@apache.org> wrote:
> >
> >> A clean log of a full master startup would be really useful, can't
> >> tell much more by the current info you provided.
> >>
> >> J-D
> >>
> >> On Fri, Aug 13, 2010 at 4:50 PM, Marchwiak, Patrick D.
> >> <ma...@llnl.gov> wrote:
> >>> I am having issues performing any operations (list/create/put) on my
> hbase
> >>> instance once it starts up.
> >>>
> >>> The environment:
> >>> Red Hat 5.5
> >>> Hadoop 0.20.2
> >>> HBase 0.20.4
> >>> java 1.6.0_20
> >>> 1 running master
> >>> 23 running regionserver + 3 also running zookeeper
> >>>
> >>> When attemting to do a list from the hbase shell it returns this error:
> >>> NativeException: org.apache.hadoop.hbase.MasterNotRunningException:
> null
> >>>
> >>> When attempting to perform inserts from a hadoop job I see the
> following
> >>> error in my application:
> >>>
> >>> 2010-08-13 14:03:22.207 INFO  [main] JobClient:1317 Task Id :
> >>> attempt_201006091333_0031_m_000000_0, Status : FAILED
> >>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying
> >>> to locate root region
> >>>        at
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg
> >>> ion(HConnectionManager.java:930)
> >>>        at
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
> >>> HConnectionManager.java:581)
> >>>        at
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
> >>> n(HConnectionManager.java:563)
> >>>        at
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
> >>> nMeta(HConnectionManager.java:694)
> >>>        at
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
> >>> HConnectionManager.java:590)
> >>>        at
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
> >>> n(HConnectionManager.java:563)
> >>>        at
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
> >>> nMeta(HConnectionManager.java:694)
> >>>        at
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
> >>> HConnectionManager.java:594)
> >>>        at
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
> >>> HConnectionManager.java:557)
> >>>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127)
> >>> ...
> >>>
> >>> Now contrary to what the shell is reporting, the HMaster process is
> >>> definitely running (along with HRegionServer and HQuorumPeer on the
> >>> appropriate other nodes in the cluster). I do not see any errors in the
> >>> master log, though interestingly I noticed a log message mentioning
> only 7
> >>> region servers - in fact there are more than twice that many in the
> cluster.
> >>>
> >>> 2010-08-13 14:04:32,018 INFO
> org.apache.hadoop.hbase.master.ServerManager: 7
> >>> region servers, 0 dead, average load 3.142857142857143
> >>>
> >>> The last clue I have is some exceptions in the zookeeper logs:
> >>>
> >>> 2010-08-13 13:34:16,041 WARN
> >>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
> >>> processing sessionid:0x12a6d2847e40000 type:create cxid:0x28
> >>> zxid:0xfffffffffffffffe txntype:unknown n/a
> >>> org.apache.zookeeper.KeeperException$NodeExistsException:
> KeeperErrorCode =
> >>> NodeExists
> >>>        at
> >>>
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
> >>> or.java:245)
> >>>        at
> >>>
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
> >>> va:114)
> >>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
> >>> Connected to /128.115.210.161:35883 lastZxid 0
> >>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
> >>> Creating new session 0x12a6d2847e40001
> >>> 2010-08-13 14:05:08,800 INFO org.apache.zookeeper.server.NIOServerCnxn:
> >>> Finished init of 0x12a6d2847e40001 valid:true
> >>> 2010-08-13 14:05:08,802 WARN
> >>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
> >>> processing sessionid:0x12a6d2847e40001 type:create cxid:0x1
> >>> zxid:0xfffffffffffffffe txntype:unknown n/a
> >>> org.apache.zookeeper.KeeperException$NodeExistsException:
> KeeperErrorCode =
> >>> NodeExists
> >>>        at
> >>>
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
> >>> or.java:245)
> >>>        at
> >>>
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
> >>> va:114)
> >>> 2010-08-13 14:05:09,762 WARN org.apache.zookeeper.server.NIOServerCnxn:
> >>> Exception causing close of session 0x12a6d2847e40001 due to
> >>> java.io.IOException: Read error
> >>> 2010-08-13 14:05:09,763 INFO org.apache.zookeeper.server.NIOServerCnxn:
> >>> closing session:0x12a6d2847e40001 NIOServerCnxn:
> >>> java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181
> >>> remote=/128.115.210.161:35883]
> >>>
> >>> HBase was running on this cluster a few months ago so I doubt it is a
> >>> blatant misconfiguration at fault. I've tried restarting everything
> hbase or
> >>> hadoop related as well as wiping out the hbase data directory on hdfs
> to
> >>> start fresh with no result. Any hints or suggestions as to what the
> problem
> >>> might be are greatly appreciated. Thanks!
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> >
>

Re: Unable to perform list/create after startup

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Ah very helpful, see how .META. is getting reassigned even if it has a
valid assignment? Some environments get this for some reason, and this
is fixed by https://issues.apache.org/jira/browse/HBASE-2599 which you
will need to apply on your hbase.

J-D

On Fri, Aug 13, 2010 at 5:22 PM, Marchwiak, Patrick D.
<ma...@llnl.gov> wrote:
> I've attached the log.
>
> One more thing I'll add is that the the stop-hbase.sh script hangs hangs on
> the "stopping master..." line so I had to manually kill the Hmaster process
> before doing a restart.
>
> On 8/13/10 5:00 PM, "Jean-Daniel Cryans" <jd...@apache.org> wrote:
>
>> A clean log of a full master startup would be really useful, can't
>> tell much more by the current info you provided.
>>
>> J-D
>>
>> On Fri, Aug 13, 2010 at 4:50 PM, Marchwiak, Patrick D.
>> <ma...@llnl.gov> wrote:
>>> I am having issues performing any operations (list/create/put) on my hbase
>>> instance once it starts up.
>>>
>>> The environment:
>>> Red Hat 5.5
>>> Hadoop 0.20.2
>>> HBase 0.20.4
>>> java 1.6.0_20
>>> 1 running master
>>> 23 running regionserver + 3 also running zookeeper
>>>
>>> When attemting to do a list from the hbase shell it returns this error:
>>> NativeException: org.apache.hadoop.hbase.MasterNotRunningException: null
>>>
>>> When attempting to perform inserts from a hadoop job I see the following
>>> error in my application:
>>>
>>> 2010-08-13 14:03:22.207 INFO  [main] JobClient:1317 Task Id :
>>> attempt_201006091333_0031_m_000000_0, Status : FAILED
>>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
>>> to locate root region
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg
>>> ion(HConnectionManager.java:930)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>>> HConnectionManager.java:581)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
>>> n(HConnectionManager.java:563)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
>>> nMeta(HConnectionManager.java:694)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>>> HConnectionManager.java:590)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
>>> n(HConnectionManager.java:563)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
>>> nMeta(HConnectionManager.java:694)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>>> HConnectionManager.java:594)
>>>        at
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>>> HConnectionManager.java:557)
>>>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127)
>>> ...
>>>
>>> Now contrary to what the shell is reporting, the HMaster process is
>>> definitely running (along with HRegionServer and HQuorumPeer on the
>>> appropriate other nodes in the cluster). I do not see any errors in the
>>> master log, though interestingly I noticed a log message mentioning only 7
>>> region servers - in fact there are more than twice that many in the cluster.
>>>
>>> 2010-08-13 14:04:32,018 INFO org.apache.hadoop.hbase.master.ServerManager: 7
>>> region servers, 0 dead, average load 3.142857142857143
>>>
>>> The last clue I have is some exceptions in the zookeeper logs:
>>>
>>> 2010-08-13 13:34:16,041 WARN
>>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
>>> processing sessionid:0x12a6d2847e40000 type:create cxid:0x28
>>> zxid:0xfffffffffffffffe txntype:unknown n/a
>>> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
>>> NodeExists
>>>        at
>>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
>>> or.java:245)
>>>        at
>>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
>>> va:114)
>>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> Connected to /128.115.210.161:35883 lastZxid 0
>>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> Creating new session 0x12a6d2847e40001
>>> 2010-08-13 14:05:08,800 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> Finished init of 0x12a6d2847e40001 valid:true
>>> 2010-08-13 14:05:08,802 WARN
>>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
>>> processing sessionid:0x12a6d2847e40001 type:create cxid:0x1
>>> zxid:0xfffffffffffffffe txntype:unknown n/a
>>> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
>>> NodeExists
>>>        at
>>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
>>> or.java:245)
>>>        at
>>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
>>> va:114)
>>> 2010-08-13 14:05:09,762 WARN org.apache.zookeeper.server.NIOServerCnxn:
>>> Exception causing close of session 0x12a6d2847e40001 due to
>>> java.io.IOException: Read error
>>> 2010-08-13 14:05:09,763 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> closing session:0x12a6d2847e40001 NIOServerCnxn:
>>> java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181
>>> remote=/128.115.210.161:35883]
>>>
>>> HBase was running on this cluster a few months ago so I doubt it is a
>>> blatant misconfiguration at fault. I've tried restarting everything hbase or
>>> hadoop related as well as wiping out the hbase data directory on hdfs to
>>> start fresh with no result. Any hints or suggestions as to what the problem
>>> might be are greatly appreciated. Thanks!
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>

Re: Unable to perform list/create after startup

Posted by "Marchwiak, Patrick D." <ma...@llnl.gov>.
I've attached the log.

One more thing I'll add is that the the stop-hbase.sh script hangs hangs on
the "stopping master..." line so I had to manually kill the Hmaster process
before doing a restart.

On 8/13/10 5:00 PM, "Jean-Daniel Cryans" <jd...@apache.org> wrote:

> A clean log of a full master startup would be really useful, can't
> tell much more by the current info you provided.
> 
> J-D
> 
> On Fri, Aug 13, 2010 at 4:50 PM, Marchwiak, Patrick D.
> <ma...@llnl.gov> wrote:
>> I am having issues performing any operations (list/create/put) on my hbase
>> instance once it starts up.
>> 
>> The environment:
>> Red Hat 5.5
>> Hadoop 0.20.2
>> HBase 0.20.4
>> java 1.6.0_20
>> 1 running master
>> 23 running regionserver + 3 also running zookeeper
>> 
>> When attemting to do a list from the hbase shell it returns this error:
>> NativeException: org.apache.hadoop.hbase.MasterNotRunningException: null
>> 
>> When attempting to perform inserts from a hadoop job I see the following
>> error in my application:
>> 
>> 2010-08-13 14:03:22.207 INFO  [main] JobClient:1317 Task Id :
>> attempt_201006091333_0031_m_000000_0, Status : FAILED
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
>> to locate root region
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg
>> ion(HConnectionManager.java:930)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>> HConnectionManager.java:581)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
>> n(HConnectionManager.java:563)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
>> nMeta(HConnectionManager.java:694)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>> HConnectionManager.java:590)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
>> n(HConnectionManager.java:563)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
>> nMeta(HConnectionManager.java:694)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>> HConnectionManager.java:594)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
>> HConnectionManager.java:557)
>>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127)
>> ...
>> 
>> Now contrary to what the shell is reporting, the HMaster process is
>> definitely running (along with HRegionServer and HQuorumPeer on the
>> appropriate other nodes in the cluster). I do not see any errors in the
>> master log, though interestingly I noticed a log message mentioning only 7
>> region servers - in fact there are more than twice that many in the cluster.
>> 
>> 2010-08-13 14:04:32,018 INFO org.apache.hadoop.hbase.master.ServerManager: 7
>> region servers, 0 dead, average load 3.142857142857143
>> 
>> The last clue I have is some exceptions in the zookeeper logs:
>> 
>> 2010-08-13 13:34:16,041 WARN
>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
>> processing sessionid:0x12a6d2847e40000 type:create cxid:0x28
>> zxid:0xfffffffffffffffe txntype:unknown n/a
>> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
>> NodeExists
>>        at
>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
>> or.java:245)
>>        at
>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
>> va:114)
>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
>> Connected to /128.115.210.161:35883 lastZxid 0
>> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
>> Creating new session 0x12a6d2847e40001
>> 2010-08-13 14:05:08,800 INFO org.apache.zookeeper.server.NIOServerCnxn:
>> Finished init of 0x12a6d2847e40001 valid:true
>> 2010-08-13 14:05:08,802 WARN
>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
>> processing sessionid:0x12a6d2847e40001 type:create cxid:0x1
>> zxid:0xfffffffffffffffe txntype:unknown n/a
>> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
>> NodeExists
>>        at
>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
>> or.java:245)
>>        at
>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
>> va:114)
>> 2010-08-13 14:05:09,762 WARN org.apache.zookeeper.server.NIOServerCnxn:
>> Exception causing close of session 0x12a6d2847e40001 due to
>> java.io.IOException: Read error
>> 2010-08-13 14:05:09,763 INFO org.apache.zookeeper.server.NIOServerCnxn:
>> closing session:0x12a6d2847e40001 NIOServerCnxn:
>> java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181
>> remote=/128.115.210.161:35883]
>> 
>> HBase was running on this cluster a few months ago so I doubt it is a
>> blatant misconfiguration at fault. I've tried restarting everything hbase or
>> hadoop related as well as wiping out the hbase data directory on hdfs to
>> start fresh with no result. Any hints or suggestions as to what the problem
>> might be are greatly appreciated. Thanks!
>> 
>> 
>> 
>> 
>> 
>> 
>> 


Re: Unable to perform list/create after startup

Posted by Jean-Daniel Cryans <jd...@apache.org>.
A clean log of a full master startup would be really useful, can't
tell much more by the current info you provided.

J-D

On Fri, Aug 13, 2010 at 4:50 PM, Marchwiak, Patrick D.
<ma...@llnl.gov> wrote:
> I am having issues performing any operations (list/create/put) on my hbase
> instance once it starts up.
>
> The environment:
> Red Hat 5.5
> Hadoop 0.20.2
> HBase 0.20.4
> java 1.6.0_20
> 1 running master
> 23 running regionserver + 3 also running zookeeper
>
> When attemting to do a list from the hbase shell it returns this error:
> NativeException: org.apache.hadoop.hbase.MasterNotRunningException: null
>
> When attempting to perform inserts from a hadoop job I see the following
> error in my application:
>
> 2010-08-13 14:03:22.207 INFO  [main] JobClient:1317 Task Id :
> attempt_201006091333_0031_m_000000_0, Status : FAILED
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
> to locate root region
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootReg
> ion(HConnectionManager.java:930)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
> HConnectionManager.java:581)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
> n(HConnectionManager.java:563)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
> nMeta(HConnectionManager.java:694)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
> HConnectionManager.java:590)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegio
> n(HConnectionManager.java:563)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionI
> nMeta(HConnectionManager.java:694)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
> HConnectionManager.java:594)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(
> HConnectionManager.java:557)
>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:127)
> ...
>
> Now contrary to what the shell is reporting, the HMaster process is
> definitely running (along with HRegionServer and HQuorumPeer on the
> appropriate other nodes in the cluster). I do not see any errors in the
> master log, though interestingly I noticed a log message mentioning only 7
> region servers - in fact there are more than twice that many in the cluster.
>
> 2010-08-13 14:04:32,018 INFO org.apache.hadoop.hbase.master.ServerManager: 7
> region servers, 0 dead, average load 3.142857142857143
>
> The last clue I have is some exceptions in the zookeeper logs:
>
> 2010-08-13 13:34:16,041 WARN
> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
> processing sessionid:0x12a6d2847e40000 type:create cxid:0x28
> zxid:0xfffffffffffffffe txntype:unknown n/a
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
> NodeExists
>        at
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
> or.java:245)
>        at
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
> va:114)
> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Connected to /128.115.210.161:35883 lastZxid 0
> 2010-08-13 14:05:08,782 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Creating new session 0x12a6d2847e40001
> 2010-08-13 14:05:08,800 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Finished init of 0x12a6d2847e40001 valid:true
> 2010-08-13 14:05:08,802 WARN
> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
> processing sessionid:0x12a6d2847e40001 type:create cxid:0x1
> zxid:0xfffffffffffffffe txntype:unknown n/a
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
> NodeExists
>        at
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcess
> or.java:245)
>        at
> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.ja
> va:114)
> 2010-08-13 14:05:09,762 WARN org.apache.zookeeper.server.NIOServerCnxn:
> Exception causing close of session 0x12a6d2847e40001 due to
> java.io.IOException: Read error
> 2010-08-13 14:05:09,763 INFO org.apache.zookeeper.server.NIOServerCnxn:
> closing session:0x12a6d2847e40001 NIOServerCnxn:
> java.nio.channels.SocketChannel[connected local=/128.115.210.149:2181
> remote=/128.115.210.161:35883]
>
> HBase was running on this cluster a few months ago so I doubt it is a
> blatant misconfiguration at fault. I've tried restarting everything hbase or
> hadoop related as well as wiping out the hbase data directory on hdfs to
> start fresh with no result. Any hints or suggestions as to what the problem
> might be are greatly appreciated. Thanks!
>
>
>
>
>
>
>