You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "yoav.morag" <yo...@corrigon.com> on 2008/09/25 17:09:50 UTC

problem restarting 0.18

I am experiencing problems when restarting a cluster with hadoop/hbase
0.18.0. hadoop restarts OK, however hbase regionservers all exit with the
message :
Exception in thread "regionserver/0:0:0:0:0:0:0:0:60020"
java.lang.NullPointerException
	at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:448)
	at java.lang.Thread.run(Thread.java:619)
strange enough, the said line appears to indicate log is null, however a log
is created and messages are written into it...
the restart scenario is very simple, and it happens even with a clean
database , on a newly formatted FS. I have also checked no ghost processes
exist before start. 

 "$INSTALLDIR/$HBASE/bin/stop-hbase.sh;$INSTALLDIR/$HADOOP/bin/stop-dfs.sh;"

"$INSTALLDIR/$HADOOP/bin/start-dfs.sh;$INSTALLDIR/$HBASE/bin/start-hbase.sh;"

any ideas ? 
-- 
View this message in context: http://www.nabble.com/problem-restarting-0.18-tp19671584p19671584.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: problem restarting 0.18

Posted by "yoav.morag" <yo...@corrigon.com>.
1. the test sequence is : start hadoop,start hbase,stop hbase,stop hadoop
2. the full context is in the files attached previously on this thread, is
there anything else I am missing ? 
3. fsck indeed fails (below). however there are no errors in the logs ! 
Exception in thread "main" java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
	at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
	at java.net.Socket.connect(Socket.java:519)
	at java.net.Socket.connect(Socket.java:469)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
	at sun.net.www.http.HttpClient.New(HttpClient.java:306)
	at sun.net.www.http.HttpClient.New(HttpClient.java:323)
	at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
	at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729)
	at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
	at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
	at org.apache.hadoop.dfs.DFSck.run(DFSck.java:116)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at org.apache.hadoop.dfs.DFSck.main(DFSck.java:137)




stack-3 wrote:
> 
> Your hdfs looks ill.  Its complaining a data file in -ROOT- catalog 
> table is 'missing'.  What happens if you run '$HADOOP_HOME/bin/hadoop 
> fsck HBASE_HOMDIR'?  More context around the errors would help with 
> analysis.   You've tried restarting your HDFS?
> 
> Thanks,
> St.Ack
> 
> 
> yoav.morag wrote:
>>  unfortunately, this is not the case :-( . I have installed NTP on the
>> cluster , but the problem remains in exactly the same way. it is now
>> clear
>> from the logs, however, that the problem occurs in the master first : 
>>
>> 2008-09-28 11:04:48,412 ERROR org.apache.hadoop.dfs.LeaseManager:
>> /hbase/-ROOT-/70236052/info/mapfiles/2686380382008424762/data not found
>> in
>> lease.paths (=[/hbase/-ROOT-/70236052/info/mapfiles
>>
>> and only then on the regionservers : 
>>
>> 2008-09-28 11:05:02,848 FATAL
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception.
>> Aborting...
>>
>> any more ideas will be greatly appreciated ... 
>>
>>
>>
>> Jean-Daniel Cryans-2 wrote:
>>   
>>> You maybe just found your problem, the clocks are not synchronized. It
>>> is
>>> a
>>> requirement when using HBase to have synchronized clocks, see
>>> http://hadoop.apache.org/hbase/docs/r0.18.0/api/index.html
>>>
>>> Thx for looking at it,
>>>
>>> J-D
>>>
>>> On Sun, Sep 28, 2008 at 3:47 AM, yoav.morag <yo...@corrigon.com> wrote:
>>>
>>>     
>>>> debug  didn't seem to give much, as far as I could tell . i did however
>>>> notice the following errors on hadoop log on the name node :
>>>> I am attaching (
>>>>
>>>> http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log
>>>> hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.loghadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log>
>>>>
>>>> http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log
>>>> hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.loghbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log>
>>>> ) the full logs
>>>> from the name node and one region servers (there are 4 , all with
>>>> identical
>>>> errors). note the clocks are not synchronized across the cluster, so
>>>> the
>>>> times in the logs can not be used to compare order between machines.
>>>>
>>>> suspicous errors :
>>>> 2008-09-28 03:15:47,316 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/data not found
>>>> in
>>>> lease.paths
>>>> (=[/hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index,
>>>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>>> 2008-09-28 03:15:47,317 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index not
>>>> found
>>>> in
>>>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/info/7031159331294621371 not found in
>>>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186 not found in
>>>> lease.paths
>>>> (=[/hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>>> 2008-09-28 03:15:47,324 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/data not found
>>>> in
>>>> lease.paths
>>>> (=[/hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index,
>>>> /hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>>> 2008-09-28 03:15:47,325 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index not
>>>> found
>>>> in
>>>> lease.paths
>>>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>>> 2008-09-28 03:15:47,326 ERROR org.apache.hadoop.dfs.LeaseManager:
>>>> /hbase/-ROOT-/70236052/info/info/8544907469765511915 not found in
>>>> lease.paths
>>>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>>> 2
>>>>
>>>>
>>>>
>>>>
>>>> Jean-Daniel Cryans-2 wrote:
>>>>       
>>>>> There is no other exceptions before that? Did you enable DEBUG? Can we
>>>>>         
>>>> see
>>>>       
>>>>> a
>>>>> whole start/stop log of your region server?
>>>>>
>>>>> Thx,
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, Sep 25, 2008 at 11:09 AM, yoav.morag <yo...@corrigon.com>
>>>>> wrote:
>>>>>
>>>>>         
>>>>>> I am experiencing problems when restarting a cluster with
>>>>>> hadoop/hbase
>>>>>> 0.18.0. hadoop restarts OK, however hbase regionservers all exit with
>>>>>>           
>>>> the
>>>>       
>>>>>> message :
>>>>>> Exception in thread "regionserver/0:0:0:0:0:0:0:0:60020"
>>>>>> java.lang.NullPointerException
>>>>>>        at
>>>>>>
>>>>>>
>>>>>>           
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:448)
>>>>       
>>>>>>        at java.lang.Thread.run(Thread.java:619)
>>>>>> strange enough, the said line appears to indicate log is null,
>>>>>> however
>>>>>>           
>>>> a
>>>>       
>>>>>> log
>>>>>> is created and messages are written into it...
>>>>>> the restart scenario is very simple, and it happens even with a clean
>>>>>> database , on a newly formatted FS. I have also checked no ghost
>>>>>> processes
>>>>>> exist before start.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>>> "$INSTALLDIR/$HBASE/bin/stop-hbase.sh;$INSTALLDIR/$HADOOP/bin/stop-dfs.sh;"
>>>>       
>>>>>>
>>>>>>           
>>>> "$INSTALLDIR/$HADOOP/bin/start-dfs.sh;$INSTALLDIR/$HBASE/bin/start-hbase.sh;"
>>>>       
>>>>>> any ideas ?
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19671584.html
>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>>           
>>>>>         
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19709529.html
>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>
>>>>
>>>>       
>>>     
>>
>>   
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-restarting-0.18-tp19671584p19712887.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: problem restarting 0.18

Posted by stack <st...@duboce.net>.
Your hdfs looks ill.  Its complaining a data file in -ROOT- catalog 
table is 'missing'.  What happens if you run '$HADOOP_HOME/bin/hadoop 
fsck HBASE_HOMDIR'?  More context around the errors would help with 
analysis.   You've tried restarting your HDFS?

Thanks,
St.Ack


yoav.morag wrote:
>  unfortunately, this is not the case :-( . I have installed NTP on the
> cluster , but the problem remains in exactly the same way. it is now clear
> from the logs, however, that the problem occurs in the master first : 
>
> 2008-09-28 11:04:48,412 ERROR org.apache.hadoop.dfs.LeaseManager:
> /hbase/-ROOT-/70236052/info/mapfiles/2686380382008424762/data not found in
> lease.paths (=[/hbase/-ROOT-/70236052/info/mapfiles
>
> and only then on the regionservers : 
>
> 2008-09-28 11:05:02,848 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception.
> Aborting...
>
> any more ideas will be greatly appreciated ... 
>
>
>
> Jean-Daniel Cryans-2 wrote:
>   
>> You maybe just found your problem, the clocks are not synchronized. It is
>> a
>> requirement when using HBase to have synchronized clocks, see
>> http://hadoop.apache.org/hbase/docs/r0.18.0/api/index.html
>>
>> Thx for looking at it,
>>
>> J-D
>>
>> On Sun, Sep 28, 2008 at 3:47 AM, yoav.morag <yo...@corrigon.com> wrote:
>>
>>     
>>> debug  didn't seem to give much, as far as I could tell . i did however
>>> notice the following errors on hadoop log on the name node :
>>> I am attaching (
>>>
>>> http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log
>>> hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.loghadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log>
>>>
>>> http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log
>>> hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.loghbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log>
>>> ) the full logs
>>> from the name node and one region servers (there are 4 , all with
>>> identical
>>> errors). note the clocks are not synchronized across the cluster, so the
>>> times in the logs can not be used to compare order between machines.
>>>
>>> suspicous errors :
>>> 2008-09-28 03:15:47,316 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/data not found
>>> in
>>> lease.paths
>>> (=[/hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index,
>>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>> 2008-09-28 03:15:47,317 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index not found
>>> in
>>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/info/7031159331294621371 not found in
>>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186 not found in
>>> lease.paths
>>> (=[/hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>> 2008-09-28 03:15:47,324 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/data not found
>>> in
>>> lease.paths
>>> (=[/hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index,
>>> /hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>> 2008-09-28 03:15:47,325 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index not found
>>> in
>>> lease.paths
>>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>> 2008-09-28 03:15:47,326 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/info/8544907469765511915 not found in
>>> lease.paths
>>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>> 2
>>>
>>>
>>>
>>>
>>> Jean-Daniel Cryans-2 wrote:
>>>       
>>>> There is no other exceptions before that? Did you enable DEBUG? Can we
>>>>         
>>> see
>>>       
>>>> a
>>>> whole start/stop log of your region server?
>>>>
>>>> Thx,
>>>>
>>>> J-D
>>>>
>>>> On Thu, Sep 25, 2008 at 11:09 AM, yoav.morag <yo...@corrigon.com> wrote:
>>>>
>>>>         
>>>>> I am experiencing problems when restarting a cluster with hadoop/hbase
>>>>> 0.18.0. hadoop restarts OK, however hbase regionservers all exit with
>>>>>           
>>> the
>>>       
>>>>> message :
>>>>> Exception in thread "regionserver/0:0:0:0:0:0:0:0:60020"
>>>>> java.lang.NullPointerException
>>>>>        at
>>>>>
>>>>>
>>>>>           
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:448)
>>>       
>>>>>        at java.lang.Thread.run(Thread.java:619)
>>>>> strange enough, the said line appears to indicate log is null, however
>>>>>           
>>> a
>>>       
>>>>> log
>>>>> is created and messages are written into it...
>>>>> the restart scenario is very simple, and it happens even with a clean
>>>>> database , on a newly formatted FS. I have also checked no ghost
>>>>> processes
>>>>> exist before start.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> "$INSTALLDIR/$HBASE/bin/stop-hbase.sh;$INSTALLDIR/$HADOOP/bin/stop-dfs.sh;"
>>>       
>>>>>
>>>>>           
>>> "$INSTALLDIR/$HADOOP/bin/start-dfs.sh;$INSTALLDIR/$HBASE/bin/start-hbase.sh;"
>>>       
>>>>> any ideas ?
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19671584.html
>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>>           
>>>>         
>>> --
>>> View this message in context:
>>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19709529.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>
>>>
>>>       
>>     
>
>   


Re: problem restarting 0.18

Posted by "yoav.morag" <yo...@corrigon.com>.
 unfortunately, this is not the case :-( . I have installed NTP on the
cluster , but the problem remains in exactly the same way. it is now clear
from the logs, however, that the problem occurs in the master first : 

2008-09-28 11:04:48,412 ERROR org.apache.hadoop.dfs.LeaseManager:
/hbase/-ROOT-/70236052/info/mapfiles/2686380382008424762/data not found in
lease.paths (=[/hbase/-ROOT-/70236052/info/mapfiles

and only then on the regionservers : 

2008-09-28 11:05:02,848 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception.
Aborting...

any more ideas will be greatly appreciated ... 



Jean-Daniel Cryans-2 wrote:
> 
> You maybe just found your problem, the clocks are not synchronized. It is
> a
> requirement when using HBase to have synchronized clocks, see
> http://hadoop.apache.org/hbase/docs/r0.18.0/api/index.html
> 
> Thx for looking at it,
> 
> J-D
> 
> On Sun, Sep 28, 2008 at 3:47 AM, yoav.morag <yo...@corrigon.com> wrote:
> 
>>
>> debug  didn't seem to give much, as far as I could tell . i did however
>> notice the following errors on hadoop log on the name node :
>> I am attaching (
>>
>> http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log
>> hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.loghadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log>
>>
>> http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log
>> hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.loghbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log>
>> ) the full logs
>> from the name node and one region servers (there are 4 , all with
>> identical
>> errors). note the clocks are not synchronized across the cluster, so the
>> times in the logs can not be used to compare order between machines.
>>
>> suspicous errors :
>> 2008-09-28 03:15:47,316 ERROR org.apache.hadoop.dfs.LeaseManager:
>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/data not found
>> in
>> lease.paths
>> (=[/hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index,
>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>> 2008-09-28 03:15:47,317 ERROR org.apache.hadoop.dfs.LeaseManager:
>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index not found
>> in
>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
>> /hbase/-ROOT-/70236052/info/info/7031159331294621371 not found in
>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186 not found in
>> lease.paths
>> (=[/hbase/.META./1028785192/log/hlog.dat.1222585931303])
>> 2008-09-28 03:15:47,324 ERROR org.apache.hadoop.dfs.LeaseManager:
>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/data not found
>> in
>> lease.paths
>> (=[/hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index,
>> /hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>> 2008-09-28 03:15:47,325 ERROR org.apache.hadoop.dfs.LeaseManager:
>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index not found
>> in
>> lease.paths
>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>> 2008-09-28 03:15:47,326 ERROR org.apache.hadoop.dfs.LeaseManager:
>> /hbase/-ROOT-/70236052/info/info/8544907469765511915 not found in
>> lease.paths
>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>> 2
>>
>>
>>
>>
>> Jean-Daniel Cryans-2 wrote:
>> >
>> > There is no other exceptions before that? Did you enable DEBUG? Can we
>> see
>> > a
>> > whole start/stop log of your region server?
>> >
>> > Thx,
>> >
>> > J-D
>> >
>> > On Thu, Sep 25, 2008 at 11:09 AM, yoav.morag <yo...@corrigon.com> wrote:
>> >
>> >>
>> >> I am experiencing problems when restarting a cluster with hadoop/hbase
>> >> 0.18.0. hadoop restarts OK, however hbase regionservers all exit with
>> the
>> >> message :
>> >> Exception in thread "regionserver/0:0:0:0:0:0:0:0:60020"
>> >> java.lang.NullPointerException
>> >>        at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:448)
>> >>        at java.lang.Thread.run(Thread.java:619)
>> >> strange enough, the said line appears to indicate log is null, however
>> a
>> >> log
>> >> is created and messages are written into it...
>> >> the restart scenario is very simple, and it happens even with a clean
>> >> database , on a newly formatted FS. I have also checked no ghost
>> >> processes
>> >> exist before start.
>> >>
>> >>
>> >>
>> >>
>> "$INSTALLDIR/$HBASE/bin/stop-hbase.sh;$INSTALLDIR/$HADOOP/bin/stop-dfs.sh;"
>> >>
>> >>
>> >>
>> "$INSTALLDIR/$HADOOP/bin/start-dfs.sh;$INSTALLDIR/$HBASE/bin/start-hbase.sh;"
>> >>
>> >> any ideas ?
>> >> --
>> >> View this message in context:
>> >> http://www.nabble.com/problem-restarting-0.18-tp19671584p19671584.html
>> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19709529.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-restarting-0.18-tp19671584p19712318.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: problem restarting 0.18

Posted by Jean-Daniel Cryans <jd...@apache.org>.
You maybe just found your problem, the clocks are not synchronized. It is a
requirement when using HBase to have synchronized clocks, see
http://hadoop.apache.org/hbase/docs/r0.18.0/api/index.html

Thx for looking at it,

J-D

On Sun, Sep 28, 2008 at 3:47 AM, yoav.morag <yo...@corrigon.com> wrote:

>
> debug  didn't seem to give much, as far as I could tell . i did however
> notice the following errors on hadoop log on the name node :
> I am attaching (
>
> http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log
> hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.loghadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log>
>
> http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log
> hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.loghbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log> ) the full logs
> from the name node and one region servers (there are 4 , all with identical
> errors). note the clocks are not synchronized across the cluster, so the
> times in the logs can not be used to compare order between machines.
>
> suspicous errors :
> 2008-09-28 03:15:47,316 ERROR org.apache.hadoop.dfs.LeaseManager:
> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/data not found in
> lease.paths
> (=[/hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index,
> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
> 2008-09-28 03:15:47,317 ERROR org.apache.hadoop.dfs.LeaseManager:
> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index not found in
> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
> /hbase/-ROOT-/70236052/info/info/7031159331294621371 not found in
> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186 not found in lease.paths
> (=[/hbase/.META./1028785192/log/hlog.dat.1222585931303])
> 2008-09-28 03:15:47,324 ERROR org.apache.hadoop.dfs.LeaseManager:
> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/data not found in
> lease.paths
> (=[/hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index,
> /hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
> 2008-09-28 03:15:47,325 ERROR org.apache.hadoop.dfs.LeaseManager:
> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index not found in
> lease.paths
> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
> 2008-09-28 03:15:47,326 ERROR org.apache.hadoop.dfs.LeaseManager:
> /hbase/-ROOT-/70236052/info/info/8544907469765511915 not found in
> lease.paths
> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
> 2
>
>
>
>
> Jean-Daniel Cryans-2 wrote:
> >
> > There is no other exceptions before that? Did you enable DEBUG? Can we
> see
> > a
> > whole start/stop log of your region server?
> >
> > Thx,
> >
> > J-D
> >
> > On Thu, Sep 25, 2008 at 11:09 AM, yoav.morag <yo...@corrigon.com> wrote:
> >
> >>
> >> I am experiencing problems when restarting a cluster with hadoop/hbase
> >> 0.18.0. hadoop restarts OK, however hbase regionservers all exit with
> the
> >> message :
> >> Exception in thread "regionserver/0:0:0:0:0:0:0:0:60020"
> >> java.lang.NullPointerException
> >>        at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:448)
> >>        at java.lang.Thread.run(Thread.java:619)
> >> strange enough, the said line appears to indicate log is null, however a
> >> log
> >> is created and messages are written into it...
> >> the restart scenario is very simple, and it happens even with a clean
> >> database , on a newly formatted FS. I have also checked no ghost
> >> processes
> >> exist before start.
> >>
> >>
> >>
> >>
> "$INSTALLDIR/$HBASE/bin/stop-hbase.sh;$INSTALLDIR/$HADOOP/bin/stop-dfs.sh;"
> >>
> >>
> >>
> "$INSTALLDIR/$HADOOP/bin/start-dfs.sh;$INSTALLDIR/$HBASE/bin/start-hbase.sh;"
> >>
> >> any ideas ?
> >> --
> >> View this message in context:
> >> http://www.nabble.com/problem-restarting-0.18-tp19671584p19671584.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/problem-restarting-0.18-tp19671584p19709529.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: problem restarting 0.18

Posted by "yoav.morag" <yo...@corrigon.com>.
debug  didn't seem to give much, as far as I could tell . i did however
notice the following errors on hadoop log on the name node : 
I am attaching ( 
http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log
hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log 
http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log
hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log  ) the full logs
from the name node and one region servers (there are 4 , all with identical
errors). note the clocks are not synchronized across the cluster, so the
times in the logs can not be used to compare order between machines. 

suspicous errors : 
2008-09-28 03:15:47,316 ERROR org.apache.hadoop.dfs.LeaseManager:
/hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/data not found in
lease.paths
(=[/hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index,
/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
/hbase/.META./1028785192/log/hlog.dat.1222585931303])
2008-09-28 03:15:47,317 ERROR org.apache.hadoop.dfs.LeaseManager:
/hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index not found in
lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
/hbase/.META./1028785192/log/hlog.dat.1222585931303])
2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
/hbase/-ROOT-/70236052/info/info/7031159331294621371 not found in
lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
/hbase/.META./1028785192/log/hlog.dat.1222585931303])
2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186 not found in lease.paths
(=[/hbase/.META./1028785192/log/hlog.dat.1222585931303])
2008-09-28 03:15:47,324 ERROR org.apache.hadoop.dfs.LeaseManager:
/hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/data not found in
lease.paths
(=[/hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index,
/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
2008-09-28 03:15:47,325 ERROR org.apache.hadoop.dfs.LeaseManager:
/hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index not found in
lease.paths
(=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
2008-09-28 03:15:47,326 ERROR org.apache.hadoop.dfs.LeaseManager:
/hbase/-ROOT-/70236052/info/info/8544907469765511915 not found in
lease.paths
(=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
2




Jean-Daniel Cryans-2 wrote:
> 
> There is no other exceptions before that? Did you enable DEBUG? Can we see
> a
> whole start/stop log of your region server?
> 
> Thx,
> 
> J-D
> 
> On Thu, Sep 25, 2008 at 11:09 AM, yoav.morag <yo...@corrigon.com> wrote:
> 
>>
>> I am experiencing problems when restarting a cluster with hadoop/hbase
>> 0.18.0. hadoop restarts OK, however hbase regionservers all exit with the
>> message :
>> Exception in thread "regionserver/0:0:0:0:0:0:0:0:60020"
>> java.lang.NullPointerException
>>        at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:448)
>>        at java.lang.Thread.run(Thread.java:619)
>> strange enough, the said line appears to indicate log is null, however a
>> log
>> is created and messages are written into it...
>> the restart scenario is very simple, and it happens even with a clean
>> database , on a newly formatted FS. I have also checked no ghost
>> processes
>> exist before start.
>>
>>
>> 
>> "$INSTALLDIR/$HBASE/bin/stop-hbase.sh;$INSTALLDIR/$HADOOP/bin/stop-dfs.sh;"
>>
>>
>> "$INSTALLDIR/$HADOOP/bin/start-dfs.sh;$INSTALLDIR/$HBASE/bin/start-hbase.sh;"
>>
>> any ideas ?
>> --
>> View this message in context:
>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19671584.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/problem-restarting-0.18-tp19671584p19709529.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: problem restarting 0.18

Posted by Jean-Daniel Cryans <jd...@apache.org>.
There is no other exceptions before that? Did you enable DEBUG? Can we see a
whole start/stop log of your region server?

Thx,

J-D

On Thu, Sep 25, 2008 at 11:09 AM, yoav.morag <yo...@corrigon.com> wrote:

>
> I am experiencing problems when restarting a cluster with hadoop/hbase
> 0.18.0. hadoop restarts OK, however hbase regionservers all exit with the
> message :
> Exception in thread "regionserver/0:0:0:0:0:0:0:0:60020"
> java.lang.NullPointerException
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:448)
>        at java.lang.Thread.run(Thread.java:619)
> strange enough, the said line appears to indicate log is null, however a
> log
> is created and messages are written into it...
> the restart scenario is very simple, and it happens even with a clean
> database , on a newly formatted FS. I have also checked no ghost processes
> exist before start.
>
>
>  "$INSTALLDIR/$HBASE/bin/stop-hbase.sh;$INSTALLDIR/$HADOOP/bin/stop-dfs.sh;"
>
>
> "$INSTALLDIR/$HADOOP/bin/start-dfs.sh;$INSTALLDIR/$HBASE/bin/start-hbase.sh;"
>
> any ideas ?
> --
> View this message in context:
> http://www.nabble.com/problem-restarting-0.18-tp19671584p19671584.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>