You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Barry Haddow <bh...@inf.ed.ac.uk> on 2008/09/29 16:18:19 UTC

Region servers shut down with UnknownScannerException

Hi

I recently set up a small hbase cluster (v 0.18) running on top of hadoop 
v.0.18.1. However I'm observing that the region servers spontaneously shut 
themselves down, usually with an UnknownScannerException. For instance, this 
weekend, I discovered that all four had shut down, with messages like the 
following in the logs:

2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Exception in 
createBlockOutputStream java.io.IOException: Bad connect ack with 
firstBadLink 129.215.197.39:50010
2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block 
blk_-5829206400135277905_3045
2008-09-29 07:29:16,552 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_CALL_SERVER_STARTUP
2008-09-29 07:46:35,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
4 on 60020, call next(-1347145425990165691) from 129.215.197.39:6999: error: 
org.apache.hadoop.hbase.UnknownScannerException: Name: -1347145425990165691


The underlying hdfs seems fine - fsck reports the hbase directory as healthy. 
After a restart hbase seems fine too, but surely the regionservers should 
stay up once they're started,

Any suggestions?

regards
Barry

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Re: Region servers shut down with UnknownScannerException

Posted by Barry Haddow <bh...@inf.ed.ac.uk>.
Thanks for the suggestions - responses inline.

On Monday 29 September 2008 18:15:53 you wrote:
> Barry:
>
>  From the below, looks like an issue in HDFS.   If regionserver is
> having issues talking to HDFS, it shuts itself down.
>
> Tell us more.  Are there other, heavy-duty processes running on the same
> servers hosting datanodes and regionservers?

Yes, there are heavy duty processes running on the same servers. This is 
unavoidable as we need the cluster for other tasks. 

>
> Enable DEBUG on your cluster and makes sure you've set your ulimit file
> descriptors up from default.  See the FAQ in wiki for how to do both.

Which faq are you referring to? I've set both hadoop and hbase to debug, and 
restarted. The fd limit is 8192. What should I be looking for and in which 
logs? 

Can I tune hbase so it is more tolerant of hdfs issues?

regards
Barry

>
> Thanks,
> St.Ack
>
> Barry Haddow wrote:
> > Hi
> >
> > I recently set up a small hbase cluster (v 0.18) running on top of hadoop
> > v.0.18.1. However I'm observing that the region servers spontaneously
> > shut themselves down, usually with an UnknownScannerException. For
> > instance, this weekend, I discovered that all four had shut down, with
> > messages like the following in the logs:
> >
> > 2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Exception
> > in createBlockOutputStream java.io.IOException: Bad connect ack with
> > firstBadLink 129.215.197.39:50010
> > 2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-5829206400135277905_3045
> > 2008-09-29 07:29:16,552 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > MSG_CALL_SERVER_STARTUP 2008-09-29 07:46:35,796 INFO
> > org.apache.hadoop.ipc.Server: IPC Server handler 4 on 60020, call
> > next(-1347145425990165691) from 129.215.197.39:6999: error:
> > org.apache.hadoop.hbase.UnknownScannerException: Name:
> > -1347145425990165691
> >
> >
> > The underlying hdfs seems fine - fsck reports the hbase directory as
> > healthy. After a restart hbase seems fine too, but surely the
> > regionservers should stay up once they're started,
> >
> > Any suggestions?
> >
> > regards
> > Barry



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Re: Region servers shut down with UnknownScannerException

Posted by stack <st...@duboce.net>.
Barry:

 From the below, looks like an issue in HDFS.   If regionserver is 
having issues talking to HDFS, it shuts itself down.

Tell us more.  Are there other, heavy-duty processes running on the same 
servers hosting datanodes and regionservers? 

Enable DEBUG on your cluster and makes sure you've set your ulimit file 
descriptors up from default.  See the FAQ in wiki for how to do both.

Thanks,
St.Ack

Barry Haddow wrote:
> Hi
>
> I recently set up a small hbase cluster (v 0.18) running on top of hadoop 
> v.0.18.1. However I'm observing that the region servers spontaneously shut 
> themselves down, usually with an UnknownScannerException. For instance, this 
> weekend, I discovered that all four had shut down, with messages like the 
> following in the logs:
>
> 2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Exception in 
> createBlockOutputStream java.io.IOException: Bad connect ack with 
> firstBadLink 129.215.197.39:50010
> 2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block 
> blk_-5829206400135277905_3045
> 2008-09-29 07:29:16,552 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_CALL_SERVER_STARTUP
> 2008-09-29 07:46:35,796 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 4 on 60020, call next(-1347145425990165691) from 129.215.197.39:6999: error: 
> org.apache.hadoop.hbase.UnknownScannerException: Name: -1347145425990165691
>
>
> The underlying hdfs seems fine - fsck reports the hbase directory as healthy. 
> After a restart hbase seems fine too, but surely the regionservers should 
> stay up once they're started,
>
> Any suggestions?
>
> regards
> Barry
>
>