You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2007/12/04 19:51:44 UTC
[jira] Commented: (HADOOP-2343) [hbase] Stuck regionserver?

    [ https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548349 ] 

Jim Kellerman commented on HADOOP-2343:
---------------------------------------

One possibility for this is that the master does not have enough server threads

> [hbase] Stuck regionserver?
> ---------------------------
>
>                 Key: HADOOP-2343
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2343
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>
> Looking in logs, a regionserver went down because it could not contact the master after 60 seconds.  Watching logging, the HRS is repeatedly checking all 150 loaded regions over and over again w/ a pause of about 5 seconds between runs... then there is a suspicious 60+ second gap with no logging as though the regionserver had hung up on something:
> {code}
> 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635
> 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region postlog,img247/230/seanpaul4li.jpg,1196615889965
> 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() determined that there was nothing to do
> 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to master for 67467 milliseconds - aborting server
> 2007-12-03 13:16:04,455 INFO  hbase.Leases - regionserver/0:0:0:0:0:0:0:0:60020 closing leases
> 2007-12-03 13:16:04,455 INFO  hbase.Leases$LeaseMonitor - regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting
> {code}
> Master seems to be running fine scanning its ~700 regions.  Then you see this in log, before the HRS shuts itself down.
> {code}
> 2007-12-03 13:14:31,416 INFO  hbase.Leases - HMaster.leaseChecker lease expired 153260899/1532608992007-12-03 13:14:31,417 INFO  hbase.HMaster - XX.XX.XX.102:60020 lease expired
> {code}
> ... and we go on to process shutdown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.