You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Sean Bigdatafun <se...@gmail.com> on 2011/05/29 08:28:13 UTC

0.90.1 HMaster malfunction in pseudo-distributed mode

I am trying for 0.90.1 (hbase-0.90.1-CDH3B4) under pseudo-dist mode, and met
the problem of HMaster crashing. Here is how I did.

I. First I installed Hadoop pseudo cluster (hadoop-0.20.2-CDH3B4) with the
following conf edited.

1) core-site.xml ==>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
</property>

2) hdfs-site.xml ==>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

(with above confs, start-all.sh was run, and the hadoop pseudo cluster
started to run happily)


Secondly, I installed hbase-0.90.1-CDH3B4 with the following conf edited.

hbase-site.xml ==>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
  </property>

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>localhost</value>
  </property>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>The replication count for HLog and HFile storage. Should
not be greater than HDFS datanode count.
    </description>
  </property>

(with the above conf, I run the command of hbase-start.sh, and I realised
that HMaster did not function well -- i can't access localhost:60010)


II. Here is the HMaster error log:

2011-05-28 23:22:55,292 WARN
org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable
location to assign region -ROOT-,,0.70236052
2011-05-28 23:23:35,291 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out:  -ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
2011-05-28 23:23:35,291 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
for too long, reassigning -ROOT-,,0.70236052 to a random server
2011-05-28 23:23:35,291 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
was=-ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
2011-05-28 23:23:35,291 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan
for region -ROOT-,,0.70236052; plan=hri=-ROOT-,,0.70236052, src=,
dest=localhost,60020,1306648534687
2011-05-28 23:23:35,291 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
-ROOT-,,0.70236052 to localhost,60020,1306648534687
2011-05-28 23:23:35,291 DEBUG org.apache.hadoop.hbase.master.ServerManager:
New connection to localhost,60020,1306648534687
2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /
127.0.0.1:60020 could not be reached after 1 tries, giving up.
2011-05-28 23:23:35,292 WARN
org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of
-ROOT-,,0.70236052 to serverName=localhost,60020,1306648534687,
load=(requests=0, regions=0, usedHeap=22, maxHeap=996), trying to assign
elsewhere instead; retry=0
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /
127.0.0.1:60020 after attempts=1
        at
org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
        at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
        at
org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
        at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
        at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
        at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
        at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
        at
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1605)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
        at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
        at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
        at
org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy6.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
        ... 8 more
2011-05-28 23:23:35,292 WARN
org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable
location to assign region -ROOT-,,0.70236052



III. Here is the zk status from http://localhost:60010/zk.jsp

HBase is rooted at /hbase
Master address: sean-PowerEdge:60000
Region server holding ROOT: null
Region servers:
 sean-PowerEdge:60020
Quorum Server Statistics:
 localhost:2181
  Zookeeper version: 3.3.2-CDH3B4--1, built on 02/21/2011 20:16 GMT
  Clients:
   /127.0.0.1:42221[0](queued=0,recved=1,sent=0)
   /127.0.0.1:44071[1](queued=0,recved=39,sent=44)
   /127.0.0.1:44078[1](queued=0,recved=23,sent=24)
   /127.0.0.1:44085[1](queued=0,recved=23,sent=23)
   /127.0.0.1:44077[1](queued=0,recved=19,sent=19)

  Latency min/avg/max: 0/6/164
  Received: 105
  Sent: 110
  Outstanding: 0
  Zxid: 0x148
  Mode: standalone
  Node count: 12


What's the problem causing the above symptom?

Thanks,
-- 
--Sean

Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

Posted by Sean Bigdatafun <se...@gmail.com>.

No, I do not see any suspicious log entry in regionserver log. Here is it
(note that all of my server processes are on the same machine because I am
running it with pseudo distributed mode). Any other hint? Thanks.

regionserver.log ==>

2011-05-28 22:55:38,982 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2011-05-28 22:55:38,984 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2011-05-28 22:55:38,984 INFO
org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics:
Initialized
2011-05-28 22:55:39,008 DEBUG
org.apache.hadoop.hbase.executor.ExecutorService: Starting executor service
name=RS_OPEN_REGION-localhost,60020,1306648534687, corePoolSize=3,
maxPoolSize=3
2011-05-28 22:55:39,009 DEBUG
org.apache.hadoop.hbase.executor.ExecutorService: Starting executor service
name=RS_OPEN_ROOT-localhost,60020,1306648534687, corePoolSize=1,
maxPoolSize=1
2011-05-28 22:55:39,009 DEBUG
org.apache.hadoop.hbase.executor.ExecutorService: Starting executor service
name=RS_OPEN_META-localhost,60020,1306648534687, corePoolSize=1,
maxPoolSize=1
2011-05-28 22:55:39,009 DEBUG
org.apache.hadoop.hbase.executor.ExecutorService: Starting executor service
name=RS_CLOSE_REGION-localhost,60020,1306648534687, corePoolSize=3,
maxPoolSize=3
2011-05-28 22:55:39,009 DEBUG
org.apache.hadoop.hbase.executor.ExecutorService: Starting executor service
name=RS_CLOSE_ROOT-localhost,60020,1306648534687, corePoolSize=1,
maxPoolSize=1
2011-05-28 22:55:39,009 DEBUG
org.apache.hadoop.hbase.executor.ExecutorService: Starting executor service
name=RS_CLOSE_META-localhost,60020,1306648534687, corePoolSize=1,
maxPoolSize=1
2011-05-28 22:55:39,107 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2011-05-28 22:55:39,192 INFO org.apache.hadoop.http.HttpServer: Added global
filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2011-05-28 22:55:39,196 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
Opening the listener on 60030
2011-05-28 22:55:39,196 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 60030
webServer.getConnectors()[0].getLocalPort() returned 60030
2011-05-28 22:55:39,196 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 60030
2011-05-28 22:55:39,197 INFO org.mortbay.log: jetty-6.1.26
2011-05-28 22:55:39,472 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:60030
2011-05-28 22:55:39,473 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder: starting
2011-05-28 22:55:39,473 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder: starting
2011-05-28 22:55:39,475 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: starting
2011-05-28 22:55:39,475 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
listener on 60020: starting
2011-05-28 22:55:39,476 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020: starting
2011-05-28 22:55:39,476 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 60020: starting
2011-05-28 22:55:39,476 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 60020: starting
2011-05-28 22:55:39,476 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020: starting
2011-05-28 22:55:39,476 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 60020: starting
2011-05-28 22:55:39,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 60020: starting
2011-05-28 22:55:39,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60020: starting
2011-05-28 22:55:39,501 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60020: starting
2011-05-28 22:55:39,503 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 1 on 60020: starting
2011-05-28 22:55:39,503 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 0 on 60020: starting
2011-05-28 22:55:39,503 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 2 on 60020: starting
2011-05-28 22:55:39,503 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 3 on 60020: starting
2011-05-28 22:55:39,504 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020: starting
2011-05-28 22:55:39,504 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 4 on 60020: starting
2011-05-28 22:55:39,504 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 5 on 60020: starting
2011-05-28 22:55:39,504 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 6 on 60020: starting
2011-05-28 22:55:39,505 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 7 on 60020: starting
2011-05-28 22:55:39,512 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 8 on 60020: starting
2011-05-28 22:55:39,512 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Serving as
localhost,60020,1306648534687, RPC listening on /127.0.1.1:60020,
sessionid=0x1303a5253dc0002
2011-05-28 22:55:39,513 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 9 on 60020: starting
2011-05-28 22:55:39,520 INFO org.apache.hadoop.hbase.regionserver.StoreFile:
Allocating LruBlockCache with maximum size 199.4m
2011-05-28 23:00:39,529 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=957.86 KB,
free=198.43 MB, max=199.36 MB, blocks=0, accesses=0, hits=0, hitRatio=�%,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=�%, evictions=0,
evicted=0, evictedPerRun=NaN
2011-05-28 23:05:39,528 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=957.86 KB,
free=198.43 MB, max=199.36 MB, blocks=0, accesses=0, hits=0, hitRatio=�%,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=�%, evictions=0,
evicted=0, evictedPerRun=NaN
2011-05-28 23:10:39,528 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=957.86 KB,
free=198.43 MB, max=199.36 MB, blocks=0, accesses=0, hits=0, hitRatio=�%,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=�%, evictions=0,
evicted=0, evictedPerRun=NaN
2011-05-28 23:15:39,528 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=957.86 KB,
free=198.43 MB, max=199.36 MB, blocks=0, accesses=0, hits=0, hitRatio=�%,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=�%, evictions=0,
evicted=0, evictedPerRun=NaN
2011-05-28 23:20:39,528 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=957.86 KB,
free=198.43 MB, max=199.36 MB, blocks=0, accesses=0, hits=0, hitRatio=�%,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=�%, evictions=0,
evicted=0, evictedPerRun=NaN
2011-05-28 23:25:39,528 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=957.86 KB,
free=198.43 MB, max=199.36 MB, blocks=0, accesses=0, hits=0, hitRatio=�%,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=�%, evictions=0,
evicted=0, evictedPerRun=NaN
2011-05-28 23:30:39,528 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=957.86 KB,
free=198.43 MB, max=199.36 MB, blocks=0, accesses=0, hits=0, hitRatio=�%,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=�%, evictions=0,
evicted=0, evictedPerRun=NaN
2011-05-28 23:35:39,528 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=957.86 KB,
free=198.43 MB, max=199.36 MB, blocks=0, accesses=0, hits=0, hitRatio=�%,
cachingAccesses=0, cachingHits=0, cachingHitsRatio=�%, evictions=0,
evicted=0, evictedPerRun=NaN

2011/5/29 Ferdy Galema <fe...@kalooga.com>

> 2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /
> 127.0.0.1:60020 could not be reached after 1 tries, giving up.
>
> This means the regionserver could not be reached. Check the regionserver
> logs to see why. Perhaps it failed to start? Is the HDFS fully functional?
>
> Ferdy.
>
> On 05/29/2011 08:28 AM, Sean Bigdatafun wrote:
> > I am trying for 0.90.1 (hbase-0.90.1-CDH3B4) under pseudo-dist mode, and
> met
> > the problem of HMaster crashing. Here is how I did.
> >
> > I. First I installed Hadoop pseudo cluster (hadoop-0.20.2-CDH3B4) with
> the
> > following conf edited.
> >
> > 1) core-site.xml ==>
> > <property>
> >   <name>fs.default.name</name>
> >   <value>hdfs://localhost:9000</value>
> > </property>
> >
> > 2) hdfs-site.xml ==>
> >   <property>
> >     <name>dfs.replication</name>
> >     <value>1</value>
> >   </property>
> >
> > (with above confs, start-all.sh was run, and the hadoop pseudo cluster
> > started to run happily)
> >
> >
> > Secondly, I installed hbase-0.90.1-CDH3B4 with the following conf edited.
> >
> > hbase-site.xml ==>
> >   <property>
> >     <name>hbase.rootdir</name>
> >     <value>hdfs://localhost:9000/hbase</value>
> >   </property>
> >
> >   <property>
> >     <name>hbase.cluster.distributed</name>
> >     <value>true</value>
> >   </property>
> >
> >   <property>
> >     <name>hbase.zookeeper.quorum</name>
> >     <value>localhost</value>
> >   </property>
> >
> >   <property>
> >     <name>dfs.replication</name>
> >     <value>1</value>
> >     <description>The replication count for HLog and HFile storage. Should
> > not be greater than HDFS datanode count.
> >     </description>
> >   </property>
> >
> > (with the above conf, I run the command of hbase-start.sh, and I realised
> > that HMaster did not function well -- i can't access localhost:60010)
> >
> >
> > II. Here is the HMaster error log:
> >
> > 2011-05-28 23:22:55,292 WARN
> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable
> > location to assign region -ROOT-,,0.70236052
> > 2011-05-28 23:23:35,291 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> > timed out:  -ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
> > 2011-05-28 23:23:35,291 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
> > for too long, reassigning -ROOT-,,0.70236052 to a random server
> > 2011-05-28 23:23:35,291 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> > was=-ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
> > 2011-05-28 23:23:35,291 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan
> > for region -ROOT-,,0.70236052; plan=hri=-ROOT-,,0.70236052, src=,
> > dest=localhost,60020,1306648534687
> > 2011-05-28 23:23:35,291 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> > -ROOT-,,0.70236052 to localhost,60020,1306648534687
> > 2011-05-28 23:23:35,291 DEBUG
> org.apache.hadoop.hbase.master.ServerManager:
> > New connection to localhost,60020,1306648534687
> > 2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /
> > 127.0.0.1:60020 could not be reached after 1 tries, giving up.
> > 2011-05-28 23:23:35,292 WARN
> > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of
> > -ROOT-,,0.70236052 to serverName=localhost,60020,1306648534687,
> > load=(requests=0, regions=0, usedHeap=22, maxHeap=996), trying to assign
> > elsewhere instead; retry=0
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting
> up
> > proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /
> > 127.0.0.1:60020 after attempts=1
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> >         at
> >
> org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
> >         at
> >
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
> >         at
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
> >         at
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
> >         at
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
> >         at
> >
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1605)
> >         at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> > Caused by: java.net.ConnectException: Connection refused
> >         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >         at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >         at
> >
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> >         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> >         at $Proxy6.getProtocolVersion(Unknown Source)
> >         at
> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> >         at
> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> >         at
> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> >         ... 8 more
> > 2011-05-28 23:23:35,292 WARN
> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable
> > location to assign region -ROOT-,,0.70236052
> >
> >
> >
> > III. Here is the zk status from http://localhost:60010/zk.jsp
> >
> > HBase is rooted at /hbase
> > Master address: sean-PowerEdge:60000
> > Region server holding ROOT: null
> > Region servers:
> >  sean-PowerEdge:60020
> > Quorum Server Statistics:
> >  localhost:2181
> >   Zookeeper version: 3.3.2-CDH3B4--1, built on 02/21/2011 20:16 GMT
> >   Clients:
> >    /127.0.0.1:42221[0](queued=0,recved=1,sent=0)
> >    /127.0.0.1:44071[1](queued=0,recved=39,sent=44)
> >    /127.0.0.1:44078[1](queued=0,recved=23,sent=24)
> >    /127.0.0.1:44085[1](queued=0,recved=23,sent=23)
> >    /127.0.0.1:44077[1](queued=0,recved=19,sent=19)
> >
> >   Latency min/avg/max: 0/6/164
> >   Received: 105
> >   Sent: 110
> >   Outstanding: 0
> >   Zxid: 0x148
> >   Mode: standalone
> >   Node count: 12
> >
> >
> > What's the problem causing the above symptom?
> >
> > Thanks,
>

-- 
--Sean

Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

Posted by Ferdy Galema <fe...@kalooga.com>.

2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /
127.0.0.1:60020 could not be reached after 1 tries, giving up.

This means the regionserver could not be reached. Check the regionserver
logs to see why. Perhaps it failed to start? Is the HDFS fully functional?

Ferdy.

On 05/29/2011 08:28 AM, Sean Bigdatafun wrote:
> I am trying for 0.90.1 (hbase-0.90.1-CDH3B4) under pseudo-dist mode, and met
> the problem of HMaster crashing. Here is how I did.
>
> I. First I installed Hadoop pseudo cluster (hadoop-0.20.2-CDH3B4) with the
> following conf edited.
>
> 1) core-site.xml ==>
> <property>
>   <name>fs.default.name</name>
>   <value>hdfs://localhost:9000</value>
> </property>
>
> 2) hdfs-site.xml ==>
>   <property>
>     <name>dfs.replication</name>
>     <value>1</value>
>   </property>
>
> (with above confs, start-all.sh was run, and the hadoop pseudo cluster
> started to run happily)
>
>
> Secondly, I installed hbase-0.90.1-CDH3B4 with the following conf edited.
>
> hbase-site.xml ==>
>   <property>
>     <name>hbase.rootdir</name>
>     <value>hdfs://localhost:9000/hbase</value>
>   </property>
>
>   <property>
>     <name>hbase.cluster.distributed</name>
>     <value>true</value>
>   </property>
>
>   <property>
>     <name>hbase.zookeeper.quorum</name>
>     <value>localhost</value>
>   </property>
>
>   <property>
>     <name>dfs.replication</name>
>     <value>1</value>
>     <description>The replication count for HLog and HFile storage. Should
> not be greater than HDFS datanode count.
>     </description>
>   </property>
>
> (with the above conf, I run the command of hbase-start.sh, and I realised
> that HMaster did not function well -- i can't access localhost:60010)
>
>
> II. Here is the HMaster error log:
>
> 2011-05-28 23:22:55,292 WARN
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable
> location to assign region -ROOT-,,0.70236052
> 2011-05-28 23:23:35,291 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> timed out:  -ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
> 2011-05-28 23:23:35,291 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
> for too long, reassigning -ROOT-,,0.70236052 to a random server
> 2011-05-28 23:23:35,291 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> was=-ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
> 2011-05-28 23:23:35,291 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan
> for region -ROOT-,,0.70236052; plan=hri=-ROOT-,,0.70236052, src=,
> dest=localhost,60020,1306648534687
> 2011-05-28 23:23:35,291 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> -ROOT-,,0.70236052 to localhost,60020,1306648534687
> 2011-05-28 23:23:35,291 DEBUG org.apache.hadoop.hbase.master.ServerManager:
> New connection to localhost,60020,1306648534687
> 2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /
> 127.0.0.1:60020 could not be reached after 1 tries, giving up.
> 2011-05-28 23:23:35,292 WARN
> org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of
> -ROOT-,,0.70236052 to serverName=localhost,60020,1306648534687,
> load=(requests=0, regions=0, usedHeap=22, maxHeap=996), trying to assign
> elsewhere instead; retry=0
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /
> 127.0.0.1:60020 after attempts=1
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
>         at
> org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
>         at
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1605)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>         at $Proxy6.getProtocolVersion(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>         at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>         ... 8 more
> 2011-05-28 23:23:35,292 WARN
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable
> location to assign region -ROOT-,,0.70236052
>
>
>
> III. Here is the zk status from http://localhost:60010/zk.jsp
>
> HBase is rooted at /hbase
> Master address: sean-PowerEdge:60000
> Region server holding ROOT: null
> Region servers:
>  sean-PowerEdge:60020
> Quorum Server Statistics:
>  localhost:2181
>   Zookeeper version: 3.3.2-CDH3B4--1, built on 02/21/2011 20:16 GMT
>   Clients:
>    /127.0.0.1:42221[0](queued=0,recved=1,sent=0)
>    /127.0.0.1:44071[1](queued=0,recved=39,sent=44)
>    /127.0.0.1:44078[1](queued=0,recved=23,sent=24)
>    /127.0.0.1:44085[1](queued=0,recved=23,sent=23)
>    /127.0.0.1:44077[1](queued=0,recved=19,sent=19)
>
>   Latency min/avg/max: 0/6/164
>   Received: 105
>   Sent: 110
>   Outstanding: 0
>   Zxid: 0x148
>   Mode: standalone
>   Node count: 12
>
>
> What's the problem causing the above symptom?
>
> Thanks,

Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

Posted by Hari Sreekumar <hs...@clickable.com>.

sry.. it is
Changed
127.0.0.1 localhost localhost.localdomain
127.0.1.1 hsreekumar-lt.
<http://hsreekumar-lt.corp1.com/>Clickablecorp.com<http://hsreekumar-lt.clickablecorp.com/>
hsreekumar-lt
<http://hsreekumar-lt.corp1.com/>

to
127.0.0.1 localhost localhost.localdomain
hsreekumar-lt.Clickablecorp.com<http://hsreekumar-lt.clickablecorp.com/>
hsreekumar-lt
#127.0.1.1 hsreekumar-lt.
<http://hsreekumar-lt.corp1.com/>Clickablecorp.com<http://hsreekumar-lt.clickablecorp.com/>
hsreekumar-lt
<http://hsreekumar-lt.corp1.com/>

On Thu, Jun 2, 2011 at 11:18 AM, Hari Sreekumar <hs...@clickable.com>wrote:

> Hey,
>
> I had the same problem.. it seems it's because of the 127.0.1.1 entry in
> /etc/hosts (which is default in ubuntu I think, but I haven't seen it in
> CentOS systems).
>
> Changed
> 127.0.0.1 localhost localhost.localdomain
> 127.0.1.1 hsreekumar-lt.corp1.com hsreekumar-lt
>
> to
> 127.0.0.1 localhost localhost.localdomain hsreekumar-lt.Clickablecorp.com
> hsreekumar-lt
> #127.0.1.1 hsreekumar-lt.corp1.com hsreekumar-lt
>
> See if it fixes your problem.. though I am not sure what will be the side
> effects of this/ whether some other programs will break?
>
> Thanks,
> Hari
>
> On Wed, Jun 1, 2011 at 11:29 PM, Stack <st...@duboce.net> wrote:
>
>> On Tue, May 31, 2011 at 11:45 PM, Sean Bigdatafun
>> <se...@gmail.com> wrote:
>> > Sure. Thanks, St.Ack. Here are the attached HBase logs, plus the
>> screenshot
>> > of the region server. The /etc/hosts should be Ok I think because my
>> Hadoop
>> > (pseudo distributed )cluster runs well and healthy.
>>
>> FYI, what works for hadoop may not work for hbase.
>>
>> > But I post it here in
>> > case I missed something :-0
>> >
>> > 127.0.0.1    localhost
>> > 127.0.1.1    sean-PowerEdge
>> >
>> > # The following lines are desirable for IPv6 capable hosts
>> > ::1     ip6-localhost ip6-loopback localhost6
>> > fe00::0 ip6-localnet
>> > ff00::0 ip6-mcastprefix
>> > ff02::1 ip6-allnodes
>> > ff02::2 ip6-allrouters
>> >
>>
>> Try turning off ipv6.  In the past its been fingered as problem-causing.
>>
>> Looking in your logs:
>>
>> + Make sure you fix this before you put any significant data into
>> hbase 'ulimit -n 1024'
>>
>> So, yeah, it looks like your /etc/hosts needs fixing.  When the
>> regionserver does its lookup its finding its hostname to be localhost:
>>
>> 2011-05-31 23:32:44,742 INFO
>> org.apache.hadoop.hbase.master.ServerManager: Registering
>> server=localhost,60020,1306909960650, regionCount=0, userLoad=false
>>
>> But then when the master tries to send it a region, its trying to send it
>> to
>>
>> 2011-05-31 23:32:47,671 INFO org.apache.hadoop.ipc.HbaseRPC: Server at
>> /127.0.0.1:60020 could not be reached after 1 tries, giving up.
>>
>> .... notice the 127.0.0.1 above.
>>
>> Fix this discrepency.
>>
>> St.Ack
>>
>>
>>
>> > Thanks,
>> > Sean
>> >
>> >
>> >
>> >
>> >
>> > On Mon, May 30, 2011 at 7:34 PM, Stack <st...@duboce.net> wrote:
>> >>
>> >> Odd.  I dont' see the regionserver checking into the master (maybe
>> >> thats the way it is in pseudo-distributed and I just forgot).  Can you
>> >> paste more master log?   I don't see the regionserver coming in in the
>> >> snippet you've pasted so not sure how its registering itself (I see
>> >> the timeout when we try to assign it -ROOT-).
>> >>
>> >> Whats in your /etc/hosts?  I see lots of locahost and 127.0.0.1.
>> >> Maybe the two are not equated in your resolve setup?
>> >>
>> >> St.Ack
>> >>
>> >> On Sat, May 28, 2011 at 11:28 PM, Sean Bigdatafun
>> >> <se...@gmail.com> wrote:
>> >> > I am trying for 0.90.1 (hbase-0.90.1-CDH3B4) under pseudo-dist mode,
>> and
>> >> > met
>> >> > the problem of HMaster crashing. Here is how I did.
>> >> >
>> >> > I. First I installed Hadoop pseudo cluster (hadoop-0.20.2-CDH3B4)
>> with
>> >> > the
>> >> > following conf edited.
>> >> >
>> >> > 1) core-site.xml ==>
>> >> > <property>
>> >> >  <name>fs.default.name</name>
>> >> >  <value>hdfs://localhost:9000</value>
>> >> > </property>
>> >> >
>> >> > 2) hdfs-site.xml ==>
>> >> >  <property>
>> >> >    <name>dfs.replication</name>
>> >> >    <value>1</value>
>> >> >  </property>
>> >> >
>> >> > (with above confs, start-all.sh was run, and the hadoop pseudo
>> cluster
>> >> > started to run happily)
>> >> >
>> >> >
>> >> > Secondly, I installed hbase-0.90.1-CDH3B4 with the following conf
>> >> > edited.
>> >> >
>> >> > hbase-site.xml ==>
>> >> >  <property>
>> >> >    <name>hbase.rootdir</name>
>> >> >    <value>hdfs://localhost:9000/hbase</value>
>> >> >  </property>
>> >> >
>> >> >  <property>
>> >> >    <name>hbase.cluster.distributed</name>
>> >> >    <value>true</value>
>> >> >  </property>
>> >> >
>> >> >  <property>
>> >> >    <name>hbase.zookeeper.quorum</name>
>> >> >    <value>localhost</value>
>> >> >  </property>
>> >> >
>> >> >  <property>
>> >> >    <name>dfs.replication</name>
>> >> >    <value>1</value>
>> >> >    <description>The replication count for HLog and HFile storage.
>> Should
>> >> > not be greater than HDFS datanode count.
>> >> >    </description>
>> >> >  </property>
>> >> >
>> >> > (with the above conf, I run the command of hbase-start.sh, and I
>> >> > realised
>> >> > that HMaster did not function well -- i can't access localhost:60010)
>> >> >
>> >> >
>> >> > II. Here is the HMaster error log:
>> >> >
>> >> > 2011-05-28 23:22:55,292 WARN
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a
>> >> > viable
>> >> > location to assign region -ROOT-,,0.70236052
>> >> > 2011-05-28 23:23:35,291 INFO
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in
>> transition
>> >> > timed out:  -ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
>> >> > 2011-05-28 23:23:35,291 INFO
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Region has been
>> >> > OFFLINE
>> >> > for too long, reassigning -ROOT-,,0.70236052 to a random server
>> >> > 2011-05-28 23:23:35,291 DEBUG
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
>> >> > was=-ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
>> >> > 2011-05-28 23:23:35,291 DEBUG
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing
>> >> > plan
>> >> > for region -ROOT-,,0.70236052; plan=hri=-ROOT-,,0.70236052, src=,
>> >> > dest=localhost,60020,1306648534687
>> >> > 2011-05-28 23:23:35,291 DEBUG
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
>> >> > -ROOT-,,0.70236052 to localhost,60020,1306648534687
>> >> > 2011-05-28 23:23:35,291 DEBUG
>> >> > org.apache.hadoop.hbase.master.ServerManager:
>> >> > New connection to localhost,60020,1306648534687
>> >> > 2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server
>> at /
>> >> > 127.0.0.1:60020 could not be reached after 1 tries, giving up.
>> >> > 2011-05-28 23:23:35,292 WARN
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment
>> of
>> >> > -ROOT-,,0.70236052 to serverName=localhost,60020,1306648534687,
>> >> > load=(requests=0, regions=0, usedHeap=22, maxHeap=996), trying to
>> assign
>> >> > elsewhere instead; retry=0
>> >> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
>> setting
>> >> > up
>> >> > proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /
>> >> > 127.0.0.1:60020 after attempts=1
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1605)
>> >> >        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>> >> > Caused by: java.net.ConnectException: Connection refused
>> >> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> >> >        at
>> >> >
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>> >> >        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>> >> >        at
>> >> >
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>> >> >        at $Proxy6.getProtocolVersion(Unknown Source)
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>> >> >        ... 8 more
>> >> > 2011-05-28 23:23:35,292 WARN
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a
>> >> > viable
>> >> > location to assign region -ROOT-,,0.70236052
>> >> >
>> >> >
>> >> >
>> >> > III. Here is the zk status from http://localhost:60010/zk.jsp
>> >> >
>> >> > HBase is rooted at /hbase
>> >> > Master address: sean-PowerEdge:60000
>> >> > Region server holding ROOT: null
>> >> > Region servers:
>> >> >  sean-PowerEdge:60020
>> >> > Quorum Server Statistics:
>> >> >  localhost:2181
>> >> >  Zookeeper version: 3.3.2-CDH3B4--1, built on 02/21/2011 20:16 GMT
>> >> >  Clients:
>> >> >   /127.0.0.1:42221[0](queued=0,recved=1,sent=0)
>> >> >   /127.0.0.1:44071[1](queued=0,recved=39,sent=44)
>> >> >   /127.0.0.1:44078[1](queued=0,recved=23,sent=24)
>> >> >   /127.0.0.1:44085[1](queued=0,recved=23,sent=23)
>> >> >   /127.0.0.1:44077[1](queued=0,recved=19,sent=19)
>> >> >
>> >> >  Latency min/avg/max: 0/6/164
>> >> >  Received: 105
>> >> >  Sent: 110
>> >> >  Outstanding: 0
>> >> >  Zxid: 0x148
>> >> >  Mode: standalone
>> >> >  Node count: 12
>> >> >
>> >> >
>> >> > What's the problem causing the above symptom?
>> >> >
>> >> > Thanks,
>> >> > --
>> >> > --Sean
>> >> >
>> >
>> >
>> >
>> > --
>> > --Sean
>> >
>> >
>>
>
>

Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

Posted by Hari Sreekumar <hs...@clickable.com>.

Hey,

I had the same problem.. it seems it's because of the 127.0.1.1 entry in
/etc/hosts (which is default in ubuntu I think, but I haven't seen it in
CentOS systems).

Changed
127.0.0.1 localhost localhost.localdomain
127.0.1.1 hsreekumar-lt.corp1.com hsreekumar-lt

to
127.0.0.1 localhost localhost.localdomain hsreekumar-lt.Clickablecorp.com
hsreekumar-lt
#127.0.1.1 hsreekumar-lt.corp1.com hsreekumar-lt

See if it fixes your problem.. though I am not sure what will be the side
effects of this/ whether some other programs will break?

Thanks,
Hari

On Wed, Jun 1, 2011 at 11:29 PM, Stack <st...@duboce.net> wrote:

> On Tue, May 31, 2011 at 11:45 PM, Sean Bigdatafun
> <se...@gmail.com> wrote:
> > Sure. Thanks, St.Ack. Here are the attached HBase logs, plus the
> screenshot
> > of the region server. The /etc/hosts should be Ok I think because my
> Hadoop
> > (pseudo distributed )cluster runs well and healthy.
>
> FYI, what works for hadoop may not work for hbase.
>
> > But I post it here in
> > case I missed something :-0
> >
> > 127.0.0.1    localhost
> > 127.0.1.1    sean-PowerEdge
> >
> > # The following lines are desirable for IPv6 capable hosts
> > ::1     ip6-localhost ip6-loopback localhost6
> > fe00::0 ip6-localnet
> > ff00::0 ip6-mcastprefix
> > ff02::1 ip6-allnodes
> > ff02::2 ip6-allrouters
> >
>
> Try turning off ipv6.  In the past its been fingered as problem-causing.
>
> Looking in your logs:
>
> + Make sure you fix this before you put any significant data into
> hbase 'ulimit -n 1024'
>
> So, yeah, it looks like your /etc/hosts needs fixing.  When the
> regionserver does its lookup its finding its hostname to be localhost:
>
> 2011-05-31 23:32:44,742 INFO
> org.apache.hadoop.hbase.master.ServerManager: Registering
> server=localhost,60020,1306909960650, regionCount=0, userLoad=false
>
> But then when the master tries to send it a region, its trying to send it
> to
>
> 2011-05-31 23:32:47,671 INFO org.apache.hadoop.ipc.HbaseRPC: Server at
> /127.0.0.1:60020 could not be reached after 1 tries, giving up.
>
> .... notice the 127.0.0.1 above.
>
> Fix this discrepency.
>
> St.Ack
>
>
>
> > Thanks,
> > Sean
> >
> >
> >
> >
> >
> > On Mon, May 30, 2011 at 7:34 PM, Stack <st...@duboce.net> wrote:
> >>
> >> Odd.  I dont' see the regionserver checking into the master (maybe
> >> thats the way it is in pseudo-distributed and I just forgot).  Can you
> >> paste more master log?   I don't see the regionserver coming in in the
> >> snippet you've pasted so not sure how its registering itself (I see
> >> the timeout when we try to assign it -ROOT-).
> >>
> >> Whats in your /etc/hosts?  I see lots of locahost and 127.0.0.1.
> >> Maybe the two are not equated in your resolve setup?
> >>
> >> St.Ack
> >>
> >> On Sat, May 28, 2011 at 11:28 PM, Sean Bigdatafun
> >> <se...@gmail.com> wrote:
> >> > I am trying for 0.90.1 (hbase-0.90.1-CDH3B4) under pseudo-dist mode,
> and
> >> > met
> >> > the problem of HMaster crashing. Here is how I did.
> >> >
> >> > I. First I installed Hadoop pseudo cluster (hadoop-0.20.2-CDH3B4) with
> >> > the
> >> > following conf edited.
> >> >
> >> > 1) core-site.xml ==>
> >> > <property>
> >> >  <name>fs.default.name</name>
> >> >  <value>hdfs://localhost:9000</value>
> >> > </property>
> >> >
> >> > 2) hdfs-site.xml ==>
> >> >  <property>
> >> >    <name>dfs.replication</name>
> >> >    <value>1</value>
> >> >  </property>
> >> >
> >> > (with above confs, start-all.sh was run, and the hadoop pseudo cluster
> >> > started to run happily)
> >> >
> >> >
> >> > Secondly, I installed hbase-0.90.1-CDH3B4 with the following conf
> >> > edited.
> >> >
> >> > hbase-site.xml ==>
> >> >  <property>
> >> >    <name>hbase.rootdir</name>
> >> >    <value>hdfs://localhost:9000/hbase</value>
> >> >  </property>
> >> >
> >> >  <property>
> >> >    <name>hbase.cluster.distributed</name>
> >> >    <value>true</value>
> >> >  </property>
> >> >
> >> >  <property>
> >> >    <name>hbase.zookeeper.quorum</name>
> >> >    <value>localhost</value>
> >> >  </property>
> >> >
> >> >  <property>
> >> >    <name>dfs.replication</name>
> >> >    <value>1</value>
> >> >    <description>The replication count for HLog and HFile storage.
> Should
> >> > not be greater than HDFS datanode count.
> >> >    </description>
> >> >  </property>
> >> >
> >> > (with the above conf, I run the command of hbase-start.sh, and I
> >> > realised
> >> > that HMaster did not function well -- i can't access localhost:60010)
> >> >
> >> >
> >> > II. Here is the HMaster error log:
> >> >
> >> > 2011-05-28 23:22:55,292 WARN
> >> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a
> >> > viable
> >> > location to assign region -ROOT-,,0.70236052
> >> > 2011-05-28 23:23:35,291 INFO
> >> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in
> transition
> >> > timed out:  -ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
> >> > 2011-05-28 23:23:35,291 INFO
> >> > org.apache.hadoop.hbase.master.AssignmentManager: Region has been
> >> > OFFLINE
> >> > for too long, reassigning -ROOT-,,0.70236052 to a random server
> >> > 2011-05-28 23:23:35,291 DEBUG
> >> > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> >> > was=-ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
> >> > 2011-05-28 23:23:35,291 DEBUG
> >> > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing
> >> > plan
> >> > for region -ROOT-,,0.70236052; plan=hri=-ROOT-,,0.70236052, src=,
> >> > dest=localhost,60020,1306648534687
> >> > 2011-05-28 23:23:35,291 DEBUG
> >> > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> >> > -ROOT-,,0.70236052 to localhost,60020,1306648534687
> >> > 2011-05-28 23:23:35,291 DEBUG
> >> > org.apache.hadoop.hbase.master.ServerManager:
> >> > New connection to localhost,60020,1306648534687
> >> > 2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at
> /
> >> > 127.0.0.1:60020 could not be reached after 1 tries, giving up.
> >> > 2011-05-28 23:23:35,292 WARN
> >> > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of
> >> > -ROOT-,,0.70236052 to serverName=localhost,60020,1306648534687,
> >> > load=(requests=0, regions=0, usedHeap=22, maxHeap=996), trying to
> assign
> >> > elsewhere instead; retry=0
> >> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> setting
> >> > up
> >> > proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /
> >> > 127.0.0.1:60020 after attempts=1
> >> >        at
> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1605)
> >> >        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> >> > Caused by: java.net.ConnectException: Connection refused
> >> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >> >        at
> >> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> >> >        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
> >> >        at
> >> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
> >> >        at
> >> > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> >> >        at $Proxy6.getProtocolVersion(Unknown Source)
> >> >        at
> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> >> >        at
> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> >> >        at
> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> >> >        at
> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> >> >        ... 8 more
> >> > 2011-05-28 23:23:35,292 WARN
> >> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a
> >> > viable
> >> > location to assign region -ROOT-,,0.70236052
> >> >
> >> >
> >> >
> >> > III. Here is the zk status from http://localhost:60010/zk.jsp
> >> >
> >> > HBase is rooted at /hbase
> >> > Master address: sean-PowerEdge:60000
> >> > Region server holding ROOT: null
> >> > Region servers:
> >> >  sean-PowerEdge:60020
> >> > Quorum Server Statistics:
> >> >  localhost:2181
> >> >  Zookeeper version: 3.3.2-CDH3B4--1, built on 02/21/2011 20:16 GMT
> >> >  Clients:
> >> >   /127.0.0.1:42221[0](queued=0,recved=1,sent=0)
> >> >   /127.0.0.1:44071[1](queued=0,recved=39,sent=44)
> >> >   /127.0.0.1:44078[1](queued=0,recved=23,sent=24)
> >> >   /127.0.0.1:44085[1](queued=0,recved=23,sent=23)
> >> >   /127.0.0.1:44077[1](queued=0,recved=19,sent=19)
> >> >
> >> >  Latency min/avg/max: 0/6/164
> >> >  Received: 105
> >> >  Sent: 110
> >> >  Outstanding: 0
> >> >  Zxid: 0x148
> >> >  Mode: standalone
> >> >  Node count: 12
> >> >
> >> >
> >> > What's the problem causing the above symptom?
> >> >
> >> > Thanks,
> >> > --
> >> > --Sean
> >> >
> >
> >
> >
> > --
> > --Sean
> >
> >
>

Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

Posted by Stack <st...@duboce.net>.

On Tue, May 31, 2011 at 11:45 PM, Sean Bigdatafun
<se...@gmail.com> wrote:
> Sure. Thanks, St.Ack. Here are the attached HBase logs, plus the screenshot
> of the region server. The /etc/hosts should be Ok I think because my Hadoop
> (pseudo distributed )cluster runs well and healthy.

FYI, what works for hadoop may not work for hbase.

> But I post it here in
> case I missed something :-0
>
> 127.0.0.1    localhost
> 127.0.1.1    sean-PowerEdge
>
> # The following lines are desirable for IPv6 capable hosts
> ::1     ip6-localhost ip6-loopback localhost6
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
>

Try turning off ipv6.  In the past its been fingered as problem-causing.

Looking in your logs:

+ Make sure you fix this before you put any significant data into
hbase 'ulimit -n 1024'

So, yeah, it looks like your /etc/hosts needs fixing.  When the
regionserver does its lookup its finding its hostname to be localhost:

2011-05-31 23:32:44,742 INFO
org.apache.hadoop.hbase.master.ServerManager: Registering
server=localhost,60020,1306909960650, regionCount=0, userLoad=false

But then when the master tries to send it a region, its trying to send it to

2011-05-31 23:32:47,671 INFO org.apache.hadoop.ipc.HbaseRPC: Server at
/127.0.0.1:60020 could not be reached after 1 tries, giving up.

.... notice the 127.0.0.1 above.

Fix this discrepency.

St.Ack



> Thanks,
> Sean
>
>
>
>
>
> On Mon, May 30, 2011 at 7:34 PM, Stack <st...@duboce.net> wrote:
>>
>> Odd.  I dont' see the regionserver checking into the master (maybe
>> thats the way it is in pseudo-distributed and I just forgot).  Can you
>> paste more master log?   I don't see the regionserver coming in in the
>> snippet you've pasted so not sure how its registering itself (I see
>> the timeout when we try to assign it -ROOT-).
>>
>> Whats in your /etc/hosts?  I see lots of locahost and 127.0.0.1.
>> Maybe the two are not equated in your resolve setup?
>>
>> St.Ack
>>
>> On Sat, May 28, 2011 at 11:28 PM, Sean Bigdatafun
>> <se...@gmail.com> wrote:
>> > I am trying for 0.90.1 (hbase-0.90.1-CDH3B4) under pseudo-dist mode, and
>> > met
>> > the problem of HMaster crashing. Here is how I did.
>> >
>> > I. First I installed Hadoop pseudo cluster (hadoop-0.20.2-CDH3B4) with
>> > the
>> > following conf edited.
>> >
>> > 1) core-site.xml ==>
>> > <property>
>> >  <name>fs.default.name</name>
>> >  <value>hdfs://localhost:9000</value>
>> > </property>
>> >
>> > 2) hdfs-site.xml ==>
>> >  <property>
>> >    <name>dfs.replication</name>
>> >    <value>1</value>
>> >  </property>
>> >
>> > (with above confs, start-all.sh was run, and the hadoop pseudo cluster
>> > started to run happily)
>> >
>> >
>> > Secondly, I installed hbase-0.90.1-CDH3B4 with the following conf
>> > edited.
>> >
>> > hbase-site.xml ==>
>> >  <property>
>> >    <name>hbase.rootdir</name>
>> >    <value>hdfs://localhost:9000/hbase</value>
>> >  </property>
>> >
>> >  <property>
>> >    <name>hbase.cluster.distributed</name>
>> >    <value>true</value>
>> >  </property>
>> >
>> >  <property>
>> >    <name>hbase.zookeeper.quorum</name>
>> >    <value>localhost</value>
>> >  </property>
>> >
>> >  <property>
>> >    <name>dfs.replication</name>
>> >    <value>1</value>
>> >    <description>The replication count for HLog and HFile storage. Should
>> > not be greater than HDFS datanode count.
>> >    </description>
>> >  </property>
>> >
>> > (with the above conf, I run the command of hbase-start.sh, and I
>> > realised
>> > that HMaster did not function well -- i can't access localhost:60010)
>> >
>> >
>> > II. Here is the HMaster error log:
>> >
>> > 2011-05-28 23:22:55,292 WARN
>> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a
>> > viable
>> > location to assign region -ROOT-,,0.70236052
>> > 2011-05-28 23:23:35,291 INFO
>> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
>> > timed out:  -ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
>> > 2011-05-28 23:23:35,291 INFO
>> > org.apache.hadoop.hbase.master.AssignmentManager: Region has been
>> > OFFLINE
>> > for too long, reassigning -ROOT-,,0.70236052 to a random server
>> > 2011-05-28 23:23:35,291 DEBUG
>> > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
>> > was=-ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
>> > 2011-05-28 23:23:35,291 DEBUG
>> > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing
>> > plan
>> > for region -ROOT-,,0.70236052; plan=hri=-ROOT-,,0.70236052, src=,
>> > dest=localhost,60020,1306648534687
>> > 2011-05-28 23:23:35,291 DEBUG
>> > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
>> > -ROOT-,,0.70236052 to localhost,60020,1306648534687
>> > 2011-05-28 23:23:35,291 DEBUG
>> > org.apache.hadoop.hbase.master.ServerManager:
>> > New connection to localhost,60020,1306648534687
>> > 2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /
>> > 127.0.0.1:60020 could not be reached after 1 tries, giving up.
>> > 2011-05-28 23:23:35,292 WARN
>> > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of
>> > -ROOT-,,0.70236052 to serverName=localhost,60020,1306648534687,
>> > load=(requests=0, regions=0, usedHeap=22, maxHeap=996), trying to assign
>> > elsewhere instead; retry=0
>> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting
>> > up
>> > proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /
>> > 127.0.0.1:60020 after attempts=1
>> >        at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
>> >        at
>> >
>> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
>> >        at
>> >
>> > org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
>> >        at
>> >
>> > org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
>> >        at
>> >
>> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
>> >        at
>> >
>> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
>> >        at
>> >
>> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
>> >        at
>> >
>> > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1605)
>> >        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>> > Caused by: java.net.ConnectException: Connection refused
>> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> >        at
>> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>> >        at
>> >
>> > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>> >        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>> >        at
>> >
>> > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>> >        at
>> >
>> > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>> >        at
>> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>> >        at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>> >        at $Proxy6.getProtocolVersion(Unknown Source)
>> >        at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>> >        at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>> >        at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>> >        at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>> >        ... 8 more
>> > 2011-05-28 23:23:35,292 WARN
>> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a
>> > viable
>> > location to assign region -ROOT-,,0.70236052
>> >
>> >
>> >
>> > III. Here is the zk status from http://localhost:60010/zk.jsp
>> >
>> > HBase is rooted at /hbase
>> > Master address: sean-PowerEdge:60000
>> > Region server holding ROOT: null
>> > Region servers:
>> >  sean-PowerEdge:60020
>> > Quorum Server Statistics:
>> >  localhost:2181
>> >  Zookeeper version: 3.3.2-CDH3B4--1, built on 02/21/2011 20:16 GMT
>> >  Clients:
>> >   /127.0.0.1:42221[0](queued=0,recved=1,sent=0)
>> >   /127.0.0.1:44071[1](queued=0,recved=39,sent=44)
>> >   /127.0.0.1:44078[1](queued=0,recved=23,sent=24)
>> >   /127.0.0.1:44085[1](queued=0,recved=23,sent=23)
>> >   /127.0.0.1:44077[1](queued=0,recved=19,sent=19)
>> >
>> >  Latency min/avg/max: 0/6/164
>> >  Received: 105
>> >  Sent: 110
>> >  Outstanding: 0
>> >  Zxid: 0x148
>> >  Mode: standalone
>> >  Node count: 12
>> >
>> >
>> > What's the problem causing the above symptom?
>> >
>> > Thanks,
>> > --
>> > --Sean
>> >
>
>
>
> --
> --Sean
>
>

Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

Posted by Sean Bigdatafun <se...@gmail.com>.

Sure. Thanks, St.Ack. Here are the attached HBase logs, plus the screenshot
of the region server. The /etc/hosts should be Ok I think because my Hadoop
(pseudo distributed )cluster runs well and healthy. But I post it here in
case I missed something :-0

127.0.0.1    localhost
127.0.1.1    sean-PowerEdge

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback localhost6
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Thanks,
Sean





On Mon, May 30, 2011 at 7:34 PM, Stack <st...@duboce.net> wrote:

> Odd.  I dont' see the regionserver checking into the master (maybe
> thats the way it is in pseudo-distributed and I just forgot).  Can you
> paste more master log?   I don't see the regionserver coming in in the
> snippet you've pasted so not sure how its registering itself (I see
> the timeout when we try to assign it -ROOT-).
>
> Whats in your /etc/hosts?  I see lots of locahost and 127.0.0.1.
> Maybe the two are not equated in your resolve setup?
>
> St.Ack
>
> On Sat, May 28, 2011 at 11:28 PM, Sean Bigdatafun
> <se...@gmail.com> wrote:
> > I am trying for 0.90.1 (hbase-0.90.1-CDH3B4) under pseudo-dist mode, and
> met
> > the problem of HMaster crashing. Here is how I did.
> >
> > I. First I installed Hadoop pseudo cluster (hadoop-0.20.2-CDH3B4) with
> the
> > following conf edited.
> >
> > 1) core-site.xml ==>
> > <property>
> >  <name>fs.default.name</name>
> >  <value>hdfs://localhost:9000</value>
> > </property>
> >
> > 2) hdfs-site.xml ==>
> >  <property>
> >    <name>dfs.replication</name>
> >    <value>1</value>
> >  </property>
> >
> > (with above confs, start-all.sh was run, and the hadoop pseudo cluster
> > started to run happily)
> >
> >
> > Secondly, I installed hbase-0.90.1-CDH3B4 with the following conf edited.
> >
> > hbase-site.xml ==>
> >  <property>
> >    <name>hbase.rootdir</name>
> >    <value>hdfs://localhost:9000/hbase</value>
> >  </property>
> >
> >  <property>
> >    <name>hbase.cluster.distributed</name>
> >    <value>true</value>
> >  </property>
> >
> >  <property>
> >    <name>hbase.zookeeper.quorum</name>
> >    <value>localhost</value>
> >  </property>
> >
> >  <property>
> >    <name>dfs.replication</name>
> >    <value>1</value>
> >    <description>The replication count for HLog and HFile storage. Should
> > not be greater than HDFS datanode count.
> >    </description>
> >  </property>
> >
> > (with the above conf, I run the command of hbase-start.sh, and I realised
> > that HMaster did not function well -- i can't access localhost:60010)
> >
> >
> > II. Here is the HMaster error log:
> >
> > 2011-05-28 23:22:55,292 WARN
> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable
> > location to assign region -ROOT-,,0.70236052
> > 2011-05-28 23:23:35,291 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> > timed out:  -ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
> > 2011-05-28 23:23:35,291 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
> > for too long, reassigning -ROOT-,,0.70236052 to a random server
> > 2011-05-28 23:23:35,291 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> > was=-ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
> > 2011-05-28 23:23:35,291 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan
> > for region -ROOT-,,0.70236052; plan=hri=-ROOT-,,0.70236052, src=,
> > dest=localhost,60020,1306648534687
> > 2011-05-28 23:23:35,291 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> > -ROOT-,,0.70236052 to localhost,60020,1306648534687
> > 2011-05-28 23:23:35,291 DEBUG
> org.apache.hadoop.hbase.master.ServerManager:
> > New connection to localhost,60020,1306648534687
> > 2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /
> > 127.0.0.1:60020 could not be reached after 1 tries, giving up.
> > 2011-05-28 23:23:35,292 WARN
> > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of
> > -ROOT-,,0.70236052 to serverName=localhost,60020,1306648534687,
> > load=(requests=0, regions=0, usedHeap=22, maxHeap=996), trying to assign
> > elsewhere instead; retry=0
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting
> up
> > proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /
> > 127.0.0.1:60020 after attempts=1
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
> >        at
> >
> org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
> >        at
> >
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
> >        at
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
> >        at
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
> >        at
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
> >        at
> >
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1605)
> >        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> > Caused by: java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> >
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> >        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> >        at $Proxy6.getProtocolVersion(Unknown Source)
> >        at
> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> >        at
> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> >        at
> org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> >        ... 8 more
> > 2011-05-28 23:23:35,292 WARN
> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable
> > location to assign region -ROOT-,,0.70236052
> >
> >
> >
> > III. Here is the zk status from http://localhost:60010/zk.jsp
> >
> > HBase is rooted at /hbase
> > Master address: sean-PowerEdge:60000
> > Region server holding ROOT: null
> > Region servers:
> >  sean-PowerEdge:60020
> > Quorum Server Statistics:
> >  localhost:2181
> >  Zookeeper version: 3.3.2-CDH3B4--1, built on 02/21/2011 20:16 GMT
> >  Clients:
> >   /127.0.0.1:42221[0](queued=0,recved=1,sent=0)
> >   /127.0.0.1:44071[1](queued=0,recved=39,sent=44)
> >   /127.0.0.1:44078[1](queued=0,recved=23,sent=24)
> >   /127.0.0.1:44085[1](queued=0,recved=23,sent=23)
> >   /127.0.0.1:44077[1](queued=0,recved=19,sent=19)
> >
> >  Latency min/avg/max: 0/6/164
> >  Received: 105
> >  Sent: 110
> >  Outstanding: 0
> >  Zxid: 0x148
> >  Mode: standalone
> >  Node count: 12
> >
> >
> > What's the problem causing the above symptom?
> >
> > Thanks,
> > --
> > --Sean
> >
>



-- 
--Sean

Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

Posted by Stack <st...@duboce.net>.

Odd.  I dont' see the regionserver checking into the master (maybe
thats the way it is in pseudo-distributed and I just forgot).  Can you
paste more master log?   I don't see the regionserver coming in in the
snippet you've pasted so not sure how its registering itself (I see
the timeout when we try to assign it -ROOT-).

Whats in your /etc/hosts?  I see lots of locahost and 127.0.0.1.
Maybe the two are not equated in your resolve setup?

St.Ack

On Sat, May 28, 2011 at 11:28 PM, Sean Bigdatafun
<se...@gmail.com> wrote:
> I am trying for 0.90.1 (hbase-0.90.1-CDH3B4) under pseudo-dist mode, and met
> the problem of HMaster crashing. Here is how I did.
>
> I. First I installed Hadoop pseudo cluster (hadoop-0.20.2-CDH3B4) with the
> following conf edited.
>
> 1) core-site.xml ==>
> <property>
>  <name>fs.default.name</name>
>  <value>hdfs://localhost:9000</value>
> </property>
>
> 2) hdfs-site.xml ==>
>  <property>
>    <name>dfs.replication</name>
>    <value>1</value>
>  </property>
>
> (with above confs, start-all.sh was run, and the hadoop pseudo cluster
> started to run happily)
>
>
> Secondly, I installed hbase-0.90.1-CDH3B4 with the following conf edited.
>
> hbase-site.xml ==>
>  <property>
>    <name>hbase.rootdir</name>
>    <value>hdfs://localhost:9000/hbase</value>
>  </property>
>
>  <property>
>    <name>hbase.cluster.distributed</name>
>    <value>true</value>
>  </property>
>
>  <property>
>    <name>hbase.zookeeper.quorum</name>
>    <value>localhost</value>
>  </property>
>
>  <property>
>    <name>dfs.replication</name>
>    <value>1</value>
>    <description>The replication count for HLog and HFile storage. Should
> not be greater than HDFS datanode count.
>    </description>
>  </property>
>
> (with the above conf, I run the command of hbase-start.sh, and I realised
> that HMaster did not function well -- i can't access localhost:60010)
>
>
> II. Here is the HMaster error log:
>
> 2011-05-28 23:22:55,292 WARN
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable
> location to assign region -ROOT-,,0.70236052
> 2011-05-28 23:23:35,291 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> timed out:  -ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
> 2011-05-28 23:23:35,291 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OFFLINE
> for too long, reassigning -ROOT-,,0.70236052 to a random server
> 2011-05-28 23:23:35,291 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> was=-ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
> 2011-05-28 23:23:35,291 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan
> for region -ROOT-,,0.70236052; plan=hri=-ROOT-,,0.70236052, src=,
> dest=localhost,60020,1306648534687
> 2011-05-28 23:23:35,291 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> -ROOT-,,0.70236052 to localhost,60020,1306648534687
> 2011-05-28 23:23:35,291 DEBUG org.apache.hadoop.hbase.master.ServerManager:
> New connection to localhost,60020,1306648534687
> 2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /
> 127.0.0.1:60020 could not be reached after 1 tries, giving up.
> 2011-05-28 23:23:35,292 WARN
> org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of
> -ROOT-,,0.70236052 to serverName=localhost,60020,1306648534687,
> load=(requests=0, regions=0, usedHeap=22, maxHeap=996), trying to assign
> elsewhere instead; retry=0
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
> proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /
> 127.0.0.1:60020 after attempts=1
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
>        at
> org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
>        at
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
>        at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
>        at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
>        at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
>        at
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1605)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>        at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>        at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>        at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>        at $Proxy6.getProtocolVersion(Unknown Source)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>        ... 8 more
> 2011-05-28 23:23:35,292 WARN
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable
> location to assign region -ROOT-,,0.70236052
>
>
>
> III. Here is the zk status from http://localhost:60010/zk.jsp
>
> HBase is rooted at /hbase
> Master address: sean-PowerEdge:60000
> Region server holding ROOT: null
> Region servers:
>  sean-PowerEdge:60020
> Quorum Server Statistics:
>  localhost:2181
>  Zookeeper version: 3.3.2-CDH3B4--1, built on 02/21/2011 20:16 GMT
>  Clients:
>   /127.0.0.1:42221[0](queued=0,recved=1,sent=0)
>   /127.0.0.1:44071[1](queued=0,recved=39,sent=44)
>   /127.0.0.1:44078[1](queued=0,recved=23,sent=24)
>   /127.0.0.1:44085[1](queued=0,recved=23,sent=23)
>   /127.0.0.1:44077[1](queued=0,recved=19,sent=19)
>
>  Latency min/avg/max: 0/6/164
>  Received: 105
>  Sent: 110
>  Outstanding: 0
>  Zxid: 0x148
>  Mode: standalone
>  Node count: 12
>
>
> What's the problem causing the above symptom?
>
> Thanks,
> --
> --Sean
>