You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2011/03/08 20:38:59 UTC

[jira] Updated: (HBASE-1960) Master should wait for DFS to come up when creating hbase.version

     [ https://issues.apache.org/jira/browse/HBASE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1960:
----------------------------------

    Attachment: HBASE-1960-redux.patch

DFS will immediately leave safe mode with 0 DNs when there are 0 blocks. This is inconvenient. It's an edge case but happens for example when setting up EC2 clusters where up comes the master instance running both NN and HMaster, and slaves with DNs and RegionServers come up at some later time.

We used to handle this by checking the current DN count and waiting until it is nonzero. With security, the check for datanode countdoesn't work -- it is a privileged op, we swallow the IOE, and continue. Attached -redux patch removes the DN count check and instead adopts the strategy of the jobtracker: we simply retry indefinitely the creation of hbase.version. This will handle both the secure and nonsecure cases.

> Master should wait for DFS to come up when creating hbase.version
> -----------------------------------------------------------------
>
>                 Key: HBASE-1960
>                 URL: https://issues.apache.org/jira/browse/HBASE-1960
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.20.3, 0.90.0
>
>         Attachments: HBASE-1960-redux.patch, HBASE-1960.patch
>
>
> The master does not wait for DFS to come up in the circumstance where the DFS master is started for the first time after format and no datanodes have been started yet. 
> {noformat}
> 2009-11-07 11:47:28,115 INFO org.apache.hadoop.hbase.master.HMaster: vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems Inc., vmVersion=14.2-b01
> 2009-11-07 11:47:28,116 INFO org.apache.hadoop.hbase.master.HMaster: vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError, -XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode, -Dhbase.log.dir=/mnt/hbase/logs, -Dhbase.log.file=hbase-root-master-ip-10-242-15-159.log, -Dhbase.home.dir=/usr/local/hbase-0.20.1/bin/.., -Dhbase.id.str=root, -Dhbase.root.logger=INFO,DRFA, -Djava.library.path=/usr/local/hbase-0.20.1/bin/../lib/native/Linux-amd64-64]
> 2009-11-07 11:47:28,247 INFO org.apache.hadoop.hbase.master.HMaster: My address is ip-10-242-15-159.ec2.internal:60000
> 2009-11-07 11:47:28,728 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/hbase.version could only be replicated to 0 nodes, instead of 1
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1267)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> [...]
> 2009-11-07 11:47:28,728 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
> 2009-11-07 11:47:28,728 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/hbase/hbase.version" - Aborting...
> 2009-11-07 11:47:28,729 FATAL org.apache.hadoop.hbase.master.HMaster: Not starting HMaster because:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/hbase.version could only be replicated to 0 nodes, instead of 1
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1267)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> {noformat}
> Should probably sleep and retry the write a few times.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira