You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Colin Kincaid Williams <di...@uw.edu> on 2014/07/31 18:37:37 UTC

Juggling or swaping out the standby NameNode in a QJM / HA configuration

Hello,

I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
believe the steps to achieve this would be something similar to:

Use the Bootstrap standby command to prep the replacment standby. Or rsync
if the command fails.

Somehow update the datanodes, so they push the heartbeat / journal to the
new standby

Update the xml configuration on all nodes to reflect the replacment standby.

Start the replacment standby

Use some hadoop command to refresh the datanodes to the new NameNode
configuration.

I am not sure how to deal with the Journal switch, or if I am going about
this the right way. Can anybody give me some suggestions here?


Regards,

Colin Williams

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Another error after stopping the zkfc. Do I have to take the cluster down
to format ZK?

[root@rhel1 conf]# sudo service hadoop-hdfs-zkfc stop
Stopping Hadoop zkfc:                                      [  OK  ]
stopping zkfc
[root@rhel1 conf]# sudo -u hdfs zkfc -formatZK
sudo: zkfc: command not found
[root@rhel1 conf]# hdfs zkfc -formatZK
2014-07-31 17:49:56,792 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 17:49:57,002 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 17:49:57,015 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 17:49:57,040 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel6.local/10.120.5.247:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 17:49:57,047 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel6.local/10.120.5.247:2181, initiating session
2014-07-31 17:49:57,050 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:run(1065)) - Unable to read
additional data from server sessionid 0x0, likely server has closed socket,
closing socket connection and attempting reconnect
2014-07-31 17:49:57,989 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x0 closed
2014-07-31 17:49:57,989 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
Exception in thread "main" java.io.IOException: Couldn't determine
existence of znode '/hadoop-ha/golden-apple'
at
org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263)
at
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:258)
at
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
at
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
at
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hadoop-ha/golden-apple
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049)
at
org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261)
... 8 more



On Thu, Jul 31, 2014 at 5:56 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> On  3) Run hdfs zkfc -formatZK in my test environment, I get a Warning
> then an error
>
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
>
>
> the complete output:
>
> sudo hdfs zkfc -formatZK
> 2014-07-31 17:43:07,952 INFO  [main] tools.DFSZKFailoverController
> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
> for NameNode NameNode at rhel1.local/10.120.5.203:8020
> 2014-07-31 17:43:08,128 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
> GMT
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
> Corporation
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.home=/usr/java/jdk1.7.0_60/jre
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
> 2014-07-31 17:43:08,130 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.library.path=//usr/lib/hadoop/lib/native
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:os.version=2.6.32-358.el6.x86_64
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.name=root
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.home=/root
> 2014-07-31 17:43:08,139 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:user.dir=/etc/hbase/conf.golden_apple
> 2014-07-31 17:43:08,149 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:<init>(433)) - Initiating client connection,
> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
> sessionTimeout=5000 watcher=null
> 2014-07-31 17:43:08,170 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
> socket connection to server rhel2.local/10.120.5.25:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-07-31 17:43:08,184 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
> connection established to rhel2.local/10.120.5.25:2181, initiating session
> 2014-07-31 17:43:08,262 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
> establishment complete on server rhel2.local/10.120.5.25:2181, sessionid
> = 0x3478900fbb40019, negotiated timeout = 5000
> ===============================================
> The configured parent znode /hadoop-ha/golden-apple already exists.
> Are you sure you want to clear all failover information from
> ZooKeeper?
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
> ===============================================
> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
> 17:43:08,268 INFO  [main-EventThread] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
> Y
> 2014-07-31 17:43:45,025 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
> /hadoop-ha/golden-apple from ZK...
> 2014-07-31 17:43:45,098 ERROR [main] ha.ZKFailoverController
> (ZKFailoverController.java:formatZK(266)) - Unable to clear zk parent znode
> java.io.IOException: Couldn't clear parent znode /hadoop-ha/golden-apple
> at
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:324)
>  at
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:264)
> at
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
>  at
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
> at
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
>  at
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
>  at
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
> at
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
> Caused by: org.apache.zookeeper.KeeperException$NotEmptyException:
> KeeperErrorCode = Directory not empty for /hadoop-ha/golden-apple
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:868)
>  at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)
> at
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:319)
>  at
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:316)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:934)
>  at
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:316)
> ... 8 more
> 2014-07-31 17:43:45,119 INFO  [main-EventThread] zookeeper.ClientCnxn
> (ClientCnxn.java:run(511)) - EventThread shut down
> 2014-07-31 17:43:45,119 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:close(679)) - Session: 0x3478900fbb40019 closed
>
>
>
> On Thu, Jul 31, 2014 at 1:56 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Thanks! I will give this a shot.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <
>> bbeaudreault@hubspot.com> wrote:
>>
>>> We've done this a number of times without issue.  Here's the general
>>> flow:
>>>
>>> 1) Shutdown namenode and zkfc on SNN
>>> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
>>> zkfc instance running to fail over to)
>>> 3) Run hdfs zkfc -formatZK on ANN
>>> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
>>> 5) Push new configs to the new SNN, bootstrap namenode there
>>> 6) Start namenode and zkfc on SNN
>>> 7) Push updated configs to all other hdfs services (datanodes, etc)
>>> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
>>> 9) Rolling restart datanodes
>>> 10) Done
>>>
>>> You'll have to handle any other consumers of DFSClient, like your own
>>> code or other apache projects.
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> Hi Jing,
>>>>
>>>> Thanks for the response. I will try this out, and file an Apache jira.
>>>>
>>>> Best,
>>>>
>>>> Colin Williams
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>> wrote:
>>>>
>>>>> Hi Colin,
>>>>>
>>>>>     I guess currently we may have to restart almost all the
>>>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>
>>>>> 1. The current active NameNode (ANN) needs to know the new SBN since
>>>>> in the current implementation the SBN tries to send rollEditLog RPC request
>>>>> to ANN periodically (thus if a NN failover happens later, the original ANN
>>>>> needs to send this RPC to the correct NN).
>>>>> 2. Looks like the DataNode currently cannot do real refreshment for
>>>>> NN. Look at the code in BPOfferService:
>>>>>
>>>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>> IOException {
>>>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>     for (BPServiceActor actor : bpServices) {
>>>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>     }
>>>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>
>>>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>       // Keep things simple for now -- we can implement this at a
>>>>> later date.
>>>>>       throw new IOException(
>>>>>           "HA does not currently support adding a new standby to a
>>>>> running DN. " +
>>>>>           "Please do a rolling restart of DNs to reconfigure the list
>>>>> of NNs.");
>>>>>     }
>>>>>   }
>>>>>
>>>>> 3. If you're using automatic failover, you also need to update the
>>>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>>>> gracefully fencing by sending RPC to the other NN.
>>>>> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>>> but I have not tried before.
>>>>>
>>>>>     Thus in general we may still have to restart all the services
>>>>> (except JNs) and update their configurations. But this may be a rolling
>>>>> restart process I guess:
>>>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>> restart of all the DN to update their configurations
>>>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>> their configuration. The new SBN should become active.
>>>>>
>>>>>     I have not tried the upper steps, thus please let me know if this
>>>>> works or not. And I think we should also document the correct steps in
>>>>> Apache. Could you please file an Apache jira?
>>>>>
>>>>> Thanks,
>>>>> -Jing
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>> configuration. I believe the steps to achieve this would be something
>>>>>> similar to:
>>>>>>
>>>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>>>> rsync if the command fails.
>>>>>>
>>>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>>>> the new standby
>>>>>>
>>>>>> Update the xml configuration on all nodes to reflect the replacment
>>>>>> standby.
>>>>>>
>>>>>> Start the replacment standby
>>>>>>
>>>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>>>> configuration.
>>>>>>
>>>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>>>> about this the right way. Can anybody give me some suggestions here?
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Colin Williams
>>>>>>
>>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Another error after stopping the zkfc. Do I have to take the cluster down
to format ZK?

[root@rhel1 conf]# sudo service hadoop-hdfs-zkfc stop
Stopping Hadoop zkfc:                                      [  OK  ]
stopping zkfc
[root@rhel1 conf]# sudo -u hdfs zkfc -formatZK
sudo: zkfc: command not found
[root@rhel1 conf]# hdfs zkfc -formatZK
2014-07-31 17:49:56,792 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 17:49:57,002 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 17:49:57,015 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 17:49:57,040 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel6.local/10.120.5.247:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 17:49:57,047 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel6.local/10.120.5.247:2181, initiating session
2014-07-31 17:49:57,050 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:run(1065)) - Unable to read
additional data from server sessionid 0x0, likely server has closed socket,
closing socket connection and attempting reconnect
2014-07-31 17:49:57,989 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x0 closed
2014-07-31 17:49:57,989 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
Exception in thread "main" java.io.IOException: Couldn't determine
existence of znode '/hadoop-ha/golden-apple'
at
org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263)
at
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:258)
at
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
at
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
at
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hadoop-ha/golden-apple
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049)
at
org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261)
... 8 more



On Thu, Jul 31, 2014 at 5:56 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> On  3) Run hdfs zkfc -formatZK in my test environment, I get a Warning
> then an error
>
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
>
>
> the complete output:
>
> sudo hdfs zkfc -formatZK
> 2014-07-31 17:43:07,952 INFO  [main] tools.DFSZKFailoverController
> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
> for NameNode NameNode at rhel1.local/10.120.5.203:8020
> 2014-07-31 17:43:08,128 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
> GMT
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
> Corporation
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.home=/usr/java/jdk1.7.0_60/jre
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
> 2014-07-31 17:43:08,130 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.library.path=//usr/lib/hadoop/lib/native
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:os.version=2.6.32-358.el6.x86_64
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.name=root
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.home=/root
> 2014-07-31 17:43:08,139 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:user.dir=/etc/hbase/conf.golden_apple
> 2014-07-31 17:43:08,149 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:<init>(433)) - Initiating client connection,
> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
> sessionTimeout=5000 watcher=null
> 2014-07-31 17:43:08,170 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
> socket connection to server rhel2.local/10.120.5.25:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-07-31 17:43:08,184 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
> connection established to rhel2.local/10.120.5.25:2181, initiating session
> 2014-07-31 17:43:08,262 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
> establishment complete on server rhel2.local/10.120.5.25:2181, sessionid
> = 0x3478900fbb40019, negotiated timeout = 5000
> ===============================================
> The configured parent znode /hadoop-ha/golden-apple already exists.
> Are you sure you want to clear all failover information from
> ZooKeeper?
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
> ===============================================
> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
> 17:43:08,268 INFO  [main-EventThread] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
> Y
> 2014-07-31 17:43:45,025 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
> /hadoop-ha/golden-apple from ZK...
> 2014-07-31 17:43:45,098 ERROR [main] ha.ZKFailoverController
> (ZKFailoverController.java:formatZK(266)) - Unable to clear zk parent znode
> java.io.IOException: Couldn't clear parent znode /hadoop-ha/golden-apple
> at
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:324)
>  at
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:264)
> at
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
>  at
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
> at
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
>  at
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
>  at
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
> at
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
> Caused by: org.apache.zookeeper.KeeperException$NotEmptyException:
> KeeperErrorCode = Directory not empty for /hadoop-ha/golden-apple
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:868)
>  at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)
> at
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:319)
>  at
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:316)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:934)
>  at
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:316)
> ... 8 more
> 2014-07-31 17:43:45,119 INFO  [main-EventThread] zookeeper.ClientCnxn
> (ClientCnxn.java:run(511)) - EventThread shut down
> 2014-07-31 17:43:45,119 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:close(679)) - Session: 0x3478900fbb40019 closed
>
>
>
> On Thu, Jul 31, 2014 at 1:56 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Thanks! I will give this a shot.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <
>> bbeaudreault@hubspot.com> wrote:
>>
>>> We've done this a number of times without issue.  Here's the general
>>> flow:
>>>
>>> 1) Shutdown namenode and zkfc on SNN
>>> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
>>> zkfc instance running to fail over to)
>>> 3) Run hdfs zkfc -formatZK on ANN
>>> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
>>> 5) Push new configs to the new SNN, bootstrap namenode there
>>> 6) Start namenode and zkfc on SNN
>>> 7) Push updated configs to all other hdfs services (datanodes, etc)
>>> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
>>> 9) Rolling restart datanodes
>>> 10) Done
>>>
>>> You'll have to handle any other consumers of DFSClient, like your own
>>> code or other apache projects.
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> Hi Jing,
>>>>
>>>> Thanks for the response. I will try this out, and file an Apache jira.
>>>>
>>>> Best,
>>>>
>>>> Colin Williams
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>> wrote:
>>>>
>>>>> Hi Colin,
>>>>>
>>>>>     I guess currently we may have to restart almost all the
>>>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>
>>>>> 1. The current active NameNode (ANN) needs to know the new SBN since
>>>>> in the current implementation the SBN tries to send rollEditLog RPC request
>>>>> to ANN periodically (thus if a NN failover happens later, the original ANN
>>>>> needs to send this RPC to the correct NN).
>>>>> 2. Looks like the DataNode currently cannot do real refreshment for
>>>>> NN. Look at the code in BPOfferService:
>>>>>
>>>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>> IOException {
>>>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>     for (BPServiceActor actor : bpServices) {
>>>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>     }
>>>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>
>>>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>       // Keep things simple for now -- we can implement this at a
>>>>> later date.
>>>>>       throw new IOException(
>>>>>           "HA does not currently support adding a new standby to a
>>>>> running DN. " +
>>>>>           "Please do a rolling restart of DNs to reconfigure the list
>>>>> of NNs.");
>>>>>     }
>>>>>   }
>>>>>
>>>>> 3. If you're using automatic failover, you also need to update the
>>>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>>>> gracefully fencing by sending RPC to the other NN.
>>>>> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>>> but I have not tried before.
>>>>>
>>>>>     Thus in general we may still have to restart all the services
>>>>> (except JNs) and update their configurations. But this may be a rolling
>>>>> restart process I guess:
>>>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>> restart of all the DN to update their configurations
>>>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>> their configuration. The new SBN should become active.
>>>>>
>>>>>     I have not tried the upper steps, thus please let me know if this
>>>>> works or not. And I think we should also document the correct steps in
>>>>> Apache. Could you please file an Apache jira?
>>>>>
>>>>> Thanks,
>>>>> -Jing
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>> configuration. I believe the steps to achieve this would be something
>>>>>> similar to:
>>>>>>
>>>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>>>> rsync if the command fails.
>>>>>>
>>>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>>>> the new standby
>>>>>>
>>>>>> Update the xml configuration on all nodes to reflect the replacment
>>>>>> standby.
>>>>>>
>>>>>> Start the replacment standby
>>>>>>
>>>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>>>> configuration.
>>>>>>
>>>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>>>> about this the right way. Can anybody give me some suggestions here?
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Colin Williams
>>>>>>
>>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Another error after stopping the zkfc. Do I have to take the cluster down
to format ZK?

[root@rhel1 conf]# sudo service hadoop-hdfs-zkfc stop
Stopping Hadoop zkfc:                                      [  OK  ]
stopping zkfc
[root@rhel1 conf]# sudo -u hdfs zkfc -formatZK
sudo: zkfc: command not found
[root@rhel1 conf]# hdfs zkfc -formatZK
2014-07-31 17:49:56,792 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 17:49:57,002 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 17:49:57,015 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 17:49:57,040 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel6.local/10.120.5.247:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 17:49:57,047 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel6.local/10.120.5.247:2181, initiating session
2014-07-31 17:49:57,050 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:run(1065)) - Unable to read
additional data from server sessionid 0x0, likely server has closed socket,
closing socket connection and attempting reconnect
2014-07-31 17:49:57,989 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x0 closed
2014-07-31 17:49:57,989 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
Exception in thread "main" java.io.IOException: Couldn't determine
existence of znode '/hadoop-ha/golden-apple'
at
org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263)
at
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:258)
at
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
at
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
at
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hadoop-ha/golden-apple
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049)
at
org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261)
... 8 more



On Thu, Jul 31, 2014 at 5:56 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> On  3) Run hdfs zkfc -formatZK in my test environment, I get a Warning
> then an error
>
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
>
>
> the complete output:
>
> sudo hdfs zkfc -formatZK
> 2014-07-31 17:43:07,952 INFO  [main] tools.DFSZKFailoverController
> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
> for NameNode NameNode at rhel1.local/10.120.5.203:8020
> 2014-07-31 17:43:08,128 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
> GMT
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
> Corporation
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.home=/usr/java/jdk1.7.0_60/jre
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
> 2014-07-31 17:43:08,130 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.library.path=//usr/lib/hadoop/lib/native
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:os.version=2.6.32-358.el6.x86_64
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.name=root
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.home=/root
> 2014-07-31 17:43:08,139 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:user.dir=/etc/hbase/conf.golden_apple
> 2014-07-31 17:43:08,149 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:<init>(433)) - Initiating client connection,
> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
> sessionTimeout=5000 watcher=null
> 2014-07-31 17:43:08,170 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
> socket connection to server rhel2.local/10.120.5.25:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-07-31 17:43:08,184 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
> connection established to rhel2.local/10.120.5.25:2181, initiating session
> 2014-07-31 17:43:08,262 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
> establishment complete on server rhel2.local/10.120.5.25:2181, sessionid
> = 0x3478900fbb40019, negotiated timeout = 5000
> ===============================================
> The configured parent znode /hadoop-ha/golden-apple already exists.
> Are you sure you want to clear all failover information from
> ZooKeeper?
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
> ===============================================
> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
> 17:43:08,268 INFO  [main-EventThread] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
> Y
> 2014-07-31 17:43:45,025 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
> /hadoop-ha/golden-apple from ZK...
> 2014-07-31 17:43:45,098 ERROR [main] ha.ZKFailoverController
> (ZKFailoverController.java:formatZK(266)) - Unable to clear zk parent znode
> java.io.IOException: Couldn't clear parent znode /hadoop-ha/golden-apple
> at
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:324)
>  at
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:264)
> at
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
>  at
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
> at
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
>  at
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
>  at
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
> at
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
> Caused by: org.apache.zookeeper.KeeperException$NotEmptyException:
> KeeperErrorCode = Directory not empty for /hadoop-ha/golden-apple
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:868)
>  at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)
> at
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:319)
>  at
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:316)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:934)
>  at
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:316)
> ... 8 more
> 2014-07-31 17:43:45,119 INFO  [main-EventThread] zookeeper.ClientCnxn
> (ClientCnxn.java:run(511)) - EventThread shut down
> 2014-07-31 17:43:45,119 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:close(679)) - Session: 0x3478900fbb40019 closed
>
>
>
> On Thu, Jul 31, 2014 at 1:56 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Thanks! I will give this a shot.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <
>> bbeaudreault@hubspot.com> wrote:
>>
>>> We've done this a number of times without issue.  Here's the general
>>> flow:
>>>
>>> 1) Shutdown namenode and zkfc on SNN
>>> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
>>> zkfc instance running to fail over to)
>>> 3) Run hdfs zkfc -formatZK on ANN
>>> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
>>> 5) Push new configs to the new SNN, bootstrap namenode there
>>> 6) Start namenode and zkfc on SNN
>>> 7) Push updated configs to all other hdfs services (datanodes, etc)
>>> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
>>> 9) Rolling restart datanodes
>>> 10) Done
>>>
>>> You'll have to handle any other consumers of DFSClient, like your own
>>> code or other apache projects.
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> Hi Jing,
>>>>
>>>> Thanks for the response. I will try this out, and file an Apache jira.
>>>>
>>>> Best,
>>>>
>>>> Colin Williams
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>> wrote:
>>>>
>>>>> Hi Colin,
>>>>>
>>>>>     I guess currently we may have to restart almost all the
>>>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>
>>>>> 1. The current active NameNode (ANN) needs to know the new SBN since
>>>>> in the current implementation the SBN tries to send rollEditLog RPC request
>>>>> to ANN periodically (thus if a NN failover happens later, the original ANN
>>>>> needs to send this RPC to the correct NN).
>>>>> 2. Looks like the DataNode currently cannot do real refreshment for
>>>>> NN. Look at the code in BPOfferService:
>>>>>
>>>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>> IOException {
>>>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>     for (BPServiceActor actor : bpServices) {
>>>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>     }
>>>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>
>>>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>       // Keep things simple for now -- we can implement this at a
>>>>> later date.
>>>>>       throw new IOException(
>>>>>           "HA does not currently support adding a new standby to a
>>>>> running DN. " +
>>>>>           "Please do a rolling restart of DNs to reconfigure the list
>>>>> of NNs.");
>>>>>     }
>>>>>   }
>>>>>
>>>>> 3. If you're using automatic failover, you also need to update the
>>>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>>>> gracefully fencing by sending RPC to the other NN.
>>>>> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>>> but I have not tried before.
>>>>>
>>>>>     Thus in general we may still have to restart all the services
>>>>> (except JNs) and update their configurations. But this may be a rolling
>>>>> restart process I guess:
>>>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>> restart of all the DN to update their configurations
>>>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>> their configuration. The new SBN should become active.
>>>>>
>>>>>     I have not tried the upper steps, thus please let me know if this
>>>>> works or not. And I think we should also document the correct steps in
>>>>> Apache. Could you please file an Apache jira?
>>>>>
>>>>> Thanks,
>>>>> -Jing
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>> configuration. I believe the steps to achieve this would be something
>>>>>> similar to:
>>>>>>
>>>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>>>> rsync if the command fails.
>>>>>>
>>>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>>>> the new standby
>>>>>>
>>>>>> Update the xml configuration on all nodes to reflect the replacment
>>>>>> standby.
>>>>>>
>>>>>> Start the replacment standby
>>>>>>
>>>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>>>> configuration.
>>>>>>
>>>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>>>> about this the right way. Can anybody give me some suggestions here?
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Colin Williams
>>>>>>
>>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Another error after stopping the zkfc. Do I have to take the cluster down
to format ZK?

[root@rhel1 conf]# sudo service hadoop-hdfs-zkfc stop
Stopping Hadoop zkfc:                                      [  OK  ]
stopping zkfc
[root@rhel1 conf]# sudo -u hdfs zkfc -formatZK
sudo: zkfc: command not found
[root@rhel1 conf]# hdfs zkfc -formatZK
2014-07-31 17:49:56,792 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 17:49:57,002 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 17:49:57,003 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 17:49:57,004 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 17:49:57,005 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 17:49:57,015 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 17:49:57,040 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel6.local/10.120.5.247:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 17:49:57,047 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel6.local/10.120.5.247:2181, initiating session
2014-07-31 17:49:57,050 INFO  [main-SendThread(rhel6.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:run(1065)) - Unable to read
additional data from server sessionid 0x0, likely server has closed socket,
closing socket connection and attempting reconnect
2014-07-31 17:49:57,989 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x0 closed
2014-07-31 17:49:57,989 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
Exception in thread "main" java.io.IOException: Couldn't determine
existence of znode '/hadoop-ha/golden-apple'
at
org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263)
at
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:258)
at
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
at
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
at
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hadoop-ha/golden-apple
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049)
at
org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261)
... 8 more



On Thu, Jul 31, 2014 at 5:56 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> On  3) Run hdfs zkfc -formatZK in my test environment, I get a Warning
> then an error
>
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
>
>
> the complete output:
>
> sudo hdfs zkfc -formatZK
> 2014-07-31 17:43:07,952 INFO  [main] tools.DFSZKFailoverController
> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
> for NameNode NameNode at rhel1.local/10.120.5.203:8020
> 2014-07-31 17:43:08,128 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
> GMT
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
> Corporation
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.home=/usr/java/jdk1.7.0_60/jre
> 2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
> 2014-07-31 17:43:08,130 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.library.path=//usr/lib/hadoop/lib/native
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:os.version=2.6.32-358.el6.x86_64
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.name=root
> 2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.home=/root
> 2014-07-31 17:43:08,139 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:user.dir=/etc/hbase/conf.golden_apple
> 2014-07-31 17:43:08,149 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:<init>(433)) - Initiating client connection,
> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
> sessionTimeout=5000 watcher=null
> 2014-07-31 17:43:08,170 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
> socket connection to server rhel2.local/10.120.5.25:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-07-31 17:43:08,184 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
> connection established to rhel2.local/10.120.5.25:2181, initiating session
> 2014-07-31 17:43:08,262 INFO  [main-SendThread(rhel2.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
> establishment complete on server rhel2.local/10.120.5.25:2181, sessionid
> = 0x3478900fbb40019, negotiated timeout = 5000
> ===============================================
> The configured parent znode /hadoop-ha/golden-apple already exists.
> Are you sure you want to clear all failover information from
> ZooKeeper?
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
> ===============================================
> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
> 17:43:08,268 INFO  [main-EventThread] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
> Y
> 2014-07-31 17:43:45,025 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
> /hadoop-ha/golden-apple from ZK...
> 2014-07-31 17:43:45,098 ERROR [main] ha.ZKFailoverController
> (ZKFailoverController.java:formatZK(266)) - Unable to clear zk parent znode
> java.io.IOException: Couldn't clear parent znode /hadoop-ha/golden-apple
> at
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:324)
>  at
> org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:264)
> at
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
>  at
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
> at
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
>  at
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
>  at
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
> at
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
> Caused by: org.apache.zookeeper.KeeperException$NotEmptyException:
> KeeperErrorCode = Directory not empty for /hadoop-ha/golden-apple
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
>  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:868)
>  at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)
> at
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:319)
>  at
> org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:316)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:934)
>  at
> org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:316)
> ... 8 more
> 2014-07-31 17:43:45,119 INFO  [main-EventThread] zookeeper.ClientCnxn
> (ClientCnxn.java:run(511)) - EventThread shut down
> 2014-07-31 17:43:45,119 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:close(679)) - Session: 0x3478900fbb40019 closed
>
>
>
> On Thu, Jul 31, 2014 at 1:56 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Thanks! I will give this a shot.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <
>> bbeaudreault@hubspot.com> wrote:
>>
>>> We've done this a number of times without issue.  Here's the general
>>> flow:
>>>
>>> 1) Shutdown namenode and zkfc on SNN
>>> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
>>> zkfc instance running to fail over to)
>>> 3) Run hdfs zkfc -formatZK on ANN
>>> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
>>> 5) Push new configs to the new SNN, bootstrap namenode there
>>> 6) Start namenode and zkfc on SNN
>>> 7) Push updated configs to all other hdfs services (datanodes, etc)
>>> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
>>> 9) Rolling restart datanodes
>>> 10) Done
>>>
>>> You'll have to handle any other consumers of DFSClient, like your own
>>> code or other apache projects.
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> Hi Jing,
>>>>
>>>> Thanks for the response. I will try this out, and file an Apache jira.
>>>>
>>>> Best,
>>>>
>>>> Colin Williams
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>> wrote:
>>>>
>>>>> Hi Colin,
>>>>>
>>>>>     I guess currently we may have to restart almost all the
>>>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>
>>>>> 1. The current active NameNode (ANN) needs to know the new SBN since
>>>>> in the current implementation the SBN tries to send rollEditLog RPC request
>>>>> to ANN periodically (thus if a NN failover happens later, the original ANN
>>>>> needs to send this RPC to the correct NN).
>>>>> 2. Looks like the DataNode currently cannot do real refreshment for
>>>>> NN. Look at the code in BPOfferService:
>>>>>
>>>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>> IOException {
>>>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>     for (BPServiceActor actor : bpServices) {
>>>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>     }
>>>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>
>>>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>       // Keep things simple for now -- we can implement this at a
>>>>> later date.
>>>>>       throw new IOException(
>>>>>           "HA does not currently support adding a new standby to a
>>>>> running DN. " +
>>>>>           "Please do a rolling restart of DNs to reconfigure the list
>>>>> of NNs.");
>>>>>     }
>>>>>   }
>>>>>
>>>>> 3. If you're using automatic failover, you also need to update the
>>>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>>>> gracefully fencing by sending RPC to the other NN.
>>>>> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>>> but I have not tried before.
>>>>>
>>>>>     Thus in general we may still have to restart all the services
>>>>> (except JNs) and update their configurations. But this may be a rolling
>>>>> restart process I guess:
>>>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>> restart of all the DN to update their configurations
>>>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>> their configuration. The new SBN should become active.
>>>>>
>>>>>     I have not tried the upper steps, thus please let me know if this
>>>>> works or not. And I think we should also document the correct steps in
>>>>> Apache. Could you please file an Apache jira?
>>>>>
>>>>> Thanks,
>>>>> -Jing
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>> configuration. I believe the steps to achieve this would be something
>>>>>> similar to:
>>>>>>
>>>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>>>> rsync if the command fails.
>>>>>>
>>>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>>>> the new standby
>>>>>>
>>>>>> Update the xml configuration on all nodes to reflect the replacment
>>>>>> standby.
>>>>>>
>>>>>> Start the replacment standby
>>>>>>
>>>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>>>> configuration.
>>>>>>
>>>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>>>> about this the right way. Can anybody give me some suggestions here?
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Colin Williams
>>>>>>
>>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
On  3) Run hdfs zkfc -formatZK in my test environment, I get a Warning then
an error

WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!


the complete output:

sudo hdfs zkfc -formatZK
2014-07-31 17:43:07,952 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 17:43:08,128 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 17:43:08,130 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 17:43:08,139 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 17:43:08,149 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 17:43:08,170 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel2.local/10.120.5.25:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 17:43:08,184 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel2.local/10.120.5.25:2181, initiating session
2014-07-31 17:43:08,262 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
establishment complete on server rhel2.local/10.120.5.25:2181, sessionid =
0x3478900fbb40019, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/golden-apple already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
17:43:08,268 INFO  [main-EventThread] ha.ActiveStandbyElector
(ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
Y
2014-07-31 17:43:45,025 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
/hadoop-ha/golden-apple from ZK...
2014-07-31 17:43:45,098 ERROR [main] ha.ZKFailoverController
(ZKFailoverController.java:formatZK(266)) - Unable to clear zk parent znode
java.io.IOException: Couldn't clear parent znode /hadoop-ha/golden-apple
at
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:324)
at
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:264)
at
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
at
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
at
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
Caused by: org.apache.zookeeper.KeeperException$NotEmptyException:
KeeperErrorCode = Directory not empty for /hadoop-ha/golden-apple
at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:868)
at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)
at
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:319)
at
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:316)
at
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:934)
at
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:316)
... 8 more
2014-07-31 17:43:45,119 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
2014-07-31 17:43:45,119 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x3478900fbb40019 closed



On Thu, Jul 31, 2014 at 1:56 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Thanks! I will give this a shot.
>
>
>
> On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> We've done this a number of times without issue.  Here's the general flow:
>>
>> 1) Shutdown namenode and zkfc on SNN
>> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
>> zkfc instance running to fail over to)
>> 3) Run hdfs zkfc -formatZK on ANN
>> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
>> 5) Push new configs to the new SNN, bootstrap namenode there
>> 6) Start namenode and zkfc on SNN
>> 7) Push updated configs to all other hdfs services (datanodes, etc)
>> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
>> 9) Rolling restart datanodes
>> 10) Done
>>
>> You'll have to handle any other consumers of DFSClient, like your own
>> code or other apache projects.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> Hi Jing,
>>>
>>> Thanks for the response. I will try this out, and file an Apache jira.
>>>
>>> Best,
>>>
>>> Colin Williams
>>>
>>>
>>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>> wrote:
>>>
>>>> Hi Colin,
>>>>
>>>>     I guess currently we may have to restart almost all the
>>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>>
>>>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>>>> the current implementation the SBN tries to send rollEditLog RPC request to
>>>> ANN periodically (thus if a NN failover happens later, the original ANN
>>>> needs to send this RPC to the correct NN).
>>>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>>>> Look at the code in BPOfferService:
>>>>
>>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>> IOException {
>>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>     for (BPServiceActor actor : bpServices) {
>>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>>     }
>>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>
>>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>       // Keep things simple for now -- we can implement this at a later
>>>> date.
>>>>       throw new IOException(
>>>>           "HA does not currently support adding a new standby to a
>>>> running DN. " +
>>>>           "Please do a rolling restart of DNs to reconfigure the list
>>>> of NNs.");
>>>>     }
>>>>   }
>>>>
>>>> 3. If you're using automatic failover, you also need to update the
>>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>>> gracefully fencing by sending RPC to the other NN.
>>>> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>> but I have not tried before.
>>>>
>>>>     Thus in general we may still have to restart all the services
>>>> (except JNs) and update their configurations. But this may be a rolling
>>>> restart process I guess:
>>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>> restart of all the DN to update their configurations
>>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>>>> configuration. The new SBN should become active.
>>>>
>>>>     I have not tried the upper steps, thus please let me know if this
>>>> works or not. And I think we should also document the correct steps in
>>>> Apache. Could you please file an Apache jira?
>>>>
>>>> Thanks,
>>>> -Jing
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
>>>>> I believe the steps to achieve this would be something similar to:
>>>>>
>>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>>> rsync if the command fails.
>>>>>
>>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>>> the new standby
>>>>>
>>>>> Update the xml configuration on all nodes to reflect the replacment
>>>>> standby.
>>>>>
>>>>> Start the replacment standby
>>>>>
>>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>>> configuration.
>>>>>
>>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>>> about this the right way. Can anybody give me some suggestions here?
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Colin Williams
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
On  3) Run hdfs zkfc -formatZK in my test environment, I get a Warning then
an error

WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!


the complete output:

sudo hdfs zkfc -formatZK
2014-07-31 17:43:07,952 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 17:43:08,128 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 17:43:08,130 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 17:43:08,139 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 17:43:08,149 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 17:43:08,170 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel2.local/10.120.5.25:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 17:43:08,184 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel2.local/10.120.5.25:2181, initiating session
2014-07-31 17:43:08,262 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
establishment complete on server rhel2.local/10.120.5.25:2181, sessionid =
0x3478900fbb40019, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/golden-apple already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
17:43:08,268 INFO  [main-EventThread] ha.ActiveStandbyElector
(ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
Y
2014-07-31 17:43:45,025 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
/hadoop-ha/golden-apple from ZK...
2014-07-31 17:43:45,098 ERROR [main] ha.ZKFailoverController
(ZKFailoverController.java:formatZK(266)) - Unable to clear zk parent znode
java.io.IOException: Couldn't clear parent znode /hadoop-ha/golden-apple
at
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:324)
at
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:264)
at
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
at
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
at
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
Caused by: org.apache.zookeeper.KeeperException$NotEmptyException:
KeeperErrorCode = Directory not empty for /hadoop-ha/golden-apple
at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:868)
at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)
at
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:319)
at
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:316)
at
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:934)
at
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:316)
... 8 more
2014-07-31 17:43:45,119 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
2014-07-31 17:43:45,119 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x3478900fbb40019 closed



On Thu, Jul 31, 2014 at 1:56 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Thanks! I will give this a shot.
>
>
>
> On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> We've done this a number of times without issue.  Here's the general flow:
>>
>> 1) Shutdown namenode and zkfc on SNN
>> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
>> zkfc instance running to fail over to)
>> 3) Run hdfs zkfc -formatZK on ANN
>> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
>> 5) Push new configs to the new SNN, bootstrap namenode there
>> 6) Start namenode and zkfc on SNN
>> 7) Push updated configs to all other hdfs services (datanodes, etc)
>> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
>> 9) Rolling restart datanodes
>> 10) Done
>>
>> You'll have to handle any other consumers of DFSClient, like your own
>> code or other apache projects.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> Hi Jing,
>>>
>>> Thanks for the response. I will try this out, and file an Apache jira.
>>>
>>> Best,
>>>
>>> Colin Williams
>>>
>>>
>>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>> wrote:
>>>
>>>> Hi Colin,
>>>>
>>>>     I guess currently we may have to restart almost all the
>>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>>
>>>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>>>> the current implementation the SBN tries to send rollEditLog RPC request to
>>>> ANN periodically (thus if a NN failover happens later, the original ANN
>>>> needs to send this RPC to the correct NN).
>>>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>>>> Look at the code in BPOfferService:
>>>>
>>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>> IOException {
>>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>     for (BPServiceActor actor : bpServices) {
>>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>>     }
>>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>
>>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>       // Keep things simple for now -- we can implement this at a later
>>>> date.
>>>>       throw new IOException(
>>>>           "HA does not currently support adding a new standby to a
>>>> running DN. " +
>>>>           "Please do a rolling restart of DNs to reconfigure the list
>>>> of NNs.");
>>>>     }
>>>>   }
>>>>
>>>> 3. If you're using automatic failover, you also need to update the
>>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>>> gracefully fencing by sending RPC to the other NN.
>>>> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>> but I have not tried before.
>>>>
>>>>     Thus in general we may still have to restart all the services
>>>> (except JNs) and update their configurations. But this may be a rolling
>>>> restart process I guess:
>>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>> restart of all the DN to update their configurations
>>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>>>> configuration. The new SBN should become active.
>>>>
>>>>     I have not tried the upper steps, thus please let me know if this
>>>> works or not. And I think we should also document the correct steps in
>>>> Apache. Could you please file an Apache jira?
>>>>
>>>> Thanks,
>>>> -Jing
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
>>>>> I believe the steps to achieve this would be something similar to:
>>>>>
>>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>>> rsync if the command fails.
>>>>>
>>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>>> the new standby
>>>>>
>>>>> Update the xml configuration on all nodes to reflect the replacment
>>>>> standby.
>>>>>
>>>>> Start the replacment standby
>>>>>
>>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>>> configuration.
>>>>>
>>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>>> about this the right way. Can anybody give me some suggestions here?
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Colin Williams
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
On  3) Run hdfs zkfc -formatZK in my test environment, I get a Warning then
an error

WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!


the complete output:

sudo hdfs zkfc -formatZK
2014-07-31 17:43:07,952 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 17:43:08,128 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 17:43:08,130 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 17:43:08,139 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 17:43:08,149 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 17:43:08,170 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel2.local/10.120.5.25:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 17:43:08,184 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel2.local/10.120.5.25:2181, initiating session
2014-07-31 17:43:08,262 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
establishment complete on server rhel2.local/10.120.5.25:2181, sessionid =
0x3478900fbb40019, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/golden-apple already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
17:43:08,268 INFO  [main-EventThread] ha.ActiveStandbyElector
(ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
Y
2014-07-31 17:43:45,025 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
/hadoop-ha/golden-apple from ZK...
2014-07-31 17:43:45,098 ERROR [main] ha.ZKFailoverController
(ZKFailoverController.java:formatZK(266)) - Unable to clear zk parent znode
java.io.IOException: Couldn't clear parent znode /hadoop-ha/golden-apple
at
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:324)
at
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:264)
at
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
at
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
at
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
Caused by: org.apache.zookeeper.KeeperException$NotEmptyException:
KeeperErrorCode = Directory not empty for /hadoop-ha/golden-apple
at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:868)
at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)
at
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:319)
at
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:316)
at
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:934)
at
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:316)
... 8 more
2014-07-31 17:43:45,119 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
2014-07-31 17:43:45,119 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x3478900fbb40019 closed



On Thu, Jul 31, 2014 at 1:56 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Thanks! I will give this a shot.
>
>
>
> On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> We've done this a number of times without issue.  Here's the general flow:
>>
>> 1) Shutdown namenode and zkfc on SNN
>> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
>> zkfc instance running to fail over to)
>> 3) Run hdfs zkfc -formatZK on ANN
>> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
>> 5) Push new configs to the new SNN, bootstrap namenode there
>> 6) Start namenode and zkfc on SNN
>> 7) Push updated configs to all other hdfs services (datanodes, etc)
>> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
>> 9) Rolling restart datanodes
>> 10) Done
>>
>> You'll have to handle any other consumers of DFSClient, like your own
>> code or other apache projects.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> Hi Jing,
>>>
>>> Thanks for the response. I will try this out, and file an Apache jira.
>>>
>>> Best,
>>>
>>> Colin Williams
>>>
>>>
>>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>> wrote:
>>>
>>>> Hi Colin,
>>>>
>>>>     I guess currently we may have to restart almost all the
>>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>>
>>>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>>>> the current implementation the SBN tries to send rollEditLog RPC request to
>>>> ANN periodically (thus if a NN failover happens later, the original ANN
>>>> needs to send this RPC to the correct NN).
>>>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>>>> Look at the code in BPOfferService:
>>>>
>>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>> IOException {
>>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>     for (BPServiceActor actor : bpServices) {
>>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>>     }
>>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>
>>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>       // Keep things simple for now -- we can implement this at a later
>>>> date.
>>>>       throw new IOException(
>>>>           "HA does not currently support adding a new standby to a
>>>> running DN. " +
>>>>           "Please do a rolling restart of DNs to reconfigure the list
>>>> of NNs.");
>>>>     }
>>>>   }
>>>>
>>>> 3. If you're using automatic failover, you also need to update the
>>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>>> gracefully fencing by sending RPC to the other NN.
>>>> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>> but I have not tried before.
>>>>
>>>>     Thus in general we may still have to restart all the services
>>>> (except JNs) and update their configurations. But this may be a rolling
>>>> restart process I guess:
>>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>> restart of all the DN to update their configurations
>>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>>>> configuration. The new SBN should become active.
>>>>
>>>>     I have not tried the upper steps, thus please let me know if this
>>>> works or not. And I think we should also document the correct steps in
>>>> Apache. Could you please file an Apache jira?
>>>>
>>>> Thanks,
>>>> -Jing
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
>>>>> I believe the steps to achieve this would be something similar to:
>>>>>
>>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>>> rsync if the command fails.
>>>>>
>>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>>> the new standby
>>>>>
>>>>> Update the xml configuration on all nodes to reflect the replacment
>>>>> standby.
>>>>>
>>>>> Start the replacment standby
>>>>>
>>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>>> configuration.
>>>>>
>>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>>> about this the right way. Can anybody give me some suggestions here?
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Colin Williams
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
On  3) Run hdfs zkfc -formatZK in my test environment, I get a Warning then
an error

WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!


the complete output:

sudo hdfs zkfc -formatZK
2014-07-31 17:43:07,952 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 17:43:08,128 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 17:43:08,129 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 17:43:08,130 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 17:43:08,138 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 17:43:08,139 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 17:43:08,149 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 17:43:08,170 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel2.local/10.120.5.25:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 17:43:08,184 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel2.local/10.120.5.25:2181, initiating session
2014-07-31 17:43:08,262 INFO  [main-SendThread(rhel2.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
establishment complete on server rhel2.local/10.120.5.25:2181, sessionid =
0x3478900fbb40019, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/golden-apple already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
17:43:08,268 INFO  [main-EventThread] ha.ActiveStandbyElector
(ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
Y
2014-07-31 17:43:45,025 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
/hadoop-ha/golden-apple from ZK...
2014-07-31 17:43:45,098 ERROR [main] ha.ZKFailoverController
(ZKFailoverController.java:formatZK(266)) - Unable to clear zk parent znode
java.io.IOException: Couldn't clear parent znode /hadoop-ha/golden-apple
at
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:324)
at
org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:264)
at
org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:197)
at
org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:165)
at
org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:161)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at
org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:161)
at
org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:175)
Caused by: org.apache.zookeeper.KeeperException$NotEmptyException:
KeeperErrorCode = Directory not empty for /hadoop-ha/golden-apple
at org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:868)
at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)
at
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:319)
at
org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:316)
at
org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:934)
at
org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:316)
... 8 more
2014-07-31 17:43:45,119 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
2014-07-31 17:43:45,119 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x3478900fbb40019 closed



On Thu, Jul 31, 2014 at 1:56 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Thanks! I will give this a shot.
>
>
>
> On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> We've done this a number of times without issue.  Here's the general flow:
>>
>> 1) Shutdown namenode and zkfc on SNN
>> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
>> zkfc instance running to fail over to)
>> 3) Run hdfs zkfc -formatZK on ANN
>> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
>> 5) Push new configs to the new SNN, bootstrap namenode there
>> 6) Start namenode and zkfc on SNN
>> 7) Push updated configs to all other hdfs services (datanodes, etc)
>> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
>> 9) Rolling restart datanodes
>> 10) Done
>>
>> You'll have to handle any other consumers of DFSClient, like your own
>> code or other apache projects.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> Hi Jing,
>>>
>>> Thanks for the response. I will try this out, and file an Apache jira.
>>>
>>> Best,
>>>
>>> Colin Williams
>>>
>>>
>>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>> wrote:
>>>
>>>> Hi Colin,
>>>>
>>>>     I guess currently we may have to restart almost all the
>>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>>
>>>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>>>> the current implementation the SBN tries to send rollEditLog RPC request to
>>>> ANN periodically (thus if a NN failover happens later, the original ANN
>>>> needs to send this RPC to the correct NN).
>>>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>>>> Look at the code in BPOfferService:
>>>>
>>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>> IOException {
>>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>     for (BPServiceActor actor : bpServices) {
>>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>>     }
>>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>
>>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>       // Keep things simple for now -- we can implement this at a later
>>>> date.
>>>>       throw new IOException(
>>>>           "HA does not currently support adding a new standby to a
>>>> running DN. " +
>>>>           "Please do a rolling restart of DNs to reconfigure the list
>>>> of NNs.");
>>>>     }
>>>>   }
>>>>
>>>> 3. If you're using automatic failover, you also need to update the
>>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>>> gracefully fencing by sending RPC to the other NN.
>>>> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>> but I have not tried before.
>>>>
>>>>     Thus in general we may still have to restart all the services
>>>> (except JNs) and update their configurations. But this may be a rolling
>>>> restart process I guess:
>>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>> restart of all the DN to update their configurations
>>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>>>> configuration. The new SBN should become active.
>>>>
>>>>     I have not tried the upper steps, thus please let me know if this
>>>> works or not. And I think we should also document the correct steps in
>>>> Apache. Could you please file an Apache jira?
>>>>
>>>> Thanks,
>>>> -Jing
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
>>>>> I believe the steps to achieve this would be something similar to:
>>>>>
>>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>>> rsync if the command fails.
>>>>>
>>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>>> the new standby
>>>>>
>>>>> Update the xml configuration on all nodes to reflect the replacment
>>>>> standby.
>>>>>
>>>>> Start the replacment standby
>>>>>
>>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>>> configuration.
>>>>>
>>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>>> about this the right way. Can anybody give me some suggestions here?
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Colin Williams
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Thanks! I will give this a shot.



On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> We've done this a number of times without issue.  Here's the general flow:
>
> 1) Shutdown namenode and zkfc on SNN
> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
> zkfc instance running to fail over to)
> 3) Run hdfs zkfc -formatZK on ANN
> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
> 5) Push new configs to the new SNN, bootstrap namenode there
> 6) Start namenode and zkfc on SNN
> 7) Push updated configs to all other hdfs services (datanodes, etc)
> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
> 9) Rolling restart datanodes
> 10) Done
>
> You'll have to handle any other consumers of DFSClient, like your own code
> or other apache projects.
>
>
>
> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Hi Jing,
>>
>> Thanks for the response. I will try this out, and file an Apache jira.
>>
>> Best,
>>
>> Colin Williams
>>
>>
>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:
>>
>>> Hi Colin,
>>>
>>>     I guess currently we may have to restart almost all the
>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>
>>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>>> the current implementation the SBN tries to send rollEditLog RPC request to
>>> ANN periodically (thus if a NN failover happens later, the original ANN
>>> needs to send this RPC to the correct NN).
>>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>>> Look at the code in BPOfferService:
>>>
>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>> IOException {
>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>     for (BPServiceActor actor : bpServices) {
>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>     }
>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>
>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>       // Keep things simple for now -- we can implement this at a later
>>> date.
>>>       throw new IOException(
>>>           "HA does not currently support adding a new standby to a
>>> running DN. " +
>>>           "Please do a rolling restart of DNs to reconfigure the list of
>>> NNs.");
>>>     }
>>>   }
>>>
>>> 3. If you're using automatic failover, you also need to update the
>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>> gracefully fencing by sending RPC to the other NN.
>>> 4. Looks like we do not need to restart JournalNodes for the new SBN but
>>> I have not tried before.
>>>
>>>     Thus in general we may still have to restart all the services
>>> (except JNs) and update their configurations. But this may be a rolling
>>> restart process I guess:
>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
>>> of all the DN to update their configurations
>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>>> configuration. The new SBN should become active.
>>>
>>>     I have not tried the upper steps, thus please let me know if this
>>> works or not. And I think we should also document the correct steps in
>>> Apache. Could you please file an Apache jira?
>>>
>>> Thanks,
>>> -Jing
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
>>>> I believe the steps to achieve this would be something similar to:
>>>>
>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>> rsync if the command fails.
>>>>
>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>> the new standby
>>>>
>>>> Update the xml configuration on all nodes to reflect the replacment
>>>> standby.
>>>>
>>>> Start the replacment standby
>>>>
>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>> configuration.
>>>>
>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>> about this the right way. Can anybody give me some suggestions here?
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Colin Williams
>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Thanks! I will give this a shot.



On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> We've done this a number of times without issue.  Here's the general flow:
>
> 1) Shutdown namenode and zkfc on SNN
> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
> zkfc instance running to fail over to)
> 3) Run hdfs zkfc -formatZK on ANN
> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
> 5) Push new configs to the new SNN, bootstrap namenode there
> 6) Start namenode and zkfc on SNN
> 7) Push updated configs to all other hdfs services (datanodes, etc)
> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
> 9) Rolling restart datanodes
> 10) Done
>
> You'll have to handle any other consumers of DFSClient, like your own code
> or other apache projects.
>
>
>
> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Hi Jing,
>>
>> Thanks for the response. I will try this out, and file an Apache jira.
>>
>> Best,
>>
>> Colin Williams
>>
>>
>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:
>>
>>> Hi Colin,
>>>
>>>     I guess currently we may have to restart almost all the
>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>
>>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>>> the current implementation the SBN tries to send rollEditLog RPC request to
>>> ANN periodically (thus if a NN failover happens later, the original ANN
>>> needs to send this RPC to the correct NN).
>>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>>> Look at the code in BPOfferService:
>>>
>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>> IOException {
>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>     for (BPServiceActor actor : bpServices) {
>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>     }
>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>
>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>       // Keep things simple for now -- we can implement this at a later
>>> date.
>>>       throw new IOException(
>>>           "HA does not currently support adding a new standby to a
>>> running DN. " +
>>>           "Please do a rolling restart of DNs to reconfigure the list of
>>> NNs.");
>>>     }
>>>   }
>>>
>>> 3. If you're using automatic failover, you also need to update the
>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>> gracefully fencing by sending RPC to the other NN.
>>> 4. Looks like we do not need to restart JournalNodes for the new SBN but
>>> I have not tried before.
>>>
>>>     Thus in general we may still have to restart all the services
>>> (except JNs) and update their configurations. But this may be a rolling
>>> restart process I guess:
>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
>>> of all the DN to update their configurations
>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>>> configuration. The new SBN should become active.
>>>
>>>     I have not tried the upper steps, thus please let me know if this
>>> works or not. And I think we should also document the correct steps in
>>> Apache. Could you please file an Apache jira?
>>>
>>> Thanks,
>>> -Jing
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
>>>> I believe the steps to achieve this would be something similar to:
>>>>
>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>> rsync if the command fails.
>>>>
>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>> the new standby
>>>>
>>>> Update the xml configuration on all nodes to reflect the replacment
>>>> standby.
>>>>
>>>> Start the replacment standby
>>>>
>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>> configuration.
>>>>
>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>> about this the right way. Can anybody give me some suggestions here?
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Colin Williams
>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Thanks! I will give this a shot.



On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> We've done this a number of times without issue.  Here's the general flow:
>
> 1) Shutdown namenode and zkfc on SNN
> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
> zkfc instance running to fail over to)
> 3) Run hdfs zkfc -formatZK on ANN
> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
> 5) Push new configs to the new SNN, bootstrap namenode there
> 6) Start namenode and zkfc on SNN
> 7) Push updated configs to all other hdfs services (datanodes, etc)
> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
> 9) Rolling restart datanodes
> 10) Done
>
> You'll have to handle any other consumers of DFSClient, like your own code
> or other apache projects.
>
>
>
> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Hi Jing,
>>
>> Thanks for the response. I will try this out, and file an Apache jira.
>>
>> Best,
>>
>> Colin Williams
>>
>>
>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:
>>
>>> Hi Colin,
>>>
>>>     I guess currently we may have to restart almost all the
>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>
>>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>>> the current implementation the SBN tries to send rollEditLog RPC request to
>>> ANN periodically (thus if a NN failover happens later, the original ANN
>>> needs to send this RPC to the correct NN).
>>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>>> Look at the code in BPOfferService:
>>>
>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>> IOException {
>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>     for (BPServiceActor actor : bpServices) {
>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>     }
>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>
>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>       // Keep things simple for now -- we can implement this at a later
>>> date.
>>>       throw new IOException(
>>>           "HA does not currently support adding a new standby to a
>>> running DN. " +
>>>           "Please do a rolling restart of DNs to reconfigure the list of
>>> NNs.");
>>>     }
>>>   }
>>>
>>> 3. If you're using automatic failover, you also need to update the
>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>> gracefully fencing by sending RPC to the other NN.
>>> 4. Looks like we do not need to restart JournalNodes for the new SBN but
>>> I have not tried before.
>>>
>>>     Thus in general we may still have to restart all the services
>>> (except JNs) and update their configurations. But this may be a rolling
>>> restart process I guess:
>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
>>> of all the DN to update their configurations
>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>>> configuration. The new SBN should become active.
>>>
>>>     I have not tried the upper steps, thus please let me know if this
>>> works or not. And I think we should also document the correct steps in
>>> Apache. Could you please file an Apache jira?
>>>
>>> Thanks,
>>> -Jing
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
>>>> I believe the steps to achieve this would be something similar to:
>>>>
>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>> rsync if the command fails.
>>>>
>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>> the new standby
>>>>
>>>> Update the xml configuration on all nodes to reflect the replacment
>>>> standby.
>>>>
>>>> Start the replacment standby
>>>>
>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>> configuration.
>>>>
>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>> about this the right way. Can anybody give me some suggestions here?
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Colin Williams
>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Thanks! I will give this a shot.



On Thu, Jul 31, 2014 at 1:12 PM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> We've done this a number of times without issue.  Here's the general flow:
>
> 1) Shutdown namenode and zkfc on SNN
> 2) Stop zkfc on ANN  (ANN will remain active because there is no other
> zkfc instance running to fail over to)
> 3) Run hdfs zkfc -formatZK on ANN
> 4) Start zkfc on ANN (will sync up with ANN and write state to zk)
> 5) Push new configs to the new SNN, bootstrap namenode there
> 6) Start namenode and zkfc on SNN
> 7) Push updated configs to all other hdfs services (datanodes, etc)
> 8) Restart hbasemaster if you are running hbase, jobtracker for MR
> 9) Rolling restart datanodes
> 10) Done
>
> You'll have to handle any other consumers of DFSClient, like your own code
> or other apache projects.
>
>
>
> On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Hi Jing,
>>
>> Thanks for the response. I will try this out, and file an Apache jira.
>>
>> Best,
>>
>> Colin Williams
>>
>>
>> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:
>>
>>> Hi Colin,
>>>
>>>     I guess currently we may have to restart almost all the
>>> daemons/services in order to swap out a standby NameNode (SBN):
>>>
>>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>>> the current implementation the SBN tries to send rollEditLog RPC request to
>>> ANN periodically (thus if a NN failover happens later, the original ANN
>>> needs to send this RPC to the correct NN).
>>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>>> Look at the code in BPOfferService:
>>>
>>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>> IOException {
>>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>     for (BPServiceActor actor : bpServices) {
>>>       oldAddrs.add(actor.getNNSocketAddress());
>>>     }
>>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>
>>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>       // Keep things simple for now -- we can implement this at a later
>>> date.
>>>       throw new IOException(
>>>           "HA does not currently support adding a new standby to a
>>> running DN. " +
>>>           "Please do a rolling restart of DNs to reconfigure the list of
>>> NNs.");
>>>     }
>>>   }
>>>
>>> 3. If you're using automatic failover, you also need to update the
>>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>>> gracefully fencing by sending RPC to the other NN.
>>> 4. Looks like we do not need to restart JournalNodes for the new SBN but
>>> I have not tried before.
>>>
>>>     Thus in general we may still have to restart all the services
>>> (except JNs) and update their configurations. But this may be a rolling
>>> restart process I guess:
>>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
>>> of all the DN to update their configurations
>>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>>> configuration. The new SBN should become active.
>>>
>>>     I have not tried the upper steps, thus please let me know if this
>>> works or not. And I think we should also document the correct steps in
>>> Apache. Could you please file an Apache jira?
>>>
>>> Thanks,
>>> -Jing
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
>>>> I believe the steps to achieve this would be something similar to:
>>>>
>>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>> rsync if the command fails.
>>>>
>>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>>> the new standby
>>>>
>>>> Update the xml configuration on all nodes to reflect the replacment
>>>> standby.
>>>>
>>>> Start the replacment standby
>>>>
>>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>> configuration.
>>>>
>>>> I am not sure how to deal with the Journal switch, or if I am going
>>>> about this the right way. Can anybody give me some suggestions here?
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Colin Williams
>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
We've done this a number of times without issue.  Here's the general flow:

1) Shutdown namenode and zkfc on SNN
2) Stop zkfc on ANN  (ANN will remain active because there is no other zkfc
instance running to fail over to)
3) Run hdfs zkfc -formatZK on ANN
4) Start zkfc on ANN (will sync up with ANN and write state to zk)
5) Push new configs to the new SNN, bootstrap namenode there
6) Start namenode and zkfc on SNN
7) Push updated configs to all other hdfs services (datanodes, etc)
8) Restart hbasemaster if you are running hbase, jobtracker for MR
9) Rolling restart datanodes
10) Done

You'll have to handle any other consumers of DFSClient, like your own code
or other apache projects.



On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Hi Jing,
>
> Thanks for the response. I will try this out, and file an Apache jira.
>
> Best,
>
> Colin Williams
>
>
> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:
>
>> Hi Colin,
>>
>>     I guess currently we may have to restart almost all the
>> daemons/services in order to swap out a standby NameNode (SBN):
>>
>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>> the current implementation the SBN tries to send rollEditLog RPC request to
>> ANN periodically (thus if a NN failover happens later, the original ANN
>> needs to send this RPC to the correct NN).
>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>> Look at the code in BPOfferService:
>>
>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>> IOException {
>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>     for (BPServiceActor actor : bpServices) {
>>       oldAddrs.add(actor.getNNSocketAddress());
>>     }
>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>
>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>       // Keep things simple for now -- we can implement this at a later
>> date.
>>       throw new IOException(
>>           "HA does not currently support adding a new standby to a
>> running DN. " +
>>           "Please do a rolling restart of DNs to reconfigure the list of
>> NNs.");
>>     }
>>   }
>>
>> 3. If you're using automatic failover, you also need to update the
>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>> gracefully fencing by sending RPC to the other NN.
>> 4. Looks like we do not need to restart JournalNodes for the new SBN but
>> I have not tried before.
>>
>>     Thus in general we may still have to restart all the services (except
>> JNs) and update their configurations. But this may be a rolling restart
>> process I guess:
>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
>> of all the DN to update their configurations
>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>> configuration. The new SBN should become active.
>>
>>     I have not tried the upper steps, thus please let me know if this
>> works or not. And I think we should also document the correct steps in
>> Apache. Could you please file an Apache jira?
>>
>> Thanks,
>> -Jing
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> Hello,
>>>
>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
>>> believe the steps to achieve this would be something similar to:
>>>
>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>> rsync if the command fails.
>>>
>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>> the new standby
>>>
>>> Update the xml configuration on all nodes to reflect the replacment
>>> standby.
>>>
>>> Start the replacment standby
>>>
>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>> configuration.
>>>
>>> I am not sure how to deal with the Journal switch, or if I am going
>>> about this the right way. Can anybody give me some suggestions here?
>>>
>>>
>>> Regards,
>>>
>>> Colin Williams
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
We've done this a number of times without issue.  Here's the general flow:

1) Shutdown namenode and zkfc on SNN
2) Stop zkfc on ANN  (ANN will remain active because there is no other zkfc
instance running to fail over to)
3) Run hdfs zkfc -formatZK on ANN
4) Start zkfc on ANN (will sync up with ANN and write state to zk)
5) Push new configs to the new SNN, bootstrap namenode there
6) Start namenode and zkfc on SNN
7) Push updated configs to all other hdfs services (datanodes, etc)
8) Restart hbasemaster if you are running hbase, jobtracker for MR
9) Rolling restart datanodes
10) Done

You'll have to handle any other consumers of DFSClient, like your own code
or other apache projects.



On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Hi Jing,
>
> Thanks for the response. I will try this out, and file an Apache jira.
>
> Best,
>
> Colin Williams
>
>
> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:
>
>> Hi Colin,
>>
>>     I guess currently we may have to restart almost all the
>> daemons/services in order to swap out a standby NameNode (SBN):
>>
>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>> the current implementation the SBN tries to send rollEditLog RPC request to
>> ANN periodically (thus if a NN failover happens later, the original ANN
>> needs to send this RPC to the correct NN).
>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>> Look at the code in BPOfferService:
>>
>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>> IOException {
>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>     for (BPServiceActor actor : bpServices) {
>>       oldAddrs.add(actor.getNNSocketAddress());
>>     }
>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>
>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>       // Keep things simple for now -- we can implement this at a later
>> date.
>>       throw new IOException(
>>           "HA does not currently support adding a new standby to a
>> running DN. " +
>>           "Please do a rolling restart of DNs to reconfigure the list of
>> NNs.");
>>     }
>>   }
>>
>> 3. If you're using automatic failover, you also need to update the
>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>> gracefully fencing by sending RPC to the other NN.
>> 4. Looks like we do not need to restart JournalNodes for the new SBN but
>> I have not tried before.
>>
>>     Thus in general we may still have to restart all the services (except
>> JNs) and update their configurations. But this may be a rolling restart
>> process I guess:
>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
>> of all the DN to update their configurations
>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>> configuration. The new SBN should become active.
>>
>>     I have not tried the upper steps, thus please let me know if this
>> works or not. And I think we should also document the correct steps in
>> Apache. Could you please file an Apache jira?
>>
>> Thanks,
>> -Jing
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> Hello,
>>>
>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
>>> believe the steps to achieve this would be something similar to:
>>>
>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>> rsync if the command fails.
>>>
>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>> the new standby
>>>
>>> Update the xml configuration on all nodes to reflect the replacment
>>> standby.
>>>
>>> Start the replacment standby
>>>
>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>> configuration.
>>>
>>> I am not sure how to deal with the Journal switch, or if I am going
>>> about this the right way. Can anybody give me some suggestions here?
>>>
>>>
>>> Regards,
>>>
>>> Colin Williams
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
We've done this a number of times without issue.  Here's the general flow:

1) Shutdown namenode and zkfc on SNN
2) Stop zkfc on ANN  (ANN will remain active because there is no other zkfc
instance running to fail over to)
3) Run hdfs zkfc -formatZK on ANN
4) Start zkfc on ANN (will sync up with ANN and write state to zk)
5) Push new configs to the new SNN, bootstrap namenode there
6) Start namenode and zkfc on SNN
7) Push updated configs to all other hdfs services (datanodes, etc)
8) Restart hbasemaster if you are running hbase, jobtracker for MR
9) Rolling restart datanodes
10) Done

You'll have to handle any other consumers of DFSClient, like your own code
or other apache projects.



On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Hi Jing,
>
> Thanks for the response. I will try this out, and file an Apache jira.
>
> Best,
>
> Colin Williams
>
>
> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:
>
>> Hi Colin,
>>
>>     I guess currently we may have to restart almost all the
>> daemons/services in order to swap out a standby NameNode (SBN):
>>
>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>> the current implementation the SBN tries to send rollEditLog RPC request to
>> ANN periodically (thus if a NN failover happens later, the original ANN
>> needs to send this RPC to the correct NN).
>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>> Look at the code in BPOfferService:
>>
>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>> IOException {
>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>     for (BPServiceActor actor : bpServices) {
>>       oldAddrs.add(actor.getNNSocketAddress());
>>     }
>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>
>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>       // Keep things simple for now -- we can implement this at a later
>> date.
>>       throw new IOException(
>>           "HA does not currently support adding a new standby to a
>> running DN. " +
>>           "Please do a rolling restart of DNs to reconfigure the list of
>> NNs.");
>>     }
>>   }
>>
>> 3. If you're using automatic failover, you also need to update the
>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>> gracefully fencing by sending RPC to the other NN.
>> 4. Looks like we do not need to restart JournalNodes for the new SBN but
>> I have not tried before.
>>
>>     Thus in general we may still have to restart all the services (except
>> JNs) and update their configurations. But this may be a rolling restart
>> process I guess:
>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
>> of all the DN to update their configurations
>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>> configuration. The new SBN should become active.
>>
>>     I have not tried the upper steps, thus please let me know if this
>> works or not. And I think we should also document the correct steps in
>> Apache. Could you please file an Apache jira?
>>
>> Thanks,
>> -Jing
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> Hello,
>>>
>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
>>> believe the steps to achieve this would be something similar to:
>>>
>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>> rsync if the command fails.
>>>
>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>> the new standby
>>>
>>> Update the xml configuration on all nodes to reflect the replacment
>>> standby.
>>>
>>> Start the replacment standby
>>>
>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>> configuration.
>>>
>>> I am not sure how to deal with the Journal switch, or if I am going
>>> about this the right way. Can anybody give me some suggestions here?
>>>
>>>
>>> Regards,
>>>
>>> Colin Williams
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
We've done this a number of times without issue.  Here's the general flow:

1) Shutdown namenode and zkfc on SNN
2) Stop zkfc on ANN  (ANN will remain active because there is no other zkfc
instance running to fail over to)
3) Run hdfs zkfc -formatZK on ANN
4) Start zkfc on ANN (will sync up with ANN and write state to zk)
5) Push new configs to the new SNN, bootstrap namenode there
6) Start namenode and zkfc on SNN
7) Push updated configs to all other hdfs services (datanodes, etc)
8) Restart hbasemaster if you are running hbase, jobtracker for MR
9) Rolling restart datanodes
10) Done

You'll have to handle any other consumers of DFSClient, like your own code
or other apache projects.



On Thu, Jul 31, 2014 at 3:35 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Hi Jing,
>
> Thanks for the response. I will try this out, and file an Apache jira.
>
> Best,
>
> Colin Williams
>
>
> On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:
>
>> Hi Colin,
>>
>>     I guess currently we may have to restart almost all the
>> daemons/services in order to swap out a standby NameNode (SBN):
>>
>> 1. The current active NameNode (ANN) needs to know the new SBN since in
>> the current implementation the SBN tries to send rollEditLog RPC request to
>> ANN periodically (thus if a NN failover happens later, the original ANN
>> needs to send this RPC to the correct NN).
>> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>> Look at the code in BPOfferService:
>>
>>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>> IOException {
>>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>     for (BPServiceActor actor : bpServices) {
>>       oldAddrs.add(actor.getNNSocketAddress());
>>     }
>>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>
>>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>       // Keep things simple for now -- we can implement this at a later
>> date.
>>       throw new IOException(
>>           "HA does not currently support adding a new standby to a
>> running DN. " +
>>           "Please do a rolling restart of DNs to reconfigure the list of
>> NNs.");
>>     }
>>   }
>>
>> 3. If you're using automatic failover, you also need to update the
>> configuration of the ZKFC on the current ANN machine, since ZKFC will do
>> gracefully fencing by sending RPC to the other NN.
>> 4. Looks like we do not need to restart JournalNodes for the new SBN but
>> I have not tried before.
>>
>>     Thus in general we may still have to restart all the services (except
>> JNs) and update their configurations. But this may be a rolling restart
>> process I guess:
>> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
>> of all the DN to update their configurations
>> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>> configuration. The new SBN should become active.
>>
>>     I have not tried the upper steps, thus please let me know if this
>> works or not. And I think we should also document the correct steps in
>> Apache. Could you please file an Apache jira?
>>
>> Thanks,
>> -Jing
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> Hello,
>>>
>>> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
>>> believe the steps to achieve this would be something similar to:
>>>
>>> Use the Bootstrap standby command to prep the replacment standby. Or
>>> rsync if the command fails.
>>>
>>> Somehow update the datanodes, so they push the heartbeat / journal to
>>> the new standby
>>>
>>> Update the xml configuration on all nodes to reflect the replacment
>>> standby.
>>>
>>> Start the replacment standby
>>>
>>> Use some hadoop command to refresh the datanodes to the new NameNode
>>> configuration.
>>>
>>> I am not sure how to deal with the Journal switch, or if I am going
>>> about this the right way. Can anybody give me some suggestions here?
>>>
>>>
>>> Regards,
>>>
>>> Colin Williams
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
The test environment is a 6 node virtualbox cluster run on 2 desktops :] 7
with the extra namenode.


On Fri, Aug 1, 2014 at 7:26 AM, Bryan Beaudreault <bb...@hubspot.com>
wrote:

> No worries!  Glad you had a test environment to play with this in.  Also,
> above I meant "If bootstrap fails...", not format of course :)
>
>
> On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> I realize that this was a foolish error made late in the day. I am no
>> hadoop expert,  and have much to learn. This is why I setup a test
>> environment.
>> On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
>> wrote:
>>
>>> Also you shouldn't format the new standby. You only format a namenode
>>> for a brand new cluster. Once a cluster is live you should just use the
>>> bootstrap on the new namenodes and never format again. Bootstrap is
>>> basically a special format that just creates the dirs and copies an active
>>> fsimage to the host.
>>>
>>> If format fails (it's buggy imo) just rsync from the active namenode. It
>>> will catch up by replaying the edits from the QJM when it is started.
>>>
>>> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
>>> wrote:
>>>
>>>> You should first replace the namenode, then when that is completely
>>>> finished move on to replacing any journal nodes. That part is easy:
>>>>
>>>> 1) bootstrap new JN (rsync from an existing)
>>>> 2) Start new JN
>>>> 3) push hdfs-site.xml to both namenodes
>>>> 4) restart standby namenode
>>>> 5) verify logs and admin ui show new JN
>>>> 6) restart active namenode
>>>> 7) verify both namenodes (failover should have happened and old standby
>>>> should be writing to the new JN)
>>>>
>>>> You can remove an existing JN at the same time if you want, just be
>>>> careful to preserve the majority of the quorum during the whole operation
>>>> (I.e only replace 1 at a time).
>>>>
>>>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>>>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>>>> journal nodes not being safe unless you roll edits. So that would go for
>>>> replacing too.
>>>>
>>>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu>
>>>> wrote:
>>>>
>>>>> I will run through the procedure again tomorrow. It was late in the
>>>>> day before I had a chance to test the procedure.
>>>>>
>>>>> If I recall correctly I had an issue formatting the New standby,
>>>>> before bootstrapping.  I think either at that point, or during the
>>>>> Zookeeper format command,  I was queried  to format the journal to the 3
>>>>> hosts in the quorum.I was unable to proceed without exception unless
>>>>> choosing this option .
>>>>>
>>>>> Are there any concerns adding another journal node to the new standby?
>>>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>>>> wrote:
>>>>>
>>>>>> This shouldn't have affected the journalnodes at all -- they are
>>>>>> mostly unaware of the zkfc and active/standby state.  Did you do something
>>>>>> else that may have impacted the journalnodes? (i.e. shut down 1 or more of
>>>>>> them, or something else)
>>>>>>
>>>>>> For your previous 2 emails, reporting errors/warns when doing
>>>>>> -formatZK:
>>>>>>
>>>>>> The WARN is fine.  It's true that you could get in a weird state if
>>>>>> you had multiple namenodes up.  But with just 1 namenode up, you should be
>>>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>>>> a running cluster.  I should have mentioned you need to use the -force
>>>>>> argument to get around that.
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>>
>>>>>>> However continuing with the process my QJM eventually error'd out
>>>>>>> and my Active NameNode went down.
>>>>>>>
>>>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>>>> QuorumOutputStream starting at txid 9634
>>>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu> wrote:
>>>>>>>
>>>>>>>> I tried a third time and it just worked?
>>>>>>>>
>>>>>>>> sudo hdfs zkfc -formatZK
>>>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>>>> GMT
>>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>>>> =rhel1.local
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>>>> Corporation
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>>>> sessionTimeout=5000 watcher=null
>>>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will
>>>>>>>> not attempt to authenticate using SASL (unknown error)
>>>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>>>> connection established to rhel1.local/10.120.5.203:2181,
>>>>>>>> initiating session
>>>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>>>  ===============================================
>>>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>>>> Are you sure you want to clear all failover information from
>>>>>>>> ZooKeeper?
>>>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>>>> failover controllers are stopped!
>>>>>>>> ===============================================
>>>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>>>> Y
>>>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>>>
>>>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>>>> discord@uw.edu> wrote:
>>>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>>>> earlier today.
>>>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>>>> NameNode in a
>>>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>>>> Seattle, feel
>>>>>>>>> > free to give me a shout out.
>>>>>>>>> >
>>>>>>>>> > ---------- Forwarded message ----------
>>>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a
>>>>>>>>> QJM / HA
>>>>>>>>> > configuration
>>>>>>>>> > To: user@hadoop.apache.org
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Hi Jing,
>>>>>>>>> >
>>>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>>>> jira.
>>>>>>>>> >
>>>>>>>>> > Best,
>>>>>>>>> >
>>>>>>>>> > Colin Williams
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <
>>>>>>>>> jing@hortonworks.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Hi Colin,
>>>>>>>>> >>
>>>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>>>> >>
>>>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>>>> since in
>>>>>>>>> >> the current implementation the SBN tries to send rollEditLog
>>>>>>>>> RPC request to
>>>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>>>> original ANN
>>>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>>>> for NN.
>>>>>>>>> >> Look at the code in BPOfferService:
>>>>>>>>> >>
>>>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>>>> >> IOException {
>>>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>>>> >>     }
>>>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>>>> >>
>>>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs,
>>>>>>>>> newAddrs).isEmpty()) {
>>>>>>>>> >>       // Keep things simple for now -- we can implement this at
>>>>>>>>> a later
>>>>>>>>> >> date.
>>>>>>>>> >>       throw new IOException(
>>>>>>>>> >>           "HA does not currently support adding a new standby
>>>>>>>>> to a running
>>>>>>>>> >> DN. " +
>>>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure
>>>>>>>>> the list of
>>>>>>>>> >> NNs.");
>>>>>>>>> >>     }
>>>>>>>>> >>   }
>>>>>>>>> >>
>>>>>>>>> >> 3. If you're using automatic failover, you also need to update
>>>>>>>>> the
>>>>>>>>> >> configuration of the ZKFC on the current ANN machine, since
>>>>>>>>> ZKFC will do
>>>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the
>>>>>>>>> new SBN but I
>>>>>>>>> >> have not tried before.
>>>>>>>>> >>
>>>>>>>>> >>     Thus in general we may still have to restart all the
>>>>>>>>> services (except
>>>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>>>> restart
>>>>>>>>> >> process I guess:
>>>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the
>>>>>>>>> new SBN.
>>>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a
>>>>>>>>> rolling restart
>>>>>>>>> >> of all the DN to update their configurations
>>>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and
>>>>>>>>> update their
>>>>>>>>> >> configuration. The new SBN should become active.
>>>>>>>>> >>
>>>>>>>>> >>     I have not tried the upper steps, thus please let me know
>>>>>>>>> if this
>>>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>>>> steps in
>>>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>>>> >>
>>>>>>>>> >> Thanks,
>>>>>>>>> >> -Jing
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>>>> discord@uw.edu>
>>>>>>>>> >> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Hello,
>>>>>>>>> >>>
>>>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>>>> configuration. I
>>>>>>>>> >>> believe the steps to achieve this would be something similar
>>>>>>>>> to:
>>>>>>>>> >>>
>>>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>>>> standby. Or
>>>>>>>>> >>> rsync if the command fails.
>>>>>>>>> >>>
>>>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>>>> journal to the
>>>>>>>>> >>> new standby
>>>>>>>>> >>>
>>>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>>>> replacment
>>>>>>>>> >>> standby.
>>>>>>>>> >>>
>>>>>>>>> >>> Start the replacment standby
>>>>>>>>> >>>
>>>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>>>> NameNode
>>>>>>>>> >>> configuration.
>>>>>>>>> >>>
>>>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>>>> going about
>>>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> Regards,
>>>>>>>>> >>>
>>>>>>>>> >>> Colin Williams
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>>>> >> NOTICE: This message is intended for the use of the individual
>>>>>>>>> or entity
>>>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>>>> confidential,
>>>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>>>> the reader of
>>>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any
>>>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>>>> forwarding of
>>>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>>>> this
>>>>>>>>> >> communication in error, please contact the sender immediately
>>>>>>>>> and delete it
>>>>>>>>> >> from your system. Thank You.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
The test environment is a 6 node virtualbox cluster run on 2 desktops :] 7
with the extra namenode.


On Fri, Aug 1, 2014 at 7:26 AM, Bryan Beaudreault <bb...@hubspot.com>
wrote:

> No worries!  Glad you had a test environment to play with this in.  Also,
> above I meant "If bootstrap fails...", not format of course :)
>
>
> On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> I realize that this was a foolish error made late in the day. I am no
>> hadoop expert,  and have much to learn. This is why I setup a test
>> environment.
>> On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
>> wrote:
>>
>>> Also you shouldn't format the new standby. You only format a namenode
>>> for a brand new cluster. Once a cluster is live you should just use the
>>> bootstrap on the new namenodes and never format again. Bootstrap is
>>> basically a special format that just creates the dirs and copies an active
>>> fsimage to the host.
>>>
>>> If format fails (it's buggy imo) just rsync from the active namenode. It
>>> will catch up by replaying the edits from the QJM when it is started.
>>>
>>> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
>>> wrote:
>>>
>>>> You should first replace the namenode, then when that is completely
>>>> finished move on to replacing any journal nodes. That part is easy:
>>>>
>>>> 1) bootstrap new JN (rsync from an existing)
>>>> 2) Start new JN
>>>> 3) push hdfs-site.xml to both namenodes
>>>> 4) restart standby namenode
>>>> 5) verify logs and admin ui show new JN
>>>> 6) restart active namenode
>>>> 7) verify both namenodes (failover should have happened and old standby
>>>> should be writing to the new JN)
>>>>
>>>> You can remove an existing JN at the same time if you want, just be
>>>> careful to preserve the majority of the quorum during the whole operation
>>>> (I.e only replace 1 at a time).
>>>>
>>>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>>>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>>>> journal nodes not being safe unless you roll edits. So that would go for
>>>> replacing too.
>>>>
>>>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu>
>>>> wrote:
>>>>
>>>>> I will run through the procedure again tomorrow. It was late in the
>>>>> day before I had a chance to test the procedure.
>>>>>
>>>>> If I recall correctly I had an issue formatting the New standby,
>>>>> before bootstrapping.  I think either at that point, or during the
>>>>> Zookeeper format command,  I was queried  to format the journal to the 3
>>>>> hosts in the quorum.I was unable to proceed without exception unless
>>>>> choosing this option .
>>>>>
>>>>> Are there any concerns adding another journal node to the new standby?
>>>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>>>> wrote:
>>>>>
>>>>>> This shouldn't have affected the journalnodes at all -- they are
>>>>>> mostly unaware of the zkfc and active/standby state.  Did you do something
>>>>>> else that may have impacted the journalnodes? (i.e. shut down 1 or more of
>>>>>> them, or something else)
>>>>>>
>>>>>> For your previous 2 emails, reporting errors/warns when doing
>>>>>> -formatZK:
>>>>>>
>>>>>> The WARN is fine.  It's true that you could get in a weird state if
>>>>>> you had multiple namenodes up.  But with just 1 namenode up, you should be
>>>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>>>> a running cluster.  I should have mentioned you need to use the -force
>>>>>> argument to get around that.
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>>
>>>>>>> However continuing with the process my QJM eventually error'd out
>>>>>>> and my Active NameNode went down.
>>>>>>>
>>>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>>>> QuorumOutputStream starting at txid 9634
>>>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu> wrote:
>>>>>>>
>>>>>>>> I tried a third time and it just worked?
>>>>>>>>
>>>>>>>> sudo hdfs zkfc -formatZK
>>>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>>>> GMT
>>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>>>> =rhel1.local
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>>>> Corporation
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>>>> sessionTimeout=5000 watcher=null
>>>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will
>>>>>>>> not attempt to authenticate using SASL (unknown error)
>>>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>>>> connection established to rhel1.local/10.120.5.203:2181,
>>>>>>>> initiating session
>>>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>>>  ===============================================
>>>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>>>> Are you sure you want to clear all failover information from
>>>>>>>> ZooKeeper?
>>>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>>>> failover controllers are stopped!
>>>>>>>> ===============================================
>>>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>>>> Y
>>>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>>>
>>>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>>>> discord@uw.edu> wrote:
>>>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>>>> earlier today.
>>>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>>>> NameNode in a
>>>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>>>> Seattle, feel
>>>>>>>>> > free to give me a shout out.
>>>>>>>>> >
>>>>>>>>> > ---------- Forwarded message ----------
>>>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a
>>>>>>>>> QJM / HA
>>>>>>>>> > configuration
>>>>>>>>> > To: user@hadoop.apache.org
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Hi Jing,
>>>>>>>>> >
>>>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>>>> jira.
>>>>>>>>> >
>>>>>>>>> > Best,
>>>>>>>>> >
>>>>>>>>> > Colin Williams
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <
>>>>>>>>> jing@hortonworks.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Hi Colin,
>>>>>>>>> >>
>>>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>>>> >>
>>>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>>>> since in
>>>>>>>>> >> the current implementation the SBN tries to send rollEditLog
>>>>>>>>> RPC request to
>>>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>>>> original ANN
>>>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>>>> for NN.
>>>>>>>>> >> Look at the code in BPOfferService:
>>>>>>>>> >>
>>>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>>>> >> IOException {
>>>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>>>> >>     }
>>>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>>>> >>
>>>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs,
>>>>>>>>> newAddrs).isEmpty()) {
>>>>>>>>> >>       // Keep things simple for now -- we can implement this at
>>>>>>>>> a later
>>>>>>>>> >> date.
>>>>>>>>> >>       throw new IOException(
>>>>>>>>> >>           "HA does not currently support adding a new standby
>>>>>>>>> to a running
>>>>>>>>> >> DN. " +
>>>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure
>>>>>>>>> the list of
>>>>>>>>> >> NNs.");
>>>>>>>>> >>     }
>>>>>>>>> >>   }
>>>>>>>>> >>
>>>>>>>>> >> 3. If you're using automatic failover, you also need to update
>>>>>>>>> the
>>>>>>>>> >> configuration of the ZKFC on the current ANN machine, since
>>>>>>>>> ZKFC will do
>>>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the
>>>>>>>>> new SBN but I
>>>>>>>>> >> have not tried before.
>>>>>>>>> >>
>>>>>>>>> >>     Thus in general we may still have to restart all the
>>>>>>>>> services (except
>>>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>>>> restart
>>>>>>>>> >> process I guess:
>>>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the
>>>>>>>>> new SBN.
>>>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a
>>>>>>>>> rolling restart
>>>>>>>>> >> of all the DN to update their configurations
>>>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and
>>>>>>>>> update their
>>>>>>>>> >> configuration. The new SBN should become active.
>>>>>>>>> >>
>>>>>>>>> >>     I have not tried the upper steps, thus please let me know
>>>>>>>>> if this
>>>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>>>> steps in
>>>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>>>> >>
>>>>>>>>> >> Thanks,
>>>>>>>>> >> -Jing
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>>>> discord@uw.edu>
>>>>>>>>> >> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Hello,
>>>>>>>>> >>>
>>>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>>>> configuration. I
>>>>>>>>> >>> believe the steps to achieve this would be something similar
>>>>>>>>> to:
>>>>>>>>> >>>
>>>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>>>> standby. Or
>>>>>>>>> >>> rsync if the command fails.
>>>>>>>>> >>>
>>>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>>>> journal to the
>>>>>>>>> >>> new standby
>>>>>>>>> >>>
>>>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>>>> replacment
>>>>>>>>> >>> standby.
>>>>>>>>> >>>
>>>>>>>>> >>> Start the replacment standby
>>>>>>>>> >>>
>>>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>>>> NameNode
>>>>>>>>> >>> configuration.
>>>>>>>>> >>>
>>>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>>>> going about
>>>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> Regards,
>>>>>>>>> >>>
>>>>>>>>> >>> Colin Williams
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>>>> >> NOTICE: This message is intended for the use of the individual
>>>>>>>>> or entity
>>>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>>>> confidential,
>>>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>>>> the reader of
>>>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any
>>>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>>>> forwarding of
>>>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>>>> this
>>>>>>>>> >> communication in error, please contact the sender immediately
>>>>>>>>> and delete it
>>>>>>>>> >> from your system. Thank You.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
The test environment is a 6 node virtualbox cluster run on 2 desktops :] 7
with the extra namenode.


On Fri, Aug 1, 2014 at 7:26 AM, Bryan Beaudreault <bb...@hubspot.com>
wrote:

> No worries!  Glad you had a test environment to play with this in.  Also,
> above I meant "If bootstrap fails...", not format of course :)
>
>
> On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> I realize that this was a foolish error made late in the day. I am no
>> hadoop expert,  and have much to learn. This is why I setup a test
>> environment.
>> On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
>> wrote:
>>
>>> Also you shouldn't format the new standby. You only format a namenode
>>> for a brand new cluster. Once a cluster is live you should just use the
>>> bootstrap on the new namenodes and never format again. Bootstrap is
>>> basically a special format that just creates the dirs and copies an active
>>> fsimage to the host.
>>>
>>> If format fails (it's buggy imo) just rsync from the active namenode. It
>>> will catch up by replaying the edits from the QJM when it is started.
>>>
>>> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
>>> wrote:
>>>
>>>> You should first replace the namenode, then when that is completely
>>>> finished move on to replacing any journal nodes. That part is easy:
>>>>
>>>> 1) bootstrap new JN (rsync from an existing)
>>>> 2) Start new JN
>>>> 3) push hdfs-site.xml to both namenodes
>>>> 4) restart standby namenode
>>>> 5) verify logs and admin ui show new JN
>>>> 6) restart active namenode
>>>> 7) verify both namenodes (failover should have happened and old standby
>>>> should be writing to the new JN)
>>>>
>>>> You can remove an existing JN at the same time if you want, just be
>>>> careful to preserve the majority of the quorum during the whole operation
>>>> (I.e only replace 1 at a time).
>>>>
>>>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>>>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>>>> journal nodes not being safe unless you roll edits. So that would go for
>>>> replacing too.
>>>>
>>>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu>
>>>> wrote:
>>>>
>>>>> I will run through the procedure again tomorrow. It was late in the
>>>>> day before I had a chance to test the procedure.
>>>>>
>>>>> If I recall correctly I had an issue formatting the New standby,
>>>>> before bootstrapping.  I think either at that point, or during the
>>>>> Zookeeper format command,  I was queried  to format the journal to the 3
>>>>> hosts in the quorum.I was unable to proceed without exception unless
>>>>> choosing this option .
>>>>>
>>>>> Are there any concerns adding another journal node to the new standby?
>>>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>>>> wrote:
>>>>>
>>>>>> This shouldn't have affected the journalnodes at all -- they are
>>>>>> mostly unaware of the zkfc and active/standby state.  Did you do something
>>>>>> else that may have impacted the journalnodes? (i.e. shut down 1 or more of
>>>>>> them, or something else)
>>>>>>
>>>>>> For your previous 2 emails, reporting errors/warns when doing
>>>>>> -formatZK:
>>>>>>
>>>>>> The WARN is fine.  It's true that you could get in a weird state if
>>>>>> you had multiple namenodes up.  But with just 1 namenode up, you should be
>>>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>>>> a running cluster.  I should have mentioned you need to use the -force
>>>>>> argument to get around that.
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>>
>>>>>>> However continuing with the process my QJM eventually error'd out
>>>>>>> and my Active NameNode went down.
>>>>>>>
>>>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>>>> QuorumOutputStream starting at txid 9634
>>>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu> wrote:
>>>>>>>
>>>>>>>> I tried a third time and it just worked?
>>>>>>>>
>>>>>>>> sudo hdfs zkfc -formatZK
>>>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>>>> GMT
>>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>>>> =rhel1.local
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>>>> Corporation
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>>>> sessionTimeout=5000 watcher=null
>>>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will
>>>>>>>> not attempt to authenticate using SASL (unknown error)
>>>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>>>> connection established to rhel1.local/10.120.5.203:2181,
>>>>>>>> initiating session
>>>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>>>  ===============================================
>>>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>>>> Are you sure you want to clear all failover information from
>>>>>>>> ZooKeeper?
>>>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>>>> failover controllers are stopped!
>>>>>>>> ===============================================
>>>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>>>> Y
>>>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>>>
>>>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>>>> discord@uw.edu> wrote:
>>>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>>>> earlier today.
>>>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>>>> NameNode in a
>>>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>>>> Seattle, feel
>>>>>>>>> > free to give me a shout out.
>>>>>>>>> >
>>>>>>>>> > ---------- Forwarded message ----------
>>>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a
>>>>>>>>> QJM / HA
>>>>>>>>> > configuration
>>>>>>>>> > To: user@hadoop.apache.org
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Hi Jing,
>>>>>>>>> >
>>>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>>>> jira.
>>>>>>>>> >
>>>>>>>>> > Best,
>>>>>>>>> >
>>>>>>>>> > Colin Williams
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <
>>>>>>>>> jing@hortonworks.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Hi Colin,
>>>>>>>>> >>
>>>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>>>> >>
>>>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>>>> since in
>>>>>>>>> >> the current implementation the SBN tries to send rollEditLog
>>>>>>>>> RPC request to
>>>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>>>> original ANN
>>>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>>>> for NN.
>>>>>>>>> >> Look at the code in BPOfferService:
>>>>>>>>> >>
>>>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>>>> >> IOException {
>>>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>>>> >>     }
>>>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>>>> >>
>>>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs,
>>>>>>>>> newAddrs).isEmpty()) {
>>>>>>>>> >>       // Keep things simple for now -- we can implement this at
>>>>>>>>> a later
>>>>>>>>> >> date.
>>>>>>>>> >>       throw new IOException(
>>>>>>>>> >>           "HA does not currently support adding a new standby
>>>>>>>>> to a running
>>>>>>>>> >> DN. " +
>>>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure
>>>>>>>>> the list of
>>>>>>>>> >> NNs.");
>>>>>>>>> >>     }
>>>>>>>>> >>   }
>>>>>>>>> >>
>>>>>>>>> >> 3. If you're using automatic failover, you also need to update
>>>>>>>>> the
>>>>>>>>> >> configuration of the ZKFC on the current ANN machine, since
>>>>>>>>> ZKFC will do
>>>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the
>>>>>>>>> new SBN but I
>>>>>>>>> >> have not tried before.
>>>>>>>>> >>
>>>>>>>>> >>     Thus in general we may still have to restart all the
>>>>>>>>> services (except
>>>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>>>> restart
>>>>>>>>> >> process I guess:
>>>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the
>>>>>>>>> new SBN.
>>>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a
>>>>>>>>> rolling restart
>>>>>>>>> >> of all the DN to update their configurations
>>>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and
>>>>>>>>> update their
>>>>>>>>> >> configuration. The new SBN should become active.
>>>>>>>>> >>
>>>>>>>>> >>     I have not tried the upper steps, thus please let me know
>>>>>>>>> if this
>>>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>>>> steps in
>>>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>>>> >>
>>>>>>>>> >> Thanks,
>>>>>>>>> >> -Jing
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>>>> discord@uw.edu>
>>>>>>>>> >> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Hello,
>>>>>>>>> >>>
>>>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>>>> configuration. I
>>>>>>>>> >>> believe the steps to achieve this would be something similar
>>>>>>>>> to:
>>>>>>>>> >>>
>>>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>>>> standby. Or
>>>>>>>>> >>> rsync if the command fails.
>>>>>>>>> >>>
>>>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>>>> journal to the
>>>>>>>>> >>> new standby
>>>>>>>>> >>>
>>>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>>>> replacment
>>>>>>>>> >>> standby.
>>>>>>>>> >>>
>>>>>>>>> >>> Start the replacment standby
>>>>>>>>> >>>
>>>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>>>> NameNode
>>>>>>>>> >>> configuration.
>>>>>>>>> >>>
>>>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>>>> going about
>>>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> Regards,
>>>>>>>>> >>>
>>>>>>>>> >>> Colin Williams
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>>>> >> NOTICE: This message is intended for the use of the individual
>>>>>>>>> or entity
>>>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>>>> confidential,
>>>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>>>> the reader of
>>>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any
>>>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>>>> forwarding of
>>>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>>>> this
>>>>>>>>> >> communication in error, please contact the sender immediately
>>>>>>>>> and delete it
>>>>>>>>> >> from your system. Thank You.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
The test environment is a 6 node virtualbox cluster run on 2 desktops :] 7
with the extra namenode.


On Fri, Aug 1, 2014 at 7:26 AM, Bryan Beaudreault <bb...@hubspot.com>
wrote:

> No worries!  Glad you had a test environment to play with this in.  Also,
> above I meant "If bootstrap fails...", not format of course :)
>
>
> On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> I realize that this was a foolish error made late in the day. I am no
>> hadoop expert,  and have much to learn. This is why I setup a test
>> environment.
>> On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
>> wrote:
>>
>>> Also you shouldn't format the new standby. You only format a namenode
>>> for a brand new cluster. Once a cluster is live you should just use the
>>> bootstrap on the new namenodes and never format again. Bootstrap is
>>> basically a special format that just creates the dirs and copies an active
>>> fsimage to the host.
>>>
>>> If format fails (it's buggy imo) just rsync from the active namenode. It
>>> will catch up by replaying the edits from the QJM when it is started.
>>>
>>> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
>>> wrote:
>>>
>>>> You should first replace the namenode, then when that is completely
>>>> finished move on to replacing any journal nodes. That part is easy:
>>>>
>>>> 1) bootstrap new JN (rsync from an existing)
>>>> 2) Start new JN
>>>> 3) push hdfs-site.xml to both namenodes
>>>> 4) restart standby namenode
>>>> 5) verify logs and admin ui show new JN
>>>> 6) restart active namenode
>>>> 7) verify both namenodes (failover should have happened and old standby
>>>> should be writing to the new JN)
>>>>
>>>> You can remove an existing JN at the same time if you want, just be
>>>> careful to preserve the majority of the quorum during the whole operation
>>>> (I.e only replace 1 at a time).
>>>>
>>>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>>>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>>>> journal nodes not being safe unless you roll edits. So that would go for
>>>> replacing too.
>>>>
>>>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu>
>>>> wrote:
>>>>
>>>>> I will run through the procedure again tomorrow. It was late in the
>>>>> day before I had a chance to test the procedure.
>>>>>
>>>>> If I recall correctly I had an issue formatting the New standby,
>>>>> before bootstrapping.  I think either at that point, or during the
>>>>> Zookeeper format command,  I was queried  to format the journal to the 3
>>>>> hosts in the quorum.I was unable to proceed without exception unless
>>>>> choosing this option .
>>>>>
>>>>> Are there any concerns adding another journal node to the new standby?
>>>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>>>> wrote:
>>>>>
>>>>>> This shouldn't have affected the journalnodes at all -- they are
>>>>>> mostly unaware of the zkfc and active/standby state.  Did you do something
>>>>>> else that may have impacted the journalnodes? (i.e. shut down 1 or more of
>>>>>> them, or something else)
>>>>>>
>>>>>> For your previous 2 emails, reporting errors/warns when doing
>>>>>> -formatZK:
>>>>>>
>>>>>> The WARN is fine.  It's true that you could get in a weird state if
>>>>>> you had multiple namenodes up.  But with just 1 namenode up, you should be
>>>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>>>> a running cluster.  I should have mentioned you need to use the -force
>>>>>> argument to get around that.
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>>
>>>>>>> However continuing with the process my QJM eventually error'd out
>>>>>>> and my Active NameNode went down.
>>>>>>>
>>>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>>> the next log roll.
>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>>> epoch 5 is not the current writer epoch  0
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>>  at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>  at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>  at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>>>> at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>>>  at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>>>> at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>>>> QuorumOutputStream starting at txid 9634
>>>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu> wrote:
>>>>>>>
>>>>>>>> I tried a third time and it just worked?
>>>>>>>>
>>>>>>>> sudo hdfs zkfc -formatZK
>>>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>>>> GMT
>>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>>>> =rhel1.local
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>>>> Corporation
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>>>> sessionTimeout=5000 watcher=null
>>>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will
>>>>>>>> not attempt to authenticate using SASL (unknown error)
>>>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>>>> connection established to rhel1.local/10.120.5.203:2181,
>>>>>>>> initiating session
>>>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>>>  ===============================================
>>>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>>>> Are you sure you want to clear all failover information from
>>>>>>>> ZooKeeper?
>>>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>>>> failover controllers are stopped!
>>>>>>>> ===============================================
>>>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>>>> Y
>>>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread]
>>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>>>
>>>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>>>> discord@uw.edu> wrote:
>>>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>>>> earlier today.
>>>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>>>> NameNode in a
>>>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>>>> Seattle, feel
>>>>>>>>> > free to give me a shout out.
>>>>>>>>> >
>>>>>>>>> > ---------- Forwarded message ----------
>>>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a
>>>>>>>>> QJM / HA
>>>>>>>>> > configuration
>>>>>>>>> > To: user@hadoop.apache.org
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Hi Jing,
>>>>>>>>> >
>>>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>>>> jira.
>>>>>>>>> >
>>>>>>>>> > Best,
>>>>>>>>> >
>>>>>>>>> > Colin Williams
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <
>>>>>>>>> jing@hortonworks.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Hi Colin,
>>>>>>>>> >>
>>>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>>>> >>
>>>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>>>> since in
>>>>>>>>> >> the current implementation the SBN tries to send rollEditLog
>>>>>>>>> RPC request to
>>>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>>>> original ANN
>>>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>>>> for NN.
>>>>>>>>> >> Look at the code in BPOfferService:
>>>>>>>>> >>
>>>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>>>> >> IOException {
>>>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>>>> >>     }
>>>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>>>> >>
>>>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs,
>>>>>>>>> newAddrs).isEmpty()) {
>>>>>>>>> >>       // Keep things simple for now -- we can implement this at
>>>>>>>>> a later
>>>>>>>>> >> date.
>>>>>>>>> >>       throw new IOException(
>>>>>>>>> >>           "HA does not currently support adding a new standby
>>>>>>>>> to a running
>>>>>>>>> >> DN. " +
>>>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure
>>>>>>>>> the list of
>>>>>>>>> >> NNs.");
>>>>>>>>> >>     }
>>>>>>>>> >>   }
>>>>>>>>> >>
>>>>>>>>> >> 3. If you're using automatic failover, you also need to update
>>>>>>>>> the
>>>>>>>>> >> configuration of the ZKFC on the current ANN machine, since
>>>>>>>>> ZKFC will do
>>>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the
>>>>>>>>> new SBN but I
>>>>>>>>> >> have not tried before.
>>>>>>>>> >>
>>>>>>>>> >>     Thus in general we may still have to restart all the
>>>>>>>>> services (except
>>>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>>>> restart
>>>>>>>>> >> process I guess:
>>>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the
>>>>>>>>> new SBN.
>>>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a
>>>>>>>>> rolling restart
>>>>>>>>> >> of all the DN to update their configurations
>>>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and
>>>>>>>>> update their
>>>>>>>>> >> configuration. The new SBN should become active.
>>>>>>>>> >>
>>>>>>>>> >>     I have not tried the upper steps, thus please let me know
>>>>>>>>> if this
>>>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>>>> steps in
>>>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>>>> >>
>>>>>>>>> >> Thanks,
>>>>>>>>> >> -Jing
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>>>> discord@uw.edu>
>>>>>>>>> >> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Hello,
>>>>>>>>> >>>
>>>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>>>> configuration. I
>>>>>>>>> >>> believe the steps to achieve this would be something similar
>>>>>>>>> to:
>>>>>>>>> >>>
>>>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>>>> standby. Or
>>>>>>>>> >>> rsync if the command fails.
>>>>>>>>> >>>
>>>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>>>> journal to the
>>>>>>>>> >>> new standby
>>>>>>>>> >>>
>>>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>>>> replacment
>>>>>>>>> >>> standby.
>>>>>>>>> >>>
>>>>>>>>> >>> Start the replacment standby
>>>>>>>>> >>>
>>>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>>>> NameNode
>>>>>>>>> >>> configuration.
>>>>>>>>> >>>
>>>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>>>> going about
>>>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> Regards,
>>>>>>>>> >>>
>>>>>>>>> >>> Colin Williams
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>>>> >> NOTICE: This message is intended for the use of the individual
>>>>>>>>> or entity
>>>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>>>> confidential,
>>>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>>>> the reader of
>>>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any
>>>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>>>> forwarding of
>>>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>>>> this
>>>>>>>>> >> communication in error, please contact the sender immediately
>>>>>>>>> and delete it
>>>>>>>>> >> from your system. Thank You.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
No worries!  Glad you had a test environment to play with this in.  Also,
above I meant "If bootstrap fails...", not format of course :)


On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> I realize that this was a foolish error made late in the day. I am no
> hadoop expert,  and have much to learn. This is why I setup a test
> environment.
> On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
> wrote:
>
>> Also you shouldn't format the new standby. You only format a namenode for
>> a brand new cluster. Once a cluster is live you should just use the
>> bootstrap on the new namenodes and never format again. Bootstrap is
>> basically a special format that just creates the dirs and copies an active
>> fsimage to the host.
>>
>> If format fails (it's buggy imo) just rsync from the active namenode. It
>> will catch up by replaying the edits from the QJM when it is started.
>>
>> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
>> wrote:
>>
>>> You should first replace the namenode, then when that is completely
>>> finished move on to replacing any journal nodes. That part is easy:
>>>
>>> 1) bootstrap new JN (rsync from an existing)
>>> 2) Start new JN
>>> 3) push hdfs-site.xml to both namenodes
>>> 4) restart standby namenode
>>> 5) verify logs and admin ui show new JN
>>> 6) restart active namenode
>>> 7) verify both namenodes (failover should have happened and old standby
>>> should be writing to the new JN)
>>>
>>> You can remove an existing JN at the same time if you want, just be
>>> careful to preserve the majority of the quorum during the whole operation
>>> (I.e only replace 1 at a time).
>>>
>>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>>> journal nodes not being safe unless you roll edits. So that would go for
>>> replacing too.
>>>
>>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> I will run through the procedure again tomorrow. It was late in the day
>>>> before I had a chance to test the procedure.
>>>>
>>>> If I recall correctly I had an issue formatting the New standby, before
>>>> bootstrapping.  I think either at that point, or during the Zookeeper
>>>> format command,  I was queried  to format the journal to the 3 hosts in the
>>>> quorum.I was unable to proceed without exception unless choosing this
>>>> option .
>>>>
>>>> Are there any concerns adding another journal node to the new standby?
>>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>>> wrote:
>>>>
>>>>> This shouldn't have affected the journalnodes at all -- they are
>>>>> mostly unaware of the zkfc and active/standby state.  Did you do something
>>>>> else that may have impacted the journalnodes? (i.e. shut down 1 or more of
>>>>> them, or something else)
>>>>>
>>>>> For your previous 2 emails, reporting errors/warns when doing
>>>>> -formatZK:
>>>>>
>>>>> The WARN is fine.  It's true that you could get in a weird state if
>>>>> you had multiple namenodes up.  But with just 1 namenode up, you should be
>>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>>> a running cluster.  I should have mentioned you need to use the -force
>>>>> argument to get around that.
>>>>>
>>>>>
>>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> However continuing with the process my QJM eventually error'd out and
>>>>>> my Active NameNode went down.
>>>>>>
>>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>>> QuorumOutputStream starting at txid 9634
>>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>>
>>>>>>> I tried a third time and it just worked?
>>>>>>>
>>>>>>> sudo hdfs zkfc -formatZK
>>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>>> GMT
>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>>> =rhel1.local
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>>> Corporation
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>>> sessionTimeout=5000 watcher=null
>>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>>>> attempt to authenticate using SASL (unknown error)
>>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>>>> session
>>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>>  ===============================================
>>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>>> Are you sure you want to clear all failover information from
>>>>>>> ZooKeeper?
>>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>>> failover controllers are stopped!
>>>>>>> ===============================================
>>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>>> Y
>>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>>
>>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>>> discord@uw.edu> wrote:
>>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>>> earlier today.
>>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>>> NameNode in a
>>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>>> Seattle, feel
>>>>>>>> > free to give me a shout out.
>>>>>>>> >
>>>>>>>> > ---------- Forwarded message ----------
>>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a
>>>>>>>> QJM / HA
>>>>>>>> > configuration
>>>>>>>> > To: user@hadoop.apache.org
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Hi Jing,
>>>>>>>> >
>>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>>> jira.
>>>>>>>> >
>>>>>>>> > Best,
>>>>>>>> >
>>>>>>>> > Colin Williams
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> Hi Colin,
>>>>>>>> >>
>>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>>> >>
>>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>>> since in
>>>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>>>> request to
>>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>>> original ANN
>>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>>> for NN.
>>>>>>>> >> Look at the code in BPOfferService:
>>>>>>>> >>
>>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>>> >> IOException {
>>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>>> >>     }
>>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>>> >>
>>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
>>>>>>>> {
>>>>>>>> >>       // Keep things simple for now -- we can implement this at
>>>>>>>> a later
>>>>>>>> >> date.
>>>>>>>> >>       throw new IOException(
>>>>>>>> >>           "HA does not currently support adding a new standby to
>>>>>>>> a running
>>>>>>>> >> DN. " +
>>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>>>> list of
>>>>>>>> >> NNs.");
>>>>>>>> >>     }
>>>>>>>> >>   }
>>>>>>>> >>
>>>>>>>> >> 3. If you're using automatic failover, you also need to update
>>>>>>>> the
>>>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>>>> will do
>>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>>>> SBN but I
>>>>>>>> >> have not tried before.
>>>>>>>> >>
>>>>>>>> >>     Thus in general we may still have to restart all the
>>>>>>>> services (except
>>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>>> restart
>>>>>>>> >> process I guess:
>>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the
>>>>>>>> new SBN.
>>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>>>> restart
>>>>>>>> >> of all the DN to update their configurations
>>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and
>>>>>>>> update their
>>>>>>>> >> configuration. The new SBN should become active.
>>>>>>>> >>
>>>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>>>> this
>>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>>> steps in
>>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >> -Jing
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>>> discord@uw.edu>
>>>>>>>> >> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Hello,
>>>>>>>> >>>
>>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>>> configuration. I
>>>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>>>> >>>
>>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>>> standby. Or
>>>>>>>> >>> rsync if the command fails.
>>>>>>>> >>>
>>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>>> journal to the
>>>>>>>> >>> new standby
>>>>>>>> >>>
>>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>>> replacment
>>>>>>>> >>> standby.
>>>>>>>> >>>
>>>>>>>> >>> Start the replacment standby
>>>>>>>> >>>
>>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>>> NameNode
>>>>>>>> >>> configuration.
>>>>>>>> >>>
>>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>>> going about
>>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> Regards,
>>>>>>>> >>>
>>>>>>>> >>> Colin Williams
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>>> >> NOTICE: This message is intended for the use of the individual
>>>>>>>> or entity
>>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>>> confidential,
>>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>>> the reader of
>>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>>> notified that any
>>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>>> forwarding of
>>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>>> this
>>>>>>>> >> communication in error, please contact the sender immediately
>>>>>>>> and delete it
>>>>>>>> >> from your system. Thank You.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
No worries!  Glad you had a test environment to play with this in.  Also,
above I meant "If bootstrap fails...", not format of course :)


On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> I realize that this was a foolish error made late in the day. I am no
> hadoop expert,  and have much to learn. This is why I setup a test
> environment.
> On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
> wrote:
>
>> Also you shouldn't format the new standby. You only format a namenode for
>> a brand new cluster. Once a cluster is live you should just use the
>> bootstrap on the new namenodes and never format again. Bootstrap is
>> basically a special format that just creates the dirs and copies an active
>> fsimage to the host.
>>
>> If format fails (it's buggy imo) just rsync from the active namenode. It
>> will catch up by replaying the edits from the QJM when it is started.
>>
>> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
>> wrote:
>>
>>> You should first replace the namenode, then when that is completely
>>> finished move on to replacing any journal nodes. That part is easy:
>>>
>>> 1) bootstrap new JN (rsync from an existing)
>>> 2) Start new JN
>>> 3) push hdfs-site.xml to both namenodes
>>> 4) restart standby namenode
>>> 5) verify logs and admin ui show new JN
>>> 6) restart active namenode
>>> 7) verify both namenodes (failover should have happened and old standby
>>> should be writing to the new JN)
>>>
>>> You can remove an existing JN at the same time if you want, just be
>>> careful to preserve the majority of the quorum during the whole operation
>>> (I.e only replace 1 at a time).
>>>
>>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>>> journal nodes not being safe unless you roll edits. So that would go for
>>> replacing too.
>>>
>>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> I will run through the procedure again tomorrow. It was late in the day
>>>> before I had a chance to test the procedure.
>>>>
>>>> If I recall correctly I had an issue formatting the New standby, before
>>>> bootstrapping.  I think either at that point, or during the Zookeeper
>>>> format command,  I was queried  to format the journal to the 3 hosts in the
>>>> quorum.I was unable to proceed without exception unless choosing this
>>>> option .
>>>>
>>>> Are there any concerns adding another journal node to the new standby?
>>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>>> wrote:
>>>>
>>>>> This shouldn't have affected the journalnodes at all -- they are
>>>>> mostly unaware of the zkfc and active/standby state.  Did you do something
>>>>> else that may have impacted the journalnodes? (i.e. shut down 1 or more of
>>>>> them, or something else)
>>>>>
>>>>> For your previous 2 emails, reporting errors/warns when doing
>>>>> -formatZK:
>>>>>
>>>>> The WARN is fine.  It's true that you could get in a weird state if
>>>>> you had multiple namenodes up.  But with just 1 namenode up, you should be
>>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>>> a running cluster.  I should have mentioned you need to use the -force
>>>>> argument to get around that.
>>>>>
>>>>>
>>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> However continuing with the process my QJM eventually error'd out and
>>>>>> my Active NameNode went down.
>>>>>>
>>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>>> QuorumOutputStream starting at txid 9634
>>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>>
>>>>>>> I tried a third time and it just worked?
>>>>>>>
>>>>>>> sudo hdfs zkfc -formatZK
>>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>>> GMT
>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>>> =rhel1.local
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>>> Corporation
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>>> sessionTimeout=5000 watcher=null
>>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>>>> attempt to authenticate using SASL (unknown error)
>>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>>>> session
>>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>>  ===============================================
>>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>>> Are you sure you want to clear all failover information from
>>>>>>> ZooKeeper?
>>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>>> failover controllers are stopped!
>>>>>>> ===============================================
>>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>>> Y
>>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>>
>>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>>> discord@uw.edu> wrote:
>>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>>> earlier today.
>>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>>> NameNode in a
>>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>>> Seattle, feel
>>>>>>>> > free to give me a shout out.
>>>>>>>> >
>>>>>>>> > ---------- Forwarded message ----------
>>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a
>>>>>>>> QJM / HA
>>>>>>>> > configuration
>>>>>>>> > To: user@hadoop.apache.org
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Hi Jing,
>>>>>>>> >
>>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>>> jira.
>>>>>>>> >
>>>>>>>> > Best,
>>>>>>>> >
>>>>>>>> > Colin Williams
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> Hi Colin,
>>>>>>>> >>
>>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>>> >>
>>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>>> since in
>>>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>>>> request to
>>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>>> original ANN
>>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>>> for NN.
>>>>>>>> >> Look at the code in BPOfferService:
>>>>>>>> >>
>>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>>> >> IOException {
>>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>>> >>     }
>>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>>> >>
>>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
>>>>>>>> {
>>>>>>>> >>       // Keep things simple for now -- we can implement this at
>>>>>>>> a later
>>>>>>>> >> date.
>>>>>>>> >>       throw new IOException(
>>>>>>>> >>           "HA does not currently support adding a new standby to
>>>>>>>> a running
>>>>>>>> >> DN. " +
>>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>>>> list of
>>>>>>>> >> NNs.");
>>>>>>>> >>     }
>>>>>>>> >>   }
>>>>>>>> >>
>>>>>>>> >> 3. If you're using automatic failover, you also need to update
>>>>>>>> the
>>>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>>>> will do
>>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>>>> SBN but I
>>>>>>>> >> have not tried before.
>>>>>>>> >>
>>>>>>>> >>     Thus in general we may still have to restart all the
>>>>>>>> services (except
>>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>>> restart
>>>>>>>> >> process I guess:
>>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the
>>>>>>>> new SBN.
>>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>>>> restart
>>>>>>>> >> of all the DN to update their configurations
>>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and
>>>>>>>> update their
>>>>>>>> >> configuration. The new SBN should become active.
>>>>>>>> >>
>>>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>>>> this
>>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>>> steps in
>>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >> -Jing
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>>> discord@uw.edu>
>>>>>>>> >> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Hello,
>>>>>>>> >>>
>>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>>> configuration. I
>>>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>>>> >>>
>>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>>> standby. Or
>>>>>>>> >>> rsync if the command fails.
>>>>>>>> >>>
>>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>>> journal to the
>>>>>>>> >>> new standby
>>>>>>>> >>>
>>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>>> replacment
>>>>>>>> >>> standby.
>>>>>>>> >>>
>>>>>>>> >>> Start the replacment standby
>>>>>>>> >>>
>>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>>> NameNode
>>>>>>>> >>> configuration.
>>>>>>>> >>>
>>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>>> going about
>>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> Regards,
>>>>>>>> >>>
>>>>>>>> >>> Colin Williams
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>>> >> NOTICE: This message is intended for the use of the individual
>>>>>>>> or entity
>>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>>> confidential,
>>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>>> the reader of
>>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>>> notified that any
>>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>>> forwarding of
>>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>>> this
>>>>>>>> >> communication in error, please contact the sender immediately
>>>>>>>> and delete it
>>>>>>>> >> from your system. Thank You.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
No worries!  Glad you had a test environment to play with this in.  Also,
above I meant "If bootstrap fails...", not format of course :)


On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> I realize that this was a foolish error made late in the day. I am no
> hadoop expert,  and have much to learn. This is why I setup a test
> environment.
> On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
> wrote:
>
>> Also you shouldn't format the new standby. You only format a namenode for
>> a brand new cluster. Once a cluster is live you should just use the
>> bootstrap on the new namenodes and never format again. Bootstrap is
>> basically a special format that just creates the dirs and copies an active
>> fsimage to the host.
>>
>> If format fails (it's buggy imo) just rsync from the active namenode. It
>> will catch up by replaying the edits from the QJM when it is started.
>>
>> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
>> wrote:
>>
>>> You should first replace the namenode, then when that is completely
>>> finished move on to replacing any journal nodes. That part is easy:
>>>
>>> 1) bootstrap new JN (rsync from an existing)
>>> 2) Start new JN
>>> 3) push hdfs-site.xml to both namenodes
>>> 4) restart standby namenode
>>> 5) verify logs and admin ui show new JN
>>> 6) restart active namenode
>>> 7) verify both namenodes (failover should have happened and old standby
>>> should be writing to the new JN)
>>>
>>> You can remove an existing JN at the same time if you want, just be
>>> careful to preserve the majority of the quorum during the whole operation
>>> (I.e only replace 1 at a time).
>>>
>>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>>> journal nodes not being safe unless you roll edits. So that would go for
>>> replacing too.
>>>
>>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> I will run through the procedure again tomorrow. It was late in the day
>>>> before I had a chance to test the procedure.
>>>>
>>>> If I recall correctly I had an issue formatting the New standby, before
>>>> bootstrapping.  I think either at that point, or during the Zookeeper
>>>> format command,  I was queried  to format the journal to the 3 hosts in the
>>>> quorum.I was unable to proceed without exception unless choosing this
>>>> option .
>>>>
>>>> Are there any concerns adding another journal node to the new standby?
>>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>>> wrote:
>>>>
>>>>> This shouldn't have affected the journalnodes at all -- they are
>>>>> mostly unaware of the zkfc and active/standby state.  Did you do something
>>>>> else that may have impacted the journalnodes? (i.e. shut down 1 or more of
>>>>> them, or something else)
>>>>>
>>>>> For your previous 2 emails, reporting errors/warns when doing
>>>>> -formatZK:
>>>>>
>>>>> The WARN is fine.  It's true that you could get in a weird state if
>>>>> you had multiple namenodes up.  But with just 1 namenode up, you should be
>>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>>> a running cluster.  I should have mentioned you need to use the -force
>>>>> argument to get around that.
>>>>>
>>>>>
>>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> However continuing with the process my QJM eventually error'd out and
>>>>>> my Active NameNode went down.
>>>>>>
>>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>>> QuorumOutputStream starting at txid 9634
>>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>>
>>>>>>> I tried a third time and it just worked?
>>>>>>>
>>>>>>> sudo hdfs zkfc -formatZK
>>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>>> GMT
>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>>> =rhel1.local
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>>> Corporation
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>>> sessionTimeout=5000 watcher=null
>>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>>>> attempt to authenticate using SASL (unknown error)
>>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>>>> session
>>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>>  ===============================================
>>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>>> Are you sure you want to clear all failover information from
>>>>>>> ZooKeeper?
>>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>>> failover controllers are stopped!
>>>>>>> ===============================================
>>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>>> Y
>>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>>
>>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>>> discord@uw.edu> wrote:
>>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>>> earlier today.
>>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>>> NameNode in a
>>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>>> Seattle, feel
>>>>>>>> > free to give me a shout out.
>>>>>>>> >
>>>>>>>> > ---------- Forwarded message ----------
>>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a
>>>>>>>> QJM / HA
>>>>>>>> > configuration
>>>>>>>> > To: user@hadoop.apache.org
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Hi Jing,
>>>>>>>> >
>>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>>> jira.
>>>>>>>> >
>>>>>>>> > Best,
>>>>>>>> >
>>>>>>>> > Colin Williams
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> Hi Colin,
>>>>>>>> >>
>>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>>> >>
>>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>>> since in
>>>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>>>> request to
>>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>>> original ANN
>>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>>> for NN.
>>>>>>>> >> Look at the code in BPOfferService:
>>>>>>>> >>
>>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>>> >> IOException {
>>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>>> >>     }
>>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>>> >>
>>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
>>>>>>>> {
>>>>>>>> >>       // Keep things simple for now -- we can implement this at
>>>>>>>> a later
>>>>>>>> >> date.
>>>>>>>> >>       throw new IOException(
>>>>>>>> >>           "HA does not currently support adding a new standby to
>>>>>>>> a running
>>>>>>>> >> DN. " +
>>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>>>> list of
>>>>>>>> >> NNs.");
>>>>>>>> >>     }
>>>>>>>> >>   }
>>>>>>>> >>
>>>>>>>> >> 3. If you're using automatic failover, you also need to update
>>>>>>>> the
>>>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>>>> will do
>>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>>>> SBN but I
>>>>>>>> >> have not tried before.
>>>>>>>> >>
>>>>>>>> >>     Thus in general we may still have to restart all the
>>>>>>>> services (except
>>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>>> restart
>>>>>>>> >> process I guess:
>>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the
>>>>>>>> new SBN.
>>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>>>> restart
>>>>>>>> >> of all the DN to update their configurations
>>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and
>>>>>>>> update their
>>>>>>>> >> configuration. The new SBN should become active.
>>>>>>>> >>
>>>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>>>> this
>>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>>> steps in
>>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >> -Jing
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>>> discord@uw.edu>
>>>>>>>> >> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Hello,
>>>>>>>> >>>
>>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>>> configuration. I
>>>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>>>> >>>
>>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>>> standby. Or
>>>>>>>> >>> rsync if the command fails.
>>>>>>>> >>>
>>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>>> journal to the
>>>>>>>> >>> new standby
>>>>>>>> >>>
>>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>>> replacment
>>>>>>>> >>> standby.
>>>>>>>> >>>
>>>>>>>> >>> Start the replacment standby
>>>>>>>> >>>
>>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>>> NameNode
>>>>>>>> >>> configuration.
>>>>>>>> >>>
>>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>>> going about
>>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> Regards,
>>>>>>>> >>>
>>>>>>>> >>> Colin Williams
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>>> >> NOTICE: This message is intended for the use of the individual
>>>>>>>> or entity
>>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>>> confidential,
>>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>>> the reader of
>>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>>> notified that any
>>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>>> forwarding of
>>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>>> this
>>>>>>>> >> communication in error, please contact the sender immediately
>>>>>>>> and delete it
>>>>>>>> >> from your system. Thank You.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
No worries!  Glad you had a test environment to play with this in.  Also,
above I meant "If bootstrap fails...", not format of course :)


On Fri, Aug 1, 2014 at 10:24 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> I realize that this was a foolish error made late in the day. I am no
> hadoop expert,  and have much to learn. This is why I setup a test
> environment.
> On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
> wrote:
>
>> Also you shouldn't format the new standby. You only format a namenode for
>> a brand new cluster. Once a cluster is live you should just use the
>> bootstrap on the new namenodes and never format again. Bootstrap is
>> basically a special format that just creates the dirs and copies an active
>> fsimage to the host.
>>
>> If format fails (it's buggy imo) just rsync from the active namenode. It
>> will catch up by replaying the edits from the QJM when it is started.
>>
>> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
>> wrote:
>>
>>> You should first replace the namenode, then when that is completely
>>> finished move on to replacing any journal nodes. That part is easy:
>>>
>>> 1) bootstrap new JN (rsync from an existing)
>>> 2) Start new JN
>>> 3) push hdfs-site.xml to both namenodes
>>> 4) restart standby namenode
>>> 5) verify logs and admin ui show new JN
>>> 6) restart active namenode
>>> 7) verify both namenodes (failover should have happened and old standby
>>> should be writing to the new JN)
>>>
>>> You can remove an existing JN at the same time if you want, just be
>>> careful to preserve the majority of the quorum during the whole operation
>>> (I.e only replace 1 at a time).
>>>
>>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>>> journal nodes not being safe unless you roll edits. So that would go for
>>> replacing too.
>>>
>>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> I will run through the procedure again tomorrow. It was late in the day
>>>> before I had a chance to test the procedure.
>>>>
>>>> If I recall correctly I had an issue formatting the New standby, before
>>>> bootstrapping.  I think either at that point, or during the Zookeeper
>>>> format command,  I was queried  to format the journal to the 3 hosts in the
>>>> quorum.I was unable to proceed without exception unless choosing this
>>>> option .
>>>>
>>>> Are there any concerns adding another journal node to the new standby?
>>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>>> wrote:
>>>>
>>>>> This shouldn't have affected the journalnodes at all -- they are
>>>>> mostly unaware of the zkfc and active/standby state.  Did you do something
>>>>> else that may have impacted the journalnodes? (i.e. shut down 1 or more of
>>>>> them, or something else)
>>>>>
>>>>> For your previous 2 emails, reporting errors/warns when doing
>>>>> -formatZK:
>>>>>
>>>>> The WARN is fine.  It's true that you could get in a weird state if
>>>>> you had multiple namenodes up.  But with just 1 namenode up, you should be
>>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>>> a running cluster.  I should have mentioned you need to use the -force
>>>>> argument to get around that.
>>>>>
>>>>>
>>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> However continuing with the process my QJM eventually error'd out and
>>>>>> my Active NameNode went down.
>>>>>>
>>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>>> the next log roll.
>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>>> epoch 5 is not the current writer epoch  0
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>>  at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>>
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>>> at
>>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>>  at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>>> at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>>> QuorumOutputStream starting at txid 9634
>>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>>
>>>>>>> I tried a third time and it just worked?
>>>>>>>
>>>>>>> sudo hdfs zkfc -formatZK
>>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>>> GMT
>>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>>> =rhel1.local
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>>> Corporation
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>>> sessionTimeout=5000 watcher=null
>>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>>>> attempt to authenticate using SASL (unknown error)
>>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>>>> session
>>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>>  ===============================================
>>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>>> Are you sure you want to clear all failover information from
>>>>>>> ZooKeeper?
>>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>>> failover controllers are stopped!
>>>>>>> ===============================================
>>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>>> Y
>>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread]
>>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>>
>>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>>> discord@uw.edu> wrote:
>>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>>> earlier today.
>>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>>> NameNode in a
>>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>>> Seattle, feel
>>>>>>>> > free to give me a shout out.
>>>>>>>> >
>>>>>>>> > ---------- Forwarded message ----------
>>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a
>>>>>>>> QJM / HA
>>>>>>>> > configuration
>>>>>>>> > To: user@hadoop.apache.org
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Hi Jing,
>>>>>>>> >
>>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>>> jira.
>>>>>>>> >
>>>>>>>> > Best,
>>>>>>>> >
>>>>>>>> > Colin Williams
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> Hi Colin,
>>>>>>>> >>
>>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>>> >>
>>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>>> since in
>>>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>>>> request to
>>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>>> original ANN
>>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>>> for NN.
>>>>>>>> >> Look at the code in BPOfferService:
>>>>>>>> >>
>>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>>> >> IOException {
>>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>>> >>     }
>>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>>> >>
>>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
>>>>>>>> {
>>>>>>>> >>       // Keep things simple for now -- we can implement this at
>>>>>>>> a later
>>>>>>>> >> date.
>>>>>>>> >>       throw new IOException(
>>>>>>>> >>           "HA does not currently support adding a new standby to
>>>>>>>> a running
>>>>>>>> >> DN. " +
>>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>>>> list of
>>>>>>>> >> NNs.");
>>>>>>>> >>     }
>>>>>>>> >>   }
>>>>>>>> >>
>>>>>>>> >> 3. If you're using automatic failover, you also need to update
>>>>>>>> the
>>>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>>>> will do
>>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>>>> SBN but I
>>>>>>>> >> have not tried before.
>>>>>>>> >>
>>>>>>>> >>     Thus in general we may still have to restart all the
>>>>>>>> services (except
>>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>>> restart
>>>>>>>> >> process I guess:
>>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the
>>>>>>>> new SBN.
>>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>>>> restart
>>>>>>>> >> of all the DN to update their configurations
>>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and
>>>>>>>> update their
>>>>>>>> >> configuration. The new SBN should become active.
>>>>>>>> >>
>>>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>>>> this
>>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>>> steps in
>>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >> -Jing
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>>> discord@uw.edu>
>>>>>>>> >> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Hello,
>>>>>>>> >>>
>>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>>> configuration. I
>>>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>>>> >>>
>>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>>> standby. Or
>>>>>>>> >>> rsync if the command fails.
>>>>>>>> >>>
>>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>>> journal to the
>>>>>>>> >>> new standby
>>>>>>>> >>>
>>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>>> replacment
>>>>>>>> >>> standby.
>>>>>>>> >>>
>>>>>>>> >>> Start the replacment standby
>>>>>>>> >>>
>>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>>> NameNode
>>>>>>>> >>> configuration.
>>>>>>>> >>>
>>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>>> going about
>>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> Regards,
>>>>>>>> >>>
>>>>>>>> >>> Colin Williams
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>>> >> NOTICE: This message is intended for the use of the individual
>>>>>>>> or entity
>>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>>> confidential,
>>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>>> the reader of
>>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>>> notified that any
>>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>>> forwarding of
>>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>>> this
>>>>>>>> >> communication in error, please contact the sender immediately
>>>>>>>> and delete it
>>>>>>>> >> from your system. Thank You.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I realize that this was a foolish error made late in the day. I am no
hadoop expert,  and have much to learn. This is why I setup a test
environment.
On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
wrote:

> Also you shouldn't format the new standby. You only format a namenode for
> a brand new cluster. Once a cluster is live you should just use the
> bootstrap on the new namenodes and never format again. Bootstrap is
> basically a special format that just creates the dirs and copies an active
> fsimage to the host.
>
> If format fails (it's buggy imo) just rsync from the active namenode. It
> will catch up by replaying the edits from the QJM when it is started.
>
> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
> wrote:
>
>> You should first replace the namenode, then when that is completely
>> finished move on to replacing any journal nodes. That part is easy:
>>
>> 1) bootstrap new JN (rsync from an existing)
>> 2) Start new JN
>> 3) push hdfs-site.xml to both namenodes
>> 4) restart standby namenode
>> 5) verify logs and admin ui show new JN
>> 6) restart active namenode
>> 7) verify both namenodes (failover should have happened and old standby
>> should be writing to the new JN)
>>
>> You can remove an existing JN at the same time if you want, just be
>> careful to preserve the majority of the quorum during the whole operation
>> (I.e only replace 1 at a time).
>>
>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>> journal nodes not being safe unless you roll edits. So that would go for
>> replacing too.
>>
>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu> wrote:
>>
>>> I will run through the procedure again tomorrow. It was late in the day
>>> before I had a chance to test the procedure.
>>>
>>> If I recall correctly I had an issue formatting the New standby, before
>>> bootstrapping.  I think either at that point, or during the Zookeeper
>>> format command,  I was queried  to format the journal to the 3 hosts in the
>>> quorum.I was unable to proceed without exception unless choosing this
>>> option .
>>>
>>> Are there any concerns adding another journal node to the new standby?
>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>> wrote:
>>>
>>>> This shouldn't have affected the journalnodes at all -- they are mostly
>>>> unaware of the zkfc and active/standby state.  Did you do something else
>>>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>>>> or something else)
>>>>
>>>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>>>
>>>> The WARN is fine.  It's true that you could get in a weird state if you
>>>> had multiple namenodes up.  But with just 1 namenode up, you should be
>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>> a running cluster.  I should have mentioned you need to use the -force
>>>> argument to get around that.
>>>>
>>>>
>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> However continuing with the process my QJM eventually error'd out and
>>>>> my Active NameNode went down.
>>>>>
>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>> at
>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>> QuorumOutputStream starting at txid 9634
>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> I tried a third time and it just worked?
>>>>>>
>>>>>> sudo hdfs zkfc -formatZK
>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>> GMT
>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>> =rhel1.local
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>> Corporation
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>> sessionTimeout=5000 watcher=null
>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>>> attempt to authenticate using SASL (unknown error)
>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>>> session
>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>  ===============================================
>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>> Are you sure you want to clear all failover information from
>>>>>> ZooKeeper?
>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>> failover controllers are stopped!
>>>>>> ===============================================
>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>> Y
>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu> wrote:
>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>> earlier today.
>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>> NameNode in a
>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>> Seattle, feel
>>>>>>> > free to give me a shout out.
>>>>>>> >
>>>>>>> > ---------- Forwarded message ----------
>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM
>>>>>>> / HA
>>>>>>> > configuration
>>>>>>> > To: user@hadoop.apache.org
>>>>>>> >
>>>>>>> >
>>>>>>> > Hi Jing,
>>>>>>> >
>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>> jira.
>>>>>>> >
>>>>>>> > Best,
>>>>>>> >
>>>>>>> > Colin Williams
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Hi Colin,
>>>>>>> >>
>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>> >>
>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>> since in
>>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>>> request to
>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>> original ANN
>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>> for NN.
>>>>>>> >> Look at the code in BPOfferService:
>>>>>>> >>
>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>> >> IOException {
>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>> >>     }
>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>> >>
>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>>>> later
>>>>>>> >> date.
>>>>>>> >>       throw new IOException(
>>>>>>> >>           "HA does not currently support adding a new standby to
>>>>>>> a running
>>>>>>> >> DN. " +
>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>>> list of
>>>>>>> >> NNs.");
>>>>>>> >>     }
>>>>>>> >>   }
>>>>>>> >>
>>>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>>> will do
>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>>> SBN but I
>>>>>>> >> have not tried before.
>>>>>>> >>
>>>>>>> >>     Thus in general we may still have to restart all the services
>>>>>>> (except
>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>> restart
>>>>>>> >> process I guess:
>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>>>> SBN.
>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>>> restart
>>>>>>> >> of all the DN to update their configurations
>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>>>> their
>>>>>>> >> configuration. The new SBN should become active.
>>>>>>> >>
>>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>>> this
>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>> steps in
>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>> >>
>>>>>>> >> Thanks,
>>>>>>> >> -Jing
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> Hello,
>>>>>>> >>>
>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>> configuration. I
>>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>>> >>>
>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>> standby. Or
>>>>>>> >>> rsync if the command fails.
>>>>>>> >>>
>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>> journal to the
>>>>>>> >>> new standby
>>>>>>> >>>
>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>> replacment
>>>>>>> >>> standby.
>>>>>>> >>>
>>>>>>> >>> Start the replacment standby
>>>>>>> >>>
>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>> NameNode
>>>>>>> >>> configuration.
>>>>>>> >>>
>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>> going about
>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Regards,
>>>>>>> >>>
>>>>>>> >>> Colin Williams
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity
>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>> confidential,
>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>> the reader of
>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>> notified that any
>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>> forwarding of
>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>> this
>>>>>>> >> communication in error, please contact the sender immediately and
>>>>>>> delete it
>>>>>>> >> from your system. Thank You.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I realize that this was a foolish error made late in the day. I am no
hadoop expert,  and have much to learn. This is why I setup a test
environment.
On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
wrote:

> Also you shouldn't format the new standby. You only format a namenode for
> a brand new cluster. Once a cluster is live you should just use the
> bootstrap on the new namenodes and never format again. Bootstrap is
> basically a special format that just creates the dirs and copies an active
> fsimage to the host.
>
> If format fails (it's buggy imo) just rsync from the active namenode. It
> will catch up by replaying the edits from the QJM when it is started.
>
> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
> wrote:
>
>> You should first replace the namenode, then when that is completely
>> finished move on to replacing any journal nodes. That part is easy:
>>
>> 1) bootstrap new JN (rsync from an existing)
>> 2) Start new JN
>> 3) push hdfs-site.xml to both namenodes
>> 4) restart standby namenode
>> 5) verify logs and admin ui show new JN
>> 6) restart active namenode
>> 7) verify both namenodes (failover should have happened and old standby
>> should be writing to the new JN)
>>
>> You can remove an existing JN at the same time if you want, just be
>> careful to preserve the majority of the quorum during the whole operation
>> (I.e only replace 1 at a time).
>>
>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>> journal nodes not being safe unless you roll edits. So that would go for
>> replacing too.
>>
>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu> wrote:
>>
>>> I will run through the procedure again tomorrow. It was late in the day
>>> before I had a chance to test the procedure.
>>>
>>> If I recall correctly I had an issue formatting the New standby, before
>>> bootstrapping.  I think either at that point, or during the Zookeeper
>>> format command,  I was queried  to format the journal to the 3 hosts in the
>>> quorum.I was unable to proceed without exception unless choosing this
>>> option .
>>>
>>> Are there any concerns adding another journal node to the new standby?
>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>> wrote:
>>>
>>>> This shouldn't have affected the journalnodes at all -- they are mostly
>>>> unaware of the zkfc and active/standby state.  Did you do something else
>>>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>>>> or something else)
>>>>
>>>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>>>
>>>> The WARN is fine.  It's true that you could get in a weird state if you
>>>> had multiple namenodes up.  But with just 1 namenode up, you should be
>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>> a running cluster.  I should have mentioned you need to use the -force
>>>> argument to get around that.
>>>>
>>>>
>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> However continuing with the process my QJM eventually error'd out and
>>>>> my Active NameNode went down.
>>>>>
>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>> at
>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>> QuorumOutputStream starting at txid 9634
>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> I tried a third time and it just worked?
>>>>>>
>>>>>> sudo hdfs zkfc -formatZK
>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>> GMT
>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>> =rhel1.local
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>> Corporation
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>> sessionTimeout=5000 watcher=null
>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>>> attempt to authenticate using SASL (unknown error)
>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>>> session
>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>  ===============================================
>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>> Are you sure you want to clear all failover information from
>>>>>> ZooKeeper?
>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>> failover controllers are stopped!
>>>>>> ===============================================
>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>> Y
>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu> wrote:
>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>> earlier today.
>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>> NameNode in a
>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>> Seattle, feel
>>>>>>> > free to give me a shout out.
>>>>>>> >
>>>>>>> > ---------- Forwarded message ----------
>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM
>>>>>>> / HA
>>>>>>> > configuration
>>>>>>> > To: user@hadoop.apache.org
>>>>>>> >
>>>>>>> >
>>>>>>> > Hi Jing,
>>>>>>> >
>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>> jira.
>>>>>>> >
>>>>>>> > Best,
>>>>>>> >
>>>>>>> > Colin Williams
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Hi Colin,
>>>>>>> >>
>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>> >>
>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>> since in
>>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>>> request to
>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>> original ANN
>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>> for NN.
>>>>>>> >> Look at the code in BPOfferService:
>>>>>>> >>
>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>> >> IOException {
>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>> >>     }
>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>> >>
>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>>>> later
>>>>>>> >> date.
>>>>>>> >>       throw new IOException(
>>>>>>> >>           "HA does not currently support adding a new standby to
>>>>>>> a running
>>>>>>> >> DN. " +
>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>>> list of
>>>>>>> >> NNs.");
>>>>>>> >>     }
>>>>>>> >>   }
>>>>>>> >>
>>>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>>> will do
>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>>> SBN but I
>>>>>>> >> have not tried before.
>>>>>>> >>
>>>>>>> >>     Thus in general we may still have to restart all the services
>>>>>>> (except
>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>> restart
>>>>>>> >> process I guess:
>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>>>> SBN.
>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>>> restart
>>>>>>> >> of all the DN to update their configurations
>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>>>> their
>>>>>>> >> configuration. The new SBN should become active.
>>>>>>> >>
>>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>>> this
>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>> steps in
>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>> >>
>>>>>>> >> Thanks,
>>>>>>> >> -Jing
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> Hello,
>>>>>>> >>>
>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>> configuration. I
>>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>>> >>>
>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>> standby. Or
>>>>>>> >>> rsync if the command fails.
>>>>>>> >>>
>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>> journal to the
>>>>>>> >>> new standby
>>>>>>> >>>
>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>> replacment
>>>>>>> >>> standby.
>>>>>>> >>>
>>>>>>> >>> Start the replacment standby
>>>>>>> >>>
>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>> NameNode
>>>>>>> >>> configuration.
>>>>>>> >>>
>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>> going about
>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Regards,
>>>>>>> >>>
>>>>>>> >>> Colin Williams
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity
>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>> confidential,
>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>> the reader of
>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>> notified that any
>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>> forwarding of
>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>> this
>>>>>>> >> communication in error, please contact the sender immediately and
>>>>>>> delete it
>>>>>>> >> from your system. Thank You.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I realize that this was a foolish error made late in the day. I am no
hadoop expert,  and have much to learn. This is why I setup a test
environment.
On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
wrote:

> Also you shouldn't format the new standby. You only format a namenode for
> a brand new cluster. Once a cluster is live you should just use the
> bootstrap on the new namenodes and never format again. Bootstrap is
> basically a special format that just creates the dirs and copies an active
> fsimage to the host.
>
> If format fails (it's buggy imo) just rsync from the active namenode. It
> will catch up by replaying the edits from the QJM when it is started.
>
> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
> wrote:
>
>> You should first replace the namenode, then when that is completely
>> finished move on to replacing any journal nodes. That part is easy:
>>
>> 1) bootstrap new JN (rsync from an existing)
>> 2) Start new JN
>> 3) push hdfs-site.xml to both namenodes
>> 4) restart standby namenode
>> 5) verify logs and admin ui show new JN
>> 6) restart active namenode
>> 7) verify both namenodes (failover should have happened and old standby
>> should be writing to the new JN)
>>
>> You can remove an existing JN at the same time if you want, just be
>> careful to preserve the majority of the quorum during the whole operation
>> (I.e only replace 1 at a time).
>>
>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>> journal nodes not being safe unless you roll edits. So that would go for
>> replacing too.
>>
>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu> wrote:
>>
>>> I will run through the procedure again tomorrow. It was late in the day
>>> before I had a chance to test the procedure.
>>>
>>> If I recall correctly I had an issue formatting the New standby, before
>>> bootstrapping.  I think either at that point, or during the Zookeeper
>>> format command,  I was queried  to format the journal to the 3 hosts in the
>>> quorum.I was unable to proceed without exception unless choosing this
>>> option .
>>>
>>> Are there any concerns adding another journal node to the new standby?
>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>> wrote:
>>>
>>>> This shouldn't have affected the journalnodes at all -- they are mostly
>>>> unaware of the zkfc and active/standby state.  Did you do something else
>>>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>>>> or something else)
>>>>
>>>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>>>
>>>> The WARN is fine.  It's true that you could get in a weird state if you
>>>> had multiple namenodes up.  But with just 1 namenode up, you should be
>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>> a running cluster.  I should have mentioned you need to use the -force
>>>> argument to get around that.
>>>>
>>>>
>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> However continuing with the process my QJM eventually error'd out and
>>>>> my Active NameNode went down.
>>>>>
>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>> at
>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>> QuorumOutputStream starting at txid 9634
>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> I tried a third time and it just worked?
>>>>>>
>>>>>> sudo hdfs zkfc -formatZK
>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>> GMT
>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>> =rhel1.local
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>> Corporation
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>> sessionTimeout=5000 watcher=null
>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>>> attempt to authenticate using SASL (unknown error)
>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>>> session
>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>  ===============================================
>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>> Are you sure you want to clear all failover information from
>>>>>> ZooKeeper?
>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>> failover controllers are stopped!
>>>>>> ===============================================
>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>> Y
>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu> wrote:
>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>> earlier today.
>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>> NameNode in a
>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>> Seattle, feel
>>>>>>> > free to give me a shout out.
>>>>>>> >
>>>>>>> > ---------- Forwarded message ----------
>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM
>>>>>>> / HA
>>>>>>> > configuration
>>>>>>> > To: user@hadoop.apache.org
>>>>>>> >
>>>>>>> >
>>>>>>> > Hi Jing,
>>>>>>> >
>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>> jira.
>>>>>>> >
>>>>>>> > Best,
>>>>>>> >
>>>>>>> > Colin Williams
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Hi Colin,
>>>>>>> >>
>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>> >>
>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>> since in
>>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>>> request to
>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>> original ANN
>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>> for NN.
>>>>>>> >> Look at the code in BPOfferService:
>>>>>>> >>
>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>> >> IOException {
>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>> >>     }
>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>> >>
>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>>>> later
>>>>>>> >> date.
>>>>>>> >>       throw new IOException(
>>>>>>> >>           "HA does not currently support adding a new standby to
>>>>>>> a running
>>>>>>> >> DN. " +
>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>>> list of
>>>>>>> >> NNs.");
>>>>>>> >>     }
>>>>>>> >>   }
>>>>>>> >>
>>>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>>> will do
>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>>> SBN but I
>>>>>>> >> have not tried before.
>>>>>>> >>
>>>>>>> >>     Thus in general we may still have to restart all the services
>>>>>>> (except
>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>> restart
>>>>>>> >> process I guess:
>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>>>> SBN.
>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>>> restart
>>>>>>> >> of all the DN to update their configurations
>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>>>> their
>>>>>>> >> configuration. The new SBN should become active.
>>>>>>> >>
>>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>>> this
>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>> steps in
>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>> >>
>>>>>>> >> Thanks,
>>>>>>> >> -Jing
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> Hello,
>>>>>>> >>>
>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>> configuration. I
>>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>>> >>>
>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>> standby. Or
>>>>>>> >>> rsync if the command fails.
>>>>>>> >>>
>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>> journal to the
>>>>>>> >>> new standby
>>>>>>> >>>
>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>> replacment
>>>>>>> >>> standby.
>>>>>>> >>>
>>>>>>> >>> Start the replacment standby
>>>>>>> >>>
>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>> NameNode
>>>>>>> >>> configuration.
>>>>>>> >>>
>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>> going about
>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Regards,
>>>>>>> >>>
>>>>>>> >>> Colin Williams
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity
>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>> confidential,
>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>> the reader of
>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>> notified that any
>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>> forwarding of
>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>> this
>>>>>>> >> communication in error, please contact the sender immediately and
>>>>>>> delete it
>>>>>>> >> from your system. Thank You.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I realize that this was a foolish error made late in the day. I am no
hadoop expert,  and have much to learn. This is why I setup a test
environment.
On Aug 1, 2014 6:47 AM, "Bryan Beaudreault" <bb...@hubspot.com>
wrote:

> Also you shouldn't format the new standby. You only format a namenode for
> a brand new cluster. Once a cluster is live you should just use the
> bootstrap on the new namenodes and never format again. Bootstrap is
> basically a special format that just creates the dirs and copies an active
> fsimage to the host.
>
> If format fails (it's buggy imo) just rsync from the active namenode. It
> will catch up by replaying the edits from the QJM when it is started.
>
> On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
> wrote:
>
>> You should first replace the namenode, then when that is completely
>> finished move on to replacing any journal nodes. That part is easy:
>>
>> 1) bootstrap new JN (rsync from an existing)
>> 2) Start new JN
>> 3) push hdfs-site.xml to both namenodes
>> 4) restart standby namenode
>> 5) verify logs and admin ui show new JN
>> 6) restart active namenode
>> 7) verify both namenodes (failover should have happened and old standby
>> should be writing to the new JN)
>>
>> You can remove an existing JN at the same time if you want, just be
>> careful to preserve the majority of the quorum during the whole operation
>> (I.e only replace 1 at a time).
>>
>> Also I think it is best to do hdfs dfsadmin -rollEdits after each
>> replaced journalnode. IIRC there is a JIRA open about rolling restarting
>> journal nodes not being safe unless you roll edits. So that would go for
>> replacing too.
>>
>> On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu> wrote:
>>
>>> I will run through the procedure again tomorrow. It was late in the day
>>> before I had a chance to test the procedure.
>>>
>>> If I recall correctly I had an issue formatting the New standby, before
>>> bootstrapping.  I think either at that point, or during the Zookeeper
>>> format command,  I was queried  to format the journal to the 3 hosts in the
>>> quorum.I was unable to proceed without exception unless choosing this
>>> option .
>>>
>>> Are there any concerns adding another journal node to the new standby?
>>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>>> wrote:
>>>
>>>> This shouldn't have affected the journalnodes at all -- they are mostly
>>>> unaware of the zkfc and active/standby state.  Did you do something else
>>>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>>>> or something else)
>>>>
>>>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>>>
>>>> The WARN is fine.  It's true that you could get in a weird state if you
>>>> had multiple namenodes up.  But with just 1 namenode up, you should be
>>>> safe. What you are trying to avoid is a split brain or standby/standby
>>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>>> a running cluster.  I should have mentioned you need to use the -force
>>>> argument to get around that.
>>>>
>>>>
>>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> However continuing with the process my QJM eventually error'd out and
>>>>> my Active NameNode went down.
>>>>>
>>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>>> the next log roll.
>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's
>>>>> epoch 5 is not the current writer epoch  0
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>>  at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>  at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>>> stream=QuorumOutputStream starting at txid 9634))
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>>
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>>> at
>>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>>> at
>>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>>  at
>>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>>> at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>>> QuorumOutputStream starting at txid 9634
>>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <
>>>>> discord@uw.edu> wrote:
>>>>>
>>>>>> I tried a third time and it just worked?
>>>>>>
>>>>>> sudo hdfs zkfc -formatZK
>>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>>> GMT
>>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>>> =rhel1.local
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>>> Corporation
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>>> (Environment.java:logEnv(100)) - Client
>>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>>> sessionTimeout=5000 watcher=null
>>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>>> attempt to authenticate using SASL (unknown error)
>>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>>> session
>>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>>  ===============================================
>>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>>> Are you sure you want to clear all failover information from
>>>>>> ZooKeeper?
>>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>>> failover controllers are stopped!
>>>>>> ===============================================
>>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>>> Y
>>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>>> /hadoop-ha/golden-apple from ZK...
>>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>>> /hadoop-ha/golden-apple from ZK.
>>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>>> /hadoop-ha/golden-apple in ZK.
>>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>>
>>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu> wrote:
>>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>>> earlier today.
>>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>>> NameNode in a
>>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>>> Seattle, feel
>>>>>>> > free to give me a shout out.
>>>>>>> >
>>>>>>> > ---------- Forwarded message ----------
>>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM
>>>>>>> / HA
>>>>>>> > configuration
>>>>>>> > To: user@hadoop.apache.org
>>>>>>> >
>>>>>>> >
>>>>>>> > Hi Jing,
>>>>>>> >
>>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>>> jira.
>>>>>>> >
>>>>>>> > Best,
>>>>>>> >
>>>>>>> > Colin Williams
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Hi Colin,
>>>>>>> >>
>>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>>> >>
>>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>>> since in
>>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>>> request to
>>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>>> original ANN
>>>>>>> >> needs to send this RPC to the correct NN).
>>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>>> for NN.
>>>>>>> >> Look at the code in BPOfferService:
>>>>>>> >>
>>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>>> >> IOException {
>>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>>> >>     }
>>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>>> >>
>>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>>>> later
>>>>>>> >> date.
>>>>>>> >>       throw new IOException(
>>>>>>> >>           "HA does not currently support adding a new standby to
>>>>>>> a running
>>>>>>> >> DN. " +
>>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>>> list of
>>>>>>> >> NNs.");
>>>>>>> >>     }
>>>>>>> >>   }
>>>>>>> >>
>>>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>>> will do
>>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>>> SBN but I
>>>>>>> >> have not tried before.
>>>>>>> >>
>>>>>>> >>     Thus in general we may still have to restart all the services
>>>>>>> (except
>>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>>> restart
>>>>>>> >> process I guess:
>>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>>>> SBN.
>>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>>> restart
>>>>>>> >> of all the DN to update their configurations
>>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>>>> their
>>>>>>> >> configuration. The new SBN should become active.
>>>>>>> >>
>>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>>> this
>>>>>>> >> works or not. And I think we should also document the correct
>>>>>>> steps in
>>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>>> >>
>>>>>>> >> Thanks,
>>>>>>> >> -Jing
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>>> discord@uw.edu>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> Hello,
>>>>>>> >>>
>>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>>> configuration. I
>>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>>> >>>
>>>>>>> >>> Use the Bootstrap standby command to prep the replacment
>>>>>>> standby. Or
>>>>>>> >>> rsync if the command fails.
>>>>>>> >>>
>>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>>> journal to the
>>>>>>> >>> new standby
>>>>>>> >>>
>>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>>> replacment
>>>>>>> >>> standby.
>>>>>>> >>>
>>>>>>> >>> Start the replacment standby
>>>>>>> >>>
>>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>>> NameNode
>>>>>>> >>> configuration.
>>>>>>> >>>
>>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>>> going about
>>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Regards,
>>>>>>> >>>
>>>>>>> >>> Colin Williams
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity
>>>>>>> >> to which it is addressed and may contain information that is
>>>>>>> confidential,
>>>>>>> >> privileged and exempt from disclosure under applicable law. If
>>>>>>> the reader of
>>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>>> notified that any
>>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>>> forwarding of
>>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>>> this
>>>>>>> >> communication in error, please contact the sender immediately and
>>>>>>> delete it
>>>>>>> >> from your system. Thank You.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Also you shouldn't format the new standby. You only format a namenode for a
brand new cluster. Once a cluster is live you should just use the bootstrap
on the new namenodes and never format again. Bootstrap is basically a
special format that just creates the dirs and copies an active fsimage to
the host.

If format fails (it's buggy imo) just rsync from the active namenode. It
will catch up by replaying the edits from the QJM when it is started.

On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
wrote:

> You should first replace the namenode, then when that is completely
> finished move on to replacing any journal nodes. That part is easy:
>
> 1) bootstrap new JN (rsync from an existing)
> 2) Start new JN
> 3) push hdfs-site.xml to both namenodes
> 4) restart standby namenode
> 5) verify logs and admin ui show new JN
> 6) restart active namenode
> 7) verify both namenodes (failover should have happened and old standby
> should be writing to the new JN)
>
> You can remove an existing JN at the same time if you want, just be
> careful to preserve the majority of the quorum during the whole operation
> (I.e only replace 1 at a time).
>
> Also I think it is best to do hdfs dfsadmin -rollEdits after each replaced
> journalnode. IIRC there is a JIRA open about rolling restarting journal
> nodes not being safe unless you roll edits. So that would go for replacing
> too.
>
> On Friday, August 1, 2014, Colin Kincaid Williams <discord@uw.edu
> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>
>> I will run through the procedure again tomorrow. It was late in the day
>> before I had a chance to test the procedure.
>>
>> If I recall correctly I had an issue formatting the New standby, before
>> bootstrapping.  I think either at that point, or during the Zookeeper
>> format command,  I was queried  to format the journal to the 3 hosts in the
>> quorum.I was unable to proceed without exception unless choosing this
>> option .
>>
>> Are there any concerns adding another journal node to the new standby?
>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>> wrote:
>>
>>> This shouldn't have affected the journalnodes at all -- they are mostly
>>> unaware of the zkfc and active/standby state.  Did you do something else
>>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>>> or something else)
>>>
>>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>>
>>> The WARN is fine.  It's true that you could get in a weird state if you
>>> had multiple namenodes up.  But with just 1 namenode up, you should be
>>> safe. What you are trying to avoid is a split brain or standby/standby
>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>> a running cluster.  I should have mentioned you need to use the -force
>>> argument to get around that.
>>>
>>>
>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> However continuing with the process my QJM eventually error'd out and
>>>> my Active NameNode went down.
>>>>
>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>> stream=QuorumOutputStream starting at txid 9634))
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>> at
>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>  at
>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>> QuorumOutputStream starting at txid 9634
>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> I tried a third time and it just worked?
>>>>>
>>>>> sudo hdfs zkfc -formatZK
>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>> GMT
>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>> =rhel1.local
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>> Corporation
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>> sessionTimeout=5000 watcher=null
>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>> attempt to authenticate using SASL (unknown error)
>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>> session
>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>  ===============================================
>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>> Are you sure you want to clear all failover information from
>>>>> ZooKeeper?
>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>> failover controllers are stopped!
>>>>> ===============================================
>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>> Y
>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>> /hadoop-ha/golden-apple from ZK...
>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>> /hadoop-ha/golden-apple from ZK.
>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>> /hadoop-ha/golden-apple in ZK.
>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>> earlier today.
>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>> NameNode in a
>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>> Seattle, feel
>>>>>> > free to give me a shout out.
>>>>>> >
>>>>>> > ---------- Forwarded message ----------
>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM
>>>>>> / HA
>>>>>> > configuration
>>>>>> > To: user@hadoop.apache.org
>>>>>> >
>>>>>> >
>>>>>> > Hi Jing,
>>>>>> >
>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>> jira.
>>>>>> >
>>>>>> > Best,
>>>>>> >
>>>>>> > Colin Williams
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Hi Colin,
>>>>>> >>
>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>> >>
>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>> since in
>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>> request to
>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>> original ANN
>>>>>> >> needs to send this RPC to the correct NN).
>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>> for NN.
>>>>>> >> Look at the code in BPOfferService:
>>>>>> >>
>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>> >> IOException {
>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>> >>     }
>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>> >>
>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>>> later
>>>>>> >> date.
>>>>>> >>       throw new IOException(
>>>>>> >>           "HA does not currently support adding a new standby to a
>>>>>> running
>>>>>> >> DN. " +
>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>> list of
>>>>>> >> NNs.");
>>>>>> >>     }
>>>>>> >>   }
>>>>>> >>
>>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>> will do
>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>> SBN but I
>>>>>> >> have not tried before.
>>>>>> >>
>>>>>> >>     Thus in general we may still have to restart all the services
>>>>>> (except
>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>> restart
>>>>>> >> process I guess:
>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>>> SBN.
>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>> restart
>>>>>> >> of all the DN to update their configurations
>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>>> their
>>>>>> >> configuration. The new SBN should become active.
>>>>>> >>
>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>> this
>>>>>> >> works or not. And I think we should also document the correct
>>>>>> steps in
>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> -Jing
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>> discord@uw.edu>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Hello,
>>>>>> >>>
>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>> configuration. I
>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>> >>>
>>>>>> >>> Use the Bootstrap standby command to prep the replacment standby.
>>>>>> Or
>>>>>> >>> rsync if the command fails.
>>>>>> >>>
>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>> journal to the
>>>>>> >>> new standby
>>>>>> >>>
>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>> replacment
>>>>>> >>> standby.
>>>>>> >>>
>>>>>> >>> Start the replacment standby
>>>>>> >>>
>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>> NameNode
>>>>>> >>> configuration.
>>>>>> >>>
>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>> going about
>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Regards,
>>>>>> >>>
>>>>>> >>> Colin Williams
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>>> entity
>>>>>> >> to which it is addressed and may contain information that is
>>>>>> confidential,
>>>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>>>> reader of
>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>> notified that any
>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>> forwarding of
>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>> this
>>>>>> >> communication in error, please contact the sender immediately and
>>>>>> delete it
>>>>>> >> from your system. Thank You.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Also you shouldn't format the new standby. You only format a namenode for a
brand new cluster. Once a cluster is live you should just use the bootstrap
on the new namenodes and never format again. Bootstrap is basically a
special format that just creates the dirs and copies an active fsimage to
the host.

If format fails (it's buggy imo) just rsync from the active namenode. It
will catch up by replaying the edits from the QJM when it is started.

On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
wrote:

> You should first replace the namenode, then when that is completely
> finished move on to replacing any journal nodes. That part is easy:
>
> 1) bootstrap new JN (rsync from an existing)
> 2) Start new JN
> 3) push hdfs-site.xml to both namenodes
> 4) restart standby namenode
> 5) verify logs and admin ui show new JN
> 6) restart active namenode
> 7) verify both namenodes (failover should have happened and old standby
> should be writing to the new JN)
>
> You can remove an existing JN at the same time if you want, just be
> careful to preserve the majority of the quorum during the whole operation
> (I.e only replace 1 at a time).
>
> Also I think it is best to do hdfs dfsadmin -rollEdits after each replaced
> journalnode. IIRC there is a JIRA open about rolling restarting journal
> nodes not being safe unless you roll edits. So that would go for replacing
> too.
>
> On Friday, August 1, 2014, Colin Kincaid Williams <discord@uw.edu
> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>
>> I will run through the procedure again tomorrow. It was late in the day
>> before I had a chance to test the procedure.
>>
>> If I recall correctly I had an issue formatting the New standby, before
>> bootstrapping.  I think either at that point, or during the Zookeeper
>> format command,  I was queried  to format the journal to the 3 hosts in the
>> quorum.I was unable to proceed without exception unless choosing this
>> option .
>>
>> Are there any concerns adding another journal node to the new standby?
>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>> wrote:
>>
>>> This shouldn't have affected the journalnodes at all -- they are mostly
>>> unaware of the zkfc and active/standby state.  Did you do something else
>>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>>> or something else)
>>>
>>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>>
>>> The WARN is fine.  It's true that you could get in a weird state if you
>>> had multiple namenodes up.  But with just 1 namenode up, you should be
>>> safe. What you are trying to avoid is a split brain or standby/standby
>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>> a running cluster.  I should have mentioned you need to use the -force
>>> argument to get around that.
>>>
>>>
>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> However continuing with the process my QJM eventually error'd out and
>>>> my Active NameNode went down.
>>>>
>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>> stream=QuorumOutputStream starting at txid 9634))
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>> at
>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>  at
>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>> QuorumOutputStream starting at txid 9634
>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> I tried a third time and it just worked?
>>>>>
>>>>> sudo hdfs zkfc -formatZK
>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>> GMT
>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>> =rhel1.local
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>> Corporation
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>> sessionTimeout=5000 watcher=null
>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>> attempt to authenticate using SASL (unknown error)
>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>> session
>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>  ===============================================
>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>> Are you sure you want to clear all failover information from
>>>>> ZooKeeper?
>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>> failover controllers are stopped!
>>>>> ===============================================
>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>> Y
>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>> /hadoop-ha/golden-apple from ZK...
>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>> /hadoop-ha/golden-apple from ZK.
>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>> /hadoop-ha/golden-apple in ZK.
>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>> earlier today.
>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>> NameNode in a
>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>> Seattle, feel
>>>>>> > free to give me a shout out.
>>>>>> >
>>>>>> > ---------- Forwarded message ----------
>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM
>>>>>> / HA
>>>>>> > configuration
>>>>>> > To: user@hadoop.apache.org
>>>>>> >
>>>>>> >
>>>>>> > Hi Jing,
>>>>>> >
>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>> jira.
>>>>>> >
>>>>>> > Best,
>>>>>> >
>>>>>> > Colin Williams
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Hi Colin,
>>>>>> >>
>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>> >>
>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>> since in
>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>> request to
>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>> original ANN
>>>>>> >> needs to send this RPC to the correct NN).
>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>> for NN.
>>>>>> >> Look at the code in BPOfferService:
>>>>>> >>
>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>> >> IOException {
>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>> >>     }
>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>> >>
>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>>> later
>>>>>> >> date.
>>>>>> >>       throw new IOException(
>>>>>> >>           "HA does not currently support adding a new standby to a
>>>>>> running
>>>>>> >> DN. " +
>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>> list of
>>>>>> >> NNs.");
>>>>>> >>     }
>>>>>> >>   }
>>>>>> >>
>>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>> will do
>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>> SBN but I
>>>>>> >> have not tried before.
>>>>>> >>
>>>>>> >>     Thus in general we may still have to restart all the services
>>>>>> (except
>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>> restart
>>>>>> >> process I guess:
>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>>> SBN.
>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>> restart
>>>>>> >> of all the DN to update their configurations
>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>>> their
>>>>>> >> configuration. The new SBN should become active.
>>>>>> >>
>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>> this
>>>>>> >> works or not. And I think we should also document the correct
>>>>>> steps in
>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> -Jing
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>> discord@uw.edu>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Hello,
>>>>>> >>>
>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>> configuration. I
>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>> >>>
>>>>>> >>> Use the Bootstrap standby command to prep the replacment standby.
>>>>>> Or
>>>>>> >>> rsync if the command fails.
>>>>>> >>>
>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>> journal to the
>>>>>> >>> new standby
>>>>>> >>>
>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>> replacment
>>>>>> >>> standby.
>>>>>> >>>
>>>>>> >>> Start the replacment standby
>>>>>> >>>
>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>> NameNode
>>>>>> >>> configuration.
>>>>>> >>>
>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>> going about
>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Regards,
>>>>>> >>>
>>>>>> >>> Colin Williams
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>>> entity
>>>>>> >> to which it is addressed and may contain information that is
>>>>>> confidential,
>>>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>>>> reader of
>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>> notified that any
>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>> forwarding of
>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>> this
>>>>>> >> communication in error, please contact the sender immediately and
>>>>>> delete it
>>>>>> >> from your system. Thank You.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Also you shouldn't format the new standby. You only format a namenode for a
brand new cluster. Once a cluster is live you should just use the bootstrap
on the new namenodes and never format again. Bootstrap is basically a
special format that just creates the dirs and copies an active fsimage to
the host.

If format fails (it's buggy imo) just rsync from the active namenode. It
will catch up by replaying the edits from the QJM when it is started.

On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
wrote:

> You should first replace the namenode, then when that is completely
> finished move on to replacing any journal nodes. That part is easy:
>
> 1) bootstrap new JN (rsync from an existing)
> 2) Start new JN
> 3) push hdfs-site.xml to both namenodes
> 4) restart standby namenode
> 5) verify logs and admin ui show new JN
> 6) restart active namenode
> 7) verify both namenodes (failover should have happened and old standby
> should be writing to the new JN)
>
> You can remove an existing JN at the same time if you want, just be
> careful to preserve the majority of the quorum during the whole operation
> (I.e only replace 1 at a time).
>
> Also I think it is best to do hdfs dfsadmin -rollEdits after each replaced
> journalnode. IIRC there is a JIRA open about rolling restarting journal
> nodes not being safe unless you roll edits. So that would go for replacing
> too.
>
> On Friday, August 1, 2014, Colin Kincaid Williams <discord@uw.edu
> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>
>> I will run through the procedure again tomorrow. It was late in the day
>> before I had a chance to test the procedure.
>>
>> If I recall correctly I had an issue formatting the New standby, before
>> bootstrapping.  I think either at that point, or during the Zookeeper
>> format command,  I was queried  to format the journal to the 3 hosts in the
>> quorum.I was unable to proceed without exception unless choosing this
>> option .
>>
>> Are there any concerns adding another journal node to the new standby?
>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>> wrote:
>>
>>> This shouldn't have affected the journalnodes at all -- they are mostly
>>> unaware of the zkfc and active/standby state.  Did you do something else
>>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>>> or something else)
>>>
>>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>>
>>> The WARN is fine.  It's true that you could get in a weird state if you
>>> had multiple namenodes up.  But with just 1 namenode up, you should be
>>> safe. What you are trying to avoid is a split brain or standby/standby
>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>> a running cluster.  I should have mentioned you need to use the -force
>>> argument to get around that.
>>>
>>>
>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> However continuing with the process my QJM eventually error'd out and
>>>> my Active NameNode went down.
>>>>
>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>> stream=QuorumOutputStream starting at txid 9634))
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>> at
>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>  at
>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>> QuorumOutputStream starting at txid 9634
>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> I tried a third time and it just worked?
>>>>>
>>>>> sudo hdfs zkfc -formatZK
>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>> GMT
>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>> =rhel1.local
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>> Corporation
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>> sessionTimeout=5000 watcher=null
>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>> attempt to authenticate using SASL (unknown error)
>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>> session
>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>  ===============================================
>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>> Are you sure you want to clear all failover information from
>>>>> ZooKeeper?
>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>> failover controllers are stopped!
>>>>> ===============================================
>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>> Y
>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>> /hadoop-ha/golden-apple from ZK...
>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>> /hadoop-ha/golden-apple from ZK.
>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>> /hadoop-ha/golden-apple in ZK.
>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>> earlier today.
>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>> NameNode in a
>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>> Seattle, feel
>>>>>> > free to give me a shout out.
>>>>>> >
>>>>>> > ---------- Forwarded message ----------
>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM
>>>>>> / HA
>>>>>> > configuration
>>>>>> > To: user@hadoop.apache.org
>>>>>> >
>>>>>> >
>>>>>> > Hi Jing,
>>>>>> >
>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>> jira.
>>>>>> >
>>>>>> > Best,
>>>>>> >
>>>>>> > Colin Williams
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Hi Colin,
>>>>>> >>
>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>> >>
>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>> since in
>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>> request to
>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>> original ANN
>>>>>> >> needs to send this RPC to the correct NN).
>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>> for NN.
>>>>>> >> Look at the code in BPOfferService:
>>>>>> >>
>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>> >> IOException {
>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>> >>     }
>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>> >>
>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>>> later
>>>>>> >> date.
>>>>>> >>       throw new IOException(
>>>>>> >>           "HA does not currently support adding a new standby to a
>>>>>> running
>>>>>> >> DN. " +
>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>> list of
>>>>>> >> NNs.");
>>>>>> >>     }
>>>>>> >>   }
>>>>>> >>
>>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>> will do
>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>> SBN but I
>>>>>> >> have not tried before.
>>>>>> >>
>>>>>> >>     Thus in general we may still have to restart all the services
>>>>>> (except
>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>> restart
>>>>>> >> process I guess:
>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>>> SBN.
>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>> restart
>>>>>> >> of all the DN to update their configurations
>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>>> their
>>>>>> >> configuration. The new SBN should become active.
>>>>>> >>
>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>> this
>>>>>> >> works or not. And I think we should also document the correct
>>>>>> steps in
>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> -Jing
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>> discord@uw.edu>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Hello,
>>>>>> >>>
>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>> configuration. I
>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>> >>>
>>>>>> >>> Use the Bootstrap standby command to prep the replacment standby.
>>>>>> Or
>>>>>> >>> rsync if the command fails.
>>>>>> >>>
>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>> journal to the
>>>>>> >>> new standby
>>>>>> >>>
>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>> replacment
>>>>>> >>> standby.
>>>>>> >>>
>>>>>> >>> Start the replacment standby
>>>>>> >>>
>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>> NameNode
>>>>>> >>> configuration.
>>>>>> >>>
>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>> going about
>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Regards,
>>>>>> >>>
>>>>>> >>> Colin Williams
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>>> entity
>>>>>> >> to which it is addressed and may contain information that is
>>>>>> confidential,
>>>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>>>> reader of
>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>> notified that any
>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>> forwarding of
>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>> this
>>>>>> >> communication in error, please contact the sender immediately and
>>>>>> delete it
>>>>>> >> from your system. Thank You.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Also you shouldn't format the new standby. You only format a namenode for a
brand new cluster. Once a cluster is live you should just use the bootstrap
on the new namenodes and never format again. Bootstrap is basically a
special format that just creates the dirs and copies an active fsimage to
the host.

If format fails (it's buggy imo) just rsync from the active namenode. It
will catch up by replaying the edits from the QJM when it is started.

On Friday, August 1, 2014, Bryan Beaudreault <bb...@hubspot.com>
wrote:

> You should first replace the namenode, then when that is completely
> finished move on to replacing any journal nodes. That part is easy:
>
> 1) bootstrap new JN (rsync from an existing)
> 2) Start new JN
> 3) push hdfs-site.xml to both namenodes
> 4) restart standby namenode
> 5) verify logs and admin ui show new JN
> 6) restart active namenode
> 7) verify both namenodes (failover should have happened and old standby
> should be writing to the new JN)
>
> You can remove an existing JN at the same time if you want, just be
> careful to preserve the majority of the quorum during the whole operation
> (I.e only replace 1 at a time).
>
> Also I think it is best to do hdfs dfsadmin -rollEdits after each replaced
> journalnode. IIRC there is a JIRA open about rolling restarting journal
> nodes not being safe unless you roll edits. So that would go for replacing
> too.
>
> On Friday, August 1, 2014, Colin Kincaid Williams <discord@uw.edu
> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>
>> I will run through the procedure again tomorrow. It was late in the day
>> before I had a chance to test the procedure.
>>
>> If I recall correctly I had an issue formatting the New standby, before
>> bootstrapping.  I think either at that point, or during the Zookeeper
>> format command,  I was queried  to format the journal to the 3 hosts in the
>> quorum.I was unable to proceed without exception unless choosing this
>> option .
>>
>> Are there any concerns adding another journal node to the new standby?
>> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
>> wrote:
>>
>>> This shouldn't have affected the journalnodes at all -- they are mostly
>>> unaware of the zkfc and active/standby state.  Did you do something else
>>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>>> or something else)
>>>
>>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>>
>>> The WARN is fine.  It's true that you could get in a weird state if you
>>> had multiple namenodes up.  But with just 1 namenode up, you should be
>>> safe. What you are trying to avoid is a split brain or standby/standby
>>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>>> a running cluster.  I should have mentioned you need to use the -force
>>> argument to get around that.
>>>
>>>
>>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>>
>>>> However continuing with the process my QJM eventually error'd out and
>>>> my Active NameNode went down.
>>>>
>>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>>> 10.120.5.247:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>>> 10.120.5.203:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>>> 10.120.5.25:8485] client.QuorumJournalManager
>>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>>> the next log roll.
>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>>> 5 is not the current writer epoch  0
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>>  at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>  at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>>> stream=QuorumOutputStream starting at txid 9634))
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>>
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>>> at
>>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>>  at
>>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>>> at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>>  at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>>> at
>>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>>  at
>>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>>> at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>>> QuorumOutputStream starting at txid 9634
>>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <discord@uw.edu
>>>> > wrote:
>>>>
>>>>> I tried a third time and it just worked?
>>>>>
>>>>> sudo hdfs zkfc -formatZK
>>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>>> GMT
>>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>>> =rhel1.local
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>>> Corporation
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>>> (Environment.java:logEnv(100)) - Client
>>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>>> sessionTimeout=5000 watcher=null
>>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>>> attempt to authenticate using SASL (unknown error)
>>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>>> session
>>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>>  ===============================================
>>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>>> Are you sure you want to clear all failover information from
>>>>> ZooKeeper?
>>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>>> failover controllers are stopped!
>>>>> ===============================================
>>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>>> Y
>>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>>> /hadoop-ha/golden-apple from ZK...
>>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>>> /hadoop-ha/golden-apple from ZK.
>>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>>> /hadoop-ha/golden-apple in ZK.
>>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>>
>>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>>> discord@uw.edu> wrote:
>>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>>> earlier today.
>>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>>> NameNode in a
>>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>>> Seattle, feel
>>>>>> > free to give me a shout out.
>>>>>> >
>>>>>> > ---------- Forwarded message ----------
>>>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM
>>>>>> / HA
>>>>>> > configuration
>>>>>> > To: user@hadoop.apache.org
>>>>>> >
>>>>>> >
>>>>>> > Hi Jing,
>>>>>> >
>>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>>> jira.
>>>>>> >
>>>>>> > Best,
>>>>>> >
>>>>>> > Colin Williams
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Hi Colin,
>>>>>> >>
>>>>>> >>     I guess currently we may have to restart almost all the
>>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>>> >>
>>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>>> since in
>>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>>> request to
>>>>>> >> ANN periodically (thus if a NN failover happens later, the
>>>>>> original ANN
>>>>>> >> needs to send this RPC to the correct NN).
>>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment
>>>>>> for NN.
>>>>>> >> Look at the code in BPOfferService:
>>>>>> >>
>>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>>> >> IOException {
>>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>>> >>     }
>>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>>> >>
>>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>>> later
>>>>>> >> date.
>>>>>> >>       throw new IOException(
>>>>>> >>           "HA does not currently support adding a new standby to a
>>>>>> running
>>>>>> >> DN. " +
>>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>>> list of
>>>>>> >> NNs.");
>>>>>> >>     }
>>>>>> >>   }
>>>>>> >>
>>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>>> will do
>>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>>> SBN but I
>>>>>> >> have not tried before.
>>>>>> >>
>>>>>> >>     Thus in general we may still have to restart all the services
>>>>>> (except
>>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>>> restart
>>>>>> >> process I guess:
>>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>>> SBN.
>>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>>> restart
>>>>>> >> of all the DN to update their configurations
>>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>>> their
>>>>>> >> configuration. The new SBN should become active.
>>>>>> >>
>>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>>> this
>>>>>> >> works or not. And I think we should also document the correct
>>>>>> steps in
>>>>>> >> Apache. Could you please file an Apache jira?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> -Jing
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>>> discord@uw.edu>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Hello,
>>>>>> >>>
>>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>>> configuration. I
>>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>>> >>>
>>>>>> >>> Use the Bootstrap standby command to prep the replacment standby.
>>>>>> Or
>>>>>> >>> rsync if the command fails.
>>>>>> >>>
>>>>>> >>> Somehow update the datanodes, so they push the heartbeat /
>>>>>> journal to the
>>>>>> >>> new standby
>>>>>> >>>
>>>>>> >>> Update the xml configuration on all nodes to reflect the
>>>>>> replacment
>>>>>> >>> standby.
>>>>>> >>>
>>>>>> >>> Start the replacment standby
>>>>>> >>>
>>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>>> NameNode
>>>>>> >>> configuration.
>>>>>> >>>
>>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>>> going about
>>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Regards,
>>>>>> >>>
>>>>>> >>> Colin Williams
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >> CONFIDENTIALITY NOTICE
>>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>>> entity
>>>>>> >> to which it is addressed and may contain information that is
>>>>>> confidential,
>>>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>>>> reader of
>>>>>> >> this message is not the intended recipient, you are hereby
>>>>>> notified that any
>>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>>> forwarding of
>>>>>> >> this communication is strictly prohibited. If you have received
>>>>>> this
>>>>>> >> communication in error, please contact the sender immediately and
>>>>>> delete it
>>>>>> >> from your system. Thank You.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
You should first replace the namenode, then when that is completely
finished move on to replacing any journal nodes. That part is easy:

1) bootstrap new JN (rsync from an existing)
2) Start new JN
3) push hdfs-site.xml to both namenodes
4) restart standby namenode
5) verify logs and admin ui show new JN
6) restart active namenode
7) verify both namenodes (failover should have happened and old standby
should be writing to the new JN)

You can remove an existing JN at the same time if you want, just be careful
to preserve the majority of the quorum during the whole operation (I.e only
replace 1 at a time).

Also I think it is best to do hdfs dfsadmin -rollEdits after each replaced
journalnode. IIRC there is a JIRA open about rolling restarting journal
nodes not being safe unless you roll edits. So that would go for replacing
too.

On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu> wrote:

> I will run through the procedure again tomorrow. It was late in the day
> before I had a chance to test the procedure.
>
> If I recall correctly I had an issue formatting the New standby, before
> bootstrapping.  I think either at that point, or during the Zookeeper
> format command,  I was queried  to format the journal to the 3 hosts in the
> quorum.I was unable to proceed without exception unless choosing this
> option .
>
> Are there any concerns adding another journal node to the new standby?
> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bbeaudreault@hubspot.com
> <javascript:_e(%7B%7D,'cvml','bbeaudreault@hubspot.com');>> wrote:
>
>> This shouldn't have affected the journalnodes at all -- they are mostly
>> unaware of the zkfc and active/standby state.  Did you do something else
>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>> or something else)
>>
>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>
>> The WARN is fine.  It's true that you could get in a weird state if you
>> had multiple namenodes up.  But with just 1 namenode up, you should be
>> safe. What you are trying to avoid is a split brain or standby/standby
>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>> a running cluster.  I should have mentioned you need to use the -force
>> argument to get around that.
>>
>>
>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <discord@uw.edu
>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>>
>>> However continuing with the process my QJM eventually error'd out and my
>>> Active NameNode went down.
>>>
>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>> 10.120.5.247:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>> 10.120.5.203:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>> 10.120.5.25:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>> stream=QuorumOutputStream starting at txid 9634))
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>  at
>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>> QuorumOutputStream starting at txid 9634
>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <discord@uw.edu
>>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>>>
>>>> I tried a third time and it just worked?
>>>>
>>>> sudo hdfs zkfc -formatZK
>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>> GMT
>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>> =rhel1.local
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>> Corporation
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>> sessionTimeout=5000 watcher=null
>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>> attempt to authenticate using SASL (unknown error)
>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>> session
>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>  ===============================================
>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>> Are you sure you want to clear all failover information from
>>>> ZooKeeper?
>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>> failover controllers are stopped!
>>>> ===============================================
>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>> Y
>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>> /hadoop-ha/golden-apple from ZK...
>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>> /hadoop-ha/golden-apple from ZK.
>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>> /hadoop-ha/golden-apple in ZK.
>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <posix4e@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','posix4e@gmail.com');>> wrote:
>>>>
>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>
>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>> discord@uw.edu <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> wrote:
>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>> earlier today.
>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>> NameNode in a
>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>> Seattle, feel
>>>>> > free to give me a shout out.
>>>>> >
>>>>> > ---------- Forwarded message ----------
>>>>> > From: Colin Kincaid Williams <discord@uw.edu
>>>>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM /
>>>>> HA
>>>>> > configuration
>>>>> > To: user@hadoop.apache.org
>>>>> <javascript:_e(%7B%7D,'cvml','user@hadoop.apache.org');>
>>>>> >
>>>>> >
>>>>> > Hi Jing,
>>>>> >
>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>> jira.
>>>>> >
>>>>> > Best,
>>>>> >
>>>>> > Colin Williams
>>>>> >
>>>>> >
>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <jing@hortonworks.com
>>>>> <javascript:_e(%7B%7D,'cvml','jing@hortonworks.com');>> wrote:
>>>>> >>
>>>>> >> Hi Colin,
>>>>> >>
>>>>> >>     I guess currently we may have to restart almost all the
>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>> >>
>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>> since in
>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>> request to
>>>>> >> ANN periodically (thus if a NN failover happens later, the original
>>>>> ANN
>>>>> >> needs to send this RPC to the correct NN).
>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>>>> NN.
>>>>> >> Look at the code in BPOfferService:
>>>>> >>
>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>> >> IOException {
>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>> >>     }
>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>> >>
>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>> later
>>>>> >> date.
>>>>> >>       throw new IOException(
>>>>> >>           "HA does not currently support adding a new standby to a
>>>>> running
>>>>> >> DN. " +
>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>> list of
>>>>> >> NNs.");
>>>>> >>     }
>>>>> >>   }
>>>>> >>
>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>> will do
>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>> SBN but I
>>>>> >> have not tried before.
>>>>> >>
>>>>> >>     Thus in general we may still have to restart all the services
>>>>> (except
>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>> restart
>>>>> >> process I guess:
>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>> SBN.
>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>> restart
>>>>> >> of all the DN to update their configurations
>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>> their
>>>>> >> configuration. The new SBN should become active.
>>>>> >>
>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>> this
>>>>> >> works or not. And I think we should also document the correct steps
>>>>> in
>>>>> >> Apache. Could you please file an Apache jira?
>>>>> >>
>>>>> >> Thanks,
>>>>> >> -Jing
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hello,
>>>>> >>>
>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>> configuration. I
>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>> >>>
>>>>> >>> Use the Bootstrap standby command to prep the replacment standby.
>>>>> Or
>>>>> >>> rsync if the command fails.
>>>>> >>>
>>>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>>>> to the
>>>>> >>> new standby
>>>>> >>>
>>>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>>>> >>> standby.
>>>>> >>>
>>>>> >>> Start the replacment standby
>>>>> >>>
>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>> NameNode
>>>>> >>> configuration.
>>>>> >>>
>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>> going about
>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>> >>>
>>>>> >>>
>>>>> >>> Regards,
>>>>> >>>
>>>>> >>> Colin Williams
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> CONFIDENTIALITY NOTICE
>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>> entity
>>>>> >> to which it is addressed and may contain information that is
>>>>> confidential,
>>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>>> reader of
>>>>> >> this message is not the intended recipient, you are hereby notified
>>>>> that any
>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>> forwarding of
>>>>> >> this communication is strictly prohibited. If you have received this
>>>>> >> communication in error, please contact the sender immediately and
>>>>> delete it
>>>>> >> from your system. Thank You.
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
You should first replace the namenode, then when that is completely
finished move on to replacing any journal nodes. That part is easy:

1) bootstrap new JN (rsync from an existing)
2) Start new JN
3) push hdfs-site.xml to both namenodes
4) restart standby namenode
5) verify logs and admin ui show new JN
6) restart active namenode
7) verify both namenodes (failover should have happened and old standby
should be writing to the new JN)

You can remove an existing JN at the same time if you want, just be careful
to preserve the majority of the quorum during the whole operation (I.e only
replace 1 at a time).

Also I think it is best to do hdfs dfsadmin -rollEdits after each replaced
journalnode. IIRC there is a JIRA open about rolling restarting journal
nodes not being safe unless you roll edits. So that would go for replacing
too.

On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu> wrote:

> I will run through the procedure again tomorrow. It was late in the day
> before I had a chance to test the procedure.
>
> If I recall correctly I had an issue formatting the New standby, before
> bootstrapping.  I think either at that point, or during the Zookeeper
> format command,  I was queried  to format the journal to the 3 hosts in the
> quorum.I was unable to proceed without exception unless choosing this
> option .
>
> Are there any concerns adding another journal node to the new standby?
> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bbeaudreault@hubspot.com
> <javascript:_e(%7B%7D,'cvml','bbeaudreault@hubspot.com');>> wrote:
>
>> This shouldn't have affected the journalnodes at all -- they are mostly
>> unaware of the zkfc and active/standby state.  Did you do something else
>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>> or something else)
>>
>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>
>> The WARN is fine.  It's true that you could get in a weird state if you
>> had multiple namenodes up.  But with just 1 namenode up, you should be
>> safe. What you are trying to avoid is a split brain or standby/standby
>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>> a running cluster.  I should have mentioned you need to use the -force
>> argument to get around that.
>>
>>
>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <discord@uw.edu
>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>>
>>> However continuing with the process my QJM eventually error'd out and my
>>> Active NameNode went down.
>>>
>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>> 10.120.5.247:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>> 10.120.5.203:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>> 10.120.5.25:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>> stream=QuorumOutputStream starting at txid 9634))
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>  at
>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>> QuorumOutputStream starting at txid 9634
>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <discord@uw.edu
>>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>>>
>>>> I tried a third time and it just worked?
>>>>
>>>> sudo hdfs zkfc -formatZK
>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>> GMT
>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>> =rhel1.local
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>> Corporation
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>> sessionTimeout=5000 watcher=null
>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>> attempt to authenticate using SASL (unknown error)
>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>> session
>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>  ===============================================
>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>> Are you sure you want to clear all failover information from
>>>> ZooKeeper?
>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>> failover controllers are stopped!
>>>> ===============================================
>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>> Y
>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>> /hadoop-ha/golden-apple from ZK...
>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>> /hadoop-ha/golden-apple from ZK.
>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>> /hadoop-ha/golden-apple in ZK.
>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <posix4e@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','posix4e@gmail.com');>> wrote:
>>>>
>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>
>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>> discord@uw.edu <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> wrote:
>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>> earlier today.
>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>> NameNode in a
>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>> Seattle, feel
>>>>> > free to give me a shout out.
>>>>> >
>>>>> > ---------- Forwarded message ----------
>>>>> > From: Colin Kincaid Williams <discord@uw.edu
>>>>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM /
>>>>> HA
>>>>> > configuration
>>>>> > To: user@hadoop.apache.org
>>>>> <javascript:_e(%7B%7D,'cvml','user@hadoop.apache.org');>
>>>>> >
>>>>> >
>>>>> > Hi Jing,
>>>>> >
>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>> jira.
>>>>> >
>>>>> > Best,
>>>>> >
>>>>> > Colin Williams
>>>>> >
>>>>> >
>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <jing@hortonworks.com
>>>>> <javascript:_e(%7B%7D,'cvml','jing@hortonworks.com');>> wrote:
>>>>> >>
>>>>> >> Hi Colin,
>>>>> >>
>>>>> >>     I guess currently we may have to restart almost all the
>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>> >>
>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>> since in
>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>> request to
>>>>> >> ANN periodically (thus if a NN failover happens later, the original
>>>>> ANN
>>>>> >> needs to send this RPC to the correct NN).
>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>>>> NN.
>>>>> >> Look at the code in BPOfferService:
>>>>> >>
>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>> >> IOException {
>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>> >>     }
>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>> >>
>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>> later
>>>>> >> date.
>>>>> >>       throw new IOException(
>>>>> >>           "HA does not currently support adding a new standby to a
>>>>> running
>>>>> >> DN. " +
>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>> list of
>>>>> >> NNs.");
>>>>> >>     }
>>>>> >>   }
>>>>> >>
>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>> will do
>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>> SBN but I
>>>>> >> have not tried before.
>>>>> >>
>>>>> >>     Thus in general we may still have to restart all the services
>>>>> (except
>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>> restart
>>>>> >> process I guess:
>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>> SBN.
>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>> restart
>>>>> >> of all the DN to update their configurations
>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>> their
>>>>> >> configuration. The new SBN should become active.
>>>>> >>
>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>> this
>>>>> >> works or not. And I think we should also document the correct steps
>>>>> in
>>>>> >> Apache. Could you please file an Apache jira?
>>>>> >>
>>>>> >> Thanks,
>>>>> >> -Jing
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hello,
>>>>> >>>
>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>> configuration. I
>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>> >>>
>>>>> >>> Use the Bootstrap standby command to prep the replacment standby.
>>>>> Or
>>>>> >>> rsync if the command fails.
>>>>> >>>
>>>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>>>> to the
>>>>> >>> new standby
>>>>> >>>
>>>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>>>> >>> standby.
>>>>> >>>
>>>>> >>> Start the replacment standby
>>>>> >>>
>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>> NameNode
>>>>> >>> configuration.
>>>>> >>>
>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>> going about
>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>> >>>
>>>>> >>>
>>>>> >>> Regards,
>>>>> >>>
>>>>> >>> Colin Williams
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> CONFIDENTIALITY NOTICE
>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>> entity
>>>>> >> to which it is addressed and may contain information that is
>>>>> confidential,
>>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>>> reader of
>>>>> >> this message is not the intended recipient, you are hereby notified
>>>>> that any
>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>> forwarding of
>>>>> >> this communication is strictly prohibited. If you have received this
>>>>> >> communication in error, please contact the sender immediately and
>>>>> delete it
>>>>> >> from your system. Thank You.
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
You should first replace the namenode, then when that is completely
finished move on to replacing any journal nodes. That part is easy:

1) bootstrap new JN (rsync from an existing)
2) Start new JN
3) push hdfs-site.xml to both namenodes
4) restart standby namenode
5) verify logs and admin ui show new JN
6) restart active namenode
7) verify both namenodes (failover should have happened and old standby
should be writing to the new JN)

You can remove an existing JN at the same time if you want, just be careful
to preserve the majority of the quorum during the whole operation (I.e only
replace 1 at a time).

Also I think it is best to do hdfs dfsadmin -rollEdits after each replaced
journalnode. IIRC there is a JIRA open about rolling restarting journal
nodes not being safe unless you roll edits. So that would go for replacing
too.

On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu> wrote:

> I will run through the procedure again tomorrow. It was late in the day
> before I had a chance to test the procedure.
>
> If I recall correctly I had an issue formatting the New standby, before
> bootstrapping.  I think either at that point, or during the Zookeeper
> format command,  I was queried  to format the journal to the 3 hosts in the
> quorum.I was unable to proceed without exception unless choosing this
> option .
>
> Are there any concerns adding another journal node to the new standby?
> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bbeaudreault@hubspot.com
> <javascript:_e(%7B%7D,'cvml','bbeaudreault@hubspot.com');>> wrote:
>
>> This shouldn't have affected the journalnodes at all -- they are mostly
>> unaware of the zkfc and active/standby state.  Did you do something else
>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>> or something else)
>>
>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>
>> The WARN is fine.  It's true that you could get in a weird state if you
>> had multiple namenodes up.  But with just 1 namenode up, you should be
>> safe. What you are trying to avoid is a split brain or standby/standby
>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>> a running cluster.  I should have mentioned you need to use the -force
>> argument to get around that.
>>
>>
>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <discord@uw.edu
>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>>
>>> However continuing with the process my QJM eventually error'd out and my
>>> Active NameNode went down.
>>>
>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>> 10.120.5.247:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>> 10.120.5.203:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>> 10.120.5.25:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>> stream=QuorumOutputStream starting at txid 9634))
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>  at
>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>> QuorumOutputStream starting at txid 9634
>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <discord@uw.edu
>>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>>>
>>>> I tried a third time and it just worked?
>>>>
>>>> sudo hdfs zkfc -formatZK
>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>> GMT
>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>> =rhel1.local
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>> Corporation
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>> sessionTimeout=5000 watcher=null
>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>> attempt to authenticate using SASL (unknown error)
>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>> session
>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>  ===============================================
>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>> Are you sure you want to clear all failover information from
>>>> ZooKeeper?
>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>> failover controllers are stopped!
>>>> ===============================================
>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>> Y
>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>> /hadoop-ha/golden-apple from ZK...
>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>> /hadoop-ha/golden-apple from ZK.
>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>> /hadoop-ha/golden-apple in ZK.
>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <posix4e@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','posix4e@gmail.com');>> wrote:
>>>>
>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>
>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>> discord@uw.edu <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> wrote:
>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>> earlier today.
>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>> NameNode in a
>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>> Seattle, feel
>>>>> > free to give me a shout out.
>>>>> >
>>>>> > ---------- Forwarded message ----------
>>>>> > From: Colin Kincaid Williams <discord@uw.edu
>>>>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM /
>>>>> HA
>>>>> > configuration
>>>>> > To: user@hadoop.apache.org
>>>>> <javascript:_e(%7B%7D,'cvml','user@hadoop.apache.org');>
>>>>> >
>>>>> >
>>>>> > Hi Jing,
>>>>> >
>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>> jira.
>>>>> >
>>>>> > Best,
>>>>> >
>>>>> > Colin Williams
>>>>> >
>>>>> >
>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <jing@hortonworks.com
>>>>> <javascript:_e(%7B%7D,'cvml','jing@hortonworks.com');>> wrote:
>>>>> >>
>>>>> >> Hi Colin,
>>>>> >>
>>>>> >>     I guess currently we may have to restart almost all the
>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>> >>
>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>> since in
>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>> request to
>>>>> >> ANN periodically (thus if a NN failover happens later, the original
>>>>> ANN
>>>>> >> needs to send this RPC to the correct NN).
>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>>>> NN.
>>>>> >> Look at the code in BPOfferService:
>>>>> >>
>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>> >> IOException {
>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>> >>     }
>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>> >>
>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>> later
>>>>> >> date.
>>>>> >>       throw new IOException(
>>>>> >>           "HA does not currently support adding a new standby to a
>>>>> running
>>>>> >> DN. " +
>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>> list of
>>>>> >> NNs.");
>>>>> >>     }
>>>>> >>   }
>>>>> >>
>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>> will do
>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>> SBN but I
>>>>> >> have not tried before.
>>>>> >>
>>>>> >>     Thus in general we may still have to restart all the services
>>>>> (except
>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>> restart
>>>>> >> process I guess:
>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>> SBN.
>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>> restart
>>>>> >> of all the DN to update their configurations
>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>> their
>>>>> >> configuration. The new SBN should become active.
>>>>> >>
>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>> this
>>>>> >> works or not. And I think we should also document the correct steps
>>>>> in
>>>>> >> Apache. Could you please file an Apache jira?
>>>>> >>
>>>>> >> Thanks,
>>>>> >> -Jing
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hello,
>>>>> >>>
>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>> configuration. I
>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>> >>>
>>>>> >>> Use the Bootstrap standby command to prep the replacment standby.
>>>>> Or
>>>>> >>> rsync if the command fails.
>>>>> >>>
>>>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>>>> to the
>>>>> >>> new standby
>>>>> >>>
>>>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>>>> >>> standby.
>>>>> >>>
>>>>> >>> Start the replacment standby
>>>>> >>>
>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>> NameNode
>>>>> >>> configuration.
>>>>> >>>
>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>> going about
>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>> >>>
>>>>> >>>
>>>>> >>> Regards,
>>>>> >>>
>>>>> >>> Colin Williams
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> CONFIDENTIALITY NOTICE
>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>> entity
>>>>> >> to which it is addressed and may contain information that is
>>>>> confidential,
>>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>>> reader of
>>>>> >> this message is not the intended recipient, you are hereby notified
>>>>> that any
>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>> forwarding of
>>>>> >> this communication is strictly prohibited. If you have received this
>>>>> >> communication in error, please contact the sender immediately and
>>>>> delete it
>>>>> >> from your system. Thank You.
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
You should first replace the namenode, then when that is completely
finished move on to replacing any journal nodes. That part is easy:

1) bootstrap new JN (rsync from an existing)
2) Start new JN
3) push hdfs-site.xml to both namenodes
4) restart standby namenode
5) verify logs and admin ui show new JN
6) restart active namenode
7) verify both namenodes (failover should have happened and old standby
should be writing to the new JN)

You can remove an existing JN at the same time if you want, just be careful
to preserve the majority of the quorum during the whole operation (I.e only
replace 1 at a time).

Also I think it is best to do hdfs dfsadmin -rollEdits after each replaced
journalnode. IIRC there is a JIRA open about rolling restarting journal
nodes not being safe unless you roll edits. So that would go for replacing
too.

On Friday, August 1, 2014, Colin Kincaid Williams <di...@uw.edu> wrote:

> I will run through the procedure again tomorrow. It was late in the day
> before I had a chance to test the procedure.
>
> If I recall correctly I had an issue formatting the New standby, before
> bootstrapping.  I think either at that point, or during the Zookeeper
> format command,  I was queried  to format the journal to the 3 hosts in the
> quorum.I was unable to proceed without exception unless choosing this
> option .
>
> Are there any concerns adding another journal node to the new standby?
> On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bbeaudreault@hubspot.com
> <javascript:_e(%7B%7D,'cvml','bbeaudreault@hubspot.com');>> wrote:
>
>> This shouldn't have affected the journalnodes at all -- they are mostly
>> unaware of the zkfc and active/standby state.  Did you do something else
>> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
>> or something else)
>>
>> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>>
>> The WARN is fine.  It's true that you could get in a weird state if you
>> had multiple namenodes up.  But with just 1 namenode up, you should be
>> safe. What you are trying to avoid is a split brain or standby/standby
>> state, but that is impossible with just 1 namenode alive.  Similarly, the
>> ERROR is a sanity check to make sure you don't screw yourself by formatting
>> a running cluster.  I should have mentioned you need to use the -force
>> argument to get around that.
>>
>>
>> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <discord@uw.edu
>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>>
>>> However continuing with the process my QJM eventually error'd out and my
>>> Active NameNode went down.
>>>
>>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>>> 10.120.5.247:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>>> 10.120.5.203:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>>> 10.120.5.25:8485] client.QuorumJournalManager
>>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>>> failed to write txns 9635-9635. Will try to write to this JN again after
>>> the next log roll.
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch
>>> 5 is not the current writer epoch  0
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>>  at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>>> stream=QuorumOutputStream starting at txid 9634))
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>>
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>>> at
>>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>>  at
>>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>>> at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>>  at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>>> at
>>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>>  at
>>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>>> at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>>> QuorumOutputStream starting at txid 9634
>>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <discord@uw.edu
>>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>> wrote:
>>>
>>>> I tried a third time and it just worked?
>>>>
>>>> sudo hdfs zkfc -formatZK
>>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>>> GMT
>>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>>> =rhel1.local
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>>> Corporation
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:os.version=2.6.32-358.el6.x86_64
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>>> (Environment.java:logEnv(100)) - Client
>>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>>> sessionTimeout=5000 watcher=null
>>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>>> attempt to authenticate using SASL (unknown error)
>>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>>> session
>>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>>  ===============================================
>>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>>> Are you sure you want to clear all failover information from
>>>> ZooKeeper?
>>>> WARNING: Before proceeding, ensure that all HDFS services and
>>>> failover controllers are stopped!
>>>> ===============================================
>>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>>> Y
>>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>>> /hadoop-ha/golden-apple from ZK...
>>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>>> /hadoop-ha/golden-apple from ZK.
>>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>>> /hadoop-ha/golden-apple in ZK.
>>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>>
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <posix4e@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','posix4e@gmail.com');>> wrote:
>>>>
>>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>>
>>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>>> discord@uw.edu <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> wrote:
>>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help
>>>>> earlier today.
>>>>> > Just thought I'd forward this info regarding swapping out the
>>>>> NameNode in a
>>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>>> Seattle, feel
>>>>> > free to give me a shout out.
>>>>> >
>>>>> > ---------- Forwarded message ----------
>>>>> > From: Colin Kincaid Williams <discord@uw.edu
>>>>> <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM /
>>>>> HA
>>>>> > configuration
>>>>> > To: user@hadoop.apache.org
>>>>> <javascript:_e(%7B%7D,'cvml','user@hadoop.apache.org');>
>>>>> >
>>>>> >
>>>>> > Hi Jing,
>>>>> >
>>>>> > Thanks for the response. I will try this out, and file an Apache
>>>>> jira.
>>>>> >
>>>>> > Best,
>>>>> >
>>>>> > Colin Williams
>>>>> >
>>>>> >
>>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <jing@hortonworks.com
>>>>> <javascript:_e(%7B%7D,'cvml','jing@hortonworks.com');>> wrote:
>>>>> >>
>>>>> >> Hi Colin,
>>>>> >>
>>>>> >>     I guess currently we may have to restart almost all the
>>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>>> >>
>>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN
>>>>> since in
>>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>>> request to
>>>>> >> ANN periodically (thus if a NN failover happens later, the original
>>>>> ANN
>>>>> >> needs to send this RPC to the correct NN).
>>>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>>>> NN.
>>>>> >> Look at the code in BPOfferService:
>>>>> >>
>>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>>> >> IOException {
>>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>>> >>     for (BPServiceActor actor : bpServices) {
>>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>>> >>     }
>>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>>> >>
>>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>>> >>       // Keep things simple for now -- we can implement this at a
>>>>> later
>>>>> >> date.
>>>>> >>       throw new IOException(
>>>>> >>           "HA does not currently support adding a new standby to a
>>>>> running
>>>>> >> DN. " +
>>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>>> list of
>>>>> >> NNs.");
>>>>> >>     }
>>>>> >>   }
>>>>> >>
>>>>> >> 3. If you're using automatic failover, you also need to update the
>>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>>> will do
>>>>> >> gracefully fencing by sending RPC to the other NN.
>>>>> >> 4. Looks like we do not need to restart JournalNodes for the new
>>>>> SBN but I
>>>>> >> have not tried before.
>>>>> >>
>>>>> >>     Thus in general we may still have to restart all the services
>>>>> (except
>>>>> >> JNs) and update their configurations. But this may be a rolling
>>>>> restart
>>>>> >> process I guess:
>>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>>> SBN.
>>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>>> restart
>>>>> >> of all the DN to update their configurations
>>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>>> their
>>>>> >> configuration. The new SBN should become active.
>>>>> >>
>>>>> >>     I have not tried the upper steps, thus please let me know if
>>>>> this
>>>>> >> works or not. And I think we should also document the correct steps
>>>>> in
>>>>> >> Apache. Could you please file an Apache jira?
>>>>> >>
>>>>> >> Thanks,
>>>>> >> -Jing
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>>> discord@uw.edu <javascript:_e(%7B%7D,'cvml','discord@uw.edu');>>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hello,
>>>>> >>>
>>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>>> configuration. I
>>>>> >>> believe the steps to achieve this would be something similar to:
>>>>> >>>
>>>>> >>> Use the Bootstrap standby command to prep the replacment standby.
>>>>> Or
>>>>> >>> rsync if the command fails.
>>>>> >>>
>>>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>>>> to the
>>>>> >>> new standby
>>>>> >>>
>>>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>>>> >>> standby.
>>>>> >>>
>>>>> >>> Start the replacment standby
>>>>> >>>
>>>>> >>> Use some hadoop command to refresh the datanodes to the new
>>>>> NameNode
>>>>> >>> configuration.
>>>>> >>>
>>>>> >>> I am not sure how to deal with the Journal switch, or if I am
>>>>> going about
>>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>>> >>>
>>>>> >>>
>>>>> >>> Regards,
>>>>> >>>
>>>>> >>> Colin Williams
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> CONFIDENTIALITY NOTICE
>>>>> >> NOTICE: This message is intended for the use of the individual or
>>>>> entity
>>>>> >> to which it is addressed and may contain information that is
>>>>> confidential,
>>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>>> reader of
>>>>> >> this message is not the intended recipient, you are hereby notified
>>>>> that any
>>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>>> forwarding of
>>>>> >> this communication is strictly prohibited. If you have received this
>>>>> >> communication in error, please contact the sender immediately and
>>>>> delete it
>>>>> >> from your system. Thank You.
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I will run through the procedure again tomorrow. It was late in the day
before I had a chance to test the procedure.

If I recall correctly I had an issue formatting the New standby, before
bootstrapping.  I think either at that point, or during the Zookeeper
format command,  I was queried  to format the journal to the 3 hosts in the
quorum.I was unable to proceed without exception unless choosing this
option .

Are there any concerns adding another journal node to the new standby?
On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
wrote:

> This shouldn't have affected the journalnodes at all -- they are mostly
> unaware of the zkfc and active/standby state.  Did you do something else
> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
> or something else)
>
> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>
> The WARN is fine.  It's true that you could get in a weird state if you
> had multiple namenodes up.  But with just 1 namenode up, you should be
> safe. What you are trying to avoid is a split brain or standby/standby
> state, but that is impossible with just 1 namenode alive.  Similarly, the
> ERROR is a sanity check to make sure you don't screw yourself by formatting
> a running cluster.  I should have mentioned you need to use the -force
> argument to get around that.
>
>
> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> However continuing with the process my QJM eventually error'd out and my
>> Active NameNode went down.
>>
>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>> 10.120.5.247:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>> 10.120.5.203:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>> 10.120.5.25:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>> stream=QuorumOutputStream starting at txid 9634))
>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>> at
>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>> at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>> at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>> at
>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>  at
>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>> QuorumOutputStream starting at txid 9634
>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>
>>
>>
>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> I tried a third time and it just worked?
>>>
>>> sudo hdfs zkfc -formatZK
>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>> GMT
>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>> =rhel1.local
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>> Corporation
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:os.version=2.6.32-358.el6.x86_64
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>> sessionTimeout=5000 watcher=null
>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>> attempt to authenticate using SASL (unknown error)
>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>> session
>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>  ===============================================
>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>> Are you sure you want to clear all failover information from
>>> ZooKeeper?
>>> WARNING: Before proceeding, ensure that all HDFS services and
>>> failover controllers are stopped!
>>> ===============================================
>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>> Y
>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>> /hadoop-ha/golden-apple from ZK...
>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>> /hadoop-ha/golden-apple from ZK.
>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>> /hadoop-ha/golden-apple in ZK.
>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>>>
>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>
>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>> discord@uw.edu> wrote:
>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>>>> today.
>>>> > Just thought I'd forward this info regarding swapping out the
>>>> NameNode in a
>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>> Seattle, feel
>>>> > free to give me a shout out.
>>>> >
>>>> > ---------- Forwarded message ----------
>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM /
>>>> HA
>>>> > configuration
>>>> > To: user@hadoop.apache.org
>>>> >
>>>> >
>>>> > Hi Jing,
>>>> >
>>>> > Thanks for the response. I will try this out, and file an Apache jira.
>>>> >
>>>> > Best,
>>>> >
>>>> > Colin Williams
>>>> >
>>>> >
>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>> wrote:
>>>> >>
>>>> >> Hi Colin,
>>>> >>
>>>> >>     I guess currently we may have to restart almost all the
>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>> >>
>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN since
>>>> in
>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>> request to
>>>> >> ANN periodically (thus if a NN failover happens later, the original
>>>> ANN
>>>> >> needs to send this RPC to the correct NN).
>>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>>> NN.
>>>> >> Look at the code in BPOfferService:
>>>> >>
>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>> >> IOException {
>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>> >>     for (BPServiceActor actor : bpServices) {
>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>> >>     }
>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>> >>
>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>> >>       // Keep things simple for now -- we can implement this at a
>>>> later
>>>> >> date.
>>>> >>       throw new IOException(
>>>> >>           "HA does not currently support adding a new standby to a
>>>> running
>>>> >> DN. " +
>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>> list of
>>>> >> NNs.");
>>>> >>     }
>>>> >>   }
>>>> >>
>>>> >> 3. If you're using automatic failover, you also need to update the
>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>> will do
>>>> >> gracefully fencing by sending RPC to the other NN.
>>>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>> but I
>>>> >> have not tried before.
>>>> >>
>>>> >>     Thus in general we may still have to restart all the services
>>>> (except
>>>> >> JNs) and update their configurations. But this may be a rolling
>>>> restart
>>>> >> process I guess:
>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>> SBN.
>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>> restart
>>>> >> of all the DN to update their configurations
>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>> their
>>>> >> configuration. The new SBN should become active.
>>>> >>
>>>> >>     I have not tried the upper steps, thus please let me know if this
>>>> >> works or not. And I think we should also document the correct steps
>>>> in
>>>> >> Apache. Could you please file an Apache jira?
>>>> >>
>>>> >> Thanks,
>>>> >> -Jing
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>> discord@uw.edu>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>> configuration. I
>>>> >>> believe the steps to achieve this would be something similar to:
>>>> >>>
>>>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>> >>> rsync if the command fails.
>>>> >>>
>>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>>> to the
>>>> >>> new standby
>>>> >>>
>>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>>> >>> standby.
>>>> >>>
>>>> >>> Start the replacment standby
>>>> >>>
>>>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>> >>> configuration.
>>>> >>>
>>>> >>> I am not sure how to deal with the Journal switch, or if I am going
>>>> about
>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>> >>>
>>>> >>>
>>>> >>> Regards,
>>>> >>>
>>>> >>> Colin Williams
>>>> >>>
>>>> >>
>>>> >>
>>>> >> CONFIDENTIALITY NOTICE
>>>> >> NOTICE: This message is intended for the use of the individual or
>>>> entity
>>>> >> to which it is addressed and may contain information that is
>>>> confidential,
>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>> reader of
>>>> >> this message is not the intended recipient, you are hereby notified
>>>> that any
>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>> forwarding of
>>>> >> this communication is strictly prohibited. If you have received this
>>>> >> communication in error, please contact the sender immediately and
>>>> delete it
>>>> >> from your system. Thank You.
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I will run through the procedure again tomorrow. It was late in the day
before I had a chance to test the procedure.

If I recall correctly I had an issue formatting the New standby, before
bootstrapping.  I think either at that point, or during the Zookeeper
format command,  I was queried  to format the journal to the 3 hosts in the
quorum.I was unable to proceed without exception unless choosing this
option .

Are there any concerns adding another journal node to the new standby?
On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
wrote:

> This shouldn't have affected the journalnodes at all -- they are mostly
> unaware of the zkfc and active/standby state.  Did you do something else
> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
> or something else)
>
> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>
> The WARN is fine.  It's true that you could get in a weird state if you
> had multiple namenodes up.  But with just 1 namenode up, you should be
> safe. What you are trying to avoid is a split brain or standby/standby
> state, but that is impossible with just 1 namenode alive.  Similarly, the
> ERROR is a sanity check to make sure you don't screw yourself by formatting
> a running cluster.  I should have mentioned you need to use the -force
> argument to get around that.
>
>
> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> However continuing with the process my QJM eventually error'd out and my
>> Active NameNode went down.
>>
>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>> 10.120.5.247:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>> 10.120.5.203:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>> 10.120.5.25:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>> stream=QuorumOutputStream starting at txid 9634))
>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>> at
>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>> at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>> at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>> at
>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>  at
>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>> QuorumOutputStream starting at txid 9634
>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>
>>
>>
>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> I tried a third time and it just worked?
>>>
>>> sudo hdfs zkfc -formatZK
>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>> GMT
>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>> =rhel1.local
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>> Corporation
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:os.version=2.6.32-358.el6.x86_64
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>> sessionTimeout=5000 watcher=null
>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>> attempt to authenticate using SASL (unknown error)
>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>> session
>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>  ===============================================
>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>> Are you sure you want to clear all failover information from
>>> ZooKeeper?
>>> WARNING: Before proceeding, ensure that all HDFS services and
>>> failover controllers are stopped!
>>> ===============================================
>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>> Y
>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>> /hadoop-ha/golden-apple from ZK...
>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>> /hadoop-ha/golden-apple from ZK.
>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>> /hadoop-ha/golden-apple in ZK.
>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>>>
>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>
>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>> discord@uw.edu> wrote:
>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>>>> today.
>>>> > Just thought I'd forward this info regarding swapping out the
>>>> NameNode in a
>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>> Seattle, feel
>>>> > free to give me a shout out.
>>>> >
>>>> > ---------- Forwarded message ----------
>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM /
>>>> HA
>>>> > configuration
>>>> > To: user@hadoop.apache.org
>>>> >
>>>> >
>>>> > Hi Jing,
>>>> >
>>>> > Thanks for the response. I will try this out, and file an Apache jira.
>>>> >
>>>> > Best,
>>>> >
>>>> > Colin Williams
>>>> >
>>>> >
>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>> wrote:
>>>> >>
>>>> >> Hi Colin,
>>>> >>
>>>> >>     I guess currently we may have to restart almost all the
>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>> >>
>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN since
>>>> in
>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>> request to
>>>> >> ANN periodically (thus if a NN failover happens later, the original
>>>> ANN
>>>> >> needs to send this RPC to the correct NN).
>>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>>> NN.
>>>> >> Look at the code in BPOfferService:
>>>> >>
>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>> >> IOException {
>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>> >>     for (BPServiceActor actor : bpServices) {
>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>> >>     }
>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>> >>
>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>> >>       // Keep things simple for now -- we can implement this at a
>>>> later
>>>> >> date.
>>>> >>       throw new IOException(
>>>> >>           "HA does not currently support adding a new standby to a
>>>> running
>>>> >> DN. " +
>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>> list of
>>>> >> NNs.");
>>>> >>     }
>>>> >>   }
>>>> >>
>>>> >> 3. If you're using automatic failover, you also need to update the
>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>> will do
>>>> >> gracefully fencing by sending RPC to the other NN.
>>>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>> but I
>>>> >> have not tried before.
>>>> >>
>>>> >>     Thus in general we may still have to restart all the services
>>>> (except
>>>> >> JNs) and update their configurations. But this may be a rolling
>>>> restart
>>>> >> process I guess:
>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>> SBN.
>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>> restart
>>>> >> of all the DN to update their configurations
>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>> their
>>>> >> configuration. The new SBN should become active.
>>>> >>
>>>> >>     I have not tried the upper steps, thus please let me know if this
>>>> >> works or not. And I think we should also document the correct steps
>>>> in
>>>> >> Apache. Could you please file an Apache jira?
>>>> >>
>>>> >> Thanks,
>>>> >> -Jing
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>> discord@uw.edu>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>> configuration. I
>>>> >>> believe the steps to achieve this would be something similar to:
>>>> >>>
>>>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>> >>> rsync if the command fails.
>>>> >>>
>>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>>> to the
>>>> >>> new standby
>>>> >>>
>>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>>> >>> standby.
>>>> >>>
>>>> >>> Start the replacment standby
>>>> >>>
>>>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>> >>> configuration.
>>>> >>>
>>>> >>> I am not sure how to deal with the Journal switch, or if I am going
>>>> about
>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>> >>>
>>>> >>>
>>>> >>> Regards,
>>>> >>>
>>>> >>> Colin Williams
>>>> >>>
>>>> >>
>>>> >>
>>>> >> CONFIDENTIALITY NOTICE
>>>> >> NOTICE: This message is intended for the use of the individual or
>>>> entity
>>>> >> to which it is addressed and may contain information that is
>>>> confidential,
>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>> reader of
>>>> >> this message is not the intended recipient, you are hereby notified
>>>> that any
>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>> forwarding of
>>>> >> this communication is strictly prohibited. If you have received this
>>>> >> communication in error, please contact the sender immediately and
>>>> delete it
>>>> >> from your system. Thank You.
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I will run through the procedure again tomorrow. It was late in the day
before I had a chance to test the procedure.

If I recall correctly I had an issue formatting the New standby, before
bootstrapping.  I think either at that point, or during the Zookeeper
format command,  I was queried  to format the journal to the 3 hosts in the
quorum.I was unable to proceed without exception unless choosing this
option .

Are there any concerns adding another journal node to the new standby?
On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
wrote:

> This shouldn't have affected the journalnodes at all -- they are mostly
> unaware of the zkfc and active/standby state.  Did you do something else
> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
> or something else)
>
> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>
> The WARN is fine.  It's true that you could get in a weird state if you
> had multiple namenodes up.  But with just 1 namenode up, you should be
> safe. What you are trying to avoid is a split brain or standby/standby
> state, but that is impossible with just 1 namenode alive.  Similarly, the
> ERROR is a sanity check to make sure you don't screw yourself by formatting
> a running cluster.  I should have mentioned you need to use the -force
> argument to get around that.
>
>
> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> However continuing with the process my QJM eventually error'd out and my
>> Active NameNode went down.
>>
>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>> 10.120.5.247:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>> 10.120.5.203:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>> 10.120.5.25:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>> stream=QuorumOutputStream starting at txid 9634))
>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>> at
>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>> at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>> at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>> at
>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>  at
>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>> QuorumOutputStream starting at txid 9634
>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>
>>
>>
>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> I tried a third time and it just worked?
>>>
>>> sudo hdfs zkfc -formatZK
>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>> GMT
>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>> =rhel1.local
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>> Corporation
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:os.version=2.6.32-358.el6.x86_64
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>> sessionTimeout=5000 watcher=null
>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>> attempt to authenticate using SASL (unknown error)
>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>> session
>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>  ===============================================
>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>> Are you sure you want to clear all failover information from
>>> ZooKeeper?
>>> WARNING: Before proceeding, ensure that all HDFS services and
>>> failover controllers are stopped!
>>> ===============================================
>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>> Y
>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>> /hadoop-ha/golden-apple from ZK...
>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>> /hadoop-ha/golden-apple from ZK.
>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>> /hadoop-ha/golden-apple in ZK.
>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>>>
>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>
>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>> discord@uw.edu> wrote:
>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>>>> today.
>>>> > Just thought I'd forward this info regarding swapping out the
>>>> NameNode in a
>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>> Seattle, feel
>>>> > free to give me a shout out.
>>>> >
>>>> > ---------- Forwarded message ----------
>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM /
>>>> HA
>>>> > configuration
>>>> > To: user@hadoop.apache.org
>>>> >
>>>> >
>>>> > Hi Jing,
>>>> >
>>>> > Thanks for the response. I will try this out, and file an Apache jira.
>>>> >
>>>> > Best,
>>>> >
>>>> > Colin Williams
>>>> >
>>>> >
>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>> wrote:
>>>> >>
>>>> >> Hi Colin,
>>>> >>
>>>> >>     I guess currently we may have to restart almost all the
>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>> >>
>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN since
>>>> in
>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>> request to
>>>> >> ANN periodically (thus if a NN failover happens later, the original
>>>> ANN
>>>> >> needs to send this RPC to the correct NN).
>>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>>> NN.
>>>> >> Look at the code in BPOfferService:
>>>> >>
>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>> >> IOException {
>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>> >>     for (BPServiceActor actor : bpServices) {
>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>> >>     }
>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>> >>
>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>> >>       // Keep things simple for now -- we can implement this at a
>>>> later
>>>> >> date.
>>>> >>       throw new IOException(
>>>> >>           "HA does not currently support adding a new standby to a
>>>> running
>>>> >> DN. " +
>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>> list of
>>>> >> NNs.");
>>>> >>     }
>>>> >>   }
>>>> >>
>>>> >> 3. If you're using automatic failover, you also need to update the
>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>> will do
>>>> >> gracefully fencing by sending RPC to the other NN.
>>>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>> but I
>>>> >> have not tried before.
>>>> >>
>>>> >>     Thus in general we may still have to restart all the services
>>>> (except
>>>> >> JNs) and update their configurations. But this may be a rolling
>>>> restart
>>>> >> process I guess:
>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>> SBN.
>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>> restart
>>>> >> of all the DN to update their configurations
>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>> their
>>>> >> configuration. The new SBN should become active.
>>>> >>
>>>> >>     I have not tried the upper steps, thus please let me know if this
>>>> >> works or not. And I think we should also document the correct steps
>>>> in
>>>> >> Apache. Could you please file an Apache jira?
>>>> >>
>>>> >> Thanks,
>>>> >> -Jing
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>> discord@uw.edu>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>> configuration. I
>>>> >>> believe the steps to achieve this would be something similar to:
>>>> >>>
>>>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>> >>> rsync if the command fails.
>>>> >>>
>>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>>> to the
>>>> >>> new standby
>>>> >>>
>>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>>> >>> standby.
>>>> >>>
>>>> >>> Start the replacment standby
>>>> >>>
>>>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>> >>> configuration.
>>>> >>>
>>>> >>> I am not sure how to deal with the Journal switch, or if I am going
>>>> about
>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>> >>>
>>>> >>>
>>>> >>> Regards,
>>>> >>>
>>>> >>> Colin Williams
>>>> >>>
>>>> >>
>>>> >>
>>>> >> CONFIDENTIALITY NOTICE
>>>> >> NOTICE: This message is intended for the use of the individual or
>>>> entity
>>>> >> to which it is addressed and may contain information that is
>>>> confidential,
>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>> reader of
>>>> >> this message is not the intended recipient, you are hereby notified
>>>> that any
>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>> forwarding of
>>>> >> this communication is strictly prohibited. If you have received this
>>>> >> communication in error, please contact the sender immediately and
>>>> delete it
>>>> >> from your system. Thank You.
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I will run through the procedure again tomorrow. It was late in the day
before I had a chance to test the procedure.

If I recall correctly I had an issue formatting the New standby, before
bootstrapping.  I think either at that point, or during the Zookeeper
format command,  I was queried  to format the journal to the 3 hosts in the
quorum.I was unable to proceed without exception unless choosing this
option .

Are there any concerns adding another journal node to the new standby?
On Jul 31, 2014 9:44 PM, "Bryan Beaudreault" <bb...@hubspot.com>
wrote:

> This shouldn't have affected the journalnodes at all -- they are mostly
> unaware of the zkfc and active/standby state.  Did you do something else
> that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
> or something else)
>
> For your previous 2 emails, reporting errors/warns when doing -formatZK:
>
> The WARN is fine.  It's true that you could get in a weird state if you
> had multiple namenodes up.  But with just 1 namenode up, you should be
> safe. What you are trying to avoid is a split brain or standby/standby
> state, but that is impossible with just 1 namenode alive.  Similarly, the
> ERROR is a sanity check to make sure you don't screw yourself by formatting
> a running cluster.  I should have mentioned you need to use the -force
> argument to get around that.
>
>
> On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> However continuing with the process my QJM eventually error'd out and my
>> Active NameNode went down.
>>
>> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
>> 10.120.5.247:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
>> 10.120.5.203:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
>> 10.120.5.25:8485] client.QuorumJournalManager
>> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
>> failed to write txns 9635-9635. Will try to write to this JN again after
>> the next log roll.
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
>> is not the current writer epoch  0
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>>  at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
>> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
>> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
>> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
>> stream=QuorumOutputStream starting at txid 9634))
>> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
>> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
>> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>> at
>> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>>  at
>> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>> at
>> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>>  at
>> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>>
>> at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
>> at
>> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>>  at
>> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>> at
>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>> at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
>> at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>>  at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
>> at
>> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>>  at
>> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
>> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
>> QuorumOutputStream starting at txid 9634
>> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020]
>> util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1
>> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
>> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>>
>>
>>
>> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>>
>>> I tried a third time and it just worked?
>>>
>>> sudo hdfs zkfc -formatZK
>>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>>> GMT
>>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:host.name
>>> =rhel1.local
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>>> Corporation
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:java.library.path=//usr/lib/hadoop/lib/native
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:os.version=2.6.32-358.el6.x86_64
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>>> (Environment.java:logEnv(100)) - Client
>>> environment:user.dir=/etc/hbase/conf.golden_apple
>>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>>> sessionTimeout=5000 watcher=null
>>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>>> attempt to authenticate using SASL (unknown error)
>>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>>> connection established to rhel1.local/10.120.5.203:2181, initiating
>>> session
>>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>>> establishment complete on server rhel1.local/10.120.5.203:2181,
>>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>>  ===============================================
>>> The configured parent znode /hadoop-ha/golden-apple already exists.
>>> Are you sure you want to clear all failover information from
>>> ZooKeeper?
>>> WARNING: Before proceeding, ensure that all HDFS services and
>>> failover controllers are stopped!
>>> ===============================================
>>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>>> Y
>>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>>> /hadoop-ha/golden-apple from ZK...
>>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>>> /hadoop-ha/golden-apple from ZK.
>>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>>> /hadoop-ha/golden-apple in ZK.
>>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>>> (ClientCnxn.java:run(511)) - EventThread shut down
>>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>>>
>>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>>
>>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <
>>>> discord@uw.edu> wrote:
>>>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>>>> today.
>>>> > Just thought I'd forward this info regarding swapping out the
>>>> NameNode in a
>>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>>> Seattle, feel
>>>> > free to give me a shout out.
>>>> >
>>>> > ---------- Forwarded message ----------
>>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM /
>>>> HA
>>>> > configuration
>>>> > To: user@hadoop.apache.org
>>>> >
>>>> >
>>>> > Hi Jing,
>>>> >
>>>> > Thanks for the response. I will try this out, and file an Apache jira.
>>>> >
>>>> > Best,
>>>> >
>>>> > Colin Williams
>>>> >
>>>> >
>>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>>> wrote:
>>>> >>
>>>> >> Hi Colin,
>>>> >>
>>>> >>     I guess currently we may have to restart almost all the
>>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>>> >>
>>>> >> 1. The current active NameNode (ANN) needs to know the new SBN since
>>>> in
>>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>>> request to
>>>> >> ANN periodically (thus if a NN failover happens later, the original
>>>> ANN
>>>> >> needs to send this RPC to the correct NN).
>>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>>> NN.
>>>> >> Look at the code in BPOfferService:
>>>> >>
>>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>>> >> IOException {
>>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>>> >>     for (BPServiceActor actor : bpServices) {
>>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>>> >>     }
>>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>>> >>
>>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>>> >>       // Keep things simple for now -- we can implement this at a
>>>> later
>>>> >> date.
>>>> >>       throw new IOException(
>>>> >>           "HA does not currently support adding a new standby to a
>>>> running
>>>> >> DN. " +
>>>> >>           "Please do a rolling restart of DNs to reconfigure the
>>>> list of
>>>> >> NNs.");
>>>> >>     }
>>>> >>   }
>>>> >>
>>>> >> 3. If you're using automatic failover, you also need to update the
>>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC
>>>> will do
>>>> >> gracefully fencing by sending RPC to the other NN.
>>>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>>> but I
>>>> >> have not tried before.
>>>> >>
>>>> >>     Thus in general we may still have to restart all the services
>>>> (except
>>>> >> JNs) and update their configurations. But this may be a rolling
>>>> restart
>>>> >> process I guess:
>>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new
>>>> SBN.
>>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>>> restart
>>>> >> of all the DN to update their configurations
>>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>>> their
>>>> >> configuration. The new SBN should become active.
>>>> >>
>>>> >>     I have not tried the upper steps, thus please let me know if this
>>>> >> works or not. And I think we should also document the correct steps
>>>> in
>>>> >> Apache. Could you please file an Apache jira?
>>>> >>
>>>> >> Thanks,
>>>> >> -Jing
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>>> discord@uw.edu>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>>> configuration. I
>>>> >>> believe the steps to achieve this would be something similar to:
>>>> >>>
>>>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>>>> >>> rsync if the command fails.
>>>> >>>
>>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>>> to the
>>>> >>> new standby
>>>> >>>
>>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>>> >>> standby.
>>>> >>>
>>>> >>> Start the replacment standby
>>>> >>>
>>>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>>>> >>> configuration.
>>>> >>>
>>>> >>> I am not sure how to deal with the Journal switch, or if I am going
>>>> about
>>>> >>> this the right way. Can anybody give me some suggestions here?
>>>> >>>
>>>> >>>
>>>> >>> Regards,
>>>> >>>
>>>> >>> Colin Williams
>>>> >>>
>>>> >>
>>>> >>
>>>> >> CONFIDENTIALITY NOTICE
>>>> >> NOTICE: This message is intended for the use of the individual or
>>>> entity
>>>> >> to which it is addressed and may contain information that is
>>>> confidential,
>>>> >> privileged and exempt from disclosure under applicable law. If the
>>>> reader of
>>>> >> this message is not the intended recipient, you are hereby notified
>>>> that any
>>>> >> printing, copying, dissemination, distribution, disclosure or
>>>> forwarding of
>>>> >> this communication is strictly prohibited. If you have received this
>>>> >> communication in error, please contact the sender immediately and
>>>> delete it
>>>> >> from your system. Thank You.
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
This shouldn't have affected the journalnodes at all -- they are mostly
unaware of the zkfc and active/standby state.  Did you do something else
that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
or something else)

For your previous 2 emails, reporting errors/warns when doing -formatZK:

The WARN is fine.  It's true that you could get in a weird state if you had
multiple namenodes up.  But with just 1 namenode up, you should be safe.
What you are trying to avoid is a split brain or standby/standby state, but
that is impossible with just 1 namenode alive.  Similarly, the ERROR is a
sanity check to make sure you don't screw yourself by formatting a running
cluster.  I should have mentioned you need to use the -force argument to
get around that.


On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> However continuing with the process my QJM eventually error'd out and my
> Active NameNode went down.
>
> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
> 10.120.5.247:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
> 10.120.5.203:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
> 10.120.5.25:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
> stream=QuorumOutputStream starting at txid 9634))
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at
> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>  at
> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
> at
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>  at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>  at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>  at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>  at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
> at
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>  at
> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
> QuorumOutputStream starting at txid 9634
> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020] util.ExitUtil
> (ExitUtil.java:terminate(87)) - Exiting with status 1
> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>
>
>
> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> I tried a third time and it just worked?
>>
>> sudo hdfs zkfc -formatZK
>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>> GMT
>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>> Corporation
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.library.path=//usr/lib/hadoop/lib/native
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:os.version=2.6.32-358.el6.x86_64
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:user.dir=/etc/hbase/conf.golden_apple
>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>> sessionTimeout=5000 watcher=null
>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>> attempt to authenticate using SASL (unknown error)
>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>> connection established to rhel1.local/10.120.5.203:2181, initiating
>> session
>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>> establishment complete on server rhel1.local/10.120.5.203:2181,
>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>  ===============================================
>> The configured parent znode /hadoop-ha/golden-apple already exists.
>> Are you sure you want to clear all failover information from
>> ZooKeeper?
>> WARNING: Before proceeding, ensure that all HDFS services and
>> failover controllers are stopped!
>> ===============================================
>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>> Y
>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>> /hadoop-ha/golden-apple from ZK...
>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>> /hadoop-ha/golden-apple from ZK.
>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>> /hadoop-ha/golden-apple in ZK.
>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>> (ClientCnxn.java:run(511)) - EventThread shut down
>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>
>>
>>
>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>>
>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>
>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>>> today.
>>> > Just thought I'd forward this info regarding swapping out the NameNode
>>> in a
>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>> Seattle, feel
>>> > free to give me a shout out.
>>> >
>>> > ---------- Forwarded message ----------
>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
>>> > configuration
>>> > To: user@hadoop.apache.org
>>> >
>>> >
>>> > Hi Jing,
>>> >
>>> > Thanks for the response. I will try this out, and file an Apache jira.
>>> >
>>> > Best,
>>> >
>>> > Colin Williams
>>> >
>>> >
>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>> wrote:
>>> >>
>>> >> Hi Colin,
>>> >>
>>> >>     I guess currently we may have to restart almost all the
>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>> >>
>>> >> 1. The current active NameNode (ANN) needs to know the new SBN since
>>> in
>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>> request to
>>> >> ANN periodically (thus if a NN failover happens later, the original
>>> ANN
>>> >> needs to send this RPC to the correct NN).
>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>> NN.
>>> >> Look at the code in BPOfferService:
>>> >>
>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>> >> IOException {
>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>> >>     for (BPServiceActor actor : bpServices) {
>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>> >>     }
>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>> >>
>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>> >>       // Keep things simple for now -- we can implement this at a
>>> later
>>> >> date.
>>> >>       throw new IOException(
>>> >>           "HA does not currently support adding a new standby to a
>>> running
>>> >> DN. " +
>>> >>           "Please do a rolling restart of DNs to reconfigure the list
>>> of
>>> >> NNs.");
>>> >>     }
>>> >>   }
>>> >>
>>> >> 3. If you're using automatic failover, you also need to update the
>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC will
>>> do
>>> >> gracefully fencing by sending RPC to the other NN.
>>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>> but I
>>> >> have not tried before.
>>> >>
>>> >>     Thus in general we may still have to restart all the services
>>> (except
>>> >> JNs) and update their configurations. But this may be a rolling
>>> restart
>>> >> process I guess:
>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>> restart
>>> >> of all the DN to update their configurations
>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>> their
>>> >> configuration. The new SBN should become active.
>>> >>
>>> >>     I have not tried the upper steps, thus please let me know if this
>>> >> works or not. And I think we should also document the correct steps in
>>> >> Apache. Could you please file an Apache jira?
>>> >>
>>> >> Thanks,
>>> >> -Jing
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>> discord@uw.edu>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>> configuration. I
>>> >>> believe the steps to achieve this would be something similar to:
>>> >>>
>>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>>> >>> rsync if the command fails.
>>> >>>
>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>> to the
>>> >>> new standby
>>> >>>
>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>> >>> standby.
>>> >>>
>>> >>> Start the replacment standby
>>> >>>
>>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>>> >>> configuration.
>>> >>>
>>> >>> I am not sure how to deal with the Journal switch, or if I am going
>>> about
>>> >>> this the right way. Can anybody give me some suggestions here?
>>> >>>
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Colin Williams
>>> >>>
>>> >>
>>> >>
>>> >> CONFIDENTIALITY NOTICE
>>> >> NOTICE: This message is intended for the use of the individual or
>>> entity
>>> >> to which it is addressed and may contain information that is
>>> confidential,
>>> >> privileged and exempt from disclosure under applicable law. If the
>>> reader of
>>> >> this message is not the intended recipient, you are hereby notified
>>> that any
>>> >> printing, copying, dissemination, distribution, disclosure or
>>> forwarding of
>>> >> this communication is strictly prohibited. If you have received this
>>> >> communication in error, please contact the sender immediately and
>>> delete it
>>> >> from your system. Thank You.
>>> >
>>> >
>>> >
>>>
>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
This shouldn't have affected the journalnodes at all -- they are mostly
unaware of the zkfc and active/standby state.  Did you do something else
that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
or something else)

For your previous 2 emails, reporting errors/warns when doing -formatZK:

The WARN is fine.  It's true that you could get in a weird state if you had
multiple namenodes up.  But with just 1 namenode up, you should be safe.
What you are trying to avoid is a split brain or standby/standby state, but
that is impossible with just 1 namenode alive.  Similarly, the ERROR is a
sanity check to make sure you don't screw yourself by formatting a running
cluster.  I should have mentioned you need to use the -force argument to
get around that.


On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> However continuing with the process my QJM eventually error'd out and my
> Active NameNode went down.
>
> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
> 10.120.5.247:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
> 10.120.5.203:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
> 10.120.5.25:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
> stream=QuorumOutputStream starting at txid 9634))
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at
> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>  at
> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
> at
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>  at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>  at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>  at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>  at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
> at
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>  at
> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
> QuorumOutputStream starting at txid 9634
> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020] util.ExitUtil
> (ExitUtil.java:terminate(87)) - Exiting with status 1
> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>
>
>
> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> I tried a third time and it just worked?
>>
>> sudo hdfs zkfc -formatZK
>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>> GMT
>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>> Corporation
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.library.path=//usr/lib/hadoop/lib/native
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:os.version=2.6.32-358.el6.x86_64
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:user.dir=/etc/hbase/conf.golden_apple
>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>> sessionTimeout=5000 watcher=null
>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>> attempt to authenticate using SASL (unknown error)
>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>> connection established to rhel1.local/10.120.5.203:2181, initiating
>> session
>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>> establishment complete on server rhel1.local/10.120.5.203:2181,
>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>  ===============================================
>> The configured parent znode /hadoop-ha/golden-apple already exists.
>> Are you sure you want to clear all failover information from
>> ZooKeeper?
>> WARNING: Before proceeding, ensure that all HDFS services and
>> failover controllers are stopped!
>> ===============================================
>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>> Y
>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>> /hadoop-ha/golden-apple from ZK...
>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>> /hadoop-ha/golden-apple from ZK.
>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>> /hadoop-ha/golden-apple in ZK.
>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>> (ClientCnxn.java:run(511)) - EventThread shut down
>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>
>>
>>
>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>>
>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>
>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>>> today.
>>> > Just thought I'd forward this info regarding swapping out the NameNode
>>> in a
>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>> Seattle, feel
>>> > free to give me a shout out.
>>> >
>>> > ---------- Forwarded message ----------
>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
>>> > configuration
>>> > To: user@hadoop.apache.org
>>> >
>>> >
>>> > Hi Jing,
>>> >
>>> > Thanks for the response. I will try this out, and file an Apache jira.
>>> >
>>> > Best,
>>> >
>>> > Colin Williams
>>> >
>>> >
>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>> wrote:
>>> >>
>>> >> Hi Colin,
>>> >>
>>> >>     I guess currently we may have to restart almost all the
>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>> >>
>>> >> 1. The current active NameNode (ANN) needs to know the new SBN since
>>> in
>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>> request to
>>> >> ANN periodically (thus if a NN failover happens later, the original
>>> ANN
>>> >> needs to send this RPC to the correct NN).
>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>> NN.
>>> >> Look at the code in BPOfferService:
>>> >>
>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>> >> IOException {
>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>> >>     for (BPServiceActor actor : bpServices) {
>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>> >>     }
>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>> >>
>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>> >>       // Keep things simple for now -- we can implement this at a
>>> later
>>> >> date.
>>> >>       throw new IOException(
>>> >>           "HA does not currently support adding a new standby to a
>>> running
>>> >> DN. " +
>>> >>           "Please do a rolling restart of DNs to reconfigure the list
>>> of
>>> >> NNs.");
>>> >>     }
>>> >>   }
>>> >>
>>> >> 3. If you're using automatic failover, you also need to update the
>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC will
>>> do
>>> >> gracefully fencing by sending RPC to the other NN.
>>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>> but I
>>> >> have not tried before.
>>> >>
>>> >>     Thus in general we may still have to restart all the services
>>> (except
>>> >> JNs) and update their configurations. But this may be a rolling
>>> restart
>>> >> process I guess:
>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>> restart
>>> >> of all the DN to update their configurations
>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>> their
>>> >> configuration. The new SBN should become active.
>>> >>
>>> >>     I have not tried the upper steps, thus please let me know if this
>>> >> works or not. And I think we should also document the correct steps in
>>> >> Apache. Could you please file an Apache jira?
>>> >>
>>> >> Thanks,
>>> >> -Jing
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>> discord@uw.edu>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>> configuration. I
>>> >>> believe the steps to achieve this would be something similar to:
>>> >>>
>>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>>> >>> rsync if the command fails.
>>> >>>
>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>> to the
>>> >>> new standby
>>> >>>
>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>> >>> standby.
>>> >>>
>>> >>> Start the replacment standby
>>> >>>
>>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>>> >>> configuration.
>>> >>>
>>> >>> I am not sure how to deal with the Journal switch, or if I am going
>>> about
>>> >>> this the right way. Can anybody give me some suggestions here?
>>> >>>
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Colin Williams
>>> >>>
>>> >>
>>> >>
>>> >> CONFIDENTIALITY NOTICE
>>> >> NOTICE: This message is intended for the use of the individual or
>>> entity
>>> >> to which it is addressed and may contain information that is
>>> confidential,
>>> >> privileged and exempt from disclosure under applicable law. If the
>>> reader of
>>> >> this message is not the intended recipient, you are hereby notified
>>> that any
>>> >> printing, copying, dissemination, distribution, disclosure or
>>> forwarding of
>>> >> this communication is strictly prohibited. If you have received this
>>> >> communication in error, please contact the sender immediately and
>>> delete it
>>> >> from your system. Thank You.
>>> >
>>> >
>>> >
>>>
>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
This shouldn't have affected the journalnodes at all -- they are mostly
unaware of the zkfc and active/standby state.  Did you do something else
that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
or something else)

For your previous 2 emails, reporting errors/warns when doing -formatZK:

The WARN is fine.  It's true that you could get in a weird state if you had
multiple namenodes up.  But with just 1 namenode up, you should be safe.
What you are trying to avoid is a split brain or standby/standby state, but
that is impossible with just 1 namenode alive.  Similarly, the ERROR is a
sanity check to make sure you don't screw yourself by formatting a running
cluster.  I should have mentioned you need to use the -force argument to
get around that.


On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> However continuing with the process my QJM eventually error'd out and my
> Active NameNode went down.
>
> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
> 10.120.5.247:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
> 10.120.5.203:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
> 10.120.5.25:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
> stream=QuorumOutputStream starting at txid 9634))
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at
> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>  at
> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
> at
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>  at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>  at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>  at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>  at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
> at
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>  at
> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
> QuorumOutputStream starting at txid 9634
> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020] util.ExitUtil
> (ExitUtil.java:terminate(87)) - Exiting with status 1
> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>
>
>
> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> I tried a third time and it just worked?
>>
>> sudo hdfs zkfc -formatZK
>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>> GMT
>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>> Corporation
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.library.path=//usr/lib/hadoop/lib/native
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:os.version=2.6.32-358.el6.x86_64
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:user.dir=/etc/hbase/conf.golden_apple
>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>> sessionTimeout=5000 watcher=null
>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>> attempt to authenticate using SASL (unknown error)
>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>> connection established to rhel1.local/10.120.5.203:2181, initiating
>> session
>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>> establishment complete on server rhel1.local/10.120.5.203:2181,
>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>  ===============================================
>> The configured parent znode /hadoop-ha/golden-apple already exists.
>> Are you sure you want to clear all failover information from
>> ZooKeeper?
>> WARNING: Before proceeding, ensure that all HDFS services and
>> failover controllers are stopped!
>> ===============================================
>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>> Y
>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>> /hadoop-ha/golden-apple from ZK...
>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>> /hadoop-ha/golden-apple from ZK.
>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>> /hadoop-ha/golden-apple in ZK.
>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>> (ClientCnxn.java:run(511)) - EventThread shut down
>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>
>>
>>
>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>>
>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>
>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>>> today.
>>> > Just thought I'd forward this info regarding swapping out the NameNode
>>> in a
>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>> Seattle, feel
>>> > free to give me a shout out.
>>> >
>>> > ---------- Forwarded message ----------
>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
>>> > configuration
>>> > To: user@hadoop.apache.org
>>> >
>>> >
>>> > Hi Jing,
>>> >
>>> > Thanks for the response. I will try this out, and file an Apache jira.
>>> >
>>> > Best,
>>> >
>>> > Colin Williams
>>> >
>>> >
>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>> wrote:
>>> >>
>>> >> Hi Colin,
>>> >>
>>> >>     I guess currently we may have to restart almost all the
>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>> >>
>>> >> 1. The current active NameNode (ANN) needs to know the new SBN since
>>> in
>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>> request to
>>> >> ANN periodically (thus if a NN failover happens later, the original
>>> ANN
>>> >> needs to send this RPC to the correct NN).
>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>> NN.
>>> >> Look at the code in BPOfferService:
>>> >>
>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>> >> IOException {
>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>> >>     for (BPServiceActor actor : bpServices) {
>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>> >>     }
>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>> >>
>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>> >>       // Keep things simple for now -- we can implement this at a
>>> later
>>> >> date.
>>> >>       throw new IOException(
>>> >>           "HA does not currently support adding a new standby to a
>>> running
>>> >> DN. " +
>>> >>           "Please do a rolling restart of DNs to reconfigure the list
>>> of
>>> >> NNs.");
>>> >>     }
>>> >>   }
>>> >>
>>> >> 3. If you're using automatic failover, you also need to update the
>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC will
>>> do
>>> >> gracefully fencing by sending RPC to the other NN.
>>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>> but I
>>> >> have not tried before.
>>> >>
>>> >>     Thus in general we may still have to restart all the services
>>> (except
>>> >> JNs) and update their configurations. But this may be a rolling
>>> restart
>>> >> process I guess:
>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>> restart
>>> >> of all the DN to update their configurations
>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>> their
>>> >> configuration. The new SBN should become active.
>>> >>
>>> >>     I have not tried the upper steps, thus please let me know if this
>>> >> works or not. And I think we should also document the correct steps in
>>> >> Apache. Could you please file an Apache jira?
>>> >>
>>> >> Thanks,
>>> >> -Jing
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>> discord@uw.edu>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>> configuration. I
>>> >>> believe the steps to achieve this would be something similar to:
>>> >>>
>>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>>> >>> rsync if the command fails.
>>> >>>
>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>> to the
>>> >>> new standby
>>> >>>
>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>> >>> standby.
>>> >>>
>>> >>> Start the replacment standby
>>> >>>
>>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>>> >>> configuration.
>>> >>>
>>> >>> I am not sure how to deal with the Journal switch, or if I am going
>>> about
>>> >>> this the right way. Can anybody give me some suggestions here?
>>> >>>
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Colin Williams
>>> >>>
>>> >>
>>> >>
>>> >> CONFIDENTIALITY NOTICE
>>> >> NOTICE: This message is intended for the use of the individual or
>>> entity
>>> >> to which it is addressed and may contain information that is
>>> confidential,
>>> >> privileged and exempt from disclosure under applicable law. If the
>>> reader of
>>> >> this message is not the intended recipient, you are hereby notified
>>> that any
>>> >> printing, copying, dissemination, distribution, disclosure or
>>> forwarding of
>>> >> this communication is strictly prohibited. If you have received this
>>> >> communication in error, please contact the sender immediately and
>>> delete it
>>> >> from your system. Thank You.
>>> >
>>> >
>>> >
>>>
>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Bryan Beaudreault <bb...@hubspot.com>.
This shouldn't have affected the journalnodes at all -- they are mostly
unaware of the zkfc and active/standby state.  Did you do something else
that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
or something else)

For your previous 2 emails, reporting errors/warns when doing -formatZK:

The WARN is fine.  It's true that you could get in a weird state if you had
multiple namenodes up.  But with just 1 namenode up, you should be safe.
What you are trying to avoid is a split brain or standby/standby state, but
that is impossible with just 1 namenode alive.  Similarly, the ERROR is a
sanity check to make sure you don't screw yourself by formatting a running
cluster.  I should have mentioned you need to use the -force argument to
get around that.


On Fri, Aug 1, 2014 at 12:35 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> However continuing with the process my QJM eventually error'd out and my
> Active NameNode went down.
>
> 2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
> 10.120.5.247:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
> 10.120.5.203:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
> 10.120.5.25:8485] client.QuorumJournalManager
> (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485
> failed to write txns 9635-9635. Will try to write to this JN again after
> the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
> is not the current writer epoch  0
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1224)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>  at com.sun.proxy.$Proxy9.journal(Unknown Source)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
>  at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
> namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
> Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
> 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
> stream=QuorumOutputStream starting at txid 9634))
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
> 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
>  at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
>  at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
> at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
>  at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> at
> org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
>  at
> org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
> at
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
>  at
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>  at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
>  at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
>  at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
>  at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
> at
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
>  at
> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
> 2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
> client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
> QuorumOutputStream starting at txid 9634
> 2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020] util.ExitUtil
> (ExitUtil.java:terminate(87)) - Exiting with status 1
> 2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
> (StringUtils.java:run(615)) - SHUTDOWN_MSG:
>
>
>
> On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> I tried a third time and it just worked?
>>
>> sudo hdfs zkfc -formatZK
>> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
>> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
>> for NameNode NameNode at rhel1.local/10.120.5.203:8020
>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
>> GMT
>> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
>> Corporation
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.home=/usr/java/jdk1.7.0_60/jre
>> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
>> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:java.library.path=//usr/lib/hadoop/lib/native
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
>> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:os.version=2.6.32-358.el6.x86_64
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:user.name=root
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client environment:user.home=/root
>> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
>> (Environment.java:logEnv(100)) - Client
>> environment:user.dir=/etc/hbase/conf.golden_apple
>> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
>> (ZooKeeper.java:<init>(433)) - Initiating client connection,
>> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
>> sessionTimeout=5000 watcher=null
>> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
>> socket connection to server rhel1.local/10.120.5.203:2181. Will not
>> attempt to authenticate using SASL (unknown error)
>> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
>> connection established to rhel1.local/10.120.5.203:2181, initiating
>> session
>> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
>> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
>> establishment complete on server rhel1.local/10.120.5.203:2181,
>> sessionid = 0x1478902fddc000a, negotiated timeout = 5000
>>  ===============================================
>> The configured parent znode /hadoop-ha/golden-apple already exists.
>> Are you sure you want to clear all failover information from
>> ZooKeeper?
>> WARNING: Before proceeding, ensure that all HDFS services and
>> failover controllers are stopped!
>> ===============================================
>> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
>> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
>> Y
>> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
>> /hadoop-ha/golden-apple from ZK...
>> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
>> /hadoop-ha/golden-apple from ZK.
>> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
>> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
>> /hadoop-ha/golden-apple in ZK.
>> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
>> (ClientCnxn.java:run(511)) - EventThread shut down
>> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
>> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>>
>>
>>
>> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>>
>>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>>
>>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
>>> wrote:
>>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>>> today.
>>> > Just thought I'd forward this info regarding swapping out the NameNode
>>> in a
>>> > QJM / HA configuration. See you around on #hbase. If you visit
>>> Seattle, feel
>>> > free to give me a shout out.
>>> >
>>> > ---------- Forwarded message ----------
>>> > From: Colin Kincaid Williams <di...@uw.edu>
>>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
>>> > configuration
>>> > To: user@hadoop.apache.org
>>> >
>>> >
>>> > Hi Jing,
>>> >
>>> > Thanks for the response. I will try this out, and file an Apache jira.
>>> >
>>> > Best,
>>> >
>>> > Colin Williams
>>> >
>>> >
>>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>>> wrote:
>>> >>
>>> >> Hi Colin,
>>> >>
>>> >>     I guess currently we may have to restart almost all the
>>> >> daemons/services in order to swap out a standby NameNode (SBN):
>>> >>
>>> >> 1. The current active NameNode (ANN) needs to know the new SBN since
>>> in
>>> >> the current implementation the SBN tries to send rollEditLog RPC
>>> request to
>>> >> ANN periodically (thus if a NN failover happens later, the original
>>> ANN
>>> >> needs to send this RPC to the correct NN).
>>> >> 2. Looks like the DataNode currently cannot do real refreshment for
>>> NN.
>>> >> Look at the code in BPOfferService:
>>> >>
>>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>>> >> IOException {
>>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>>> >>     for (BPServiceActor actor : bpServices) {
>>> >>       oldAddrs.add(actor.getNNSocketAddress());
>>> >>     }
>>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>>> >>
>>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>>> >>       // Keep things simple for now -- we can implement this at a
>>> later
>>> >> date.
>>> >>       throw new IOException(
>>> >>           "HA does not currently support adding a new standby to a
>>> running
>>> >> DN. " +
>>> >>           "Please do a rolling restart of DNs to reconfigure the list
>>> of
>>> >> NNs.");
>>> >>     }
>>> >>   }
>>> >>
>>> >> 3. If you're using automatic failover, you also need to update the
>>> >> configuration of the ZKFC on the current ANN machine, since ZKFC will
>>> do
>>> >> gracefully fencing by sending RPC to the other NN.
>>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>>> but I
>>> >> have not tried before.
>>> >>
>>> >>     Thus in general we may still have to restart all the services
>>> (except
>>> >> JNs) and update their configurations. But this may be a rolling
>>> restart
>>> >> process I guess:
>>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>>> restart
>>> >> of all the DN to update their configurations
>>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update
>>> their
>>> >> configuration. The new SBN should become active.
>>> >>
>>> >>     I have not tried the upper steps, thus please let me know if this
>>> >> works or not. And I think we should also document the correct steps in
>>> >> Apache. Could you please file an Apache jira?
>>> >>
>>> >> Thanks,
>>> >> -Jing
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>>> discord@uw.edu>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>>> configuration. I
>>> >>> believe the steps to achieve this would be something similar to:
>>> >>>
>>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>>> >>> rsync if the command fails.
>>> >>>
>>> >>> Somehow update the datanodes, so they push the heartbeat / journal
>>> to the
>>> >>> new standby
>>> >>>
>>> >>> Update the xml configuration on all nodes to reflect the replacment
>>> >>> standby.
>>> >>>
>>> >>> Start the replacment standby
>>> >>>
>>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>>> >>> configuration.
>>> >>>
>>> >>> I am not sure how to deal with the Journal switch, or if I am going
>>> about
>>> >>> this the right way. Can anybody give me some suggestions here?
>>> >>>
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Colin Williams
>>> >>>
>>> >>
>>> >>
>>> >> CONFIDENTIALITY NOTICE
>>> >> NOTICE: This message is intended for the use of the individual or
>>> entity
>>> >> to which it is addressed and may contain information that is
>>> confidential,
>>> >> privileged and exempt from disclosure under applicable law. If the
>>> reader of
>>> >> this message is not the intended recipient, you are hereby notified
>>> that any
>>> >> printing, copying, dissemination, distribution, disclosure or
>>> forwarding of
>>> >> this communication is strictly prohibited. If you have received this
>>> >> communication in error, please contact the sender immediately and
>>> delete it
>>> >> from your system. Thank You.
>>> >
>>> >
>>> >
>>>
>>
>>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
However continuing with the process my QJM eventually error'd out and my
Active NameNode went down.

2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
10.120.5.247:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
10.120.5.203:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
10.120.5.25:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
stream=QuorumOutputStream starting at txid 9634))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 2/3. 3 exceptions thrown:
10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
at
org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
at
org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
at
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
at
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
QuorumOutputStream starting at txid 9634
2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020] util.ExitUtil
(ExitUtil.java:terminate(87)) - Exiting with status 1
2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
(StringUtils.java:run(615)) - SHUTDOWN_MSG:



On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> I tried a third time and it just worked?
>
> sudo hdfs zkfc -formatZK
> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
> for NameNode NameNode at rhel1.local/10.120.5.203:8020
> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
> GMT
> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
> Corporation
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.home=/usr/java/jdk1.7.0_60/jre
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.library.path=//usr/lib/hadoop/lib/native
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:os.version=2.6.32-358.el6.x86_64
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.name=root
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.home=/root
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:user.dir=/etc/hbase/conf.golden_apple
> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:<init>(433)) - Initiating client connection,
> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
> sessionTimeout=5000 watcher=null
> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
> socket connection to server rhel1.local/10.120.5.203:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
> connection established to rhel1.local/10.120.5.203:2181, initiating
> session
> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
> establishment complete on server rhel1.local/10.120.5.203:2181, sessionid
> = 0x1478902fddc000a, negotiated timeout = 5000
> ===============================================
> The configured parent znode /hadoop-ha/golden-apple already exists.
> Are you sure you want to clear all failover information from
> ZooKeeper?
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
> ===============================================
> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
> Y
> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
> /hadoop-ha/golden-apple from ZK...
> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
> /hadoop-ha/golden-apple from ZK.
> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
> /hadoop-ha/golden-apple in ZK.
> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
> (ClientCnxn.java:run(511)) - EventThread shut down
> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>
>
>
> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>
>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>
>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>> today.
>> > Just thought I'd forward this info regarding swapping out the NameNode
>> in a
>> > QJM / HA configuration. See you around on #hbase. If you visit Seattle,
>> feel
>> > free to give me a shout out.
>> >
>> > ---------- Forwarded message ----------
>> > From: Colin Kincaid Williams <di...@uw.edu>
>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
>> > configuration
>> > To: user@hadoop.apache.org
>> >
>> >
>> > Hi Jing,
>> >
>> > Thanks for the response. I will try this out, and file an Apache jira.
>> >
>> > Best,
>> >
>> > Colin Williams
>> >
>> >
>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>> wrote:
>> >>
>> >> Hi Colin,
>> >>
>> >>     I guess currently we may have to restart almost all the
>> >> daemons/services in order to swap out a standby NameNode (SBN):
>> >>
>> >> 1. The current active NameNode (ANN) needs to know the new SBN since in
>> >> the current implementation the SBN tries to send rollEditLog RPC
>> request to
>> >> ANN periodically (thus if a NN failover happens later, the original ANN
>> >> needs to send this RPC to the correct NN).
>> >> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>> >> Look at the code in BPOfferService:
>> >>
>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>> >> IOException {
>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>> >>     for (BPServiceActor actor : bpServices) {
>> >>       oldAddrs.add(actor.getNNSocketAddress());
>> >>     }
>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>> >>
>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>> >>       // Keep things simple for now -- we can implement this at a later
>> >> date.
>> >>       throw new IOException(
>> >>           "HA does not currently support adding a new standby to a
>> running
>> >> DN. " +
>> >>           "Please do a rolling restart of DNs to reconfigure the list
>> of
>> >> NNs.");
>> >>     }
>> >>   }
>> >>
>> >> 3. If you're using automatic failover, you also need to update the
>> >> configuration of the ZKFC on the current ANN machine, since ZKFC will
>> do
>> >> gracefully fencing by sending RPC to the other NN.
>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>> but I
>> >> have not tried before.
>> >>
>> >>     Thus in general we may still have to restart all the services
>> (except
>> >> JNs) and update their configurations. But this may be a rolling restart
>> >> process I guess:
>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>> restart
>> >> of all the DN to update their configurations
>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>> >> configuration. The new SBN should become active.
>> >>
>> >>     I have not tried the upper steps, thus please let me know if this
>> >> works or not. And I think we should also document the correct steps in
>> >> Apache. Could you please file an Apache jira?
>> >>
>> >> Thanks,
>> >> -Jing
>> >>
>> >>
>> >>
>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>> discord@uw.edu>
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>> configuration. I
>> >>> believe the steps to achieve this would be something similar to:
>> >>>
>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>> >>> rsync if the command fails.
>> >>>
>> >>> Somehow update the datanodes, so they push the heartbeat / journal to
>> the
>> >>> new standby
>> >>>
>> >>> Update the xml configuration on all nodes to reflect the replacment
>> >>> standby.
>> >>>
>> >>> Start the replacment standby
>> >>>
>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>> >>> configuration.
>> >>>
>> >>> I am not sure how to deal with the Journal switch, or if I am going
>> about
>> >>> this the right way. Can anybody give me some suggestions here?
>> >>>
>> >>>
>> >>> Regards,
>> >>>
>> >>> Colin Williams
>> >>>
>> >>
>> >>
>> >> CONFIDENTIALITY NOTICE
>> >> NOTICE: This message is intended for the use of the individual or
>> entity
>> >> to which it is addressed and may contain information that is
>> confidential,
>> >> privileged and exempt from disclosure under applicable law. If the
>> reader of
>> >> this message is not the intended recipient, you are hereby notified
>> that any
>> >> printing, copying, dissemination, distribution, disclosure or
>> forwarding of
>> >> this communication is strictly prohibited. If you have received this
>> >> communication in error, please contact the sender immediately and
>> delete it
>> >> from your system. Thank You.
>> >
>> >
>> >
>>
>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
However continuing with the process my QJM eventually error'd out and my
Active NameNode went down.

2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
10.120.5.247:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
10.120.5.203:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
10.120.5.25:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
stream=QuorumOutputStream starting at txid 9634))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 2/3. 3 exceptions thrown:
10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
at
org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
at
org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
at
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
at
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
QuorumOutputStream starting at txid 9634
2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020] util.ExitUtil
(ExitUtil.java:terminate(87)) - Exiting with status 1
2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
(StringUtils.java:run(615)) - SHUTDOWN_MSG:



On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> I tried a third time and it just worked?
>
> sudo hdfs zkfc -formatZK
> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
> for NameNode NameNode at rhel1.local/10.120.5.203:8020
> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
> GMT
> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
> Corporation
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.home=/usr/java/jdk1.7.0_60/jre
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.library.path=//usr/lib/hadoop/lib/native
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:os.version=2.6.32-358.el6.x86_64
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.name=root
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.home=/root
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:user.dir=/etc/hbase/conf.golden_apple
> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:<init>(433)) - Initiating client connection,
> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
> sessionTimeout=5000 watcher=null
> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
> socket connection to server rhel1.local/10.120.5.203:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
> connection established to rhel1.local/10.120.5.203:2181, initiating
> session
> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
> establishment complete on server rhel1.local/10.120.5.203:2181, sessionid
> = 0x1478902fddc000a, negotiated timeout = 5000
> ===============================================
> The configured parent znode /hadoop-ha/golden-apple already exists.
> Are you sure you want to clear all failover information from
> ZooKeeper?
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
> ===============================================
> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
> Y
> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
> /hadoop-ha/golden-apple from ZK...
> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
> /hadoop-ha/golden-apple from ZK.
> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
> /hadoop-ha/golden-apple in ZK.
> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
> (ClientCnxn.java:run(511)) - EventThread shut down
> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>
>
>
> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>
>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>
>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>> today.
>> > Just thought I'd forward this info regarding swapping out the NameNode
>> in a
>> > QJM / HA configuration. See you around on #hbase. If you visit Seattle,
>> feel
>> > free to give me a shout out.
>> >
>> > ---------- Forwarded message ----------
>> > From: Colin Kincaid Williams <di...@uw.edu>
>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
>> > configuration
>> > To: user@hadoop.apache.org
>> >
>> >
>> > Hi Jing,
>> >
>> > Thanks for the response. I will try this out, and file an Apache jira.
>> >
>> > Best,
>> >
>> > Colin Williams
>> >
>> >
>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>> wrote:
>> >>
>> >> Hi Colin,
>> >>
>> >>     I guess currently we may have to restart almost all the
>> >> daemons/services in order to swap out a standby NameNode (SBN):
>> >>
>> >> 1. The current active NameNode (ANN) needs to know the new SBN since in
>> >> the current implementation the SBN tries to send rollEditLog RPC
>> request to
>> >> ANN periodically (thus if a NN failover happens later, the original ANN
>> >> needs to send this RPC to the correct NN).
>> >> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>> >> Look at the code in BPOfferService:
>> >>
>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>> >> IOException {
>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>> >>     for (BPServiceActor actor : bpServices) {
>> >>       oldAddrs.add(actor.getNNSocketAddress());
>> >>     }
>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>> >>
>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>> >>       // Keep things simple for now -- we can implement this at a later
>> >> date.
>> >>       throw new IOException(
>> >>           "HA does not currently support adding a new standby to a
>> running
>> >> DN. " +
>> >>           "Please do a rolling restart of DNs to reconfigure the list
>> of
>> >> NNs.");
>> >>     }
>> >>   }
>> >>
>> >> 3. If you're using automatic failover, you also need to update the
>> >> configuration of the ZKFC on the current ANN machine, since ZKFC will
>> do
>> >> gracefully fencing by sending RPC to the other NN.
>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>> but I
>> >> have not tried before.
>> >>
>> >>     Thus in general we may still have to restart all the services
>> (except
>> >> JNs) and update their configurations. But this may be a rolling restart
>> >> process I guess:
>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>> restart
>> >> of all the DN to update their configurations
>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>> >> configuration. The new SBN should become active.
>> >>
>> >>     I have not tried the upper steps, thus please let me know if this
>> >> works or not. And I think we should also document the correct steps in
>> >> Apache. Could you please file an Apache jira?
>> >>
>> >> Thanks,
>> >> -Jing
>> >>
>> >>
>> >>
>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>> discord@uw.edu>
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>> configuration. I
>> >>> believe the steps to achieve this would be something similar to:
>> >>>
>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>> >>> rsync if the command fails.
>> >>>
>> >>> Somehow update the datanodes, so they push the heartbeat / journal to
>> the
>> >>> new standby
>> >>>
>> >>> Update the xml configuration on all nodes to reflect the replacment
>> >>> standby.
>> >>>
>> >>> Start the replacment standby
>> >>>
>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>> >>> configuration.
>> >>>
>> >>> I am not sure how to deal with the Journal switch, or if I am going
>> about
>> >>> this the right way. Can anybody give me some suggestions here?
>> >>>
>> >>>
>> >>> Regards,
>> >>>
>> >>> Colin Williams
>> >>>
>> >>
>> >>
>> >> CONFIDENTIALITY NOTICE
>> >> NOTICE: This message is intended for the use of the individual or
>> entity
>> >> to which it is addressed and may contain information that is
>> confidential,
>> >> privileged and exempt from disclosure under applicable law. If the
>> reader of
>> >> this message is not the intended recipient, you are hereby notified
>> that any
>> >> printing, copying, dissemination, distribution, disclosure or
>> forwarding of
>> >> this communication is strictly prohibited. If you have received this
>> >> communication in error, please contact the sender immediately and
>> delete it
>> >> from your system. Thank You.
>> >
>> >
>> >
>>
>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
However continuing with the process my QJM eventually error'd out and my
Active NameNode went down.

2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
10.120.5.247:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
10.120.5.203:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
10.120.5.25:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
stream=QuorumOutputStream starting at txid 9634))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 2/3. 3 exceptions thrown:
10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
at
org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
at
org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
at
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
at
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
QuorumOutputStream starting at txid 9634
2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020] util.ExitUtil
(ExitUtil.java:terminate(87)) - Exiting with status 1
2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
(StringUtils.java:run(615)) - SHUTDOWN_MSG:



On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> I tried a third time and it just worked?
>
> sudo hdfs zkfc -formatZK
> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
> for NameNode NameNode at rhel1.local/10.120.5.203:8020
> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
> GMT
> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
> Corporation
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.home=/usr/java/jdk1.7.0_60/jre
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.library.path=//usr/lib/hadoop/lib/native
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:os.version=2.6.32-358.el6.x86_64
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.name=root
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.home=/root
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:user.dir=/etc/hbase/conf.golden_apple
> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:<init>(433)) - Initiating client connection,
> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
> sessionTimeout=5000 watcher=null
> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
> socket connection to server rhel1.local/10.120.5.203:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
> connection established to rhel1.local/10.120.5.203:2181, initiating
> session
> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
> establishment complete on server rhel1.local/10.120.5.203:2181, sessionid
> = 0x1478902fddc000a, negotiated timeout = 5000
> ===============================================
> The configured parent znode /hadoop-ha/golden-apple already exists.
> Are you sure you want to clear all failover information from
> ZooKeeper?
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
> ===============================================
> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
> Y
> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
> /hadoop-ha/golden-apple from ZK...
> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
> /hadoop-ha/golden-apple from ZK.
> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
> /hadoop-ha/golden-apple in ZK.
> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
> (ClientCnxn.java:run(511)) - EventThread shut down
> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>
>
>
> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>
>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>
>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>> today.
>> > Just thought I'd forward this info regarding swapping out the NameNode
>> in a
>> > QJM / HA configuration. See you around on #hbase. If you visit Seattle,
>> feel
>> > free to give me a shout out.
>> >
>> > ---------- Forwarded message ----------
>> > From: Colin Kincaid Williams <di...@uw.edu>
>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
>> > configuration
>> > To: user@hadoop.apache.org
>> >
>> >
>> > Hi Jing,
>> >
>> > Thanks for the response. I will try this out, and file an Apache jira.
>> >
>> > Best,
>> >
>> > Colin Williams
>> >
>> >
>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>> wrote:
>> >>
>> >> Hi Colin,
>> >>
>> >>     I guess currently we may have to restart almost all the
>> >> daemons/services in order to swap out a standby NameNode (SBN):
>> >>
>> >> 1. The current active NameNode (ANN) needs to know the new SBN since in
>> >> the current implementation the SBN tries to send rollEditLog RPC
>> request to
>> >> ANN periodically (thus if a NN failover happens later, the original ANN
>> >> needs to send this RPC to the correct NN).
>> >> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>> >> Look at the code in BPOfferService:
>> >>
>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>> >> IOException {
>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>> >>     for (BPServiceActor actor : bpServices) {
>> >>       oldAddrs.add(actor.getNNSocketAddress());
>> >>     }
>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>> >>
>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>> >>       // Keep things simple for now -- we can implement this at a later
>> >> date.
>> >>       throw new IOException(
>> >>           "HA does not currently support adding a new standby to a
>> running
>> >> DN. " +
>> >>           "Please do a rolling restart of DNs to reconfigure the list
>> of
>> >> NNs.");
>> >>     }
>> >>   }
>> >>
>> >> 3. If you're using automatic failover, you also need to update the
>> >> configuration of the ZKFC on the current ANN machine, since ZKFC will
>> do
>> >> gracefully fencing by sending RPC to the other NN.
>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>> but I
>> >> have not tried before.
>> >>
>> >>     Thus in general we may still have to restart all the services
>> (except
>> >> JNs) and update their configurations. But this may be a rolling restart
>> >> process I guess:
>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>> restart
>> >> of all the DN to update their configurations
>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>> >> configuration. The new SBN should become active.
>> >>
>> >>     I have not tried the upper steps, thus please let me know if this
>> >> works or not. And I think we should also document the correct steps in
>> >> Apache. Could you please file an Apache jira?
>> >>
>> >> Thanks,
>> >> -Jing
>> >>
>> >>
>> >>
>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>> discord@uw.edu>
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>> configuration. I
>> >>> believe the steps to achieve this would be something similar to:
>> >>>
>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>> >>> rsync if the command fails.
>> >>>
>> >>> Somehow update the datanodes, so they push the heartbeat / journal to
>> the
>> >>> new standby
>> >>>
>> >>> Update the xml configuration on all nodes to reflect the replacment
>> >>> standby.
>> >>>
>> >>> Start the replacment standby
>> >>>
>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>> >>> configuration.
>> >>>
>> >>> I am not sure how to deal with the Journal switch, or if I am going
>> about
>> >>> this the right way. Can anybody give me some suggestions here?
>> >>>
>> >>>
>> >>> Regards,
>> >>>
>> >>> Colin Williams
>> >>>
>> >>
>> >>
>> >> CONFIDENTIALITY NOTICE
>> >> NOTICE: This message is intended for the use of the individual or
>> entity
>> >> to which it is addressed and may contain information that is
>> confidential,
>> >> privileged and exempt from disclosure under applicable law. If the
>> reader of
>> >> this message is not the intended recipient, you are hereby notified
>> that any
>> >> printing, copying, dissemination, distribution, disclosure or
>> forwarding of
>> >> this communication is strictly prohibited. If you have received this
>> >> communication in error, please contact the sender immediately and
>> delete it
>> >> from your system. Thank You.
>> >
>> >
>> >
>>
>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
However continuing with the process my QJM eventually error'd out and my
Active NameNode went down.

2014-07-31 20:59:33,944 WARN  [Logger channel to rhel6.local/
10.120.5.247:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,954 WARN  [Logger channel to rhel1.local/
10.120.5.203:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,975 WARN  [Logger channel to rhel2.local/
10.120.5.25:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485 failed
to write txns 9635-9635. Will try to write to this JN again after the next
log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5
is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1224)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020]
namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) -
Error: flush failed for required journal (JournalAndStream(mgr=QJM to [
10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485],
stream=QuorumOutputStream starting at txid 9634))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 2/3. 3 exceptions thrown:
10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch  0
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
at
org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
at
org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734)
at
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129)
at
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
2014-07-31 20:59:33,976 WARN  [IPC Server handler 5 on 8020]
client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting
QuorumOutputStream starting at txid 9634
2014-07-31 20:59:33,978 INFO  [IPC Server handler 5 on 8020] util.ExitUtil
(ExitUtil.java:terminate(87)) - Exiting with status 1
2014-07-31 20:59:33,982 INFO  [Thread-0] namenode.NameNode
(StringUtils.java:run(615)) - SHUTDOWN_MSG:



On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> I tried a third time and it just worked?
>
> sudo hdfs zkfc -formatZK
> 2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
> (DFSZKFailoverController.java:<init>(140)) - Failover controller configured
> for NameNode NameNode at rhel1.local/10.120.5.203:8020
> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
> GMT
> 2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
> Corporation
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.home=/usr/java/jdk1.7.0_60/jre
> 2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
> 2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:java.library.path=//usr/lib/hadoop/lib/native
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
> 2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.name=Linux
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:os.version=2.6.32-358.el6.x86_64
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.name=root
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client environment:user.home=/root
> 2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
> (Environment.java:logEnv(100)) - Client
> environment:user.dir=/etc/hbase/conf.golden_apple
> 2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:<init>(433)) - Initiating client connection,
> connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
> sessionTimeout=5000 watcher=null
> 2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
> socket connection to server rhel1.local/10.120.5.203:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
> connection established to rhel1.local/10.120.5.203:2181, initiating
> session
> 2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
> zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
> establishment complete on server rhel1.local/10.120.5.203:2181, sessionid
> = 0x1478902fddc000a, negotiated timeout = 5000
> ===============================================
> The configured parent znode /hadoop-ha/golden-apple already exists.
> Are you sure you want to clear all failover information from
> ZooKeeper?
> WARNING: Before proceeding, ensure that all HDFS services and
> failover controllers are stopped!
> ===============================================
> Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
> 18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
> Y
> 2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
> /hadoop-ha/golden-apple from ZK...
> 2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
> /hadoop-ha/golden-apple from ZK.
> 2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
> (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
> /hadoop-ha/golden-apple in ZK.
> 2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
> (ClientCnxn.java:run(511)) - EventThread shut down
> 2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
> (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed
>
>
>
> On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:
>
>> Cheers. That's rough. We don't have that problem here at WanDISCO.
>>
>> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
>> wrote:
>> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
>> today.
>> > Just thought I'd forward this info regarding swapping out the NameNode
>> in a
>> > QJM / HA configuration. See you around on #hbase. If you visit Seattle,
>> feel
>> > free to give me a shout out.
>> >
>> > ---------- Forwarded message ----------
>> > From: Colin Kincaid Williams <di...@uw.edu>
>> > Date: Thu, Jul 31, 2014 at 12:35 PM
>> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
>> > configuration
>> > To: user@hadoop.apache.org
>> >
>> >
>> > Hi Jing,
>> >
>> > Thanks for the response. I will try this out, and file an Apache jira.
>> >
>> > Best,
>> >
>> > Colin Williams
>> >
>> >
>> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
>> wrote:
>> >>
>> >> Hi Colin,
>> >>
>> >>     I guess currently we may have to restart almost all the
>> >> daemons/services in order to swap out a standby NameNode (SBN):
>> >>
>> >> 1. The current active NameNode (ANN) needs to know the new SBN since in
>> >> the current implementation the SBN tries to send rollEditLog RPC
>> request to
>> >> ANN periodically (thus if a NN failover happens later, the original ANN
>> >> needs to send this RPC to the correct NN).
>> >> 2. Looks like the DataNode currently cannot do real refreshment for NN.
>> >> Look at the code in BPOfferService:
>> >>
>> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
>> >> IOException {
>> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>> >>     for (BPServiceActor actor : bpServices) {
>> >>       oldAddrs.add(actor.getNNSocketAddress());
>> >>     }
>> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>> >>
>> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>> >>       // Keep things simple for now -- we can implement this at a later
>> >> date.
>> >>       throw new IOException(
>> >>           "HA does not currently support adding a new standby to a
>> running
>> >> DN. " +
>> >>           "Please do a rolling restart of DNs to reconfigure the list
>> of
>> >> NNs.");
>> >>     }
>> >>   }
>> >>
>> >> 3. If you're using automatic failover, you also need to update the
>> >> configuration of the ZKFC on the current ANN machine, since ZKFC will
>> do
>> >> gracefully fencing by sending RPC to the other NN.
>> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
>> but I
>> >> have not tried before.
>> >>
>> >>     Thus in general we may still have to restart all the services
>> (except
>> >> JNs) and update their configurations. But this may be a rolling restart
>> >> process I guess:
>> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
>> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling
>> restart
>> >> of all the DN to update their configurations
>> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
>> >> configuration. The new SBN should become active.
>> >>
>> >>     I have not tried the upper steps, thus please let me know if this
>> >> works or not. And I think we should also document the correct steps in
>> >> Apache. Could you please file an Apache jira?
>> >>
>> >> Thanks,
>> >> -Jing
>> >>
>> >>
>> >>
>> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <
>> discord@uw.edu>
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I'm trying to swap out a standby NameNode in a QJM / HA
>> configuration. I
>> >>> believe the steps to achieve this would be something similar to:
>> >>>
>> >>> Use the Bootstrap standby command to prep the replacment standby. Or
>> >>> rsync if the command fails.
>> >>>
>> >>> Somehow update the datanodes, so they push the heartbeat / journal to
>> the
>> >>> new standby
>> >>>
>> >>> Update the xml configuration on all nodes to reflect the replacment
>> >>> standby.
>> >>>
>> >>> Start the replacment standby
>> >>>
>> >>> Use some hadoop command to refresh the datanodes to the new NameNode
>> >>> configuration.
>> >>>
>> >>> I am not sure how to deal with the Journal switch, or if I am going
>> about
>> >>> this the right way. Can anybody give me some suggestions here?
>> >>>
>> >>>
>> >>> Regards,
>> >>>
>> >>> Colin Williams
>> >>>
>> >>
>> >>
>> >> CONFIDENTIALITY NOTICE
>> >> NOTICE: This message is intended for the use of the individual or
>> entity
>> >> to which it is addressed and may contain information that is
>> confidential,
>> >> privileged and exempt from disclosure under applicable law. If the
>> reader of
>> >> this message is not the intended recipient, you are hereby notified
>> that any
>> >> printing, copying, dissemination, distribution, disclosure or
>> forwarding of
>> >> this communication is strictly prohibited. If you have received this
>> >> communication in error, please contact the sender immediately and
>> delete it
>> >> from your system. Thank You.
>> >
>> >
>> >
>>
>
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I tried a third time and it just worked?

sudo hdfs zkfc -formatZK
2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel1.local/10.120.5.203:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel1.local/10.120.5.203:2181, initiating session
2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
establishment complete on server rhel1.local/10.120.5.203:2181, sessionid =
0x1478902fddc000a, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/golden-apple already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
(ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
Y
2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
/hadoop-ha/golden-apple from ZK...
2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
/hadoop-ha/golden-apple from ZK.
2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
/hadoop-ha/golden-apple in ZK.
2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed



On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:

> Cheers. That's rough. We don't have that problem here at WanDISCO.
>
> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
> today.
> > Just thought I'd forward this info regarding swapping out the NameNode
> in a
> > QJM / HA configuration. See you around on #hbase. If you visit Seattle,
> feel
> > free to give me a shout out.
> >
> > ---------- Forwarded message ----------
> > From: Colin Kincaid Williams <di...@uw.edu>
> > Date: Thu, Jul 31, 2014 at 12:35 PM
> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
> > configuration
> > To: user@hadoop.apache.org
> >
> >
> > Hi Jing,
> >
> > Thanks for the response. I will try this out, and file an Apache jira.
> >
> > Best,
> >
> > Colin Williams
> >
> >
> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
> wrote:
> >>
> >> Hi Colin,
> >>
> >>     I guess currently we may have to restart almost all the
> >> daemons/services in order to swap out a standby NameNode (SBN):
> >>
> >> 1. The current active NameNode (ANN) needs to know the new SBN since in
> >> the current implementation the SBN tries to send rollEditLog RPC
> request to
> >> ANN periodically (thus if a NN failover happens later, the original ANN
> >> needs to send this RPC to the correct NN).
> >> 2. Looks like the DataNode currently cannot do real refreshment for NN.
> >> Look at the code in BPOfferService:
> >>
> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
> >> IOException {
> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
> >>     for (BPServiceActor actor : bpServices) {
> >>       oldAddrs.add(actor.getNNSocketAddress());
> >>     }
> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
> >>
> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
> >>       // Keep things simple for now -- we can implement this at a later
> >> date.
> >>       throw new IOException(
> >>           "HA does not currently support adding a new standby to a
> running
> >> DN. " +
> >>           "Please do a rolling restart of DNs to reconfigure the list of
> >> NNs.");
> >>     }
> >>   }
> >>
> >> 3. If you're using automatic failover, you also need to update the
> >> configuration of the ZKFC on the current ANN machine, since ZKFC will do
> >> gracefully fencing by sending RPC to the other NN.
> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
> but I
> >> have not tried before.
> >>
> >>     Thus in general we may still have to restart all the services
> (except
> >> JNs) and update their configurations. But this may be a rolling restart
> >> process I guess:
> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
> >> of all the DN to update their configurations
> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
> >> configuration. The new SBN should become active.
> >>
> >>     I have not tried the upper steps, thus please let me know if this
> >> works or not. And I think we should also document the correct steps in
> >> Apache. Could you please file an Apache jira?
> >>
> >> Thanks,
> >> -Jing
> >>
> >>
> >>
> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <discord@uw.edu
> >
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
> I
> >>> believe the steps to achieve this would be something similar to:
> >>>
> >>> Use the Bootstrap standby command to prep the replacment standby. Or
> >>> rsync if the command fails.
> >>>
> >>> Somehow update the datanodes, so they push the heartbeat / journal to
> the
> >>> new standby
> >>>
> >>> Update the xml configuration on all nodes to reflect the replacment
> >>> standby.
> >>>
> >>> Start the replacment standby
> >>>
> >>> Use some hadoop command to refresh the datanodes to the new NameNode
> >>> configuration.
> >>>
> >>> I am not sure how to deal with the Journal switch, or if I am going
> about
> >>> this the right way. Can anybody give me some suggestions here?
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Colin Williams
> >>>
> >>
> >>
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or entity
> >> to which it is addressed and may contain information that is
> confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader of
> >> this message is not the intended recipient, you are hereby notified
> that any
> >> printing, copying, dissemination, distribution, disclosure or
> forwarding of
> >> this communication is strictly prohibited. If you have received this
> >> communication in error, please contact the sender immediately and
> delete it
> >> from your system. Thank You.
> >
> >
> >
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I tried a third time and it just worked?

sudo hdfs zkfc -formatZK
2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel1.local/10.120.5.203:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel1.local/10.120.5.203:2181, initiating session
2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
establishment complete on server rhel1.local/10.120.5.203:2181, sessionid =
0x1478902fddc000a, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/golden-apple already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
(ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
Y
2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
/hadoop-ha/golden-apple from ZK...
2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
/hadoop-ha/golden-apple from ZK.
2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
/hadoop-ha/golden-apple in ZK.
2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed



On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:

> Cheers. That's rough. We don't have that problem here at WanDISCO.
>
> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
> today.
> > Just thought I'd forward this info regarding swapping out the NameNode
> in a
> > QJM / HA configuration. See you around on #hbase. If you visit Seattle,
> feel
> > free to give me a shout out.
> >
> > ---------- Forwarded message ----------
> > From: Colin Kincaid Williams <di...@uw.edu>
> > Date: Thu, Jul 31, 2014 at 12:35 PM
> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
> > configuration
> > To: user@hadoop.apache.org
> >
> >
> > Hi Jing,
> >
> > Thanks for the response. I will try this out, and file an Apache jira.
> >
> > Best,
> >
> > Colin Williams
> >
> >
> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
> wrote:
> >>
> >> Hi Colin,
> >>
> >>     I guess currently we may have to restart almost all the
> >> daemons/services in order to swap out a standby NameNode (SBN):
> >>
> >> 1. The current active NameNode (ANN) needs to know the new SBN since in
> >> the current implementation the SBN tries to send rollEditLog RPC
> request to
> >> ANN periodically (thus if a NN failover happens later, the original ANN
> >> needs to send this RPC to the correct NN).
> >> 2. Looks like the DataNode currently cannot do real refreshment for NN.
> >> Look at the code in BPOfferService:
> >>
> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
> >> IOException {
> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
> >>     for (BPServiceActor actor : bpServices) {
> >>       oldAddrs.add(actor.getNNSocketAddress());
> >>     }
> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
> >>
> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
> >>       // Keep things simple for now -- we can implement this at a later
> >> date.
> >>       throw new IOException(
> >>           "HA does not currently support adding a new standby to a
> running
> >> DN. " +
> >>           "Please do a rolling restart of DNs to reconfigure the list of
> >> NNs.");
> >>     }
> >>   }
> >>
> >> 3. If you're using automatic failover, you also need to update the
> >> configuration of the ZKFC on the current ANN machine, since ZKFC will do
> >> gracefully fencing by sending RPC to the other NN.
> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
> but I
> >> have not tried before.
> >>
> >>     Thus in general we may still have to restart all the services
> (except
> >> JNs) and update their configurations. But this may be a rolling restart
> >> process I guess:
> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
> >> of all the DN to update their configurations
> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
> >> configuration. The new SBN should become active.
> >>
> >>     I have not tried the upper steps, thus please let me know if this
> >> works or not. And I think we should also document the correct steps in
> >> Apache. Could you please file an Apache jira?
> >>
> >> Thanks,
> >> -Jing
> >>
> >>
> >>
> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <discord@uw.edu
> >
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
> I
> >>> believe the steps to achieve this would be something similar to:
> >>>
> >>> Use the Bootstrap standby command to prep the replacment standby. Or
> >>> rsync if the command fails.
> >>>
> >>> Somehow update the datanodes, so they push the heartbeat / journal to
> the
> >>> new standby
> >>>
> >>> Update the xml configuration on all nodes to reflect the replacment
> >>> standby.
> >>>
> >>> Start the replacment standby
> >>>
> >>> Use some hadoop command to refresh the datanodes to the new NameNode
> >>> configuration.
> >>>
> >>> I am not sure how to deal with the Journal switch, or if I am going
> about
> >>> this the right way. Can anybody give me some suggestions here?
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Colin Williams
> >>>
> >>
> >>
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or entity
> >> to which it is addressed and may contain information that is
> confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader of
> >> this message is not the intended recipient, you are hereby notified
> that any
> >> printing, copying, dissemination, distribution, disclosure or
> forwarding of
> >> this communication is strictly prohibited. If you have received this
> >> communication in error, please contact the sender immediately and
> delete it
> >> from your system. Thank You.
> >
> >
> >
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I tried a third time and it just worked?

sudo hdfs zkfc -formatZK
2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel1.local/10.120.5.203:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel1.local/10.120.5.203:2181, initiating session
2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
establishment complete on server rhel1.local/10.120.5.203:2181, sessionid =
0x1478902fddc000a, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/golden-apple already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
(ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
Y
2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
/hadoop-ha/golden-apple from ZK...
2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
/hadoop-ha/golden-apple from ZK.
2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
/hadoop-ha/golden-apple in ZK.
2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed



On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:

> Cheers. That's rough. We don't have that problem here at WanDISCO.
>
> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
> today.
> > Just thought I'd forward this info regarding swapping out the NameNode
> in a
> > QJM / HA configuration. See you around on #hbase. If you visit Seattle,
> feel
> > free to give me a shout out.
> >
> > ---------- Forwarded message ----------
> > From: Colin Kincaid Williams <di...@uw.edu>
> > Date: Thu, Jul 31, 2014 at 12:35 PM
> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
> > configuration
> > To: user@hadoop.apache.org
> >
> >
> > Hi Jing,
> >
> > Thanks for the response. I will try this out, and file an Apache jira.
> >
> > Best,
> >
> > Colin Williams
> >
> >
> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
> wrote:
> >>
> >> Hi Colin,
> >>
> >>     I guess currently we may have to restart almost all the
> >> daemons/services in order to swap out a standby NameNode (SBN):
> >>
> >> 1. The current active NameNode (ANN) needs to know the new SBN since in
> >> the current implementation the SBN tries to send rollEditLog RPC
> request to
> >> ANN periodically (thus if a NN failover happens later, the original ANN
> >> needs to send this RPC to the correct NN).
> >> 2. Looks like the DataNode currently cannot do real refreshment for NN.
> >> Look at the code in BPOfferService:
> >>
> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
> >> IOException {
> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
> >>     for (BPServiceActor actor : bpServices) {
> >>       oldAddrs.add(actor.getNNSocketAddress());
> >>     }
> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
> >>
> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
> >>       // Keep things simple for now -- we can implement this at a later
> >> date.
> >>       throw new IOException(
> >>           "HA does not currently support adding a new standby to a
> running
> >> DN. " +
> >>           "Please do a rolling restart of DNs to reconfigure the list of
> >> NNs.");
> >>     }
> >>   }
> >>
> >> 3. If you're using automatic failover, you also need to update the
> >> configuration of the ZKFC on the current ANN machine, since ZKFC will do
> >> gracefully fencing by sending RPC to the other NN.
> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
> but I
> >> have not tried before.
> >>
> >>     Thus in general we may still have to restart all the services
> (except
> >> JNs) and update their configurations. But this may be a rolling restart
> >> process I guess:
> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
> >> of all the DN to update their configurations
> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
> >> configuration. The new SBN should become active.
> >>
> >>     I have not tried the upper steps, thus please let me know if this
> >> works or not. And I think we should also document the correct steps in
> >> Apache. Could you please file an Apache jira?
> >>
> >> Thanks,
> >> -Jing
> >>
> >>
> >>
> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <discord@uw.edu
> >
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
> I
> >>> believe the steps to achieve this would be something similar to:
> >>>
> >>> Use the Bootstrap standby command to prep the replacment standby. Or
> >>> rsync if the command fails.
> >>>
> >>> Somehow update the datanodes, so they push the heartbeat / journal to
> the
> >>> new standby
> >>>
> >>> Update the xml configuration on all nodes to reflect the replacment
> >>> standby.
> >>>
> >>> Start the replacment standby
> >>>
> >>> Use some hadoop command to refresh the datanodes to the new NameNode
> >>> configuration.
> >>>
> >>> I am not sure how to deal with the Journal switch, or if I am going
> about
> >>> this the right way. Can anybody give me some suggestions here?
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Colin Williams
> >>>
> >>
> >>
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or entity
> >> to which it is addressed and may contain information that is
> confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader of
> >> this message is not the intended recipient, you are hereby notified
> that any
> >> printing, copying, dissemination, distribution, disclosure or
> forwarding of
> >> this communication is strictly prohibited. If you have received this
> >> communication in error, please contact the sender immediately and
> delete it
> >> from your system. Thank You.
> >
> >
> >
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
I tried a third time and it just worked?

sudo hdfs zkfc -formatZK
2014-07-31 18:07:51,595 INFO  [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:<init>(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13
GMT
2014-07-31 18:07:51,791 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle
Corporation
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.home=/usr/java/jdk1.7.0_60/jre
2014-07-31 18:07:51,792 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar
2014-07-31 18:07:51,793 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:java.library.path=//usr/lib/hadoop/lib/native
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2014-07-31 18:07:51,801 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.name=Linux
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:os.version=2.6.32-358.el6.x86_64
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.name=root
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client environment:user.home=/root
2014-07-31 18:07:51,802 INFO  [main] zookeeper.ZooKeeper
(Environment.java:logEnv(100)) - Client
environment:user.dir=/etc/hbase/conf.golden_apple
2014-07-31 18:07:51,813 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:<init>(433)) - Initiating client connection,
connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181
sessionTimeout=5000 watcher=null
2014-07-31 18:07:51,833 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening
socket connection to server rhel1.local/10.120.5.203:2181. Will not attempt
to authenticate using SASL (unknown error)
2014-07-31 18:07:51,844 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket
connection established to rhel1.local/10.120.5.203:2181, initiating session
2014-07-31 18:07:51,852 INFO  [main-SendThread(rhel1.local:2181)]
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session
establishment complete on server rhel1.local/10.120.5.203:2181, sessionid =
0x1478902fddc000a, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/golden-apple already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31
18:07:51,858 INFO  [main-EventThread] ha.ActiveStandbyElector
(ActiveStandbyElector.java:processWatchEvent(538)) - Session connected.
Y
2014-07-31 18:08:00,439 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting
/hadoop-ha/golden-apple from ZK...
2014-07-31 18:08:00,506 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted
/hadoop-ha/golden-apple from ZK.
2014-07-31 18:08:00,541 INFO  [main] ha.ActiveStandbyElector
(ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created
/hadoop-ha/golden-apple in ZK.
2014-07-31 18:08:00,545 INFO  [main-EventThread] zookeeper.ClientCnxn
(ClientCnxn.java:run(511)) - EventThread shut down
2014-07-31 18:08:00,545 INFO  [main] zookeeper.ZooKeeper
(ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed



On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <po...@gmail.com> wrote:

> Cheers. That's rough. We don't have that problem here at WanDISCO.
>
> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier
> today.
> > Just thought I'd forward this info regarding swapping out the NameNode
> in a
> > QJM / HA configuration. See you around on #hbase. If you visit Seattle,
> feel
> > free to give me a shout out.
> >
> > ---------- Forwarded message ----------
> > From: Colin Kincaid Williams <di...@uw.edu>
> > Date: Thu, Jul 31, 2014 at 12:35 PM
> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA
> > configuration
> > To: user@hadoop.apache.org
> >
> >
> > Hi Jing,
> >
> > Thanks for the response. I will try this out, and file an Apache jira.
> >
> > Best,
> >
> > Colin Williams
> >
> >
> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com>
> wrote:
> >>
> >> Hi Colin,
> >>
> >>     I guess currently we may have to restart almost all the
> >> daemons/services in order to swap out a standby NameNode (SBN):
> >>
> >> 1. The current active NameNode (ANN) needs to know the new SBN since in
> >> the current implementation the SBN tries to send rollEditLog RPC
> request to
> >> ANN periodically (thus if a NN failover happens later, the original ANN
> >> needs to send this RPC to the correct NN).
> >> 2. Looks like the DataNode currently cannot do real refreshment for NN.
> >> Look at the code in BPOfferService:
> >>
> >>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
> >> IOException {
> >>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
> >>     for (BPServiceActor actor : bpServices) {
> >>       oldAddrs.add(actor.getNNSocketAddress());
> >>     }
> >>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
> >>
> >>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
> >>       // Keep things simple for now -- we can implement this at a later
> >> date.
> >>       throw new IOException(
> >>           "HA does not currently support adding a new standby to a
> running
> >> DN. " +
> >>           "Please do a rolling restart of DNs to reconfigure the list of
> >> NNs.");
> >>     }
> >>   }
> >>
> >> 3. If you're using automatic failover, you also need to update the
> >> configuration of the ZKFC on the current ANN machine, since ZKFC will do
> >> gracefully fencing by sending RPC to the other NN.
> >> 4. Looks like we do not need to restart JournalNodes for the new SBN
> but I
> >> have not tried before.
> >>
> >>     Thus in general we may still have to restart all the services
> (except
> >> JNs) and update their configurations. But this may be a rolling restart
> >> process I guess:
> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
> >> of all the DN to update their configurations
> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
> >> configuration. The new SBN should become active.
> >>
> >>     I have not tried the upper steps, thus please let me know if this
> >> works or not. And I think we should also document the correct steps in
> >> Apache. Could you please file an Apache jira?
> >>
> >> Thanks,
> >> -Jing
> >>
> >>
> >>
> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <discord@uw.edu
> >
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm trying to swap out a standby NameNode in a QJM / HA configuration.
> I
> >>> believe the steps to achieve this would be something similar to:
> >>>
> >>> Use the Bootstrap standby command to prep the replacment standby. Or
> >>> rsync if the command fails.
> >>>
> >>> Somehow update the datanodes, so they push the heartbeat / journal to
> the
> >>> new standby
> >>>
> >>> Update the xml configuration on all nodes to reflect the replacment
> >>> standby.
> >>>
> >>> Start the replacment standby
> >>>
> >>> Use some hadoop command to refresh the datanodes to the new NameNode
> >>> configuration.
> >>>
> >>> I am not sure how to deal with the Journal switch, or if I am going
> about
> >>> this the right way. Can anybody give me some suggestions here?
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Colin Williams
> >>>
> >>
> >>
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or entity
> >> to which it is addressed and may contain information that is
> confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader of
> >> this message is not the intended recipient, you are hereby notified
> that any
> >> printing, copying, dissemination, distribution, disclosure or
> forwarding of
> >> this communication is strictly prohibited. If you have received this
> >> communication in error, please contact the sender immediately and
> delete it
> >> from your system. Thank You.
> >
> >
> >
>

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Hi Jing,

Thanks for the response. I will try this out, and file an Apache jira.

Best,

Colin Williams


On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:

> Hi Colin,
>
>     I guess currently we may have to restart almost all the
> daemons/services in order to swap out a standby NameNode (SBN):
>
> 1. The current active NameNode (ANN) needs to know the new SBN since in
> the current implementation the SBN tries to send rollEditLog RPC request to
> ANN periodically (thus if a NN failover happens later, the original ANN
> needs to send this RPC to the correct NN).
> 2. Looks like the DataNode currently cannot do real refreshment for NN.
> Look at the code in BPOfferService:
>
>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
> IOException {
>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>     for (BPServiceActor actor : bpServices) {
>       oldAddrs.add(actor.getNNSocketAddress());
>     }
>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>
>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>       // Keep things simple for now -- we can implement this at a later
> date.
>       throw new IOException(
>           "HA does not currently support adding a new standby to a running
> DN. " +
>           "Please do a rolling restart of DNs to reconfigure the list of
> NNs.");
>     }
>   }
>
> 3. If you're using automatic failover, you also need to update the
> configuration of the ZKFC on the current ANN machine, since ZKFC will do
> gracefully fencing by sending RPC to the other NN.
> 4. Looks like we do not need to restart JournalNodes for the new SBN but I
> have not tried before.
>
>     Thus in general we may still have to restart all the services (except
> JNs) and update their configurations. But this may be a rolling restart
> process I guess:
> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
> of all the DN to update their configurations
> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
> configuration. The new SBN should become active.
>
>     I have not tried the upper steps, thus please let me know if this
> works or not. And I think we should also document the correct steps in
> Apache. Could you please file an Apache jira?
>
> Thanks,
> -Jing
>
>
>
> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Hello,
>>
>> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
>> believe the steps to achieve this would be something similar to:
>>
>> Use the Bootstrap standby command to prep the replacment standby. Or
>> rsync if the command fails.
>>
>> Somehow update the datanodes, so they push the heartbeat / journal to the
>> new standby
>>
>> Update the xml configuration on all nodes to reflect the replacment
>> standby.
>>
>> Start the replacment standby
>>
>> Use some hadoop command to refresh the datanodes to the new NameNode
>> configuration.
>>
>> I am not sure how to deal with the Journal switch, or if I am going about
>> this the right way. Can anybody give me some suggestions here?
>>
>>
>> Regards,
>>
>> Colin Williams
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Hi Jing,

Thanks for the response. I will try this out, and file an Apache jira.

Best,

Colin Williams


On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:

> Hi Colin,
>
>     I guess currently we may have to restart almost all the
> daemons/services in order to swap out a standby NameNode (SBN):
>
> 1. The current active NameNode (ANN) needs to know the new SBN since in
> the current implementation the SBN tries to send rollEditLog RPC request to
> ANN periodically (thus if a NN failover happens later, the original ANN
> needs to send this RPC to the correct NN).
> 2. Looks like the DataNode currently cannot do real refreshment for NN.
> Look at the code in BPOfferService:
>
>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
> IOException {
>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>     for (BPServiceActor actor : bpServices) {
>       oldAddrs.add(actor.getNNSocketAddress());
>     }
>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>
>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>       // Keep things simple for now -- we can implement this at a later
> date.
>       throw new IOException(
>           "HA does not currently support adding a new standby to a running
> DN. " +
>           "Please do a rolling restart of DNs to reconfigure the list of
> NNs.");
>     }
>   }
>
> 3. If you're using automatic failover, you also need to update the
> configuration of the ZKFC on the current ANN machine, since ZKFC will do
> gracefully fencing by sending RPC to the other NN.
> 4. Looks like we do not need to restart JournalNodes for the new SBN but I
> have not tried before.
>
>     Thus in general we may still have to restart all the services (except
> JNs) and update their configurations. But this may be a rolling restart
> process I guess:
> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
> of all the DN to update their configurations
> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
> configuration. The new SBN should become active.
>
>     I have not tried the upper steps, thus please let me know if this
> works or not. And I think we should also document the correct steps in
> Apache. Could you please file an Apache jira?
>
> Thanks,
> -Jing
>
>
>
> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Hello,
>>
>> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
>> believe the steps to achieve this would be something similar to:
>>
>> Use the Bootstrap standby command to prep the replacment standby. Or
>> rsync if the command fails.
>>
>> Somehow update the datanodes, so they push the heartbeat / journal to the
>> new standby
>>
>> Update the xml configuration on all nodes to reflect the replacment
>> standby.
>>
>> Start the replacment standby
>>
>> Use some hadoop command to refresh the datanodes to the new NameNode
>> configuration.
>>
>> I am not sure how to deal with the Journal switch, or if I am going about
>> this the right way. Can anybody give me some suggestions here?
>>
>>
>> Regards,
>>
>> Colin Williams
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Hi Jing,

Thanks for the response. I will try this out, and file an Apache jira.

Best,

Colin Williams


On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:

> Hi Colin,
>
>     I guess currently we may have to restart almost all the
> daemons/services in order to swap out a standby NameNode (SBN):
>
> 1. The current active NameNode (ANN) needs to know the new SBN since in
> the current implementation the SBN tries to send rollEditLog RPC request to
> ANN periodically (thus if a NN failover happens later, the original ANN
> needs to send this RPC to the correct NN).
> 2. Looks like the DataNode currently cannot do real refreshment for NN.
> Look at the code in BPOfferService:
>
>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
> IOException {
>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>     for (BPServiceActor actor : bpServices) {
>       oldAddrs.add(actor.getNNSocketAddress());
>     }
>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>
>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>       // Keep things simple for now -- we can implement this at a later
> date.
>       throw new IOException(
>           "HA does not currently support adding a new standby to a running
> DN. " +
>           "Please do a rolling restart of DNs to reconfigure the list of
> NNs.");
>     }
>   }
>
> 3. If you're using automatic failover, you also need to update the
> configuration of the ZKFC on the current ANN machine, since ZKFC will do
> gracefully fencing by sending RPC to the other NN.
> 4. Looks like we do not need to restart JournalNodes for the new SBN but I
> have not tried before.
>
>     Thus in general we may still have to restart all the services (except
> JNs) and update their configurations. But this may be a rolling restart
> process I guess:
> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
> of all the DN to update their configurations
> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
> configuration. The new SBN should become active.
>
>     I have not tried the upper steps, thus please let me know if this
> works or not. And I think we should also document the correct steps in
> Apache. Could you please file an Apache jira?
>
> Thanks,
> -Jing
>
>
>
> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Hello,
>>
>> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
>> believe the steps to achieve this would be something similar to:
>>
>> Use the Bootstrap standby command to prep the replacment standby. Or
>> rsync if the command fails.
>>
>> Somehow update the datanodes, so they push the heartbeat / journal to the
>> new standby
>>
>> Update the xml configuration on all nodes to reflect the replacment
>> standby.
>>
>> Start the replacment standby
>>
>> Use some hadoop command to refresh the datanodes to the new NameNode
>> configuration.
>>
>> I am not sure how to deal with the Journal switch, or if I am going about
>> this the right way. Can anybody give me some suggestions here?
>>
>>
>> Regards,
>>
>> Colin Williams
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Colin Kincaid Williams <di...@uw.edu>.
Hi Jing,

Thanks for the response. I will try this out, and file an Apache jira.

Best,

Colin Williams


On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <ji...@hortonworks.com> wrote:

> Hi Colin,
>
>     I guess currently we may have to restart almost all the
> daemons/services in order to swap out a standby NameNode (SBN):
>
> 1. The current active NameNode (ANN) needs to know the new SBN since in
> the current implementation the SBN tries to send rollEditLog RPC request to
> ANN periodically (thus if a NN failover happens later, the original ANN
> needs to send this RPC to the correct NN).
> 2. Looks like the DataNode currently cannot do real refreshment for NN.
> Look at the code in BPOfferService:
>
>   void refreshNNList(ArrayList<InetSocketAddress> addrs) throws
> IOException {
>     Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
>     for (BPServiceActor actor : bpServices) {
>       oldAddrs.add(actor.getNNSocketAddress());
>     }
>     Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
>
>     if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
>       // Keep things simple for now -- we can implement this at a later
> date.
>       throw new IOException(
>           "HA does not currently support adding a new standby to a running
> DN. " +
>           "Please do a rolling restart of DNs to reconfigure the list of
> NNs.");
>     }
>   }
>
> 3. If you're using automatic failover, you also need to update the
> configuration of the ZKFC on the current ANN machine, since ZKFC will do
> gracefully fencing by sending RPC to the other NN.
> 4. Looks like we do not need to restart JournalNodes for the new SBN but I
> have not tried before.
>
>     Thus in general we may still have to restart all the services (except
> JNs) and update their configurations. But this may be a rolling restart
> process I guess:
> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
> 2. Keep the ANN and its corresponding ZKFC running, do a rolling restart
> of all the DN to update their configurations
> 3. After restarting all the DN, stop ANN and the ZKFC, and update their
> configuration. The new SBN should become active.
>
>     I have not tried the upper steps, thus please let me know if this
> works or not. And I think we should also document the correct steps in
> Apache. Could you please file an Apache jira?
>
> Thanks,
> -Jing
>
>
>
> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
> wrote:
>
>> Hello,
>>
>> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
>> believe the steps to achieve this would be something similar to:
>>
>> Use the Bootstrap standby command to prep the replacment standby. Or
>> rsync if the command fails.
>>
>> Somehow update the datanodes, so they push the heartbeat / journal to the
>> new standby
>>
>> Update the xml configuration on all nodes to reflect the replacment
>> standby.
>>
>> Start the replacment standby
>>
>> Use some hadoop command to refresh the datanodes to the new NameNode
>> configuration.
>>
>> I am not sure how to deal with the Journal switch, or if I am going about
>> this the right way. Can anybody give me some suggestions here?
>>
>>
>> Regards,
>>
>> Colin Williams
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Jing Zhao <ji...@hortonworks.com>.
Hi Colin,

    I guess currently we may have to restart almost all the
daemons/services in order to swap out a standby NameNode (SBN):

1. The current active NameNode (ANN) needs to know the new SBN since in the
current implementation the SBN tries to send rollEditLog RPC request to ANN
periodically (thus if a NN failover happens later, the original ANN needs
to send this RPC to the correct NN).
2. Looks like the DataNode currently cannot do real refreshment for NN.
Look at the code in BPOfferService:

  void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException
{
    Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
    for (BPServiceActor actor : bpServices) {
      oldAddrs.add(actor.getNNSocketAddress());
    }
    Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);

    if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
      // Keep things simple for now -- we can implement this at a later
date.
      throw new IOException(
          "HA does not currently support adding a new standby to a running
DN. " +
          "Please do a rolling restart of DNs to reconfigure the list of
NNs.");
    }
  }

3. If you're using automatic failover, you also need to update the
configuration of the ZKFC on the current ANN machine, since ZKFC will do
gracefully fencing by sending RPC to the other NN.
4. Looks like we do not need to restart JournalNodes for the new SBN but I
have not tried before.

    Thus in general we may still have to restart all the services (except
JNs) and update their configurations. But this may be a rolling restart
process I guess:
1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
2. Keep the ANN and its corresponding ZKFC running, do a rolling restart of
all the DN to update their configurations
3. After restarting all the DN, stop ANN and the ZKFC, and update their
configuration. The new SBN should become active.

    I have not tried the upper steps, thus please let me know if this works
or not. And I think we should also document the correct steps in Apache.
Could you please file an Apache jira?

Thanks,
-Jing



On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Hello,
>
> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
> believe the steps to achieve this would be something similar to:
>
> Use the Bootstrap standby command to prep the replacment standby. Or rsync
> if the command fails.
>
> Somehow update the datanodes, so they push the heartbeat / journal to the
> new standby
>
> Update the xml configuration on all nodes to reflect the replacment
> standby.
>
> Start the replacment standby
>
> Use some hadoop command to refresh the datanodes to the new NameNode
> configuration.
>
> I am not sure how to deal with the Journal switch, or if I am going about
> this the right way. Can anybody give me some suggestions here?
>
>
> Regards,
>
> Colin Williams
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Jing Zhao <ji...@hortonworks.com>.
Hi Colin,

    I guess currently we may have to restart almost all the
daemons/services in order to swap out a standby NameNode (SBN):

1. The current active NameNode (ANN) needs to know the new SBN since in the
current implementation the SBN tries to send rollEditLog RPC request to ANN
periodically (thus if a NN failover happens later, the original ANN needs
to send this RPC to the correct NN).
2. Looks like the DataNode currently cannot do real refreshment for NN.
Look at the code in BPOfferService:

  void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException
{
    Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
    for (BPServiceActor actor : bpServices) {
      oldAddrs.add(actor.getNNSocketAddress());
    }
    Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);

    if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
      // Keep things simple for now -- we can implement this at a later
date.
      throw new IOException(
          "HA does not currently support adding a new standby to a running
DN. " +
          "Please do a rolling restart of DNs to reconfigure the list of
NNs.");
    }
  }

3. If you're using automatic failover, you also need to update the
configuration of the ZKFC on the current ANN machine, since ZKFC will do
gracefully fencing by sending RPC to the other NN.
4. Looks like we do not need to restart JournalNodes for the new SBN but I
have not tried before.

    Thus in general we may still have to restart all the services (except
JNs) and update their configurations. But this may be a rolling restart
process I guess:
1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
2. Keep the ANN and its corresponding ZKFC running, do a rolling restart of
all the DN to update their configurations
3. After restarting all the DN, stop ANN and the ZKFC, and update their
configuration. The new SBN should become active.

    I have not tried the upper steps, thus please let me know if this works
or not. And I think we should also document the correct steps in Apache.
Could you please file an Apache jira?

Thanks,
-Jing



On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Hello,
>
> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
> believe the steps to achieve this would be something similar to:
>
> Use the Bootstrap standby command to prep the replacment standby. Or rsync
> if the command fails.
>
> Somehow update the datanodes, so they push the heartbeat / journal to the
> new standby
>
> Update the xml configuration on all nodes to reflect the replacment
> standby.
>
> Start the replacment standby
>
> Use some hadoop command to refresh the datanodes to the new NameNode
> configuration.
>
> I am not sure how to deal with the Journal switch, or if I am going about
> this the right way. Can anybody give me some suggestions here?
>
>
> Regards,
>
> Colin Williams
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Jing Zhao <ji...@hortonworks.com>.
Hi Colin,

    I guess currently we may have to restart almost all the
daemons/services in order to swap out a standby NameNode (SBN):

1. The current active NameNode (ANN) needs to know the new SBN since in the
current implementation the SBN tries to send rollEditLog RPC request to ANN
periodically (thus if a NN failover happens later, the original ANN needs
to send this RPC to the correct NN).
2. Looks like the DataNode currently cannot do real refreshment for NN.
Look at the code in BPOfferService:

  void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException
{
    Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
    for (BPServiceActor actor : bpServices) {
      oldAddrs.add(actor.getNNSocketAddress());
    }
    Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);

    if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
      // Keep things simple for now -- we can implement this at a later
date.
      throw new IOException(
          "HA does not currently support adding a new standby to a running
DN. " +
          "Please do a rolling restart of DNs to reconfigure the list of
NNs.");
    }
  }

3. If you're using automatic failover, you also need to update the
configuration of the ZKFC on the current ANN machine, since ZKFC will do
gracefully fencing by sending RPC to the other NN.
4. Looks like we do not need to restart JournalNodes for the new SBN but I
have not tried before.

    Thus in general we may still have to restart all the services (except
JNs) and update their configurations. But this may be a rolling restart
process I guess:
1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
2. Keep the ANN and its corresponding ZKFC running, do a rolling restart of
all the DN to update their configurations
3. After restarting all the DN, stop ANN and the ZKFC, and update their
configuration. The new SBN should become active.

    I have not tried the upper steps, thus please let me know if this works
or not. And I think we should also document the correct steps in Apache.
Could you please file an Apache jira?

Thanks,
-Jing



On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Hello,
>
> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
> believe the steps to achieve this would be something similar to:
>
> Use the Bootstrap standby command to prep the replacment standby. Or rsync
> if the command fails.
>
> Somehow update the datanodes, so they push the heartbeat / journal to the
> new standby
>
> Update the xml configuration on all nodes to reflect the replacment
> standby.
>
> Start the replacment standby
>
> Use some hadoop command to refresh the datanodes to the new NameNode
> configuration.
>
> I am not sure how to deal with the Journal switch, or if I am going about
> this the right way. Can anybody give me some suggestions here?
>
>
> Regards,
>
> Colin Williams
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

Posted by Jing Zhao <ji...@hortonworks.com>.
Hi Colin,

    I guess currently we may have to restart almost all the
daemons/services in order to swap out a standby NameNode (SBN):

1. The current active NameNode (ANN) needs to know the new SBN since in the
current implementation the SBN tries to send rollEditLog RPC request to ANN
periodically (thus if a NN failover happens later, the original ANN needs
to send this RPC to the correct NN).
2. Looks like the DataNode currently cannot do real refreshment for NN.
Look at the code in BPOfferService:

  void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException
{
    Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
    for (BPServiceActor actor : bpServices) {
      oldAddrs.add(actor.getNNSocketAddress());
    }
    Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);

    if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) {
      // Keep things simple for now -- we can implement this at a later
date.
      throw new IOException(
          "HA does not currently support adding a new standby to a running
DN. " +
          "Please do a rolling restart of DNs to reconfigure the list of
NNs.");
    }
  }

3. If you're using automatic failover, you also need to update the
configuration of the ZKFC on the current ANN machine, since ZKFC will do
gracefully fencing by sending RPC to the other NN.
4. Looks like we do not need to restart JournalNodes for the new SBN but I
have not tried before.

    Thus in general we may still have to restart all the services (except
JNs) and update their configurations. But this may be a rolling restart
process I guess:
1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN.
2. Keep the ANN and its corresponding ZKFC running, do a rolling restart of
all the DN to update their configurations
3. After restarting all the DN, stop ANN and the ZKFC, and update their
configuration. The new SBN should become active.

    I have not tried the upper steps, thus please let me know if this works
or not. And I think we should also document the correct steps in Apache.
Could you please file an Apache jira?

Thanks,
-Jing



On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams <di...@uw.edu>
wrote:

> Hello,
>
> I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
> believe the steps to achieve this would be something similar to:
>
> Use the Bootstrap standby command to prep the replacment standby. Or rsync
> if the command fails.
>
> Somehow update the datanodes, so they push the heartbeat / journal to the
> new standby
>
> Update the xml configuration on all nodes to reflect the replacment
> standby.
>
> Start the replacment standby
>
> Use some hadoop command to refresh the datanodes to the new NameNode
> configuration.
>
> I am not sure how to deal with the Journal switch, or if I am going about
> this the right way. Can anybody give me some suggestions here?
>
>
> Regards,
>
> Colin Williams
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.