You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by 清如许 <47...@qq.com> on 2014/09/28 20:56:36 UTC

Failed to active namenode when config HA

Hi,

I'm new to hadoop and meet some problems when config HA.
Below are some important configuration in core-site.xml

  <property>
    <name>dfs.nameservices</name>
    <value>ns1,ns2</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn1,nn3</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.ns2</name>
    <value>nn2,nn4</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn1</name>
    <value>namenode1:9000</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1.nn3</name>
    <value>namenode3:9000</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns2.nn2</name>
    <value>namenode2:9000</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns2.nn4</name>
    <value>namenode4:9000</value>
  </property>
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
  </property>
  <property>
    <name>dfs.client.failover.proxy.provider.ns1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hduser/.ssh/id_rsa</value>
  </property>
  <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
  </property>
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hduser/mydata/hdfs/journalnode</value>
  </property>

(two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)

After configuration, I did the following steps
firstly,  I start jornalnode on datanode2,datanode3,datanode4
secondly I format datanode1 and start namenode on it
then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it

Everything seems fine unless no namenode is active now, then i tried to active one by running 
hdfs haadmin -transitionToActive nn1 on namenode1
but strangely it says "Illegal argument: Unable to determine the nameservice id."

Could anyone tell me why it cannot determine nn1 from my configuration?
Is there something wrong in my configuraion?

Thanks a lot!!!

RE: RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.
Can you please send out ZKFC logs and configurations..




Thanks & Regards



Brahma Reddy Battula




________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 2:20 PM
To: user
Subject: Re: RE: Failed to active namenode when config HA

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>;
Subject:  RE: Failed to active namenode when config HA

You need to start the ZKFC process which will monitor and manage  the state of namenode.





Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.



Please go through following link for more details..


http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html




Thanks & Regards



Brahma Reddy Battula



________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

RE: RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.
Can you please send out ZKFC logs and configurations..




Thanks & Regards



Brahma Reddy Battula




________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 2:20 PM
To: user
Subject: Re: RE: Failed to active namenode when config HA

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>;
Subject:  RE: Failed to active namenode when config HA

You need to start the ZKFC process which will monitor and manage  the state of namenode.





Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.



Please go through following link for more details..


http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html




Thanks & Regards



Brahma Reddy Battula



________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

RE: RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.
Can you please send out ZKFC logs and configurations..




Thanks & Regards



Brahma Reddy Battula




________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 2:20 PM
To: user
Subject: Re: RE: Failed to active namenode when config HA

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>;
Subject:  RE: Failed to active namenode when config HA

You need to start the ZKFC process which will monitor and manage  the state of namenode.





Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.



Please go through following link for more details..


http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html




Thanks & Regards



Brahma Reddy Battula



________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

RE: RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.
Can you please send out ZKFC logs and configurations..




Thanks & Regards



Brahma Reddy Battula




________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 2:20 PM
To: user
Subject: Re: RE: Failed to active namenode when config HA

Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)

------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>;
Subject:  RE: Failed to active namenode when config HA

You need to start the ZKFC process which will monitor and manage  the state of namenode.





Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.



Please go through following link for more details..


http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html




Thanks & Regards



Brahma Reddy Battula



________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

Re: RE: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.
Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)


------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>; 

Subject:  RE:  Failed to active namenode when config HA



 You need to start the ZKFC process which will monitor and manage  the state of namenode.
 
 
 
 
  
Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).
 
Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation  of automatic HDFS failover relies on ZooKeeper for the following things:
  
 Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover  should be triggered.

 Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating  that it should become the next active.
 
The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs  a ZKFC, and that ZKFC is responsible for:
  
 Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node  healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

 ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's  support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.

 ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the  election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active  state.
 
 
 
 Please go through following link for more details..
 
 
 http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
 
 
 
  
Thanks & Regards
 
 
 
Brahma Reddy Battula
 
 
 
 From: 清如许 [475053586@qq.com]
 Sent: Tuesday, September 30, 2014 8:54 AM
 To: user
 Subject: Re: Failed to active namenode when config HA
 
 
 Hi, Matt
 
 Thank you very much for your response!
 
 There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
 
 There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
 <name>dfs.nameservices</name>
 and each nameservice will have two namenodes.
 
 If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
 
 But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active  state, use command
 hdfs haadmin -transitionToActive nn1
 HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
 
 Do you used to configure HA&Federation and know what may cause these problem?
 
 Thanks,
 Lucy
  
 
 ------------------ Original ------------------
  From:  "Matt Narrell";<ma...@gmail.com>;
 Send time: Monday, Sep 29, 2014 6:28 AM
 To: "user"<us...@hadoop.apache.org>; 
 Subject:  Re: Failed to active namenode when config HA
 
 
 
 I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
 
 Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
 
 <?xml version="1.0"?>
 <configuration>
   <property>
     <name>dfs.replication</name>
     <value>3</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/var/data/hadoop/hdfs/nn</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/var/data/hadoop/hdfs/dn</value>
   </property>
 
     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.nameservices</name>
       <value>hdfs-cluster</value>
     </property>
 
     <property>
       <name>dfs.ha.namenodes.hdfs-cluster</name>
       <value>nn1,nn2</value>
     </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
         <value>namenode1:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
         <value>namenode1:50070</value>
       </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
         <value>namenode2:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
         <value>namenode2:50070</value>
       </property>
 
     <property>
       <name>dfs.namenode.shared.edits.dir</name>
       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
     </property>
 
     <property>
       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>
 
     <property>
       <name>dfs.ha.fencing.methods</name>
       <value>sshfence</value>
     </property>
     <property>
       <name>dfs.ha.fencing.ssh.private-key-files</name>
       <value>/home/hadoop/.ssh/id_rsa</value>
     </property>
 </configuration>
 
 mn
 
 On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
 
 > Hi,
 > 
 > I'm new to hadoop and meet some problems when config HA.
 > Below are some important configuration in core-site.xml
 > 
 >   <property>
 >     <name>dfs.nameservices</name>
 >     <value>ns1,ns2</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns1</name>
 >     <value>nn1,nn3</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns2</name>
 >     <value>nn2,nn4</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
 >     <value>namenode1:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
 >     <value>namenode3:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
 >     <value>namenode2:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
 >     <value>namenode4:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.shared.edits.dir</name>
 >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
 >   </property>
 >   <property>
 >     <name>dfs.client.failover.proxy.provider.ns1</name>
 >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.methods</name>
 >     <value>sshfence</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.private-key-files</name>
 >     <value>/home/hduser/.ssh/id_rsa</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
 >     <value>30000</value>
 >   </property>
 >   <property>
 >     <name>dfs.journalnode.edits.dir</name>
 >     <value>/home/hduser/mydata/hdfs/journalnode</value>
 >   </property>
 > 
 > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
 > 
 > After configuration, I did the following steps
 > firstly,  I start jornalnode on datanode2,datanode3,datanode4
 > secondly I format datanode1 and start namenode on it
 > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
 > 
 > Everything seems fine unless no namenode is active now, then i tried to active one by running 
 > hdfs haadmin -transitionToActive nn1 on namenode1
 > but strangely it says "Illegal argument: Unable to determine the nameservice id."
 > 
 > Could anyone tell me why it cannot determine nn1 from my configuration?
 > Is there something wrong in my configuraion?
 > 
 > Thanks a lot!!!
 > 
 >

Re: RE: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.
Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)


------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>; 

Subject:  RE:  Failed to active namenode when config HA



 You need to start the ZKFC process which will monitor and manage  the state of namenode.
 
 
 
 
  
Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).
 
Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation  of automatic HDFS failover relies on ZooKeeper for the following things:
  
 Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover  should be triggered.

 Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating  that it should become the next active.
 
The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs  a ZKFC, and that ZKFC is responsible for:
  
 Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node  healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

 ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's  support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.

 ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the  election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active  state.
 
 
 
 Please go through following link for more details..
 
 
 http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
 
 
 
  
Thanks & Regards
 
 
 
Brahma Reddy Battula
 
 
 
 From: 清如许 [475053586@qq.com]
 Sent: Tuesday, September 30, 2014 8:54 AM
 To: user
 Subject: Re: Failed to active namenode when config HA
 
 
 Hi, Matt
 
 Thank you very much for your response!
 
 There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
 
 There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
 <name>dfs.nameservices</name>
 and each nameservice will have two namenodes.
 
 If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
 
 But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active  state, use command
 hdfs haadmin -transitionToActive nn1
 HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
 
 Do you used to configure HA&Federation and know what may cause these problem?
 
 Thanks,
 Lucy
  
 
 ------------------ Original ------------------
  From:  "Matt Narrell";<ma...@gmail.com>;
 Send time: Monday, Sep 29, 2014 6:28 AM
 To: "user"<us...@hadoop.apache.org>; 
 Subject:  Re: Failed to active namenode when config HA
 
 
 
 I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
 
 Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
 
 <?xml version="1.0"?>
 <configuration>
   <property>
     <name>dfs.replication</name>
     <value>3</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/var/data/hadoop/hdfs/nn</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/var/data/hadoop/hdfs/dn</value>
   </property>
 
     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.nameservices</name>
       <value>hdfs-cluster</value>
     </property>
 
     <property>
       <name>dfs.ha.namenodes.hdfs-cluster</name>
       <value>nn1,nn2</value>
     </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
         <value>namenode1:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
         <value>namenode1:50070</value>
       </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
         <value>namenode2:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
         <value>namenode2:50070</value>
       </property>
 
     <property>
       <name>dfs.namenode.shared.edits.dir</name>
       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
     </property>
 
     <property>
       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>
 
     <property>
       <name>dfs.ha.fencing.methods</name>
       <value>sshfence</value>
     </property>
     <property>
       <name>dfs.ha.fencing.ssh.private-key-files</name>
       <value>/home/hadoop/.ssh/id_rsa</value>
     </property>
 </configuration>
 
 mn
 
 On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
 
 > Hi,
 > 
 > I'm new to hadoop and meet some problems when config HA.
 > Below are some important configuration in core-site.xml
 > 
 >   <property>
 >     <name>dfs.nameservices</name>
 >     <value>ns1,ns2</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns1</name>
 >     <value>nn1,nn3</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns2</name>
 >     <value>nn2,nn4</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
 >     <value>namenode1:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
 >     <value>namenode3:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
 >     <value>namenode2:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
 >     <value>namenode4:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.shared.edits.dir</name>
 >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
 >   </property>
 >   <property>
 >     <name>dfs.client.failover.proxy.provider.ns1</name>
 >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.methods</name>
 >     <value>sshfence</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.private-key-files</name>
 >     <value>/home/hduser/.ssh/id_rsa</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
 >     <value>30000</value>
 >   </property>
 >   <property>
 >     <name>dfs.journalnode.edits.dir</name>
 >     <value>/home/hduser/mydata/hdfs/journalnode</value>
 >   </property>
 > 
 > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
 > 
 > After configuration, I did the following steps
 > firstly,  I start jornalnode on datanode2,datanode3,datanode4
 > secondly I format datanode1 and start namenode on it
 > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
 > 
 > Everything seems fine unless no namenode is active now, then i tried to active one by running 
 > hdfs haadmin -transitionToActive nn1 on namenode1
 > but strangely it says "Illegal argument: Unable to determine the nameservice id."
 > 
 > Could anyone tell me why it cannot determine nn1 from my configuration?
 > Is there something wrong in my configuraion?
 > 
 > Thanks a lot!!!
 > 
 >

Re: RE: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.
Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)


------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>; 

Subject:  RE:  Failed to active namenode when config HA



 You need to start the ZKFC process which will monitor and manage  the state of namenode.
 
 
 
 
  
Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).
 
Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation  of automatic HDFS failover relies on ZooKeeper for the following things:
  
 Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover  should be triggered.

 Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating  that it should become the next active.
 
The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs  a ZKFC, and that ZKFC is responsible for:
  
 Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node  healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

 ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's  support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.

 ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the  election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active  state.
 
 
 
 Please go through following link for more details..
 
 
 http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
 
 
 
  
Thanks & Regards
 
 
 
Brahma Reddy Battula
 
 
 
 From: 清如许 [475053586@qq.com]
 Sent: Tuesday, September 30, 2014 8:54 AM
 To: user
 Subject: Re: Failed to active namenode when config HA
 
 
 Hi, Matt
 
 Thank you very much for your response!
 
 There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
 
 There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
 <name>dfs.nameservices</name>
 and each nameservice will have two namenodes.
 
 If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
 
 But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active  state, use command
 hdfs haadmin -transitionToActive nn1
 HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
 
 Do you used to configure HA&Federation and know what may cause these problem?
 
 Thanks,
 Lucy
  
 
 ------------------ Original ------------------
  From:  "Matt Narrell";<ma...@gmail.com>;
 Send time: Monday, Sep 29, 2014 6:28 AM
 To: "user"<us...@hadoop.apache.org>; 
 Subject:  Re: Failed to active namenode when config HA
 
 
 
 I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
 
 Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
 
 <?xml version="1.0"?>
 <configuration>
   <property>
     <name>dfs.replication</name>
     <value>3</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/var/data/hadoop/hdfs/nn</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/var/data/hadoop/hdfs/dn</value>
   </property>
 
     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.nameservices</name>
       <value>hdfs-cluster</value>
     </property>
 
     <property>
       <name>dfs.ha.namenodes.hdfs-cluster</name>
       <value>nn1,nn2</value>
     </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
         <value>namenode1:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
         <value>namenode1:50070</value>
       </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
         <value>namenode2:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
         <value>namenode2:50070</value>
       </property>
 
     <property>
       <name>dfs.namenode.shared.edits.dir</name>
       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
     </property>
 
     <property>
       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>
 
     <property>
       <name>dfs.ha.fencing.methods</name>
       <value>sshfence</value>
     </property>
     <property>
       <name>dfs.ha.fencing.ssh.private-key-files</name>
       <value>/home/hadoop/.ssh/id_rsa</value>
     </property>
 </configuration>
 
 mn
 
 On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
 
 > Hi,
 > 
 > I'm new to hadoop and meet some problems when config HA.
 > Below are some important configuration in core-site.xml
 > 
 >   <property>
 >     <name>dfs.nameservices</name>
 >     <value>ns1,ns2</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns1</name>
 >     <value>nn1,nn3</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns2</name>
 >     <value>nn2,nn4</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
 >     <value>namenode1:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
 >     <value>namenode3:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
 >     <value>namenode2:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
 >     <value>namenode4:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.shared.edits.dir</name>
 >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
 >   </property>
 >   <property>
 >     <name>dfs.client.failover.proxy.provider.ns1</name>
 >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.methods</name>
 >     <value>sshfence</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.private-key-files</name>
 >     <value>/home/hduser/.ssh/id_rsa</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
 >     <value>30000</value>
 >   </property>
 >   <property>
 >     <name>dfs.journalnode.edits.dir</name>
 >     <value>/home/hduser/mydata/hdfs/journalnode</value>
 >   </property>
 > 
 > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
 > 
 > After configuration, I did the following steps
 > firstly,  I start jornalnode on datanode2,datanode3,datanode4
 > secondly I format datanode1 and start namenode on it
 > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
 > 
 > Everything seems fine unless no namenode is active now, then i tried to active one by running 
 > hdfs haadmin -transitionToActive nn1 on namenode1
 > but strangely it says "Illegal argument: Unable to determine the nameservice id."
 > 
 > Could anyone tell me why it cannot determine nn1 from my configuration?
 > Is there something wrong in my configuraion?
 > 
 > Thanks a lot!!!
 > 
 >

Re: RE: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.
Thank you very much!
I use zookeeper to do automatic failover. Even HAAdmin still can not determine the four namenodes, but the cluster launched successfully.
I think i should do more research on it. :)


------------------ Original ------------------
From:  "Brahma Reddy Battula";<br...@huawei.com>;
Send time: Tuesday, Sep 30, 2014 12:04 PM
To: "user@hadoop.apache.org"<us...@hadoop.apache.org>; 

Subject:  RE:  Failed to active namenode when config HA



 You need to start the ZKFC process which will monitor and manage  the state of namenode.
 
 
 
 
  
Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).
 
Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation  of automatic HDFS failover relies on ZooKeeper for the following things:
  
 Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover  should be triggered.

 Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating  that it should become the next active.
 
The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs  a ZKFC, and that ZKFC is responsible for:
  
 Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node  healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

 ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's  support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.

 ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the  election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active  state.
 
 
 
 Please go through following link for more details..
 
 
 http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
 
 
 
  
Thanks & Regards
 
 
 
Brahma Reddy Battula
 
 
 
 From: 清如许 [475053586@qq.com]
 Sent: Tuesday, September 30, 2014 8:54 AM
 To: user
 Subject: Re: Failed to active namenode when config HA
 
 
 Hi, Matt
 
 Thank you very much for your response!
 
 There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
 
 There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
 <name>dfs.nameservices</name>
 and each nameservice will have two namenodes.
 
 If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
 
 But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active  state, use command
 hdfs haadmin -transitionToActive nn1
 HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
 
 Do you used to configure HA&Federation and know what may cause these problem?
 
 Thanks,
 Lucy
  
 
 ------------------ Original ------------------
  From:  "Matt Narrell";<ma...@gmail.com>;
 Send time: Monday, Sep 29, 2014 6:28 AM
 To: "user"<us...@hadoop.apache.org>; 
 Subject:  Re: Failed to active namenode when config HA
 
 
 
 I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
 
 Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
 
 <?xml version="1.0"?>
 <configuration>
   <property>
     <name>dfs.replication</name>
     <value>3</value>
   </property>
   <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/var/data/hadoop/hdfs/nn</value>
   </property>
   <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/var/data/hadoop/hdfs/dn</value>
   </property>
 
     <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.nameservices</name>
       <value>hdfs-cluster</value>
     </property>
 
     <property>
       <name>dfs.ha.namenodes.hdfs-cluster</name>
       <value>nn1,nn2</value>
     </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
         <value>namenode1:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
         <value>namenode1:50070</value>
       </property>
       <property>
         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
         <value>namenode2:8020</value>
       </property>
       <property>
         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
         <value>namenode2:50070</value>
       </property>
 
     <property>
       <name>dfs.namenode.shared.edits.dir</name>
       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
     </property>
 
     <property>
       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
     </property>
 
     <property>
       <name>dfs.ha.fencing.methods</name>
       <value>sshfence</value>
     </property>
     <property>
       <name>dfs.ha.fencing.ssh.private-key-files</name>
       <value>/home/hadoop/.ssh/id_rsa</value>
     </property>
 </configuration>
 
 mn
 
 On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
 
 > Hi,
 > 
 > I'm new to hadoop and meet some problems when config HA.
 > Below are some important configuration in core-site.xml
 > 
 >   <property>
 >     <name>dfs.nameservices</name>
 >     <value>ns1,ns2</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns1</name>
 >     <value>nn1,nn3</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.namenodes.ns2</name>
 >     <value>nn2,nn4</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
 >     <value>namenode1:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
 >     <value>namenode3:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
 >     <value>namenode2:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
 >     <value>namenode4:9000</value>
 >   </property>
 >   <property>
 >     <name>dfs.namenode.shared.edits.dir</name>
 >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
 >   </property>
 >   <property>
 >     <name>dfs.client.failover.proxy.provider.ns1</name>
 >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.methods</name>
 >     <value>sshfence</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.private-key-files</name>
 >     <value>/home/hduser/.ssh/id_rsa</value>
 >   </property>
 >   <property>
 >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
 >     <value>30000</value>
 >   </property>
 >   <property>
 >     <name>dfs.journalnode.edits.dir</name>
 >     <value>/home/hduser/mydata/hdfs/journalnode</value>
 >   </property>
 > 
 > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
 > 
 > After configuration, I did the following steps
 > firstly,  I start jornalnode on datanode2,datanode3,datanode4
 > secondly I format datanode1 and start namenode on it
 > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
 > 
 > Everything seems fine unless no namenode is active now, then i tried to active one by running 
 > hdfs haadmin -transitionToActive nn1 on namenode1
 > but strangely it says "Illegal argument: Unable to determine the nameservice id."
 > 
 > Could anyone tell me why it cannot determine nn1 from my configuration?
 > Is there something wrong in my configuraion?
 > 
 > Thanks a lot!!!
 > 
 >

RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.
You need to start the ZKFC process which will monitor and manage  the state of namenode.





Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.



Please go through following link for more details..


http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html




Thanks & Regards



Brahma Reddy Battula



________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.
Lucy, 

I’m sorry, I’m only doing HDFS HA, not federated HDFS.

mn

On Sep 29, 2014, at 9:24 PM, 清如许 <47...@qq.com> wrote:

> Hi, Matt
> 
> Thank you very much for your response!
> 
> There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
> 
> There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
> <name>dfs.nameservices</name>
> and each nameservice will have two namenodes.
> 
> If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
> 
> But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
> hdfs haadmin -transitionToActive nn1
> HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
> 
> Do you used to configure HA&Federation and know what may cause these problem?
> 
> Thanks,
> Lucy
> 
> ------------------ Original ------------------
> From:  "Matt Narrell";<ma...@gmail.com>;
> Send time: Monday, Sep 29, 2014 6:28 AM
> To: "user"<us...@hadoop.apache.org>;
> Subject:  Re: Failed to active namenode when config HA
> 
> I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
> 
> Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
> 
> <?xml version="1.0"?>
> <configuration>
>   <property>
>     <name>dfs.replication</name>
>     <value>3</value>
>   </property>
>   <property>
>     <name>dfs.namenode.name.dir</name>
>     <value>file:/var/data/hadoop/hdfs/nn</value>
>   </property>
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>file:/var/data/hadoop/hdfs/dn</value>
>   </property>
> 
>     <property>
>       <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
>     </property>
>     <property>
>       <name>dfs.nameservices</name>
>       <value>hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.namenodes.hdfs-cluster</name>
>       <value>nn1,nn2</value>
>     </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
>         <value>namenode1:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
>         <value>namenode1:50070</value>
>       </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
>         <value>namenode2:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
>         <value>namenode2:50070</value>
>       </property>
> 
>     <property>
>       <name>dfs.namenode.shared.edits.dir</name>
>       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
>       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.fencing.methods</name>
>       <value>sshfence</value>
>     </property>
>     <property>
>       <name>dfs.ha.fencing.ssh.private-key-files</name>
>       <value>/home/hadoop/.ssh/id_rsa</value>
>     </property>
> </configuration>
> 
> mn
> 
> On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
> 
> > Hi,
> > 
> > I'm new to hadoop and meet some problems when config HA.
> > Below are some important configuration in core-site.xml
> > 
> >   <property>
> >     <name>dfs.nameservices</name>
> >     <value>ns1,ns2</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns1</name>
> >     <value>nn1,nn3</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns2</name>
> >     <value>nn2,nn4</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
> >     <value>namenode1:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
> >     <value>namenode3:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
> >     <value>namenode2:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
> >     <value>namenode4:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.shared.edits.dir</name>
> >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
> >   </property>
> >   <property>
> >     <name>dfs.client.failover.proxy.provider.ns1</name>
> >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.methods</name>
> >     <value>sshfence</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.private-key-files</name>
> >     <value>/home/hduser/.ssh/id_rsa</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
> >     <value>30000</value>
> >   </property>
> >   <property>
> >     <name>dfs.journalnode.edits.dir</name>
> >     <value>/home/hduser/mydata/hdfs/journalnode</value>
> >   </property>
> > 
> > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> > 
> > After configuration, I did the following steps
> > firstly,  I start jornalnode on datanode2,datanode3,datanode4
> > secondly I format datanode1 and start namenode on it
> > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> > 
> > Everything seems fine unless no namenode is active now, then i tried to active one by running 
> > hdfs haadmin -transitionToActive nn1 on namenode1
> > but strangely it says "Illegal argument: Unable to determine the nameservice id."
> > 
> > Could anyone tell me why it cannot determine nn1 from my configuration?
> > Is there something wrong in my configuraion?
> > 
> > Thanks a lot!!!
> > 
> > 


RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.
You need to start the ZKFC process which will monitor and manage  the state of namenode.





Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.



Please go through following link for more details..


http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html




Thanks & Regards



Brahma Reddy Battula



________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.
Lucy, 

I’m sorry, I’m only doing HDFS HA, not federated HDFS.

mn

On Sep 29, 2014, at 9:24 PM, 清如许 <47...@qq.com> wrote:

> Hi, Matt
> 
> Thank you very much for your response!
> 
> There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
> 
> There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
> <name>dfs.nameservices</name>
> and each nameservice will have two namenodes.
> 
> If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
> 
> But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
> hdfs haadmin -transitionToActive nn1
> HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
> 
> Do you used to configure HA&Federation and know what may cause these problem?
> 
> Thanks,
> Lucy
> 
> ------------------ Original ------------------
> From:  "Matt Narrell";<ma...@gmail.com>;
> Send time: Monday, Sep 29, 2014 6:28 AM
> To: "user"<us...@hadoop.apache.org>;
> Subject:  Re: Failed to active namenode when config HA
> 
> I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
> 
> Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
> 
> <?xml version="1.0"?>
> <configuration>
>   <property>
>     <name>dfs.replication</name>
>     <value>3</value>
>   </property>
>   <property>
>     <name>dfs.namenode.name.dir</name>
>     <value>file:/var/data/hadoop/hdfs/nn</value>
>   </property>
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>file:/var/data/hadoop/hdfs/dn</value>
>   </property>
> 
>     <property>
>       <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
>     </property>
>     <property>
>       <name>dfs.nameservices</name>
>       <value>hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.namenodes.hdfs-cluster</name>
>       <value>nn1,nn2</value>
>     </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
>         <value>namenode1:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
>         <value>namenode1:50070</value>
>       </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
>         <value>namenode2:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
>         <value>namenode2:50070</value>
>       </property>
> 
>     <property>
>       <name>dfs.namenode.shared.edits.dir</name>
>       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
>       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.fencing.methods</name>
>       <value>sshfence</value>
>     </property>
>     <property>
>       <name>dfs.ha.fencing.ssh.private-key-files</name>
>       <value>/home/hadoop/.ssh/id_rsa</value>
>     </property>
> </configuration>
> 
> mn
> 
> On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
> 
> > Hi,
> > 
> > I'm new to hadoop and meet some problems when config HA.
> > Below are some important configuration in core-site.xml
> > 
> >   <property>
> >     <name>dfs.nameservices</name>
> >     <value>ns1,ns2</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns1</name>
> >     <value>nn1,nn3</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns2</name>
> >     <value>nn2,nn4</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
> >     <value>namenode1:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
> >     <value>namenode3:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
> >     <value>namenode2:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
> >     <value>namenode4:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.shared.edits.dir</name>
> >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
> >   </property>
> >   <property>
> >     <name>dfs.client.failover.proxy.provider.ns1</name>
> >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.methods</name>
> >     <value>sshfence</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.private-key-files</name>
> >     <value>/home/hduser/.ssh/id_rsa</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
> >     <value>30000</value>
> >   </property>
> >   <property>
> >     <name>dfs.journalnode.edits.dir</name>
> >     <value>/home/hduser/mydata/hdfs/journalnode</value>
> >   </property>
> > 
> > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> > 
> > After configuration, I did the following steps
> > firstly,  I start jornalnode on datanode2,datanode3,datanode4
> > secondly I format datanode1 and start namenode on it
> > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> > 
> > Everything seems fine unless no namenode is active now, then i tried to active one by running 
> > hdfs haadmin -transitionToActive nn1 on namenode1
> > but strangely it says "Illegal argument: Unable to determine the nameservice id."
> > 
> > Could anyone tell me why it cannot determine nn1 from my configuration?
> > Is there something wrong in my configuraion?
> > 
> > Thanks a lot!!!
> > 
> > 


RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.
You need to start the ZKFC process which will monitor and manage  the state of namenode.





Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.



Please go through following link for more details..


http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html




Thanks & Regards



Brahma Reddy Battula



________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

RE: Failed to active namenode when config HA

Posted by Brahma Reddy Battula <br...@huawei.com>.
You need to start the ZKFC process which will monitor and manage  the state of namenode.





Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures. The implementation of automatic HDFS failover relies on ZooKeeper for the following things:

  *   Failure detection - each of the NameNode machines in the cluster maintains a persistent session in ZooKeeper. If the machine crashes, the ZooKeeper session will expire, notifying the other NameNode that a failover should be triggered.
  *   Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

  *   Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
  *   ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
  *   ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.



Please go through following link for more details..


http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html




Thanks & Regards



Brahma Reddy Battula



________________________________
From: 清如许 [475053586@qq.com]
Sent: Tuesday, September 30, 2014 8:54 AM
To: user
Subject: Re: Failed to active namenode when config HA

Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy

------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>;
Subject:  Re: Failed to active namenode when config HA

I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
>
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
>
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
>
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
>
> Everything seems fine unless no namenode is active now, then i tried to active one by running
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
>
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
>
> Thanks a lot!!!
>
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.
Lucy, 

I’m sorry, I’m only doing HDFS HA, not federated HDFS.

mn

On Sep 29, 2014, at 9:24 PM, 清如许 <47...@qq.com> wrote:

> Hi, Matt
> 
> Thank you very much for your response!
> 
> There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
> 
> There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
> <name>dfs.nameservices</name>
> and each nameservice will have two namenodes.
> 
> If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
> 
> But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
> hdfs haadmin -transitionToActive nn1
> HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
> 
> Do you used to configure HA&Federation and know what may cause these problem?
> 
> Thanks,
> Lucy
> 
> ------------------ Original ------------------
> From:  "Matt Narrell";<ma...@gmail.com>;
> Send time: Monday, Sep 29, 2014 6:28 AM
> To: "user"<us...@hadoop.apache.org>;
> Subject:  Re: Failed to active namenode when config HA
> 
> I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
> 
> Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
> 
> <?xml version="1.0"?>
> <configuration>
>   <property>
>     <name>dfs.replication</name>
>     <value>3</value>
>   </property>
>   <property>
>     <name>dfs.namenode.name.dir</name>
>     <value>file:/var/data/hadoop/hdfs/nn</value>
>   </property>
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>file:/var/data/hadoop/hdfs/dn</value>
>   </property>
> 
>     <property>
>       <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
>     </property>
>     <property>
>       <name>dfs.nameservices</name>
>       <value>hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.namenodes.hdfs-cluster</name>
>       <value>nn1,nn2</value>
>     </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
>         <value>namenode1:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
>         <value>namenode1:50070</value>
>       </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
>         <value>namenode2:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
>         <value>namenode2:50070</value>
>       </property>
> 
>     <property>
>       <name>dfs.namenode.shared.edits.dir</name>
>       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
>       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.fencing.methods</name>
>       <value>sshfence</value>
>     </property>
>     <property>
>       <name>dfs.ha.fencing.ssh.private-key-files</name>
>       <value>/home/hadoop/.ssh/id_rsa</value>
>     </property>
> </configuration>
> 
> mn
> 
> On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
> 
> > Hi,
> > 
> > I'm new to hadoop and meet some problems when config HA.
> > Below are some important configuration in core-site.xml
> > 
> >   <property>
> >     <name>dfs.nameservices</name>
> >     <value>ns1,ns2</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns1</name>
> >     <value>nn1,nn3</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns2</name>
> >     <value>nn2,nn4</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
> >     <value>namenode1:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
> >     <value>namenode3:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
> >     <value>namenode2:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
> >     <value>namenode4:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.shared.edits.dir</name>
> >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
> >   </property>
> >   <property>
> >     <name>dfs.client.failover.proxy.provider.ns1</name>
> >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.methods</name>
> >     <value>sshfence</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.private-key-files</name>
> >     <value>/home/hduser/.ssh/id_rsa</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
> >     <value>30000</value>
> >   </property>
> >   <property>
> >     <name>dfs.journalnode.edits.dir</name>
> >     <value>/home/hduser/mydata/hdfs/journalnode</value>
> >   </property>
> > 
> > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> > 
> > After configuration, I did the following steps
> > firstly,  I start jornalnode on datanode2,datanode3,datanode4
> > secondly I format datanode1 and start namenode on it
> > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> > 
> > Everything seems fine unless no namenode is active now, then i tried to active one by running 
> > hdfs haadmin -transitionToActive nn1 on namenode1
> > but strangely it says "Illegal argument: Unable to determine the nameservice id."
> > 
> > Could anyone tell me why it cannot determine nn1 from my configuration?
> > Is there something wrong in my configuraion?
> > 
> > Thanks a lot!!!
> > 
> > 


Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.
Lucy, 

I’m sorry, I’m only doing HDFS HA, not federated HDFS.

mn

On Sep 29, 2014, at 9:24 PM, 清如许 <47...@qq.com> wrote:

> Hi, Matt
> 
> Thank you very much for your response!
> 
> There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.
> 
> There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
> <name>dfs.nameservices</name>
> and each nameservice will have two namenodes.
> 
> If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.
> 
> But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
> hdfs haadmin -transitionToActive nn1
> HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.
> 
> Do you used to configure HA&Federation and know what may cause these problem?
> 
> Thanks,
> Lucy
> 
> ------------------ Original ------------------
> From:  "Matt Narrell";<ma...@gmail.com>;
> Send time: Monday, Sep 29, 2014 6:28 AM
> To: "user"<us...@hadoop.apache.org>;
> Subject:  Re: Failed to active namenode when config HA
> 
> I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.
> 
> Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:
> 
> <?xml version="1.0"?>
> <configuration>
>   <property>
>     <name>dfs.replication</name>
>     <value>3</value>
>   </property>
>   <property>
>     <name>dfs.namenode.name.dir</name>
>     <value>file:/var/data/hadoop/hdfs/nn</value>
>   </property>
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>file:/var/data/hadoop/hdfs/dn</value>
>   </property>
> 
>     <property>
>       <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
>     </property>
>     <property>
>       <name>dfs.nameservices</name>
>       <value>hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.namenodes.hdfs-cluster</name>
>       <value>nn1,nn2</value>
>     </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
>         <value>namenode1:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
>         <value>namenode1:50070</value>
>       </property>
>       <property>
>         <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
>         <value>namenode2:8020</value>
>       </property>
>       <property>
>         <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
>         <value>namenode2:50070</value>
>       </property>
> 
>     <property>
>       <name>dfs.namenode.shared.edits.dir</name>
>       <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
>     </property>
> 
>     <property>
>       <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
>       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>     </property>
> 
>     <property>
>       <name>dfs.ha.fencing.methods</name>
>       <value>sshfence</value>
>     </property>
>     <property>
>       <name>dfs.ha.fencing.ssh.private-key-files</name>
>       <value>/home/hadoop/.ssh/id_rsa</value>
>     </property>
> </configuration>
> 
> mn
> 
> On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:
> 
> > Hi,
> > 
> > I'm new to hadoop and meet some problems when config HA.
> > Below are some important configuration in core-site.xml
> > 
> >   <property>
> >     <name>dfs.nameservices</name>
> >     <value>ns1,ns2</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns1</name>
> >     <value>nn1,nn3</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.namenodes.ns2</name>
> >     <value>nn2,nn4</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn1</name>
> >     <value>namenode1:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns1.nn3</name>
> >     <value>namenode3:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn2</name>
> >     <value>namenode2:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.rpc-address.ns2.nn4</name>
> >     <value>namenode4:9000</value>
> >   </property>
> >   <property>
> >     <name>dfs.namenode.shared.edits.dir</name>
> >     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
> >   </property>
> >   <property>
> >     <name>dfs.client.failover.proxy.provider.ns1</name>
> >     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.methods</name>
> >     <value>sshfence</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.private-key-files</name>
> >     <value>/home/hduser/.ssh/id_rsa</value>
> >   </property>
> >   <property>
> >     <name>dfs.ha.fencing.ssh.connect-timeout</name>
> >     <value>30000</value>
> >   </property>
> >   <property>
> >     <name>dfs.journalnode.edits.dir</name>
> >     <value>/home/hduser/mydata/hdfs/journalnode</value>
> >   </property>
> > 
> > (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> > 
> > After configuration, I did the following steps
> > firstly,  I start jornalnode on datanode2,datanode3,datanode4
> > secondly I format datanode1 and start namenode on it
> > then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> > 
> > Everything seems fine unless no namenode is active now, then i tried to active one by running 
> > hdfs haadmin -transitionToActive nn1 on namenode1
> > but strangely it says "Illegal argument: Unable to determine the nameservice id."
> > 
> > Could anyone tell me why it cannot determine nn1 from my configuration?
> > Is there something wrong in my configuraion?
> > 
> > Thanks a lot!!!
> > 
> > 


Re: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.
Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy


------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>; 

Subject:  Re: Failed to active namenode when config HA



I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.
Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy


------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>; 

Subject:  Re: Failed to active namenode when config HA



I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.
Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy


------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>; 

Subject:  Re: Failed to active namenode when config HA



I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by 清如许 <47...@qq.com>.
Hi, Matt

Thank you very much for your response!

There were some mistakes in my description as i wrote this mail in a hurry. I put those properties is in hdfs-site.xml not core-site.xml.

There are four name nodes because i also using HDFS federation, so there are two nameservices in porperty
<name>dfs.nameservices</name>
and each nameservice will have two namenodes.

If i configure only HA (only one nameservice), everything is ok, and HAAdmin can determine the namenodes nn1, nn3.

But if i configure two nameservice and set namenodes nn1,nn3 for nameservice1 and nn2,nn4 for nameservices2. I can start these namenodes successfully and the namenodes are all in standby state at th beginning. But if i want to change one namenode to active state, use command
hdfs haadmin -transitionToActive nn1
HAAdmin throw exception as it cannot determine the four namenodes(nn1,nn2,nn3,nn4) at all.

Do you used to configure HA&Federation and know what may cause these problem?

Thanks,
Lucy


------------------ Original ------------------
From:  "Matt Narrell";<ma...@gmail.com>;
Send time: Monday, Sep 29, 2014 6:28 AM
To: "user"<us...@hadoop.apache.org>; 

Subject:  Re: Failed to active namenode when config HA



I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>

Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.
I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
> 


Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.
I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
> 


Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.
I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
> 


Re: Failed to active namenode when config HA

Posted by Matt Narrell <ma...@gmail.com>.
I’m pretty sure HDFS HA is relegated to two name nodes (not four), designated active and standby.  Secondly, I believe these properties should be in hdfs-site.xml NOT core-site.xml.

Furthermore, I think your HDFS nameservices are misconfigured.  Consider the following:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/var/data/hadoop/hdfs/nn</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/var/data/hadoop/hdfs/dn</value>
  </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.nameservices</name>
      <value>hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.ha.namenodes.hdfs-cluster</name>
      <value>nn1,nn2</value>
    </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>namenode1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>namenode1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>namenode2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>namenode2:50070</value>
      </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/hdfs-cluster</value>
    </property>

    <property>
      <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>

mn

On Sep 28, 2014, at 12:56 PM, 清如许 <47...@qq.com> wrote:

> Hi,
> 
> I'm new to hadoop and meet some problems when config HA.
> Below are some important configuration in core-site.xml
> 
>   <property>
>     <name>dfs.nameservices</name>
>     <value>ns1,ns2</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns1</name>
>     <value>nn1,nn3</value>
>   </property>
>   <property>
>     <name>dfs.ha.namenodes.ns2</name>
>     <value>nn2,nn4</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn1</name>
>     <value>namenode1:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns1.nn3</name>
>     <value>namenode3:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn2</name>
>     <value>namenode2:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.ns2.nn4</name>
>     <value>namenode4:9000</value>
>   </property>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://datanode2:8485;datanode3:8485;datanode4:8485/ns1</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.ns1</name>
>     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/hduser/.ssh/id_rsa</value>
>   </property>
>   <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>     <value>30000</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/home/hduser/mydata/hdfs/journalnode</value>
>   </property>
> 
> (two nameservice ns1,ns2 is for configuring federation later. In this step, I only want launch ns1 on namenode1,namenode3)
> 
> After configuration, I did the following steps
> firstly,  I start jornalnode on datanode2,datanode3,datanode4
> secondly I format datanode1 and start namenode on it
> then i run 'hdfs namenode -bootstrapStandby' on the other namenode and start namenode on it
> 
> Everything seems fine unless no namenode is active now, then i tried to active one by running 
> hdfs haadmin -transitionToActive nn1 on namenode1
> but strangely it says "Illegal argument: Unable to determine the nameservice id."
> 
> Could anyone tell me why it cannot determine nn1 from my configuration?
> Is there something wrong in my configuraion?
> 
> Thanks a lot!!!
> 
>