You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Alejandro Fernandez <af...@hortonworks.com> on 2015/02/13 16:42:34 UTC

Review Request 31002: RU - NodeManager failed to restart in Kerberized clusters

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31002/
-----------------------------------------------------------

Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, and Robert Levas.


Bugs: AMBARI-9627
    https://issues.apache.org/jira/browse/AMBARI-9627


Repository: ambari


Description
-------

Node Manager failed to restart in a Kerberized cluster while performing a Rolling Upgrade.

I deployed a 3-node cluster with all services from HDDFS through ZK, then enabled Namenode HA, and kerberized the cluster.

When I performed a RU from 2.2.0.0 GA bits to 2.2.1.0-2260, I first had to comment out an error in ZK server, and when I got to the Slaves group, NodeManager failed. See attached log.

```
Fail: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/nm.service.keytab nm/_HOST@EXAMPLE.COM;' returned 1. kinit: Keytab contains no suitable keys for nm/_HOST@EXAMPLE.COM while getting initial credentials
```

```
[root@c6404 ~]# klist -kt /etc/security/keytabs/nm.service.keytab
Keytab name: FILE:/etc/security/keytabs/nm.service.keytab
KVNO Timestamp         Principal
---- ----------------- --------------------------------------------------------
   1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
   1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
   1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
   1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
   1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
```

This means that params.py is probably missing to replace _HOST with the value.


Diffs
-----

  ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params.py 53beb96 

Diff: https://reviews.apache.org/r/31002/diff/


Testing
-------

Deployed a 3-node cluster with HDFS, YARN, ..., ZK, then added Namenode HA, and kerberized the cluster. After performing a RU, I verified that the fix for NodeManager worked.

Unit tests are in progress.


Thanks,

Alejandro Fernandez


Re: Review Request 31002: RU - NodeManager failed to restart in Kerberized clusters

Posted by Robert Levas <rl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31002/#review72395
-----------------------------------------------------------

Ship it!


Ship It!

- Robert Levas


On Feb. 13, 2015, 12:19 p.m., Alejandro Fernandez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31002/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2015, 12:19 p.m.)
> 
> 
> Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-9627
>     https://issues.apache.org/jira/browse/AMBARI-9627
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Node Manager failed to restart in a Kerberized cluster while performing a Rolling Upgrade.
> 
> I deployed a 3-node cluster with all services from HDDFS through ZK, then enabled Namenode HA, and kerberized the cluster.
> 
> When I performed a RU from 2.2.0.0 GA bits to 2.2.1.0-2260, I first had to comment out an error in ZK server, and when I got to the Slaves group, NodeManager failed. See attached log.
> 
> ```
> Fail: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/nm.service.keytab nm/_HOST@EXAMPLE.COM;' returned 1. kinit: Keytab contains no suitable keys for nm/_HOST@EXAMPLE.COM while getting initial credentials
> ```
> 
> ```
> [root@c6404 ~]# klist -kt /etc/security/keytabs/nm.service.keytab
> Keytab name: FILE:/etc/security/keytabs/nm.service.keytab
> KVNO Timestamp         Principal
> ---- ----------------- --------------------------------------------------------
>    1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
>    1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
>    1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
>    1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
>    1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
> ```
> 
> This means that params.py is probably missing to replace _HOST with the value.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params.py 53beb96 
> 
> Diff: https://reviews.apache.org/r/31002/diff/
> 
> 
> Testing
> -------
> 
> Deployed a 3-node cluster with HDFS, YARN, ..., ZK, then added Namenode HA, and kerberized the cluster. After performing a RU, I verified that the fix for NodeManager worked.
> 
> Unit tests passed in ABO.
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 01:13 h
> [INFO] Finished at: 2015-02-13T17:11:56+00:00
> [INFO] Final Memory: 44M/475M
> [INFO] ------------------------------------------------------------------------
> 
> 
> Thanks,
> 
> Alejandro Fernandez
> 
>


Re: Review Request 31002: RU - NodeManager failed to restart in Kerberized clusters

Posted by Alejandro Fernandez <af...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31002/
-----------------------------------------------------------

(Updated Feb. 13, 2015, 5:19 p.m.)


Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, and Robert Levas.


Bugs: AMBARI-9627
    https://issues.apache.org/jira/browse/AMBARI-9627


Repository: ambari


Description
-------

Node Manager failed to restart in a Kerberized cluster while performing a Rolling Upgrade.

I deployed a 3-node cluster with all services from HDDFS through ZK, then enabled Namenode HA, and kerberized the cluster.

When I performed a RU from 2.2.0.0 GA bits to 2.2.1.0-2260, I first had to comment out an error in ZK server, and when I got to the Slaves group, NodeManager failed. See attached log.

```
Fail: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/nm.service.keytab nm/_HOST@EXAMPLE.COM;' returned 1. kinit: Keytab contains no suitable keys for nm/_HOST@EXAMPLE.COM while getting initial credentials
```

```
[root@c6404 ~]# klist -kt /etc/security/keytabs/nm.service.keytab
Keytab name: FILE:/etc/security/keytabs/nm.service.keytab
KVNO Timestamp         Principal
---- ----------------- --------------------------------------------------------
   1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
   1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
   1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
   1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
   1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
```

This means that params.py is probably missing to replace _HOST with the value.


Diffs
-----

  ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params.py 53beb96 

Diff: https://reviews.apache.org/r/31002/diff/


Testing (updated)
-------

Deployed a 3-node cluster with HDFS, YARN, ..., ZK, then added Namenode HA, and kerberized the cluster. After performing a RU, I verified that the fix for NodeManager worked.

Unit tests passed in ABO.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:13 h
[INFO] Finished at: 2015-02-13T17:11:56+00:00
[INFO] Final Memory: 44M/475M
[INFO] ------------------------------------------------------------------------


Thanks,

Alejandro Fernandez