You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Alejandro Fernandez <af...@hortonworks.com> on 2015/02/13 16:42:34 UTC
Review Request 31002: RU - NodeManager failed to restart in Kerberized
clusters
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31002/
-----------------------------------------------------------
Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, and Robert Levas.
Bugs: AMBARI-9627
https://issues.apache.org/jira/browse/AMBARI-9627
Repository: ambari
Description
-------
Node Manager failed to restart in a Kerberized cluster while performing a Rolling Upgrade.
I deployed a 3-node cluster with all services from HDDFS through ZK, then enabled Namenode HA, and kerberized the cluster.
When I performed a RU from 2.2.0.0 GA bits to 2.2.1.0-2260, I first had to comment out an error in ZK server, and when I got to the Slaves group, NodeManager failed. See attached log.
```
Fail: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/nm.service.keytab nm/_HOST@EXAMPLE.COM;' returned 1. kinit: Keytab contains no suitable keys for nm/_HOST@EXAMPLE.COM while getting initial credentials
```
```
[root@c6404 ~]# klist -kt /etc/security/keytabs/nm.service.keytab
Keytab name: FILE:/etc/security/keytabs/nm.service.keytab
KVNO Timestamp Principal
---- ----------------- --------------------------------------------------------
1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
```
This means that params.py is probably missing to replace _HOST with the value.
Diffs
-----
ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params.py 53beb96
Diff: https://reviews.apache.org/r/31002/diff/
Testing
-------
Deployed a 3-node cluster with HDFS, YARN, ..., ZK, then added Namenode HA, and kerberized the cluster. After performing a RU, I verified that the fix for NodeManager worked.
Unit tests are in progress.
Thanks,
Alejandro Fernandez
Re: Review Request 31002: RU - NodeManager failed to restart in
Kerberized clusters
Posted by Robert Levas <rl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31002/#review72395
-----------------------------------------------------------
Ship it!
Ship It!
- Robert Levas
On Feb. 13, 2015, 12:19 p.m., Alejandro Fernandez wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31002/
> -----------------------------------------------------------
>
> (Updated Feb. 13, 2015, 12:19 p.m.)
>
>
> Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, and Robert Levas.
>
>
> Bugs: AMBARI-9627
> https://issues.apache.org/jira/browse/AMBARI-9627
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Node Manager failed to restart in a Kerberized cluster while performing a Rolling Upgrade.
>
> I deployed a 3-node cluster with all services from HDDFS through ZK, then enabled Namenode HA, and kerberized the cluster.
>
> When I performed a RU from 2.2.0.0 GA bits to 2.2.1.0-2260, I first had to comment out an error in ZK server, and when I got to the Slaves group, NodeManager failed. See attached log.
>
> ```
> Fail: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/nm.service.keytab nm/_HOST@EXAMPLE.COM;' returned 1. kinit: Keytab contains no suitable keys for nm/_HOST@EXAMPLE.COM while getting initial credentials
> ```
>
> ```
> [root@c6404 ~]# klist -kt /etc/security/keytabs/nm.service.keytab
> Keytab name: FILE:/etc/security/keytabs/nm.service.keytab
> KVNO Timestamp Principal
> ---- ----------------- --------------------------------------------------------
> 1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
> 1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
> 1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
> 1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
> 1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
> ```
>
> This means that params.py is probably missing to replace _HOST with the value.
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params.py 53beb96
>
> Diff: https://reviews.apache.org/r/31002/diff/
>
>
> Testing
> -------
>
> Deployed a 3-node cluster with HDFS, YARN, ..., ZK, then added Namenode HA, and kerberized the cluster. After performing a RU, I verified that the fix for NodeManager worked.
>
> Unit tests passed in ABO.
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 01:13 h
> [INFO] Finished at: 2015-02-13T17:11:56+00:00
> [INFO] Final Memory: 44M/475M
> [INFO] ------------------------------------------------------------------------
>
>
> Thanks,
>
> Alejandro Fernandez
>
>
Re: Review Request 31002: RU - NodeManager failed to restart in
Kerberized clusters
Posted by Alejandro Fernandez <af...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31002/
-----------------------------------------------------------
(Updated Feb. 13, 2015, 5:19 p.m.)
Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, and Robert Levas.
Bugs: AMBARI-9627
https://issues.apache.org/jira/browse/AMBARI-9627
Repository: ambari
Description
-------
Node Manager failed to restart in a Kerberized cluster while performing a Rolling Upgrade.
I deployed a 3-node cluster with all services from HDDFS through ZK, then enabled Namenode HA, and kerberized the cluster.
When I performed a RU from 2.2.0.0 GA bits to 2.2.1.0-2260, I first had to comment out an error in ZK server, and when I got to the Slaves group, NodeManager failed. See attached log.
```
Fail: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/nm.service.keytab nm/_HOST@EXAMPLE.COM;' returned 1. kinit: Keytab contains no suitable keys for nm/_HOST@EXAMPLE.COM while getting initial credentials
```
```
[root@c6404 ~]# klist -kt /etc/security/keytabs/nm.service.keytab
Keytab name: FILE:/etc/security/keytabs/nm.service.keytab
KVNO Timestamp Principal
---- ----------------- --------------------------------------------------------
1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
1 02/12/15 23:23:29 nm/c6404.ambari.apache.org@EXAMPLE.COM
```
This means that params.py is probably missing to replace _HOST with the value.
Diffs
-----
ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params.py 53beb96
Diff: https://reviews.apache.org/r/31002/diff/
Testing (updated)
-------
Deployed a 3-node cluster with HDFS, YARN, ..., ZK, then added Namenode HA, and kerberized the cluster. After performing a RU, I verified that the fix for NodeManager worked.
Unit tests passed in ABO.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:13 h
[INFO] Finished at: 2015-02-13T17:11:56+00:00
[INFO] Final Memory: 44M/475M
[INFO] ------------------------------------------------------------------------
Thanks,
Alejandro Fernandez