You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Robert Levas <rl...@hortonworks.com> on 2017/03/08 04:08:14 UTC
Review Request 57410: When SPNEGO authentication is enabled for
Hadoop in a cluster with NN HA, PXF Process alert fails
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57410/
-----------------------------------------------------------
Review request for Ambari, Attila Magyar, bhuvnesh chaudhary, Bal�zs Bence S�ri, Eugene Chekanskiy, jun aoki, Laszlo Puskas, and Sebastian Toader.
Bugs: AMBARI-20349
https://issues.apache.org/jira/browse/AMBARI-20349
Repository: ambari
Description
-------
When SPNEGO authentication is enabled for Hadoop in a cluster where NN HA is enabled, PXF Process alert fails with the following errors in the ambari-agent.log file
```
ERROR 2017-03-07 18:03:58,417 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6401.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesy
stem
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
data_dict = json.loads(data)
File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
INFO 2017-03-07 18:04:02,769 logger.py:71 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmphTXg76 2>/tmp/tmp5bm2nM''] {'quiet': False}
INFO 2017-03-07 18:04:02,797 logger.py:71 - call returned (0, '')
ERROR 2017-03-07 18:04:02,798 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
data_dict = json.loads(data)
File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
```
# Cause
During the test for the _PXF Process_ alert, the Active NN is found using a JMX call. This call requires SPNEGO authentication since SPNEGO authentication is turned on for the Hadoop web interfaces. However, a valid Kerberos ticket is not found in the configured user's Kerberos ticket cache. In this case, the configured users is the HDFS user - which technically is not necessary.
This occurs in `common-services/PXF/3.0.0/package/alerts/api_status.py:137`
```
if CLUSTER_ENV_SECURITY in configurations and configurations[CLUSTER_ENV_SECURITY].lower() == "true":
if 'dfs.nameservices' in configurations[HDFS_SITE]:
namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
else:
namenode_address = configurations[HDFS_SITE]['dfs.namenode.http-address']
token = _get_delegation_token(namenode_address,
configurations[HADOOP_ENV_HDFS_USER],
configurations[HADOOP_ENV_HDFS_USER_KEYTAB],
configurations[HADOOP_ENV_HDFS_PRINCIPAL_NAME],
None)
commonPXFHeaders.update({"X-GP-TOKEN": token})
```
Inside the call at
```
namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
```
# Solution
Ensure the configured user's Kerberos ticket cache contains a valid ticket before querying for the active NN. Possibly change the acting user to one executing the PXF component.
Diffs
-----
ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py d0ed0a4
Diff: https://reviews.apache.org/r/57410/diff/1/
Testing
-------
Manually tested in cluster - Ambari 2.5 with HPD 2.5 and HDB 2.1.2
Thanks,
Robert Levas
Re: Review Request 57410: When SPNEGO authentication is enabled for
Hadoop in a cluster with NN HA, PXF Process alert fails
Posted by Robert Levas <rl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57410/#review168242
-----------------------------------------------------------
ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py
Line 33 (original), 35 (patched)
<https://reviews.apache.org/r/57410/#comment240443>
The `pfx` user is hard coded in `common-services/PXF/3.0.0/package/scripts/pxf_constants.py:32`.
```
pxf_user = "pxf"
```
- Robert Levas
On March 7, 2017, 11:08 p.m., Robert Levas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57410/
> -----------------------------------------------------------
>
> (Updated March 7, 2017, 11:08 p.m.)
>
>
> Review request for Ambari, Attila Magyar, bhuvnesh chaudhary, Bal�zs Bence S�ri, Eugene Chekanskiy, jun aoki, Laszlo Puskas, and Sebastian Toader.
>
>
> Bugs: AMBARI-20349
> https://issues.apache.org/jira/browse/AMBARI-20349
>
>
> Repository: ambari
>
>
> Description
> -------
>
> When SPNEGO authentication is enabled for Hadoop in a cluster where NN HA is enabled, PXF Process alert fails with the following errors in the ambari-agent.log file
>
> ```
> ERROR 2017-03-07 18:03:58,417 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6401.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesy
> stem
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
> data_dict = json.loads(data)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> INFO 2017-03-07 18:04:02,769 logger.py:71 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmphTXg76 2>/tmp/tmp5bm2nM''] {'quiet': False}
> INFO 2017-03-07 18:04:02,797 logger.py:71 - call returned (0, '')
> ERROR 2017-03-07 18:04:02,798 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
> data_dict = json.loads(data)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> ```
>
> # Cause
> During the test for the _PXF Process_ alert, the Active NN is found using a JMX call. This call requires SPNEGO authentication since SPNEGO authentication is turned on for the Hadoop web interfaces. However, a valid Kerberos ticket is not found in the configured user's Kerberos ticket cache. In this case, the configured users is the HDFS user - which technically is not necessary.
>
> This occurs in `common-services/PXF/3.0.0/package/alerts/api_status.py:137`
> ```
> if CLUSTER_ENV_SECURITY in configurations and configurations[CLUSTER_ENV_SECURITY].lower() == "true":
> if 'dfs.nameservices' in configurations[HDFS_SITE]:
> namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> else:
> namenode_address = configurations[HDFS_SITE]['dfs.namenode.http-address']
>
> token = _get_delegation_token(namenode_address,
> configurations[HADOOP_ENV_HDFS_USER],
> configurations[HADOOP_ENV_HDFS_USER_KEYTAB],
> configurations[HADOOP_ENV_HDFS_PRINCIPAL_NAME],
> None)
> commonPXFHeaders.update({"X-GP-TOKEN": token})
> ```
>
> Inside the call at
>
> ```
> namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> ```
>
> # Solution
> Ensure the configured user's Kerberos ticket cache contains a valid ticket before querying for the active NN. Possibly change the acting user to one executing the PXF component.
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py d0ed0a4
>
>
> Diff: https://reviews.apache.org/r/57410/diff/1/
>
>
> Testing
> -------
>
> Manually tested in cluster - Ambari 2.5 with HPD 2.5 and HDB 2.1.2
>
>
> Thanks,
>
> Robert Levas
>
>
Re: Review Request 57410: When SPNEGO authentication is enabled for
Hadoop in a cluster with NN HA, PXF Process alert fails
Posted by Robert Levas <rl...@hortonworks.com>.
> On March 8, 2017, 3:01 a.m., Sebastian Toader wrote:
> > ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py
> > Lines 152 (patched)
> > <https://reviews.apache.org/r/57410/diff/1/?file=1658685#file1658685line152>
> >
> > Isn't kinit needed for the NN non-HA case as well?
> >
> > e.g. move this call right after
> > ```
> > if resolved_principal is not None:
> > resolved_principal = resolved_principal.replace('_HOST', host_name)
> > ```
The _added_ kinit is only needed before the `get_active_namenode` call. The `_get_delegation_token` call uses `curl_krb_request`, which performs a kinit itself.
Unfortunatley, though both calls eventually use `curl` to execute the request, each use a different ticket cacache.
- `curl_krb_request` places the obtained ticket in an alternate ticket cache (which is preferred)
- `get_active_namenode` eventually calls `get_value_from_jmx` which executes `curl` assuming the default (user interactive) ticket cache is valid.
So 2 kinit's will need to be made until a fix is made much deeper in the code.
This is even more unfortunate since the alert test is triggered every minute.
- Robert
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57410/#review168252
-----------------------------------------------------------
On March 7, 2017, 11:08 p.m., Robert Levas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57410/
> -----------------------------------------------------------
>
> (Updated March 7, 2017, 11:08 p.m.)
>
>
> Review request for Ambari, Attila Magyar, bhuvnesh chaudhary, Bal�zs Bence S�ri, Eugene Chekanskiy, jun aoki, Laszlo Puskas, and Sebastian Toader.
>
>
> Bugs: AMBARI-20349
> https://issues.apache.org/jira/browse/AMBARI-20349
>
>
> Repository: ambari
>
>
> Description
> -------
>
> When SPNEGO authentication is enabled for Hadoop in a cluster where NN HA is enabled, PXF Process alert fails with the following errors in the ambari-agent.log file
>
> ```
> ERROR 2017-03-07 18:03:58,417 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6401.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesy
> stem
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
> data_dict = json.loads(data)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> INFO 2017-03-07 18:04:02,769 logger.py:71 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmphTXg76 2>/tmp/tmp5bm2nM''] {'quiet': False}
> INFO 2017-03-07 18:04:02,797 logger.py:71 - call returned (0, '')
> ERROR 2017-03-07 18:04:02,798 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
> data_dict = json.loads(data)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> ```
>
> # Cause
> During the test for the _PXF Process_ alert, the Active NN is found using a JMX call. This call requires SPNEGO authentication since SPNEGO authentication is turned on for the Hadoop web interfaces. However, a valid Kerberos ticket is not found in the configured user's Kerberos ticket cache. In this case, the configured users is the HDFS user - which technically is not necessary.
>
> This occurs in `common-services/PXF/3.0.0/package/alerts/api_status.py:137`
> ```
> if CLUSTER_ENV_SECURITY in configurations and configurations[CLUSTER_ENV_SECURITY].lower() == "true":
> if 'dfs.nameservices' in configurations[HDFS_SITE]:
> namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> else:
> namenode_address = configurations[HDFS_SITE]['dfs.namenode.http-address']
>
> token = _get_delegation_token(namenode_address,
> configurations[HADOOP_ENV_HDFS_USER],
> configurations[HADOOP_ENV_HDFS_USER_KEYTAB],
> configurations[HADOOP_ENV_HDFS_PRINCIPAL_NAME],
> None)
> commonPXFHeaders.update({"X-GP-TOKEN": token})
> ```
>
> Inside the call at
>
> ```
> namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> ```
>
> # Solution
> Ensure the configured user's Kerberos ticket cache contains a valid ticket before querying for the active NN. Possibly change the acting user to one executing the PXF component.
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py d0ed0a4
>
>
> Diff: https://reviews.apache.org/r/57410/diff/1/
>
>
> Testing
> -------
>
> Manually tested in cluster - Ambari 2.5 with HPD 2.5 and HDB 2.1.2
>
>
> Thanks,
>
> Robert Levas
>
>
Re: Review Request 57410: When SPNEGO authentication is enabled for
Hadoop in a cluster with NN HA, PXF Process alert fails
Posted by Sebastian Toader <st...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57410/#review168252
-----------------------------------------------------------
ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py
Lines 152 (patched)
<https://reviews.apache.org/r/57410/#comment240466>
Isn't kinit needed for the NN non-HA case as well?
e.g. move this call right after
```
if resolved_principal is not None:
resolved_principal = resolved_principal.replace('_HOST', host_name)
```
- Sebastian Toader
On March 8, 2017, 5:08 a.m., Robert Levas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57410/
> -----------------------------------------------------------
>
> (Updated March 8, 2017, 5:08 a.m.)
>
>
> Review request for Ambari, Attila Magyar, bhuvnesh chaudhary, Bal�zs Bence S�ri, Eugene Chekanskiy, jun aoki, Laszlo Puskas, and Sebastian Toader.
>
>
> Bugs: AMBARI-20349
> https://issues.apache.org/jira/browse/AMBARI-20349
>
>
> Repository: ambari
>
>
> Description
> -------
>
> When SPNEGO authentication is enabled for Hadoop in a cluster where NN HA is enabled, PXF Process alert fails with the following errors in the ambari-agent.log file
>
> ```
> ERROR 2017-03-07 18:03:58,417 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6401.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesy
> stem
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
> data_dict = json.loads(data)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> INFO 2017-03-07 18:04:02,769 logger.py:71 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmphTXg76 2>/tmp/tmp5bm2nM''] {'quiet': False}
> INFO 2017-03-07 18:04:02,797 logger.py:71 - call returned (0, '')
> ERROR 2017-03-07 18:04:02,798 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
> data_dict = json.loads(data)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> ```
>
> # Cause
> During the test for the _PXF Process_ alert, the Active NN is found using a JMX call. This call requires SPNEGO authentication since SPNEGO authentication is turned on for the Hadoop web interfaces. However, a valid Kerberos ticket is not found in the configured user's Kerberos ticket cache. In this case, the configured users is the HDFS user - which technically is not necessary.
>
> This occurs in `common-services/PXF/3.0.0/package/alerts/api_status.py:137`
> ```
> if CLUSTER_ENV_SECURITY in configurations and configurations[CLUSTER_ENV_SECURITY].lower() == "true":
> if 'dfs.nameservices' in configurations[HDFS_SITE]:
> namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> else:
> namenode_address = configurations[HDFS_SITE]['dfs.namenode.http-address']
>
> token = _get_delegation_token(namenode_address,
> configurations[HADOOP_ENV_HDFS_USER],
> configurations[HADOOP_ENV_HDFS_USER_KEYTAB],
> configurations[HADOOP_ENV_HDFS_PRINCIPAL_NAME],
> None)
> commonPXFHeaders.update({"X-GP-TOKEN": token})
> ```
>
> Inside the call at
>
> ```
> namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> ```
>
> # Solution
> Ensure the configured user's Kerberos ticket cache contains a valid ticket before querying for the active NN. Possibly change the acting user to one executing the PXF component.
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py d0ed0a4
>
>
> Diff: https://reviews.apache.org/r/57410/diff/1/
>
>
> Testing
> -------
>
> Manually tested in cluster - Ambari 2.5 with HPD 2.5 and HDB 2.1.2
>
>
> Thanks,
>
> Robert Levas
>
>
Re: Review Request 57410: When SPNEGO authentication is enabled for
Hadoop in a cluster with NN HA, PXF Process alert fails
Posted by Eugene Chekanskiy <ec...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57410/#review168458
-----------------------------------------------------------
Ship it!
Ship It!
- Eugene Chekanskiy
On March 8, 2017, 4:08 a.m., Robert Levas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57410/
> -----------------------------------------------------------
>
> (Updated March 8, 2017, 4:08 a.m.)
>
>
> Review request for Ambari, Attila Magyar, bhuvnesh chaudhary, Bal�zs Bence S�ri, Eugene Chekanskiy, jun aoki, Laszlo Puskas, and Sebastian Toader.
>
>
> Bugs: AMBARI-20349
> https://issues.apache.org/jira/browse/AMBARI-20349
>
>
> Repository: ambari
>
>
> Description
> -------
>
> When SPNEGO authentication is enabled for Hadoop in a cluster where NN HA is enabled, PXF Process alert fails with the following errors in the ambari-agent.log file
>
> ```
> ERROR 2017-03-07 18:03:58,417 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6401.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesy
> stem
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
> data_dict = json.loads(data)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> INFO 2017-03-07 18:04:02,769 logger.py:71 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmphTXg76 2>/tmp/tmp5bm2nM''] {'quiet': False}
> INFO 2017-03-07 18:04:02,797 logger.py:71 - call returned (0, '')
> ERROR 2017-03-07 18:04:02,798 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
> data_dict = json.loads(data)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> ```
>
> # Cause
> During the test for the _PXF Process_ alert, the Active NN is found using a JMX call. This call requires SPNEGO authentication since SPNEGO authentication is turned on for the Hadoop web interfaces. However, a valid Kerberos ticket is not found in the configured user's Kerberos ticket cache. In this case, the configured users is the HDFS user - which technically is not necessary.
>
> This occurs in `common-services/PXF/3.0.0/package/alerts/api_status.py:137`
> ```
> if CLUSTER_ENV_SECURITY in configurations and configurations[CLUSTER_ENV_SECURITY].lower() == "true":
> if 'dfs.nameservices' in configurations[HDFS_SITE]:
> namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> else:
> namenode_address = configurations[HDFS_SITE]['dfs.namenode.http-address']
>
> token = _get_delegation_token(namenode_address,
> configurations[HADOOP_ENV_HDFS_USER],
> configurations[HADOOP_ENV_HDFS_USER_KEYTAB],
> configurations[HADOOP_ENV_HDFS_PRINCIPAL_NAME],
> None)
> commonPXFHeaders.update({"X-GP-TOKEN": token})
> ```
>
> Inside the call at
>
> ```
> namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> ```
>
> # Solution
> Ensure the configured user's Kerberos ticket cache contains a valid ticket before querying for the active NN. Possibly change the acting user to one executing the PXF component.
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py d0ed0a4
>
>
> Diff: https://reviews.apache.org/r/57410/diff/1/
>
>
> Testing
> -------
>
> Manually tested in cluster - Ambari 2.5 with HPD 2.5 and HDB 2.1.2
>
>
> Thanks,
>
> Robert Levas
>
>
Re: Review Request 57410: When SPNEGO authentication is enabled for
Hadoop in a cluster with NN HA, PXF Process alert fails
Posted by Sebastian Toader <st...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57410/#review168287
-----------------------------------------------------------
Ship it!
Ship It!
- Sebastian Toader
On March 8, 2017, 5:08 a.m., Robert Levas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57410/
> -----------------------------------------------------------
>
> (Updated March 8, 2017, 5:08 a.m.)
>
>
> Review request for Ambari, Attila Magyar, bhuvnesh chaudhary, Bal�zs Bence S�ri, Eugene Chekanskiy, jun aoki, Laszlo Puskas, and Sebastian Toader.
>
>
> Bugs: AMBARI-20349
> https://issues.apache.org/jira/browse/AMBARI-20349
>
>
> Repository: ambari
>
>
> Description
> -------
>
> When SPNEGO authentication is enabled for Hadoop in a cluster where NN HA is enabled, PXF Process alert fails with the following errors in the ambari-agent.log file
>
> ```
> ERROR 2017-03-07 18:03:58,417 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6401.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesy
> stem
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
> data_dict = json.loads(data)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> INFO 2017-03-07 18:04:02,769 logger.py:71 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmphTXg76 2>/tmp/tmp5bm2nM''] {'quiet': False}
> INFO 2017-03-07 18:04:02,797 logger.py:71 - call returned (0, '')
> ERROR 2017-03-07 18:04:02,798 jmx.py:44 - Getting jmx metrics from NN failed. URL: http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 41, in get_value_from_jmx
> data_dict = json.loads(data)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 335, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 353, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> ```
>
> # Cause
> During the test for the _PXF Process_ alert, the Active NN is found using a JMX call. This call requires SPNEGO authentication since SPNEGO authentication is turned on for the Hadoop web interfaces. However, a valid Kerberos ticket is not found in the configured user's Kerberos ticket cache. In this case, the configured users is the HDFS user - which technically is not necessary.
>
> This occurs in `common-services/PXF/3.0.0/package/alerts/api_status.py:137`
> ```
> if CLUSTER_ENV_SECURITY in configurations and configurations[CLUSTER_ENV_SECURITY].lower() == "true":
> if 'dfs.nameservices' in configurations[HDFS_SITE]:
> namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> else:
> namenode_address = configurations[HDFS_SITE]['dfs.namenode.http-address']
>
> token = _get_delegation_token(namenode_address,
> configurations[HADOOP_ENV_HDFS_USER],
> configurations[HADOOP_ENV_HDFS_USER_KEYTAB],
> configurations[HADOOP_ENV_HDFS_PRINCIPAL_NAME],
> None)
> commonPXFHeaders.update({"X-GP-TOKEN": token})
> ```
>
> Inside the call at
>
> ```
> namenode_address = get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
> ```
>
> # Solution
> Ensure the configured user's Kerberos ticket cache contains a valid ticket before querying for the active NN. Possibly change the acting user to one executing the PXF component.
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/common-services/PXF/3.0.0/package/alerts/api_status.py d0ed0a4
>
>
> Diff: https://reviews.apache.org/r/57410/diff/1/
>
>
> Testing
> -------
>
> Manually tested in cluster - Ambari 2.5 with HPD 2.5 and HDB 2.1.2
>
>
> Thanks,
>
> Robert Levas
>
>