You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ambari.apache.org by Greg Hill <gr...@RACKSPACE.COM> on 2015/03/02 20:07:44 UTC

decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting the names in a comma-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON array, but the server complained about that.  Let me know if that is the wrong format.  The decommission request completes successfully, it just never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode"),
                "parameters": {"slave_type": "DATANODE", "excluded_hosts": "slave-1.local,slave-2.local"},
                "operation_level": {
"level": "CLUSTER",
"cluster_name": cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": "HDFS",
                "component_name": "NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode"),
                "parameters": {"slave_type": "DATANODE", "excluded_hosts": "slave-1.local"},
                "operation_level": {
"level": "HOST_COMPONENT",
"cluster_name": cluster_name,
"host_name": "slave-1.local",
"service_name": "HDFS"
},
            },
            "Requests/resource_filters": [{
                "service_name": "HDFS",
                "component_name": "NAMENODE",
            }],

Looking on the actual node, it's obvious that the file isn't being written by the command output:

(multiple hosts, notice there is no 'Writing File' line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param, so it's not like the request is passing the information along incorrectly.  I'm going to play around with the format I use to pass it in and take some wild guesses like it's expecting double-encoded JSON as I've seen that in other places, but if someone knows the answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in Ambari, I'll open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg

Re: COMMERCIAL:Re: decommission multiple nodes issue

Posted by Yusaku Sako <yu...@hortonworks.com>.
Greg,

You should be able to make the call as documented in the Wiki.
I've tested on Ambari 1.6.1 and Ambari 2.0.0 and the calls work fine for both DataNodes and NodeManagers, regardless of host maintenance mode.
Please let us know if you see issues.

Thanks,
Yusaku

From: Greg Hill <gr...@RACKSPACE.COM>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Wednesday, March 4, 2015 6:26 AM
To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>, Sean Roberts <sr...@hortonworks.com>>
Subject: Re: COMMERCIAL:Re: decommission multiple nodes issue

IIRC switching it to HOST_COMPONENT made it so I couldn't pass in multiple hosts (that was what I was doing originally, and Ambari just rejected the request outright, unless my memory is tricking me).  Maybe I just needed slightly different syntax for that case?

Also, decommissioning NODEMANAGER using CLUSTER and a list of hosts did not exhibit the same behavior.  It seemed to decommission them properly, even when in maintenance mode.

Greg

From: Yusaku Sako <yu...@hortonworks.com>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Tuesday, March 3, 2015 at 9:41 PM
To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>, Sean Roberts <sr...@hortonworks.com>>
Subject: COMMERCIAL:Re: decommission multiple nodes issue

Hi Greg,

This is actually by design.
If you want to decommission all DataNodes regardless of their host maintenance mode, you need to change "RequestInfo/level" from "CLUSTER" to "HOST_COMPONENT".
When you set the "level" to "CLUSTER", bulk operations (in this case decommission) would be skipped on the matching target resources in case the host(s) are in maintenance mode.
If you set to "HOST_COMPONENT", it would ignore any host-level maintenance mode.
This is a really mysterious, undocumented part of Ambari, unfortunately.

Yusaku

From: Greg Hill <gr...@RACKSPACE.COM>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Tuesday, March 3, 2015 9:32 AM
To: Sean Roberts <sr...@hortonworks.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

I have verified that if maintenance mode is set on a host, then it is ignored by the decommission process, but only if you try to decommission multiple hosts at the same time.  I'll open a bug.

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:34 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

Greg - Same here on submitting JSON. Although they are JSON documents you have to submit them as plain form. This is true across all of Ambari. I opened a bug for it a month back.


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Date: March 2, 2015 at 19:32:34
To: Sean Roberts <sr...@hortonworks.com>, user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  Re: decommission multiple nodes issue

That causes a server error.  I’ve yet to see any part of the API that accepts JSON arrays like that as input; it’s almost always, if not always, a comma-separated string like I posted.  Many methods even return double-encoded JSON values (i.e. “key”: “[\”value1\”,\”value2\”]").  It’s kind of annoying and inconsistent, honestly, and not documented anywhere.  You just have to have your client code choke on it and then go add another data[key] = json.loads(data[key]) in the client to account for it.

I am starting to think it’s because I set the nodes into maintenance mode first, as doing the decommission command manually from the client works fine when the nodes aren’t in maintenance mode.  I’ll keep digging, I guess, but it is weird that the exact same command worked this time (the commandArgs are identical to the one that did nothing).

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:22 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue


Racker Greg - I’m not familiar with the decommissioning API, but if it’s consistent with the rest of Ambari, you’ll need to change from this:

"excluded_hosts": “slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","slave-2.local" ]


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Reply: user@ambari.apache.org<ma...@ambari.apache.org>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting the names in a comma-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON array, but the server complained about that.  Let me know if that is the wrong format.  The decommission request completes successfully, it just never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local,slave-2.local"},
                "operation_level": {
“level”: “CLUSTER”,
“cluster_name”: cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local"},
                "operation_level": {
“level”: “HOST_COMPONENT”,
“cluster_name”: cluster_name,
“host_name”: “slave-1.local”,
“service_name”: “HDFS”
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

Looking on the actual node, it’s obvious that the file isn’t being written by the command output:

(multiple hosts, notice there is no ‘Writing File’ line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param, so it’s not like the request is passing the information along incorrectly.  I’m going to play around with the format I use to pass it in and take some wild guesses like it’s expecting double-encoded JSON as I’ve seen that in other places, but if someone knows the answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in Ambari, I’ll open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg

Re: COMMERCIAL:Re: decommission multiple nodes issue

Posted by Greg Hill <gr...@RACKSPACE.COM>.
IIRC switching it to HOST_COMPONENT made it so I couldn't pass in multiple hosts (that was what I was doing originally, and Ambari just rejected the request outright, unless my memory is tricking me).  Maybe I just needed slightly different syntax for that case?

Also, decommissioning NODEMANAGER using CLUSTER and a list of hosts did not exhibit the same behavior.  It seemed to decommission them properly, even when in maintenance mode.

Greg

From: Yusaku Sako <yu...@hortonworks.com>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Tuesday, March 3, 2015 at 9:41 PM
To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>, Sean Roberts <sr...@hortonworks.com>>
Subject: COMMERCIAL:Re: decommission multiple nodes issue

Hi Greg,

This is actually by design.
If you want to decommission all DataNodes regardless of their host maintenance mode, you need to change "RequestInfo/level" from "CLUSTER" to "HOST_COMPONENT".
When you set the "level" to "CLUSTER", bulk operations (in this case decommission) would be skipped on the matching target resources in case the host(s) are in maintenance mode.
If you set to "HOST_COMPONENT", it would ignore any host-level maintenance mode.
This is a really mysterious, undocumented part of Ambari, unfortunately.

Yusaku

From: Greg Hill <gr...@RACKSPACE.COM>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Tuesday, March 3, 2015 9:32 AM
To: Sean Roberts <sr...@hortonworks.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

I have verified that if maintenance mode is set on a host, then it is ignored by the decommission process, but only if you try to decommission multiple hosts at the same time.  I'll open a bug.

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:34 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

Greg - Same here on submitting JSON. Although they are JSON documents you have to submit them as plain form. This is true across all of Ambari. I opened a bug for it a month back.


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Date: March 2, 2015 at 19:32:34
To: Sean Roberts <sr...@hortonworks.com>, user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  Re: decommission multiple nodes issue

That causes a server error.  I’ve yet to see any part of the API that accepts JSON arrays like that as input; it’s almost always, if not always, a comma-separated string like I posted.  Many methods even return double-encoded JSON values (i.e. “key”: “[\”value1\”,\”value2\”]").  It’s kind of annoying and inconsistent, honestly, and not documented anywhere.  You just have to have your client code choke on it and then go add another data[key] = json.loads(data[key]) in the client to account for it.

I am starting to think it’s because I set the nodes into maintenance mode first, as doing the decommission command manually from the client works fine when the nodes aren’t in maintenance mode.  I’ll keep digging, I guess, but it is weird that the exact same command worked this time (the commandArgs are identical to the one that did nothing).

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:22 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue


Racker Greg - I’m not familiar with the decommissioning API, but if it’s consistent with the rest of Ambari, you’ll need to change from this:

"excluded_hosts": “slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","slave-2.local" ]


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Reply: user@ambari.apache.org<ma...@ambari.apache.org>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting the names in a comma-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON array, but the server complained about that.  Let me know if that is the wrong format.  The decommission request completes successfully, it just never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local,slave-2.local"},
                "operation_level": {
“level”: “CLUSTER”,
“cluster_name”: cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local"},
                "operation_level": {
“level”: “HOST_COMPONENT”,
“cluster_name”: cluster_name,
“host_name”: “slave-1.local”,
“service_name”: “HDFS”
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

Looking on the actual node, it’s obvious that the file isn’t being written by the command output:

(multiple hosts, notice there is no ‘Writing File’ line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param, so it’s not like the request is passing the information along incorrectly.  I’m going to play around with the format I use to pass it in and take some wild guesses like it’s expecting double-encoded JSON as I’ve seen that in other places, but if someone knows the answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in Ambari, I’ll open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg

Re: decommission multiple nodes issue

Posted by Yusaku Sako <yu...@hortonworks.com>.
Sorry, the Wiki title changed:
https://cwiki.apache.org/confluence/display/AMBARI/API+to+decommission+DataNodes+and+NodeManagers

From: Yusaku Sako <yu...@hortonworks.com>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Tuesday, March 3, 2015 8:49 PM
To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>, Sean Roberts <sr...@hortonworks.com>>
Subject: Re: decommission multiple nodes issue

BTW, I've started a new Wiki on decommissioning DataNodes: https://cwiki.apache.org/confluence/display/AMBARI/API+to+decommission+DataNodes

Yusaku

From: Yusaku Sako <yu...@hortonworks.com>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Tuesday, March 3, 2015 7:41 PM
To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>, Sean Roberts <sr...@hortonworks.com>>
Subject: Re: decommission multiple nodes issue

Hi Greg,

This is actually by design.
If you want to decommission all DataNodes regardless of their host maintenance mode, you need to change "RequestInfo/level" from "CLUSTER" to "HOST_COMPONENT".
When you set the "level" to "CLUSTER", bulk operations (in this case decommission) would be skipped on the matching target resources in case the host(s) are in maintenance mode.
If you set to "HOST_COMPONENT", it would ignore any host-level maintenance mode.
This is a really mysterious, undocumented part of Ambari, unfortunately.

Yusaku

From: Greg Hill <gr...@RACKSPACE.COM>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Tuesday, March 3, 2015 9:32 AM
To: Sean Roberts <sr...@hortonworks.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

I have verified that if maintenance mode is set on a host, then it is ignored by the decommission process, but only if you try to decommission multiple hosts at the same time.  I'll open a bug.

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:34 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

Greg - Same here on submitting JSON. Although they are JSON documents you have to submit them as plain form. This is true across all of Ambari. I opened a bug for it a month back.


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Date: March 2, 2015 at 19:32:34
To: Sean Roberts <sr...@hortonworks.com>, user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  Re: decommission multiple nodes issue

That causes a server error.  I’ve yet to see any part of the API that accepts JSON arrays like that as input; it’s almost always, if not always, a comma-separated string like I posted.  Many methods even return double-encoded JSON values (i.e. “key”: “[\”value1\”,\”value2\”]").  It’s kind of annoying and inconsistent, honestly, and not documented anywhere.  You just have to have your client code choke on it and then go add another data[key] = json.loads(data[key]) in the client to account for it.

I am starting to think it’s because I set the nodes into maintenance mode first, as doing the decommission command manually from the client works fine when the nodes aren’t in maintenance mode.  I’ll keep digging, I guess, but it is weird that the exact same command worked this time (the commandArgs are identical to the one that did nothing).

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:22 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue


Racker Greg - I’m not familiar with the decommissioning API, but if it’s consistent with the rest of Ambari, you’ll need to change from this:

"excluded_hosts": “slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","slave-2.local" ]


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Reply: user@ambari.apache.org<ma...@ambari.apache.org>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting the names in a comma-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON array, but the server complained about that.  Let me know if that is the wrong format.  The decommission request completes successfully, it just never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local,slave-2.local"},
                "operation_level": {
“level”: “CLUSTER”,
“cluster_name”: cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local"},
                "operation_level": {
“level”: “HOST_COMPONENT”,
“cluster_name”: cluster_name,
“host_name”: “slave-1.local”,
“service_name”: “HDFS”
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

Looking on the actual node, it’s obvious that the file isn’t being written by the command output:

(multiple hosts, notice there is no ‘Writing File’ line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param, so it’s not like the request is passing the information along incorrectly.  I’m going to play around with the format I use to pass it in and take some wild guesses like it’s expecting double-encoded JSON as I’ve seen that in other places, but if someone knows the answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in Ambari, I’ll open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg

Re: decommission multiple nodes issue

Posted by Yusaku Sako <yu...@hortonworks.com>.
BTW, I've started a new Wiki on decommissioning DataNodes: https://cwiki.apache.org/confluence/display/AMBARI/API+to+decommission+DataNodes

Yusaku

From: Yusaku Sako <yu...@hortonworks.com>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Tuesday, March 3, 2015 7:41 PM
To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>, Sean Roberts <sr...@hortonworks.com>>
Subject: Re: decommission multiple nodes issue

Hi Greg,

This is actually by design.
If you want to decommission all DataNodes regardless of their host maintenance mode, you need to change "RequestInfo/level" from "CLUSTER" to "HOST_COMPONENT".
When you set the "level" to "CLUSTER", bulk operations (in this case decommission) would be skipped on the matching target resources in case the host(s) are in maintenance mode.
If you set to "HOST_COMPONENT", it would ignore any host-level maintenance mode.
This is a really mysterious, undocumented part of Ambari, unfortunately.

Yusaku

From: Greg Hill <gr...@RACKSPACE.COM>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Tuesday, March 3, 2015 9:32 AM
To: Sean Roberts <sr...@hortonworks.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

I have verified that if maintenance mode is set on a host, then it is ignored by the decommission process, but only if you try to decommission multiple hosts at the same time.  I'll open a bug.

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:34 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

Greg - Same here on submitting JSON. Although they are JSON documents you have to submit them as plain form. This is true across all of Ambari. I opened a bug for it a month back.


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Date: March 2, 2015 at 19:32:34
To: Sean Roberts <sr...@hortonworks.com>, user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  Re: decommission multiple nodes issue

That causes a server error.  I’ve yet to see any part of the API that accepts JSON arrays like that as input; it’s almost always, if not always, a comma-separated string like I posted.  Many methods even return double-encoded JSON values (i.e. “key”: “[\”value1\”,\”value2\”]").  It’s kind of annoying and inconsistent, honestly, and not documented anywhere.  You just have to have your client code choke on it and then go add another data[key] = json.loads(data[key]) in the client to account for it.

I am starting to think it’s because I set the nodes into maintenance mode first, as doing the decommission command manually from the client works fine when the nodes aren’t in maintenance mode.  I’ll keep digging, I guess, but it is weird that the exact same command worked this time (the commandArgs are identical to the one that did nothing).

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:22 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue


Racker Greg - I’m not familiar with the decommissioning API, but if it’s consistent with the rest of Ambari, you’ll need to change from this:

"excluded_hosts": “slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","slave-2.local" ]


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Reply: user@ambari.apache.org<ma...@ambari.apache.org>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting the names in a comma-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON array, but the server complained about that.  Let me know if that is the wrong format.  The decommission request completes successfully, it just never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local,slave-2.local"},
                "operation_level": {
“level”: “CLUSTER”,
“cluster_name”: cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local"},
                "operation_level": {
“level”: “HOST_COMPONENT”,
“cluster_name”: cluster_name,
“host_name”: “slave-1.local”,
“service_name”: “HDFS”
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

Looking on the actual node, it’s obvious that the file isn’t being written by the command output:

(multiple hosts, notice there is no ‘Writing File’ line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param, so it’s not like the request is passing the information along incorrectly.  I’m going to play around with the format I use to pass it in and take some wild guesses like it’s expecting double-encoded JSON as I’ve seen that in other places, but if someone knows the answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in Ambari, I’ll open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg

Re: decommission multiple nodes issue

Posted by Yusaku Sako <yu...@hortonworks.com>.
Hi Greg,

This is actually by design.
If you want to decommission all DataNodes regardless of their host maintenance mode, you need to change "RequestInfo/level" from "CLUSTER" to "HOST_COMPONENT".
When you set the "level" to "CLUSTER", bulk operations (in this case decommission) would be skipped on the matching target resources in case the host(s) are in maintenance mode.
If you set to "HOST_COMPONENT", it would ignore any host-level maintenance mode.
This is a really mysterious, undocumented part of Ambari, unfortunately.

Yusaku

From: Greg Hill <gr...@RACKSPACE.COM>>
Reply-To: "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Date: Tuesday, March 3, 2015 9:32 AM
To: Sean Roberts <sr...@hortonworks.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

I have verified that if maintenance mode is set on a host, then it is ignored by the decommission process, but only if you try to decommission multiple hosts at the same time.  I'll open a bug.

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:34 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

Greg - Same here on submitting JSON. Although they are JSON documents you have to submit them as plain form. This is true across all of Ambari. I opened a bug for it a month back.


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Date: March 2, 2015 at 19:32:34
To: Sean Roberts <sr...@hortonworks.com>, user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  Re: decommission multiple nodes issue

That causes a server error.  I’ve yet to see any part of the API that accepts JSON arrays like that as input; it’s almost always, if not always, a comma-separated string like I posted.  Many methods even return double-encoded JSON values (i.e. “key”: “[\”value1\”,\”value2\”]").  It’s kind of annoying and inconsistent, honestly, and not documented anywhere.  You just have to have your client code choke on it and then go add another data[key] = json.loads(data[key]) in the client to account for it.

I am starting to think it’s because I set the nodes into maintenance mode first, as doing the decommission command manually from the client works fine when the nodes aren’t in maintenance mode.  I’ll keep digging, I guess, but it is weird that the exact same command worked this time (the commandArgs are identical to the one that did nothing).

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:22 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue


Racker Greg - I’m not familiar with the decommissioning API, but if it’s consistent with the rest of Ambari, you’ll need to change from this:

"excluded_hosts": “slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","slave-2.local" ]


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Reply: user@ambari.apache.org<ma...@ambari.apache.org>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting the names in a comma-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON array, but the server complained about that.  Let me know if that is the wrong format.  The decommission request completes successfully, it just never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local,slave-2.local"},
                "operation_level": {
“level”: “CLUSTER”,
“cluster_name”: cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local"},
                "operation_level": {
“level”: “HOST_COMPONENT”,
“cluster_name”: cluster_name,
“host_name”: “slave-1.local”,
“service_name”: “HDFS”
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

Looking on the actual node, it’s obvious that the file isn’t being written by the command output:

(multiple hosts, notice there is no ‘Writing File’ line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param, so it’s not like the request is passing the information along incorrectly.  I’m going to play around with the format I use to pass it in and take some wild guesses like it’s expecting double-encoded JSON as I’ve seen that in other places, but if someone knows the answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in Ambari, I’ll open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg

Re: decommission multiple nodes issue

Posted by Greg Hill <gr...@RACKSPACE.COM>.
I have verified that if maintenance mode is set on a host, then it is ignored by the decommission process, but only if you try to decommission multiple hosts at the same time.  I'll open a bug.

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:34 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue

Greg - Same here on submitting JSON. Although they are JSON documents you have to submit them as plain form. This is true across all of Ambari. I opened a bug for it a month back.


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Date: March 2, 2015 at 19:32:34
To: Sean Roberts <sr...@hortonworks.com>, user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  Re: decommission multiple nodes issue

That causes a server error.  I’ve yet to see any part of the API that accepts JSON arrays like that as input; it’s almost always, if not always, a comma-separated string like I posted.  Many methods even return double-encoded JSON values (i.e. “key”: “[\”value1\”,\”value2\”]").  It’s kind of annoying and inconsistent, honestly, and not documented anywhere.  You just have to have your client code choke on it and then go add another data[key] = json.loads(data[key]) in the client to account for it.

I am starting to think it’s because I set the nodes into maintenance mode first, as doing the decommission command manually from the client works fine when the nodes aren’t in maintenance mode.  I’ll keep digging, I guess, but it is weird that the exact same command worked this time (the commandArgs are identical to the one that did nothing).

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:22 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue


Racker Greg - I’m not familiar with the decommissioning API, but if it’s consistent with the rest of Ambari, you’ll need to change from this:

"excluded_hosts": “slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","slave-2.local" ]


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Reply: user@ambari.apache.org<ma...@ambari.apache.org>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org<ma...@ambari.apache.org>
Subject:  decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting the names in a comma-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON array, but the server complained about that.  Let me know if that is the wrong format.  The decommission request completes successfully, it just never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local,slave-2.local"},
                "operation_level": {
“level”: “CLUSTER”,
“cluster_name”: cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local"},
                "operation_level": {
“level”: “HOST_COMPONENT”,
“cluster_name”: cluster_name,
“host_name”: “slave-1.local”,
“service_name”: “HDFS”
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

Looking on the actual node, it’s obvious that the file isn’t being written by the command output:

(multiple hosts, notice there is no ‘Writing File’ line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param, so it’s not like the request is passing the information along incorrectly.  I’m going to play around with the format I use to pass it in and take some wild guesses like it’s expecting double-encoded JSON as I’ve seen that in other places, but if someone knows the answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in Ambari, I’ll open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg

Re: decommission multiple nodes issue

Posted by Sean Roberts <sr...@hortonworks.com>.
Greg - Same here on submitting JSON. Although they are JSON documents you have to submit them as plain form. This is true across all of Ambari. I opened a bug for it a month back.


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Date: March 2, 2015 at 19:32:34
To: Sean Roberts <sr...@hortonworks.com>, user@ambari.apache.org <us...@ambari.apache.org>
Subject:  Re: decommission multiple nodes issue

That causes a server error.  I’ve yet to see any part of the API that accepts JSON arrays like that as input; it’s almost always, if not always, a comma-separated string like I posted.  Many methods even return double-encoded JSON values (i.e. “key”: “[\”value1\”,\”value2\”]").  It’s kind of annoying and inconsistent, honestly, and not documented anywhere.  You just have to have your client code choke on it and then go add another data[key] = json.loads(data[key]) in the client to account for it.

I am starting to think it’s because I set the nodes into maintenance mode first, as doing the decommission command manually from the client works fine when the nodes aren’t in maintenance mode.  I’ll keep digging, I guess, but it is weird that the exact same command worked this time (the commandArgs are identical to the one that did nothing).

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:22 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue


Racker Greg - I’m not familiar with the decommissioning API, but if it’s consistent with the rest of Ambari, you’ll need to change from this:

"excluded_hosts": “slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","slave-2.local" ]


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Reply: user@ambari.apache.org<ma...@ambari.apache.org> <us...@ambari.apache.org>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org<ma...@ambari.apache.org> <us...@ambari.apache.org>
Subject:  decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting the names in a comma-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON array, but the server complained about that.  Let me know if that is the wrong format.  The decommission request completes successfully, it just never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local,slave-2.local"},
                "operation_level": {
“level”: “CLUSTER”,
“cluster_name”: cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local"},
                "operation_level": {
“level”: “HOST_COMPONENT”,
“cluster_name”: cluster_name,
“host_name”: “slave-1.local”,
“service_name”: “HDFS”
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

Looking on the actual node, it’s obvious that the file isn’t being written by the command output:

(multiple hosts, notice there is no ‘Writing File’ line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param, so it’s not like the request is passing the information along incorrectly.  I’m going to play around with the format I use to pass it in and take some wild guesses like it’s expecting double-encoded JSON as I’ve seen that in other places, but if someone knows the answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in Ambari, I’ll open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg

Re: decommission multiple nodes issue

Posted by Greg Hill <gr...@RACKSPACE.COM>.
That causes a server error.  I’ve yet to see any part of the API that accepts JSON arrays like that as input; it’s almost always, if not always, a comma-separated string like I posted.  Many methods even return double-encoded JSON values (i.e. “key”: “[\”value1\”,\”value2\”]").  It’s kind of annoying and inconsistent, honestly, and not documented anywhere.  You just have to have your client code choke on it and then go add another data[key] = json.loads(data[key]) in the client to account for it.

I am starting to think it’s because I set the nodes into maintenance mode first, as doing the decommission command manually from the client works fine when the nodes aren’t in maintenance mode.  I’ll keep digging, I guess, but it is weird that the exact same command worked this time (the commandArgs are identical to the one that did nothing).

Greg

From: Sean Roberts <sr...@hortonworks.com>>
Date: Monday, March 2, 2015 at 1:22 PM
To: Greg <gr...@rackspace.com>>, "user@ambari.apache.org<ma...@ambari.apache.org>" <us...@ambari.apache.org>>
Subject: Re: decommission multiple nodes issue


Racker Greg - I’m not familiar with the decommissioning API, but if it’s consistent with the rest of Ambari, you’ll need to change from this:

"excluded_hosts": “slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","slave-2.local" ]


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Reply: user@ambari.apache.org<ma...@ambari.apache.org> <us...@ambari.apache.org>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org<ma...@ambari.apache.org> <us...@ambari.apache.org>
Subject:  decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting the names in a comma-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON array, but the server complained about that.  Let me know if that is the wrong format.  The decommission request completes successfully, it just never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local,slave-2.local"},
                "operation_level": {
“level”: “CLUSTER”,
“cluster_name”: cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local"},
                "operation_level": {
“level”: “HOST_COMPONENT”,
“cluster_name”: cluster_name,
“host_name”: “slave-1.local”,
“service_name”: “HDFS”
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

Looking on the actual node, it’s obvious that the file isn’t being written by the command output:

(multiple hosts, notice there is no ‘Writing File’ line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param, so it’s not like the request is passing the information along incorrectly.  I’m going to play around with the format I use to pass it in and take some wild guesses like it’s expecting double-encoded JSON as I’ve seen that in other places, but if someone knows the answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in Ambari, I’ll open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg

Re: decommission multiple nodes issue

Posted by Sean Roberts <sr...@hortonworks.com>.
Racker Greg - I’m not familiar with the decommissioning API, but if it’s consistent with the rest of Ambari, you’ll need to change from this:

"excluded_hosts": “slave-1.local,slave-2.local"

To this:

"excluded_hosts" : [ "slave-1.local","slave-2.local" ]


--
Hortonworks - We do Hadoop

Sean Roberts
Partner Solutions Engineer - EMEA
@seano

From: Greg Hill <gr...@rackspace.com>
Reply: user@ambari.apache.org <us...@ambari.apache.org>
Date: March 2, 2015 at 19:08:13
To: user@ambari.apache.org <us...@ambari.apache.org>
Subject:  decommission multiple nodes issue

I have some code for decommissioning datanodes prior to removal.  It seems to work fine with a single node, but with multiple nodes it fails.  When passing multiple hosts, I am putting the names in a comma-separated string, as seems to be the custom with other Ambari API commands.  I attempted to send it as a JSON array, but the server complained about that.  Let me know if that is the wrong format.  The decommission request completes successfully, it just never writes the excludes file so no nodes are decommissioned.

This fails for mutiple nodes:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local,slave-2.local"},
                "operation_level": {
“level”: “CLUSTER”,
“cluster_name”: cluster_name
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

But this works for a single node:

"RequestInfo": {
                "command": "DECOMMISSION",
                "context": "Decommission DataNode”),
                "parameters": {"slave_type": “DATANODE", "excluded_hosts": “slave-1.local"},
                "operation_level": {
“level”: “HOST_COMPONENT”,
“cluster_name”: cluster_name,
“host_name”: “slave-1.local”,
“service_name”: “HDFS”
},
            },
            "Requests/resource_filters": [{
                "service_name": “HDFS",
                "component_name": “NAMENODE",
            }],

Looking on the actual node, it’s obvious that the file isn’t being written by the command output:

(multiple hosts, notice there is no ‘Writing File’ line)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

(single host, it writes the exclude file)
File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
Writing File['/etc/hadoop/conf/dfs.exclude'] because contents don't match
Execute[''] {'user': 'hdfs'}
ExecuteHadoop['dfsadmin -refreshNodes'] {'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'conf_dir': '/etc/hadoop/conf', 'kinit_override': True, 'user': 'hdfs'}
Execute['hadoop --config /etc/hadoop/conf dfsadmin -refreshNodes'] {'logoutput': False, 'path': ['/usr/hdp/current/hadoop-client/bin'], 'tries': 1, 'user': 'hdfs', 'try_sleep': 0}

The only notable difference in the command.json is the commandParams/excluded_hosts param, so it’s not like the request is passing the information along incorrectly.  I’m going to play around with the format I use to pass it in and take some wild guesses like it’s expecting double-encoded JSON as I’ve seen that in other places, but if someone knows the answer offhand and can help out, that would be appreciated.  If it turns out to be a bug in Ambari, I’ll open a JIRA and rewrite our code to issue the decommission call independently for each host.

Greg