You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@ambari.apache.org by Dmitro Lisnichenko <dl...@hortonworks.com> on 2017/04/05 12:27:34 UTC

Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/
-----------------------------------------------------------

Review request for Ambari, Jonathan Hurley and Nate Cole.


Bugs: AMBARI-20682
    https://issues.apache.org/jira/browse/AMBARI-20682


Repository: ambari


Description
-------

During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.

Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:

{code}
2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
{code}

Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.

Instead, we should also monitor for the PID.


Diffs
-----

  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
  ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 


Diff: https://reviews.apache.org/r/58208/diff/1/


Testing
-------

mvn clean test


Thanks,

Dmitro Lisnichenko

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.


> On April 5, 2017, 4:10 p.m., Jonathan Hurley wrote:
> > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
> > Lines 51-59 (original), 51-59 (patched)
> > <https://reviews.apache.org/r/58208/diff/1/?file=1685218#file1685218line51>
> >
> >     Let's say that this fails to stop gracefully on the de-register. The code which calls this is looking for a boolean to be returned:
> >     
> >     ```
> >           stopped = datanode_upgrade.pre_rolling_upgrade_shutdown(hdfs_binary)
> >           if not stopped:
> >             datanode(action="stop")
> >     ```
> >     
> >     Should we try/catch `_check_datanode_shutdown` and return False if it fails?

this try-catch whould shadow a failure that is covered by 2 our tests


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171118
-----------------------------------------------------------


On April 5, 2017, 3:27 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 5, 2017, 3:27 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/1/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.


> On April 5, 2017, 4:10 p.m., Jonathan Hurley wrote:
> > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
> > Lines 51-59 (original), 51-59 (patched)
> > <https://reviews.apache.org/r/58208/diff/1/?file=1685218#file1685218line51>
> >
> >     Let's say that this fails to stop gracefully on the de-register. The code which calls this is looking for a boolean to be returned:
> >     
> >     ```
> >           stopped = datanode_upgrade.pre_rolling_upgrade_shutdown(hdfs_binary)
> >           if not stopped:
> >             datanode(action="stop")
> >     ```
> >     
> >     Should we try/catch `_check_datanode_shutdown` and return False if it fails?
> 
> Dmitro Lisnichenko wrote:
>     this try-catch whould shadow a failure that is covered by 2 our tests

* would


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171118
-----------------------------------------------------------


On April 5, 2017, 3:27 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 5, 2017, 3:27 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/1/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Jonathan Hurley <jh...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171118
-----------------------------------------------------------




ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
Lines 51-59 (original), 51-59 (patched)
<https://reviews.apache.org/r/58208/#comment244005>

    Let's say that this fails to stop gracefully on the de-register. The code which calls this is looking for a boolean to be returned:
    
    ```
          stopped = datanode_upgrade.pre_rolling_upgrade_shutdown(hdfs_binary)
          if not stopped:
            datanode(action="stop")
    ```
    
    Should we try/catch `_check_datanode_shutdown` and return False if it fails?


- Jonathan Hurley


On April 5, 2017, 8:27 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 5, 2017, 8:27 a.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/1/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Alejandro Fernandez <af...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171593
-----------------------------------------------------------




ambari-common/src/main/python/resource_management/libraries/script/script.py
Lines 323 (patched)
<https://reviews.apache.org/r/58208/#comment244535>

    What does "afix" mean?



ambari-common/src/main/python/resource_management/libraries/script/script.py
Line 328 (original), 347 (patched)
<https://reviews.apache.org/r/58208/#comment244534>

    Add some doc.



ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py
Line 24 (original), 24 (patched)
<https://reviews.apache.org/r/58208/#comment244533>

    Can we remove this import *?


- Alejandro Fernandez


On April 11, 2017, 3:22 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 11, 2017, 3:22 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04278 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343fb2 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26cace 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237dd1f 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8e6b 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc14e 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/2/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Jonathan Hurley <jh...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171876
-----------------------------------------------------------


Ship it!




Ship It!

- Jonathan Hurley


On April 12, 2017, 8:57 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 12, 2017, 8:57 a.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26c 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/3/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Nate Cole <nc...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171874
-----------------------------------------------------------


Ship it!




Ship It!

- Nate Cole


On April 12, 2017, 8:57 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 12, 2017, 8:57 a.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26c 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/3/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/
-----------------------------------------------------------

(Updated April 12, 2017, 3:57 p.m.)


Review request for Ambari, Jonathan Hurley and Nate Cole.


Changes
-------

Fixed review comments


Bugs: AMBARI-20682
    https://issues.apache.org/jira/browse/AMBARI-20682


Repository: ambari


Description
-------

During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.

Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:

{code}
2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
{code}

Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.

Instead, we should also monitor for the PID.

-----------------
Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)


Diffs (updated)
-----

  ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04 
  ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26c 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
  ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8 
  ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc 
  ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 


Diff: https://reviews.apache.org/r/58208/diff/3/

Changes: https://reviews.apache.org/r/58208/diff/2-3/


Testing
-------

mvn clean test 
and test on live cluster


Thanks,

Dmitro Lisnichenko

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.


> On April 11, 2017, 9:20 p.m., Alejandro Fernandez wrote:
> > ambari-common/src/main/python/resource_management/libraries/script/script.py
> > Lines 352 (patched)
> > <https://reviews.apache.org/r/58208/diff/2/?file=1688457#file1688457line355>
> >
> >     How does this know it's operating on a stop command, or a restart?

it's called from execute() method above


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171595
-----------------------------------------------------------


On April 12, 2017, 3:57 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 12, 2017, 3:57 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26c 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/3/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.


> On April 11, 2017, 9:20 p.m., Alejandro Fernandez wrote:
> > ambari-common/src/main/python/resource_management/libraries/script/script.py
> > Lines 355 (patched)
> > <https://reviews.apache.org/r/58208/diff/2/?file=1688457#file1688457line358>
> >
> >     Should we have a hard limit, if more than say 5 mins, then abort so we can avoid an infinite loop.

we still have a STOP command timeout limit. Not sure that adding yet another hardcoded timeout is applicable for all cases. Also it would require to fail entire STOP command instead of marking it as timed out.


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171595
-----------------------------------------------------------


On April 11, 2017, 6:22 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 11, 2017, 6:22 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04278 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343fb2 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26cace 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237dd1f 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8e6b 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc14e 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/2/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Alejandro Fernandez <af...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171595
-----------------------------------------------------------




ambari-common/src/main/python/resource_management/libraries/script/script.py
Lines 352 (patched)
<https://reviews.apache.org/r/58208/#comment244537>

    How does this know it's operating on a stop command, or a restart?



ambari-common/src/main/python/resource_management/libraries/script/script.py
Lines 355 (patched)
<https://reviews.apache.org/r/58208/#comment244538>

    Should we have a hard limit, if more than say 5 mins, then abort so we can avoid an infinite loop.


- Alejandro Fernandez


On April 11, 2017, 3:22 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 11, 2017, 3:22 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04278 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343fb2 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26cace 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237dd1f 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8e6b 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc14e 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/2/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Alejandro Fernandez <af...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171594
-----------------------------------------------------------




ambari-common/src/main/python/resource_management/libraries/script/script.py
Lines 323 (patched)
<https://reviews.apache.org/r/58208/#comment244536>

    Nevermind, I see you meant prefix or suffix.


- Alejandro Fernandez


On April 11, 2017, 3:22 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 11, 2017, 3:22 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04278 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343fb2 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26cace 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237dd1f 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8e6b 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc14e 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/2/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.


> On April 11, 2017, 10:18 p.m., Jonathan Hurley wrote:
> > ambari-common/src/main/python/resource_management/libraries/script/script.py
> > Lines 357 (patched)
> > <https://reviews.apache.org/r/58208/diff/2/?file=1688457#file1688457line360>
> >
> >     Is counter < 1000 easier to read?

this condition is true every 100 iterations (it's a remainder after division by 100)


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171616
-----------------------------------------------------------


On April 11, 2017, 6:22 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 11, 2017, 6:22 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04278 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343fb2 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26cace 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237dd1f 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8e6b 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc14e 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/2/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.


> On April 11, 2017, 10:18 p.m., Jonathan Hurley wrote:
> > ambari-common/src/main/python/resource_management/libraries/script/script.py
> > Lines 352 (patched)
> > <https://reviews.apache.org/r/58208/diff/2/?file=1688457#file1688457line355>
> >
> >     Can we call this status_method just to make it clearer what's being executed?

done


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171616
-----------------------------------------------------------


On April 12, 2017, 3:57 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 12, 2017, 3:57 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26c 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/3/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.


> On April 11, 2017, 10:18 p.m., Jonathan Hurley wrote:
> > ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py
> > Lines 34-41 (original), 34-45 (patched)
> > <https://reviews.apache.org/r/58208/diff/2/?file=1688458#file1688458line34>
> >
> >     When executing the status command, won't this run it twice if mysql is running?

looks like you are right


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171616
-----------------------------------------------------------


On April 11, 2017, 6:22 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 11, 2017, 6:22 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04278 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343fb2 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26cace 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237dd1f 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8e6b 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc14e 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/2/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Jonathan Hurley <jh...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171616
-----------------------------------------------------------


Fix it, then Ship it!





ambari-common/src/main/python/resource_management/libraries/script/script.py
Lines 323 (patched)
<https://reviews.apache.org/r/58208/#comment244571>

    Can we call this something a little clearer, like "execute_prefix_function"



ambari-common/src/main/python/resource_management/libraries/script/script.py
Lines 352 (patched)
<https://reviews.apache.org/r/58208/#comment244569>

    Can we call this status_method just to make it clearer what's being executed?



ambari-common/src/main/python/resource_management/libraries/script/script.py
Lines 357 (patched)
<https://reviews.apache.org/r/58208/#comment244570>

    Is counter < 1000 easier to read?



ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py
Lines 34-41 (original), 34-45 (patched)
<https://reviews.apache.org/r/58208/#comment244574>

    When executing the status command, won't this run it twice if mysql is running?


- Jonathan Hurley


On April 11, 2017, 11:22 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 11, 2017, 11:22 a.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> -----------------
> Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04278 
>   ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343fb2 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26cace 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237dd1f 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8e6b 
>   ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc14e 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/2/
> 
> 
> Testing
> -------
> 
> mvn clean test 
> and test on live cluster
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/
-----------------------------------------------------------

(Updated April 11, 2017, 6:22 p.m.)


Review request for Ambari, Jonathan Hurley and Nate Cole.


Bugs: AMBARI-20682
    https://issues.apache.org/jira/browse/AMBARI-20682


Repository: ambari


Description (updated)
-------

During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.

Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:

{code}
2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
{code}

Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.

Instead, we should also monitor for the PID.

-----------------
Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)


Diffs
-----

  ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04278 
  ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343fb2 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26cace 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237dd1f 
  ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8e6b 
  ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc14e 
  ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 


Diff: https://reviews.apache.org/r/58208/diff/2/


Testing
-------

mvn clean test 
and test on live cluster


Thanks,

Dmitro Lisnichenko

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/
-----------------------------------------------------------

(Updated April 11, 2017, 6:21 p.m.)


Review request for Ambari, Jonathan Hurley and Nate Cole.


Bugs: AMBARI-20682
    https://issues.apache.org/jira/browse/AMBARI-20682


Repository: ambari


Description
-------

During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.

Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:

{code}
2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
{code}

Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.

Instead, we should also monitor for the PID.


Diffs
-----

  ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04278 
  ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343fb2 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26cace 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237dd1f 
  ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8e6b 
  ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc14e 
  ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 


Diff: https://reviews.apache.org/r/58208/diff/2/


Testing (updated)
-------

mvn clean test 
and test on live cluster


Thanks,

Dmitro Lisnichenko

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/
-----------------------------------------------------------

(Updated April 11, 2017, 6:21 p.m.)


Review request for Ambari, Jonathan Hurley and Nate Cole.


Changes
-------

More general patch
Now STOP command waits until component really dies. Motivation behind that is: we don't want to execute START of still running component again (e.g. during upgrade/RESTART)


Bugs: AMBARI-20682
    https://issues.apache.org/jira/browse/AMBARI-20682


Repository: ambari


Description
-------

During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.

Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:

{code}
2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
{code}

Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.

Instead, we should also monitor for the PID.


Diffs (updated)
-----

  ambari-common/src/main/python/resource_management/libraries/script/script.py 9a5da04278 
  ambari-funtest/src/test/resources/stacks/HDP/2.0.7/services/HIVE/package/scripts/mysql_service.py 4716343fb2 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py 151e26cace 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237dd1f 
  ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/mysql_service.py 11bbdd8e6b 
  ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/HIVE/package/scripts/postgresql_service.py cc7b4cc14e 
  ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7932 


Diff: https://reviews.apache.org/r/58208/diff/2/

Changes: https://reviews.apache.org/r/58208/diff/1-2/


Testing
-------

mvn clean test


Thanks,

Dmitro Lisnichenko

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Nate Cole <nc...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171128
-----------------------------------------------------------


Ship it!




Ship It!

- Nate Cole


On April 5, 2017, 8:27 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 5, 2017, 8:27 a.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/1/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>

Re: Review Request 58208: Wait For DataNodes To Shutdown During a Rolling Upgrade

Posted by Alejandro Fernandez <af...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58208/#review171147
-----------------------------------------------------------




ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
Line 113 (original), 114 (patched)
<https://reviews.apache.org/r/58208/#comment244035>

    Should we catch a specific exception here that means it was deregistered? Catching any seems too wide a net.



ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py
Line 117 (original), 126 (patched)
<https://reviews.apache.org/r/58208/#comment244036>

    Technically, this function could also be called outside of an upgrade, so the message shouldn't really know it's "upgrade".
    
    Does it make sense to also call this function during any restart command?


- Alejandro Fernandez


On April 5, 2017, 12:27 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58208/
> -----------------------------------------------------------
> 
> (Updated April 5, 2017, 12:27 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-20682
>     https://issues.apache.org/jira/browse/AMBARI-20682
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> During a rolling upgrade (especially on a large, heavily used cluster), the DataNodes do not shutdown immediately. However, they do de-register from the NameNode which tricks Ambari into thinking that they are down.
> 
> Since the rolling upgrade uses a {{RESTART}} command, we attempt to start the DataNode back up before the daemon has shutdown:
> 
> {code}
> 2017-03-14 05:00:25,602 - call['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -shutdownDatanode 0.0.0.0:8010 upgrade'] {'user': 'hdfs'}
> 2017-03-14 05:00:28,438 - call returned (0, 'Submitted a shutdown request to datanode 0.0.0.0:8010')
> 2017-03-14 05:00:28,438 - Execute['/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs dfsadmin -fs hdfs://c1ha -D ipc.client.connect.max.retries=5 -D ipc.client.connect.retry.interval=1000 -getDatanodeInfo 0.0.0.0:8010'] {'tries': 1, 'user': 'hdfs'}
> 2017-03-14 05:00:35,976 - DataNode has successfully shutdown for upgrade.
> {code}
> 
> Even though ~ 6 seconds have passed, the daemon is still running as it drains. Therefore, we attempt to start it which causes a NOOP.
> 
> Instead, we should also monitor for the PID.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py b55237d 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 1c3c5b7 
> 
> 
> Diff: https://reviews.apache.org/r/58208/diff/1/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>