You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Dmitro Lisnichenko <dl...@hortonworks.com> on 2015/04/29 09:33:40 UTC

Review Request 33663: Hit re-install when performing an RU - UI seems to have stuck at installing even though the request has completed

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33663/
-----------------------------------------------------------

Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, and Nate Cole.


Bugs: AMBARI-10818
    https://issues.apache.org/jira/browse/AMBARI-10818


Repository: ambari


Description
-------

After installing the package, noticed that there is a button for "re-install" so just clicked on it.


Diffs
-----

  ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostEntity.java 9f3f70c 

Diff: https://reviews.apache.org/r/33663/diff/


Testing
-------

Manual verification - retrying distribute bits multiple times
Bug is intermittent, so not 100% sure


Thanks,

Dmitro Lisnichenko


Re: Review Request 33663: Hit re-install when performing an RU - UI seems to have stuck at installing even though the request has completed

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.

> On April 29, 2015, 11:25 a.m., Nate Cole wrote:
> > -1.  There is no indication that this fixes the problem described, and this looks like some random code that was changed for change's sake.  You're right, it is intermittent (I've seen it), but it appears as though the DB states weren't updated properly, implying the event didn't fire or the result didn't come back correctly.

The event fires, but transition is not performed. Looks like host event entity state INSTALLED is not persisted, so cluster version is not transitioned. Please see discussion in jira


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33663/#review81950
-----------------------------------------------------------


On April 29, 2015, 7:33 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33663/
> -----------------------------------------------------------
> 
> (Updated April 29, 2015, 7:33 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, and Nate Cole.
> 
> 
> Bugs: AMBARI-10818
>     https://issues.apache.org/jira/browse/AMBARI-10818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> After installing the package, noticed that there is a button for "re-install" so just clicked on it.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostEntity.java 9f3f70c 
> 
> Diff: https://reviews.apache.org/r/33663/diff/
> 
> 
> Testing
> -------
> 
> Manual verification - retrying distribute bits multiple times
> Bug is intermittent, so not 100% sure
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 33663: Hit re-install when performing an RU - UI seems to have stuck at installing even though the request has completed

Posted by Nate Cole <nc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33663/#review81950
-----------------------------------------------------------


-1.  There is no indication that this fixes the problem described, and this looks like some random code that was changed for change's sake.  You're right, it is intermittent (I've seen it), but it appears as though the DB states weren't updated properly, implying the event didn't fire or the result didn't come back correctly.

- Nate Cole


On April 29, 2015, 3:33 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33663/
> -----------------------------------------------------------
> 
> (Updated April 29, 2015, 3:33 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, and Nate Cole.
> 
> 
> Bugs: AMBARI-10818
>     https://issues.apache.org/jira/browse/AMBARI-10818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> After installing the package, noticed that there is a button for "re-install" so just clicked on it.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostEntity.java 9f3f70c 
> 
> Diff: https://reviews.apache.org/r/33663/diff/
> 
> 
> Testing
> -------
> 
> Manual verification - retrying distribute bits multiple times
> Bug is intermittent, so not 100% sure
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 33663: Hit re-install when performing an RU - UI seems to have stuck at installing even though the request has completed

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.

> On May 6, 2015, 8:32 p.m., Jonathan Hurley wrote:
> > Very interesting that the problem is caused because this is executed in separate threads from the normal request workflow which causes the EntityManager to contain stale cache.
> > 
> > Could we actually extrapolate this into an annotation so that we can simply decorate methods that need to be inside units of work? Something like @UnitOfWork and then doc why it's needed ... as Ambari becomes more and more asynchronous this will probably be useful.

Created a separate jira for that since current issue is urgent


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33663/#review82726
-----------------------------------------------------------


On May 6, 2015, 6:09 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33663/
> -----------------------------------------------------------
> 
> (Updated May 6, 2015, 6:09 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, and Nate Cole.
> 
> 
> Bugs: AMBARI-10818
>     https://issues.apache.org/jira/browse/AMBARI-10818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Problem here is that DistributeRepositoriesActionListener is executed in a separate thread. So we have to use UnitOfWork just like at org.apache.ambari.server.actionmanager.ActionScheduler#doWork , otherwise EntityManager cache is not updated on DB updates. I mean that RepositoryVersion state at DB is INSTALLING, and API shows INSTALLING, but RepositoryVersion state in DistributeRepositoriesActionListener is still INSTALLED, and cluster state transition is not performed.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/DistributeRepositoriesActionListener.java 5600ef1 
>   ambari-server/src/test/java/org/apache/ambari/server/agent/TestHeartbeatHandler.java 39192c4 
> 
> Diff: https://reviews.apache.org/r/33663/diff/
> 
> 
> Testing
> -------
> 
> Same tests are failing on trunk
> 
> Tests in error: 
>   test220Cardinality(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
>   test220AutoDeploy(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
>   test220Dependencies(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
>   testCommonOozieServiceDescriptor(org.apache.ambari.server.stack.KerberosDescriptorTest): /media/plextor/review_ambari/ambari-server/target/classes/common-services/OOZIE/5.0.0.2.3/kerberos.json is not a readable file
> 
> Tests run: 2951, Failures: 0, Errors: 4, Skipped: 17
> 
> [INFO] ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Ambari Views ...................................... SUCCESS [2.900s]
> [INFO] Ambari Metrics Common ............................. SUCCESS [1.570s]
> [INFO] Ambari Server ..................................... FAILURE [43:59.418s]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 44:04.603s
> [INFO] Finished at: Wed May 06 20:25:17 EEST 2015
> [INFO] Final Memory: 32M/268M
> [INFO] ----------------------------------------------------------------------
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 33663: Hit re-install when performing an RU - UI seems to have stuck at installing even though the request has completed

Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33663/#review82726
-----------------------------------------------------------

Ship it!


Very interesting that the problem is caused because this is executed in separate threads from the normal request workflow which causes the EntityManager to contain stale cache.

Could we actually extrapolate this into an annotation so that we can simply decorate methods that need to be inside units of work? Something like @UnitOfWork and then doc why it's needed ... as Ambari becomes more and more asynchronous this will probably be useful.

- Jonathan Hurley


On May 6, 2015, 2:09 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33663/
> -----------------------------------------------------------
> 
> (Updated May 6, 2015, 2:09 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, and Nate Cole.
> 
> 
> Bugs: AMBARI-10818
>     https://issues.apache.org/jira/browse/AMBARI-10818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Problem here is that DistributeRepositoriesActionListener is executed in a separate thread. So we have to use UnitOfWork just like at org.apache.ambari.server.actionmanager.ActionScheduler#doWork , otherwise EntityManager cache is not updated on DB updates. I mean that RepositoryVersion state at DB is INSTALLING, and API shows INSTALLING, but RepositoryVersion state in DistributeRepositoriesActionListener is still INSTALLED, and cluster state transition is not performed.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/DistributeRepositoriesActionListener.java 5600ef1 
>   ambari-server/src/test/java/org/apache/ambari/server/agent/TestHeartbeatHandler.java 39192c4 
> 
> Diff: https://reviews.apache.org/r/33663/diff/
> 
> 
> Testing
> -------
> 
> Same tests are failing on trunk
> 
> Tests in error: 
>   test220Cardinality(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
>   test220AutoDeploy(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
>   test220Dependencies(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
>   testCommonOozieServiceDescriptor(org.apache.ambari.server.stack.KerberosDescriptorTest): /media/plextor/review_ambari/ambari-server/target/classes/common-services/OOZIE/5.0.0.2.3/kerberos.json is not a readable file
> 
> Tests run: 2951, Failures: 0, Errors: 4, Skipped: 17
> 
> [INFO] ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Ambari Views ...................................... SUCCESS [2.900s]
> [INFO] Ambari Metrics Common ............................. SUCCESS [1.570s]
> [INFO] Ambari Server ..................................... FAILURE [43:59.418s]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 44:04.603s
> [INFO] Finished at: Wed May 06 20:25:17 EEST 2015
> [INFO] Final Memory: 32M/268M
> [INFO] ----------------------------------------------------------------------
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 33663: Hit re-install when performing an RU - UI seems to have stuck at installing even though the request has completed

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33663/
-----------------------------------------------------------

(Updated May 8, 2015, 12:08 p.m.)


Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, and Nate Cole.


Changes
-------

Patch adjusts annotations as Myroslav proposed.
Live test on cluster shows that it does not solve issue, so posted only for discussion purposes.


Bugs: AMBARI-10818
    https://issues.apache.org/jira/browse/AMBARI-10818


Repository: ambari


Description
-------

Problem here is that DistributeRepositoriesActionListener is executed in a separate thread. So we have to use UnitOfWork just like at org.apache.ambari.server.actionmanager.ActionScheduler#doWork , otherwise EntityManager cache is not updated on DB updates. I mean that RepositoryVersion state at DB is INSTALLING, and API shows INSTALLING, but RepositoryVersion state in DistributeRepositoriesActionListener is still INSTALLED, and cluster state transition is not performed.


Diffs (updated)
-----

  ambari-server/src/main/resources/stacks/HDP/2.2/upgrades/upgrade-2.2.xml 0cf8ff2 
  ambari-server/src/main/resources/stacks/HDP/2.2/upgrades/upgrade-2.3.xml abd95aa 
  ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.3.xml 65549dd 

Diff: https://reviews.apache.org/r/33663/diff/


Testing
-------

Same tests are failing on trunk

Tests in error: 
  test220Cardinality(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
  test220AutoDeploy(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
  test220Dependencies(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
  testCommonOozieServiceDescriptor(org.apache.ambari.server.stack.KerberosDescriptorTest): /media/plextor/review_ambari/ambari-server/target/classes/common-services/OOZIE/5.0.0.2.3/kerberos.json is not a readable file

Tests run: 2951, Failures: 0, Errors: 4, Skipped: 17

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Ambari Views ...................................... SUCCESS [2.900s]
[INFO] Ambari Metrics Common ............................. SUCCESS [1.570s]
[INFO] Ambari Server ..................................... FAILURE [43:59.418s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 44:04.603s
[INFO] Finished at: Wed May 06 20:25:17 EEST 2015
[INFO] Final Memory: 32M/268M
[INFO] ----------------------------------------------------------------------


Thanks,

Dmitro Lisnichenko


Re: Review Request 33663: Hit re-install when performing an RU - UI seems to have stuck at installing even though the request has completed

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33663/
-----------------------------------------------------------

(Updated May 6, 2015, 6:09 p.m.)


Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, and Nate Cole.


Changes
-------

Attached patch that has been proved to work.


Bugs: AMBARI-10818
    https://issues.apache.org/jira/browse/AMBARI-10818


Repository: ambari


Description (updated)
-------

Problem here is that DistributeRepositoriesActionListener is executed in a separate thread. So we have to use UnitOfWork just like at org.apache.ambari.server.actionmanager.ActionScheduler#doWork , otherwise EntityManager cache is not updated on DB updates. I mean that RepositoryVersion state at DB is INSTALLING, and API shows INSTALLING, but RepositoryVersion state in DistributeRepositoriesActionListener is still INSTALLED, and cluster state transition is not performed.


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/DistributeRepositoriesActionListener.java 5600ef1 
  ambari-server/src/test/java/org/apache/ambari/server/agent/TestHeartbeatHandler.java 39192c4 

Diff: https://reviews.apache.org/r/33663/diff/


Testing (updated)
-------

Same tests are failing on trunk

Tests in error: 
  test220Cardinality(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
  test220AutoDeploy(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
  test220Dependencies(org.apache.ambari.server.api.services.KerberosServiceMetaInfoTest): Guice provision errors:(..)
  testCommonOozieServiceDescriptor(org.apache.ambari.server.stack.KerberosDescriptorTest): /media/plextor/review_ambari/ambari-server/target/classes/common-services/OOZIE/5.0.0.2.3/kerberos.json is not a readable file

Tests run: 2951, Failures: 0, Errors: 4, Skipped: 17

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Ambari Views ...................................... SUCCESS [2.900s]
[INFO] Ambari Metrics Common ............................. SUCCESS [1.570s]
[INFO] Ambari Server ..................................... FAILURE [43:59.418s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 44:04.603s
[INFO] Finished at: Wed May 06 20:25:17 EEST 2015
[INFO] Final Memory: 32M/268M
[INFO] ----------------------------------------------------------------------


Thanks,

Dmitro Lisnichenko


Re: Review Request 33663: Hit re-install when performing an RU - UI seems to have stuck at installing even though the request has completed

Posted by Nate Cole <nc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33663/#review82129
-----------------------------------------------------------

Ship it!


Ship It!

- Nate Cole


On April 29, 2015, 3:33 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33663/
> -----------------------------------------------------------
> 
> (Updated April 29, 2015, 3:33 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, and Nate Cole.
> 
> 
> Bugs: AMBARI-10818
>     https://issues.apache.org/jira/browse/AMBARI-10818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> After installing the package, noticed that there is a button for "re-install" so just clicked on it.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostEntity.java 9f3f70c 
> 
> Diff: https://reviews.apache.org/r/33663/diff/
> 
> 
> Testing
> -------
> 
> Manual verification - retrying distribute bits multiple times
> Bug is intermittent, so not 100% sure
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 33663: Hit re-install when performing an RU - UI seems to have stuck at installing even though the request has completed

Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33663/#review82150
-----------------------------------------------------------


-1 I still this think is a problem. Consider the following existing code:

```
    // For each of the hosts, add a host version
    for (HostEntity host : hostEntities) {
      HostVersionEntity hostVersionEntity = new HostVersionEntity(host, helper.getOrCreateRepositoryVersion(HDP_22_STACK, "2.2.0.1-996"), RepositoryVersionState.INSTALLED);
      hostVersionDAO.create(hostVersionEntity);
    }
```

The entities are already being persisted. By added CascadeType.PERSIST, exceptions will be thrown if the entity already exists. If CascadeType.PERSIST is going to be used, then we need to ensure it's not created anywhere else.

- Jonathan Hurley


On April 29, 2015, 3:33 a.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33663/
> -----------------------------------------------------------
> 
> (Updated April 29, 2015, 3:33 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, and Nate Cole.
> 
> 
> Bugs: AMBARI-10818
>     https://issues.apache.org/jira/browse/AMBARI-10818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> After installing the package, noticed that there is a button for "re-install" so just clicked on it.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostEntity.java 9f3f70c 
> 
> Diff: https://reviews.apache.org/r/33663/diff/
> 
> 
> Testing
> -------
> 
> Manual verification - retrying distribute bits multiple times
> Bug is intermittent, so not 100% sure
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>