You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Jayush Luniya (JIRA)" <ji...@apache.org> on 2015/03/25 04:44:53 UTC

[jira] [Updated] (AMBARI-10197) Apache builds for trunk are getting aborted

     [ https://issues.apache.org/jira/browse/AMBARI-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jayush Luniya updated AMBARI-10197:
-----------------------------------
    Assignee: Jonathan Hurley

> Apache builds for trunk are getting aborted
> -------------------------------------------
>
>                 Key: AMBARI-10197
>                 URL: https://issues.apache.org/jira/browse/AMBARI-10197
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-agent
>    Affects Versions: 2.1.0
>            Reporter: Jayush Luniya
>            Assignee: Jonathan Hurley
>             Fix For: 2.1.0
>
>
> On 3/24/15, 7:50 PM, "Jonathan Hurley" <jh...@hortonworks.com> wrote:
> Ah, I see that. Looks like TestController.TestController is a common theme here then. I tried running the tests on CentOS 6 instead of OSX and it looks like mine hung on test_certSigningFailed the first time and test_heartbeat_no_host_check_cmd_in_queue the second time.
> Let’s open up a Jira for this so it can be tracked and resolved.
> On Mar 24, 2015, at 7:20 PM, Jayush Luniya <jl...@hortonworks.com> wrote:
> Hi Jonathan,
> Yes, as I mentioned the UT tests hang which is not 100% repro. The BOA is
> aborted after 2 hours.
> However the builds always hang during Ambari Agent Test. If you see the
> logs further up, you will see that the actual abort happened during the
> TestController UTs (I.e. Python was terminated), but the build was not yet
> entirely terminated and hence we continue building the ambari client,
> python client until it was completely aborted.
> test_addToStatusQueue (TestController.TestController) ... ok
> test_certSigningFailed (TestController.TestController) ... ok
> test_heartbeatWithServer (TestController.TestController) ... ok
> test_registerAndHeartbeat (TestController.TestController) ... ok
> test_registerAndHeartbeatWithException (TestController.TestController) ...
> ok
> test_registerAndHeartbeat_check_registration_listener
> (TestController.TestController) ... Build timed out (after 120 minutes).
> Marking the build as aborted.
> Build was aborted
> /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-agent/../a
> mbari-common/src/main/unix/ambari-python-wrap: line 40: 31955 Terminated
>            $PYTHON "$@"
> [INFO]            
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Building Ambari Client 2.0.0-SNAPSHOT
> [INFO]
> ------------------------------------------------------------------------
> [INFO]
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ ambari-client ---
> [INFO] Deleting
> /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client
> (includes = [**/*.pyc], excludes = [])
> [INFO]
> [INFO] --- build-helper-maven-plugin:1.8:regex-property
> (parse-package-version) @ ambari-client ---
> [INFO]
> [INFO] --- build-helper-maven-plugin:1.8:regex-property
> (parse-package-release) @ ambari-client ---
> [INFO]
> [INFO] --- apache-rat-plugin:0.11:check (default) @ ambari-client ---
> [INFO] 53 implicit excludes (use -debug for more details).
> [INFO] No excludes explicitly specified.
> [INFO] 2 resources included (use -debug for more details)
> [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0
> approved: 2 licence.
> [INFO]
> [INFO] --- maven-assembly-plugin:2.2-beta-5:single (build-tarball) @
> ambari-client ---
> [INFO] Reading assembly descriptor: assemblies/client.xml
> [INFO]
> [INFO] --- maven-assembly-plugin:2.2-beta-5:single (make-assembly) @
> ambari-client ---
> [INFO] Reading assembly descriptor: assemblies/client.xml
> [INFO]
> [INFO] --- maven-install-plugin:2.4:install (default-install) @
> ambari-client ---
> [INFO] Installing
> /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client/pom
> .xml to
> /home/jenkins/.m2/repository/org/apache/ambari/ambari-client/2.0.0-SNAPSHOT
> /ambari-client-2.0.0-SNAPSHOT.pom
> [INFO]            
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Building Ambari Python Client 2.0.0-SNAPSHOT
> [INFO]
> ------------------------------------------------------------------------
> [INFO]
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ python-client ---
> [INFO] Deleting
> /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client/pyt
> hon-client (includes = [**/*.pyc], excludes = [])
> [INFO]
> [INFO] --- build-helper-maven-plugin:1.8:regex-property
> (parse-package-version) @ python-client ---
> [INFO]
> [INFO] --- build-helper-maven-plugin:1.8:regex-property
> (parse-package-release) @ python-client ---
> [INFO]
> [INFO] --- exec-maven-plugin:1.2:exec (python-test) @ python-client ---
> Updating AMBARI-10163
> Recording test results
> Warning: you have no plugins providing access control for builds, so
> falling back to legacy behavior of permitting any downstream builds to be
> triggered
> Finished: ABORTED
> Thanks
> Jayush
> On 3/24/15, 1:25 PM, "Jonathan Hurley" <jh...@hortonworks.com> wrote:
> I think that we¹re looking in the wrong places. Consider:
> https://builds.apache.org/job/Ambari-trunk-Commit/2101
> and
> https://builds.apache.org/job/Ambari-trunk-Commit/2100
> 2101 successfully built in about an hour. 2100 did not; it aborted after
> 2 hours. It aborted during the Groovy unit tests. Ambari unit test time
> variances should not swing the total job time by an hour.
> Perhaps something else is going gone here. Maybe there¹s a network issue
> and Git or one of the maven build steps is taking too long.
> The pattern seems to be that the builds are not stuck since they are
> aborted at different stages in between jobs. Groovy, agent tests, etc.
> On Mar 24, 2015, at 4:07 PM, Jonathan Hurley
> <jh...@hortonworks.com>> wrote:
> No, that change should have no effect on the tests. There were aborted
> runs before that change, and there were failed runs after it. It seems
> like in some cases, the tests just take too long.
> On Mar 24, 2015, at 3:55 PM, Jayush Luniya
> <jl...@hortonworks.com>> wrote:
> This is the change that went in in build#2072.
> Jonathan, any change the issue below could have been caused by it?
> Sumit, what was the commit version of your change to reenable
> TestController tests and when was it committed?
> 1. AMBARI-10126 <https://issues.apache.org/jira/browse/AMBARI-10126> -
> Alert Scheduler Is Double Scheduling Jobs (jonathanhurley) (details
> <https://builds.apache.org/job/Ambari-trunk-Commit/2072/changes#detail0>)
> Commit 68468feeeeb35ca9edd4899ea8b1abafb7c2742a
> <http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=68468feeee
> b
> 35ca9edd4899ea8b1abafb7c2742a> by jhurley
> <https://builds.apache.org/user/jhurley/>AMBARI-10126
> <https://issues.apache.org/jira/browse/AMBARI-10126> - Alert Scheduler Is
> Double Scheduling Jobs (jonathanhurley)
> ambari-agent/src/main/python/ambari_agent/Controller.py
> <http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=blob&f=ambari-agent
> /
> src/main/python/ambari_agent/Controller.py&h=bb85337bfdf2404a6aabf78eb361c
> 1
> 12f77c977e&hb=68468feeeeb35ca9edd4899ea8b1abafb7c2742a> (diff)
> <http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=blobdiff&f=ambari-a
> g
> ent/src/main/python/ambari_agent/Controller.py&fp=ambari-agent/src/main/py
> t
> hon/ambari_agent/Controller.py&h=eeca4c294399e04dae8d893f078d6e6125f3df47&
> h
> p=bb85337bfdf2404a6aabf78eb361c112f77c977e&hb=68468feeeeb35ca9edd4899ea8b1
> a
> bafb7c2742a&hpb=32e1215639f3cdfea68e2955f316576f1ded85fe>
> Thanks
> Jayush
> On 3/24/15, 12:49 PM, "Sumit Mohanty"
> <sm...@hortonworks.com>> wrote:
> The TestController are the tests I re-enabled to run on mac recently. So
> we may see these failures locally as well if your dev box is mac.
> ________________________________________
> From: Jayush Luniya
> <jl...@hortonworks.com>>
> Sent: Tuesday, March 24, 2015 12:24 PM
> To: Alejandro Fernandez;
> dev@ambari.apache.org<ma...@ambari.apache.org>
> Subject: Re: Server unit tests take too long (30+ minutes)
> Agreed we should take a look at reducing our test times.
> Also, I looked at the latest builds on trunk, looks like there agent
> tests are hanging as well leading to builds being aborted. Culprit seems
> to be TestController tests. This is not a consistent failure but happens
> very frequently since build#2072
> https://builds.apache.org/job/Ambari-trunk-Commit/
> test_repeatRegistration (TestController.TestController) ... ok
> test_restartAgent (TestController.TestController) ... ok
> test_run (TestController.TestController) ... Build timed out (after 120
> minutes). Marking the build as aborted.
> Build was aborted
> /home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-agent/../
> ambari-common/src/main/unix/ambari-python-wrap: line 40: 20024 Terminated
>          $PYTHON "$@"
> Thanks
> Jayush
> From: Alejandro Fernandez
> <af...@hortonworks.com>>
> Date: Tuesday, March 24, 2015 at 12:18 PM
> To: "dev@ambari.apache.org<ma...@ambari.apache.org>"
> <de...@ambari.apache.org>>
> Cc: Jayush Luniya
> <jl...@hortonworks.com>>
> Subject: Re: Server unit tests take too long (30+ minutes)
> +1 to that.
> grep -B1 ".*sec$" ~/test_times.txt | sed 's/^.*Time elapsed: \(.*\)$/\1/'
> Here's another run with all tests that took over 30 secs. Total time in
> these 28 test classes was 28 mins.
> The biggest culprit was AmbariManagementControllerTest at 5:28
> Running org.apache.ambari.server.agent.TestHeartbeatHandler
> 89.435 sec
> Running org.apache.ambari.server.upgrade.UpgradeTest
> 76.566 sec
> Running
> org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationPr
> oviderForDNWithSpaceTest
> 55.582 sec
> Running org.apache.ambari.server.security.authorization.TestUsers
> 43.228 sec
> Running
> org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationPr
> oviderTest
> 57.922 sec
> Running
> org.apache.ambari.server.controller.internal.StackDefinedPropertyProviderT
> est
> 56.585 sec
> Running
> org.apache.ambari.server.controller.internal.RepositoryVersionResourceProv
> iderTest
> 60.788 sec
> Running
> org.apache.ambari.server.controller.internal.UpgradeResourceProviderTest
> 40.329 sec
> Running
> org.apache.ambari.server.controller.internal.HostStackVersionResourceProvi
> derTest
> 34.812 sec
> Running
> org.apache.ambari.server.controller.internal.StageResourceProviderTest
> 37.434 sec
> Running org.apache.ambari.server.controller.AmbariServerTest
> 37.638 sec
> Running org.apache.ambari.server.controller.AmbariManagementControllerTest
> 317.327 sec
> Running org.apache.ambari.server.actionmanager.TestActionDBAccessorImpl
> 53.404 sec
> Running org.apache.ambari.server.scheduler.ExecutionScheduleManagerTest
> 34.245 sec
> Running
> org.apache.ambari.server.notifications.dispatchers.SNMPDispatcherTest
> 34.732 sec
> Running org.apache.ambari.server.state.UpgradeHelperTest
> 35.616 sec
> Running org.apache.ambari.server.state.alerts.AlertEventPublisherTest
> 62.627 sec
> Running org.apache.ambari.server.state.alerts.AlertDefinitionHashTest
> 42.206 sec
> Running org.apache.ambari.server.state.alerts.AlertStateChangedEventTest
> 41.462 sec
> Running org.apache.ambari.server.state.stack.UpgradePackTest
> 72.379 sec
> Running org.apache.ambari.server.state.ConfigHelperTest
> 72.849 sec
> Running
> org.apache.ambari.server.state.svccomphost.ServiceComponentHostTest
> 50.383 sec
> Running org.apache.ambari.server.state.cluster.ClusterTest
> 69.889 sec
> Running org.apache.ambari.server.state.cluster.ClusterDeadlockTest
> 80.271 sec
> Running org.apache.ambari.server.state.ServiceTest
> 45.443 sec
> Running org.apache.ambari.server.orm.dao.AlertsDAOTest
> 57.077 sec
> Running org.apache.ambari.server.orm.dao.AlertDefinitionDAOTest
> 33.872 sec
> Running org.apache.ambari.server.metadata.RoleCommandOrderTest
> 31.794 sec
> Thanks,
> Alejandro
> On 3/24/15, 11:54 AM, "Jonathan Hurley"
> <jh...@hortonworks.com>> wrote:
> Many of these, such as the deadlock tests and alert tests are just going
> to take a long time due to the nature of what they're doing. In general,
> if b.a.o is timing out, we need to either increase the timeout for the
> job or change our pom.xml to allow for forked execution of the tests.
> In my local environment, 3 concurrent forks can run through the test
> suite in about 20 minutes. The problem is that both LDAP tests below
> always fail in a forked environment. I'd say if we want to get the build
> times down, we should look into making the 2 LDAP tests work with forked
> test runners in the pom.xml
> On Mar 24, 2015, at 2:33 PM, Sumit Mohanty
> <sm...@hortonworks.com>> wrote:
> ?Hi,
> these are some of the unit tests that take too long (more than 30 seconds
> on my machine).  There are several that are above 10 seconds but below 30
> seconds range that can also use some optimization.
> Jayush tells me that the Apache builds may be getting aborted as the
> build + UT run takes more than an hour.
> I will look into some of it when I get a chance. If there are any that
> piques your curiosity then take a look.
> Running org.apache.ambari.server.agent.TestHeartbeatHandler
> Tests run: 34, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.43 sec
> Running org.apache.ambari.server.state.cluster.ClusterTest
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 55.576
> sec
> Running org.apache.ambari.server.state.cluster.ClusterDeadlockTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.252 sec
> Running org.apache.ambari.server.upgrade.UpgradeTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 50.433 sec
> Running org.apache.ambari.server.orm.dao.AlertDispatchDAOTest
> Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 46.681
> sec
> Running org.apache.ambari.server.orm.dao.AlertsDAOTest
> Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 44.474
> sec
> Running org.apache.ambari.server.security.authorization.TestUsers
> Tests run: 26, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 36.421
> sec
> Running
> org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationPr
> oviderTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.46 sec
> Running
> org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationPr
> oviderForDNWithSpaceTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.706 sec
> Running org.apache.ambari.server.state.ConfigHelperTest
> Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.863
> sec
> Running
> org.apache.ambari.server.controller.internal.StackDefinedPropertyProviderT
> est
> Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.247
> sec
> ...
> thanks
> ?-Sumit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)