You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "brian wickman (JIRA)" <ji...@apache.org> on 2015/03/19 01:56:38 UTC

[jira] [Commented] (AURORA-1054) src.test.python.apache.aurora.executor.thermos_task_runner appears to be flaky

    [ https://issues.apache.org/jira/browse/AURORA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368255#comment-14368255 ] 

brian wickman commented on AURORA-1054:
---------------------------------------

I *think* understand what's going on now.  This is almost certainly an artifact from the open source refactor (fd241cdd).  Trying to understand the intricacies of the race but the gist is that we fork a runner, it forks a task that swallows SIGTERM, we send SIGUSR1 to the runner which tears its children down by doing the SIGTERM -> SIGKILL escalation, but the escalation grace period is 60s.  This happens to correspond with the default timeout in stop(), when could result in a raised TaskError some of the time depending on whether the executor or runner wins.  Will continue to dig deeper to understand what behavior precisely we're trying to test and targeting that specifically.

> src.test.python.apache.aurora.executor.thermos_task_runner appears to be flaky
> ------------------------------------------------------------------------------
>
>                 Key: AURORA-1054
>                 URL: https://issues.apache.org/jira/browse/AURORA-1054
>             Project: Aurora
>          Issue Type: Story
>          Components: Executor
>            Reporter: Bill Farner
>            Assignee: brian wickman
>
> I've seen this test fail on a few reviews recently, but succeeds on a retry.
> {noformat}
>                      ==================== FAILURES ====================
>                       TestThermosTaskRunnerIntegration.test_integration_stop 
>                      
>                      self = <test_thermos_task_runner.TestThermosTaskRunnerIntegration object at 0x7ff3e1dc6fd0>
>                      
>                          def test_integration_stop(self):
>                            with self.yield_sleepy(ThermosTaskRunner, sleep=1000, exit_code=0) as task_runner:
>                              task_runner.start()
>                              task_runner.forked.wait()
>                          
>                              assert task_runner.status is None
>                          
>                              task_runner.stop()
>                          
>                      >       assert task_runner.status is not None
>                      E       assert None is not None
>                      E        +  where None = <apache.aurora.executor.thermos_task_runner.ThermosTaskRunner object at 0x7ff3e3333490>.status
>                      
>                      src/test/python/apache/aurora/executor/test_thermos_task_runner.py:175: AssertionError
>                      -------------- Captured stderr call --------------
>                      Writing log files to disk in /tmp/user/2396/tmpqHF5bz
>                      ERROR] Could not quitquitquit runner: Cannot take control of a task in terminal state.
>                       generated xml file: /home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/src.test.python.apache.aurora.executor.thermos_task_runner.xml 
>                      ====== 1 failed, 7 passed in 81.16 seconds =======
>                      src.test.python.apache.aurora.admin.admin                                       .....   SUCCESS
>                      src.test.python.apache.aurora.admin.host_maintenance                            .....   SUCCESS
>                      src.test.python.apache.aurora.admin.maintenance                                 .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.api                                    .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.instance_watcher                       .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.job_monitor                            .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.mux                                    .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.quota_check                            .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.restarter                              .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.scheduler_client                       .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.sla                                    .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.task_util                              .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.updater                                .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.updater_util                           .....   SUCCESS
>                      src.test.python.apache.aurora.client.base                                       .....   SUCCESS
>                      src.test.python.apache.aurora.client.binding_helper                             .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.api                                    .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.client                                 .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.command_hooks                          .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.config                                 .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.cron                                   .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.inspect                                .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.job                                    .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.plugins                                .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.quota                                  .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.sla                                    .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.supdate                                .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.task                                   .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.update                                 .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.version                                .....   SUCCESS
>                      src.test.python.apache.aurora.client.config                                     .....   SUCCESS
>                      src.test.python.apache.aurora.client.hooks.hooked_api                           .....   SUCCESS
>                      src.test.python.apache.aurora.client.hooks.non_hooked_api                       .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_aurora_job_key                        .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_cluster                               .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_cluster_option                        .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_clusters                              .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_http_signaler                         .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_pex_version                           .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_shellify                              .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_transport                             .....   SUCCESS
>                      src.test.python.apache.aurora.config.test_base                                  .....   SUCCESS
>                      src.test.python.apache.aurora.config.test_constraint_parsing                    .....   SUCCESS
>                      src.test.python.apache.aurora.config.test_loader                                .....   SUCCESS
>                      src.test.python.apache.aurora.config.test_thrift                                .....   SUCCESS
>                      src.test.python.apache.aurora.executor.common.task_info                         .....   SUCCESS
>                      src.test.python.apache.aurora.executor.executor_base                            .....   SUCCESS
>                      src.test.python.apache.aurora.executor.executor_detector                        .....   SUCCESS
>                      src.test.python.apache.aurora.executor.executor_vars                            .....   SUCCESS
>                      src.test.python.apache.aurora.executor.status_manager                           .....   SUCCESS
>                      src.test.python.apache.aurora.executor.thermos_task_runner                      .....   FAILURE
>                      src.test.python.apache.thermos.common.test_pathspec                             .....   SUCCESS
>                      src.test.python.apache.thermos.core.test_runner_integration                     .....   SUCCESS
>                      src.test.python.apache.thermos.monitoring.test_disk                             .....   SUCCESS
>                      
> FAILURE
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)