You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2016/03/01 00:08:18 UTC

[jira] [Commented] (AMBARI-15173) Express Upgrade Stuck At Manual Prompt Due To HRC Status Calculation Cache Problem

    [ https://issues.apache.org/jira/browse/AMBARI-15173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172821#comment-15172821 ] 

Hudson commented on AMBARI-15173:
---------------------------------

FAILURE: Integrated in Ambari-branch-2.2 #433 (See [https://builds.apache.org/job/Ambari-branch-2.2/433/])
AMBARI-15173 - Express Upgrade Stuck At Manual Prompt Due To HRC Status (jhurley: [http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=8f05ad82450197b4fba4e4b55cb7f96a44d5d21d])
* ambari-server/src/test/java/org/apache/ambari/annotations/TransactionalLockTest.java
* ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
* ambari-server/src/test/java/org/apache/ambari/annotations/TransactionalLockInterceptorTest.java
* ambari-server/src/test/java/org/apache/ambari/annotations/LockAreaTest.java
* ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
* ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLocks.java
* ambari-server/src/test/java/org/apache/ambari/server/controller/internal/AlertResourceProviderTest.java
* ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java


> Express Upgrade Stuck At Manual Prompt Due To HRC Status Calculation Cache Problem
> ----------------------------------------------------------------------------------
>
>                 Key: AMBARI-15173
>                 URL: https://issues.apache.org/jira/browse/AMBARI-15173
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.2.2
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Blocker
>             Fix For: 2.2.2
>
>
> Seen while performing an upgrade, it's possible that the status of a request/stage does not match that of its tasks. Essentially, the task could be {{HOLDING}} while the request is still {{IN_PROGRESS}}.
> I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011 introduced, among other things, a cache to the {{HostRoleCommandStatusSummaryDTO}} which is a aggregation of the number of tasks a stage has in each state (PENDING, HOLDING, etc).
> This {{HostRoleCommandStatusSummaryDTO}} is used by {{CalculatedState}} to calculate a stage's and request's status based on the tasks. 
> The problem is that {{ServerActionExecutor}} is moving a tasks's state to {{HOLDING}} (reflected in the database correctly) but the cache invalidation happens inside the uncommitted transaction. This causes stale data to be re-cached. So, when we go to calculate the request and state status, we get {{IN_PROGRESS}} instead of {{HOLDING}}.
> {code}
> {
>   "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*",
>   "Stage": {
>     "cluster_name": "cl1",
>     "context": "Stop YARN Queues",
>     "display_status": "IN_PROGRESS",
>     "end_time": -1,
>     "progress_percent": 35,
>     "request_id": 61,
>     "skippable": true,
>     "stage_id": 1,
>     "start_time": 1456227329191,
>     "status": "IN_PROGRESS"
>   },
>   "tasks": [
>     {
>       "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754",
>       "Tasks": {
>         "attempt_cnt": 1,
>         "cluster_name": "cl1",
>         "command": "EXECUTE",
>         "command_detail": "Before continuing, please stop all YARN queues. If yarn-site's yarn.resourcemanager.work-preserving-recovery.enabled is set to true, then you can skip this step since the clients will retry on their own.",
>         "custom_command_name": "org.apache.ambari.server.serveraction.upgrades.ManualStageAction",
>         "end_time": -1,
>         "error_log": "errors-754.txt",
>         "exit_code": 0,
>         "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal",
>         "id": 754,
>         "output_log": "output-754.txt",
>         "request_id": 61,
>         "role": "AMBARI_SERVER_ACTION",
>         "stage_id": 1,
>         "start_time": 1456227329191,
>         "status": "HOLDING",
>         "stderr": "",
>         "stdout": "",
>         "structured_out": {}
>       }
>     }
>   ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)