You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/08/15 17:01:00 UTC

[jira] [Work logged] (TRAFODION-3318) Change process management of DTM to improve HA behavior

     [ https://issues.apache.org/jira/browse/TRAFODION-3318?focusedWorklogId=295565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295565 ]

ASF GitHub Bot logged work on TRAFODION-3318:
---------------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Aug/19 17:00
            Start Date: 15/Aug/19 17:00
    Worklog Time Spent: 10m 
      Work Description: zcorrea commented on pull request #1854: [TRAFODION-3318] Changed process management rules for DTM process:
URL: https://github.com/apache/trafodion/pull/1854
 
 
            - DTM is now a 'monitor primitive persistent' process
              o When persistent process retries exceed the persist time window,
                the node is brought down
              o Improves HA behavior in that DTM process death does no longer
                kills all processes in its node
            - Removed TmSync logic and DTM dependency on it
            - Removed SoftNodeDown/Up logic and protocol triggered by DTM process
              abnormal termination
            - This change required enabling monitor AGENT mode in Python installations
            - Moved obsolete test files to monitor/test-legacy directory
            - Removed obsolete code files in monitor/linux and monitor/test directories
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 295565)
            Time Spent: 10m
    Remaining Estimate: 119h 50m  (was: 120h)

> Change process management of DTM to improve HA behavior
> -------------------------------------------------------
>
>                 Key: TRAFODION-3318
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-3318
>             Project: Apache Trafodion
>          Issue Type: Improvement
>          Components: dtm, foundation
>    Affects Versions: 2.4
>            Reporter: Gonzalo E Correa
>            Priority: Major
>             Fix For: 2.4
>
>   Original Estimate: 120h
>          Time Spent: 10m
>  Remaining Estimate: 119h 50m
>
> Current process management model for process type DTM enforces and soft node down behavior which kills all processes in a node where a DTM process terminates abnormally. The DTM process is recreated by the monitor along with all persistent processes hosted in that node.
> To reduce the fault zone impact, this change removes the soft node down/up functionality so that the DTM process is recreated without killing all other processes in the node. The rule where the persistent DTM process cannot be restarted within the configured retries in the specified time window will cause a node down will still be enforced.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)