You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Hans Zeller (JIRA)" <ji...@apache.org> on 2018/07/24 18:42:00 UTC

[jira] [Commented] (TRAFODION-3164) Phase out existing mxosrvrs on-demand

    [ https://issues.apache.org/jira/browse/TRAFODION-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554663#comment-16554663 ] 

Hans Zeller commented on TRAFODION-3164:
----------------------------------------

Not being an expert in this area, my suggestion is the following:
 * Define the concept of an "epoch" in the DCS master. The epoch is a long integer that initially has a value of 1.
 * The epoch number is stored in zookeeper, in the persistent node /trafodion/dcs/master. This node had no data until now.
 * The mxosrvr reads the epoch number from zookeeper and watches this node.
 * When the epoch changes and the watcher goes off in mxosrvr, it notes this fact and sets variable "shutdownThisThing" to 1. This will cause it to exit if it is available or once it becomes available (similar to other zookeeper-related events).

I'll be creating a PR shortly. There are other things we could do, which I won't include in the initial PR for this JIRA:
 * Report the epoch in the DCS master web GUI
 * Report the epoch of individual mxosrvrs in the DCS master web GUI (store it in zookeeper in the ephemeral /trafodion/dcs/servers/registered/* nodes)
 * Create a method to increment the epoch (right now this has to be done via a user program or via zkcli)

Before doing these additional things I want to get some feedback first.

> Phase out existing mxosrvrs on-demand
> -------------------------------------
>
>                 Key: TRAFODION-3164
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-3164
>             Project: Apache Trafodion
>          Issue Type: Improvement
>          Components: connectivity-mxosrvr
>    Affects Versions: 2.1-incubating
>            Reporter: Hans Zeller
>            Assignee: Hans Zeller
>            Priority: Major
>             Fix For: 2.3
>
>
> In some cases, it would be very helpful if we could tell the DCS component to restart all of its mxosrvrs at the next opportunity.
> There are several reasons why one would want this:
>  * We may have installed a temporary patch to the executable file and want to phase out any mxosrvrs using the old executable.
>  * We may have changed the system defaults table and want all the mxosrvrs to pick up new system defaults
>  * We may want to free up all the resources like ESPs, UDR servers, etc. held by the mxosrvrs
> With "restart at the next opportunity", what I mean by that is to restart all the servers in the AVAILABLE state immediately. The servers that are currently connected should be restarted once they become available after disconnecting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)