You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "Jiajun Wang (JIRA)" <ji...@apache.org> on 2017/05/31 07:16:04 UTC
[jira] [Commented] (HELIX-659) Support Additional Associate States

    [ https://issues.apache.org/jira/browse/HELIX-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030755#comment-16030755 ] 

Jiajun Wang commented on HELIX-659:
-----------------------------------

h1. Proposal
In this document, we propose to introduce an additional layer of state mechanism into Helix.
Considering Pinot case, what they need is transiting from "ONLINE:V1" to "ONLINE:V2". Note that "V1" to "V2" transition is in parallel of the existing state transition. It is special in following ways:
# The state is not pre-defined. New version numbers may appear after state transition model is registered.
# Helix won't understand the internal logic of this additional state. So there is no way that Helix automatically computes idea state. It will rely on application's configuration to update this state.
We will take the above 2 points as assumptions.
As for expected workflow, still take Pinot partition version as an example: 
# Pinot needs to register their own logic for version upgrade, which means a new state model (factory name).
# Helix provides API to configure resources with additional state ("VERSION").
# Upon resource configuration changed, the controller triggers state transition and sends message to the participants.
# Participants handles message by calling corresponding state transition methods. Then update in current state.
# Controller listens on current state change. If any update, it processes and reflects the update in the external view.
 
h1. Design
h2. Register Associate States Model / Factory
Note that since associate states maybe not pre-defined, so defaultTransitionHandler has to be implemented.
State Model Factory:

public abstract class AssociateStateModelFactory extends StateModelFactory<AssociateStateModel> {
  ...
}
  
public abstract class AssociateStateModel extends StateModel {
  static final String DEFAULT_INITIAL_STATE = "UNKNOWN";
  protected String _currentState = DEFAULT_INITIAL_STATE;
 
  public String getCurrentState() {
    return _currentState;
  }
 
  // !!!!!!!!!!! Changed part !!!!!!!!!!!! //
  @transition(from='from', to='to')
  public void defaultTransitionHandler(Message message, NotificationContext context) {
    logger
      .error("Default transition handler. The idea is to invoke this if no transition method is found. To be implemented");
  }
 
  public boolean updateState(String newState) {
    _currentState = newState;
    return true;
  }
 
  public void rollbackOnError(Message message, NotificationContext context,
      StateTransitionError error) {
    logger.error("Default rollback method invoked on error. Error Code: " + error.getCode());
  }
 
  public void reset() {
    logger
      .warn("Default reset method invoked. Either because the process longer own this resource or session timedout");
  }
 
  @Transition(to = "DROPPED", from = "ERROR")
  public void onBecomeDroppedFromError(Message message, NotificationContext context)
      throws Exception {
    logger.info("Default ERROR->DROPPED transition invoked.");
  }
}

h2. Resource Configuration
Resource config with associate state VERSION:

{
  "id":"Test_Resource"
  ,"simpleFields":{
  }
  ,"listFields":{
    "ASSOCIATE_STATE_MODEL_DEF_REFS": [
        "VERSION"
    ],
    "ASSOCIATE_STATE_MODEL_FACTORY_NAMES": [
        "DEFAULT"
    ],
    "ASSOCIATE_STATES": [
        "1.0.1"
    ],
  }
  ,"mapFields":{
  }
}

h2. Additional APIs to configure associate states

 /**
 * Set configuration values
 * @param scope
 * @param properties
 */
void setConfig(HelixConfigScope scope, Map<String, List<String>> listProperties);
  
/**
 * Get configuration values
 * @param scope
 * @param keys
 * @return configuration values ordered by the provided keys
 */
Map<String, List<String>> getConfig(HelixConfigScope scope, List<String> keys);

h2. Partition with the Associate States on the Participant State And EV
Current States:

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF":"MasterSlave"
    ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
    ,"BUCKET_SIZE":"0"
    ,"SESSION_ID":"25b2ce5dfbde0fa"
  }
  ,"listFields":{
    "ASSOCIATE_STATE_MODEL_DEF_REFS": [
        "VERSION"
    ],
    "ASSOCIATE_STATE_MODEL_FACTORY_NAMES": [
        "DEFAULT"
    ]
  }
  ,"mapFields":{
    "example_resource_0":{
      "CURRENT_STATE":"MASTER"
      "ASSOCIATE_STATES":"1.0.1" // Split by ":" if multiple associate states are set
      ,"INFO":""
    }
  }
}

Associate state in External View:

{
  "id":"example_resource"
  ,"simpleFields":{
    ,"STATE_MODEL_DEF_REF":"MasterSlave"
  }
  ,"listFields":{
    "ASSOCIATE_STATE_MODEL_DEF_REFS": [
        "VERSION"
    ]
  }
  ,"mapFields":{
    "example_resource_0":{
      // Given more than one assistant states, they will be split by ":". And the main state will always be the first state.
      "lca1-app0004.stg.linkedin.com_11932":"MASTER:1.0.1"
      ,"lca1-app0048.stg.linkedin.com_11932":"SLAVE:1.0.0"
    }
  }
}

h2. Helix Controller Updates
On resource configuration changes:
* Fill ClusterDataCache with associate states and related state models / factories from resource configuration.
* Merge associate states to BestPossibleStateOutput.
* Fill associate states and related state models / factories into the message before sending to participants.
On participant state changes:
* Besides existing read, also read and fill associate states. Then fill EV with complete states information.

h2. Helix Participant Updates
On receiving state transition message:
* Read main state and associate states, trigger state transitions in order.
* Do main state transition first, then do associate states transitions one by one.
* If any state transition failed, set an error state to cover all states and stop processing. User should fix problem and reset to initial states.
* If state transition succeeds, update current state.

h1. Alternative options
h2. Introducing UPGRADING State for additional state transitions
Adding a new internal state UPGRADING for partition upgrade.
So upgrade will happen when the partition is transited "to" or "from" UPGRADING status.
Note that application has the freedom to define whether UPGRADING is a special online status or not.
For Pinot case, upgrading partition (even before they are back to ONLINE) might be active partition.
The problem of this new state is that it only works fine for a single additional state.
Once we have more than one additional state to take care, UPGRADING state is not enough.
h2. Rely on resetting partition to load new states
Whenever a new version is available, application update versions for the resource. Then resetting all partitions.
Then during state transition from offline to online, participants will read new version and apply to the related partitions.
The problem of this method is changing in the additional state will affect the main state. A partition will be offline for a while. During this period, even old version will be not available.
h2. Application registers message handler to handle upgrading message
In this method, the controller is only responsible for sending upgrade request to participants. Participants will be responsible for reporting local participant versions.
Since the controller has no clue about how to control the additional state, the application will need to process all the logics.
h1. Validation
Add unit tests / integration tests for validate associate states.
Verify Pinot Version use case.

> Support Additional Associate States
> -----------------------------------
>
>                 Key: HELIX-659
>                 URL: https://issues.apache.org/jira/browse/HELIX-659
>             Project: Apache Helix
>          Issue Type: New Feature
>          Components: helix-core
>    Affects Versions: 0.6.x
>            Reporter: Jiajun Wang
>
> Currently, Helix only supports management a single state for all resources/partitions. However, in the real world, cluster management requirements may be more complicated than that.
> In Pinot, for example, each partition need to be assigned a version for ensuring data consistency.
> When a new version comes, the system needs to replace the old partition with the new one. And the replacement is done one partition by one partition. So any reads during this period will get inconsistent data.
> Pinot system cannot directly put the version information into the section(partition) state field because it is already occupied by the main state (offline-online for instance) used by Helix controller.
> So Pinot team relies on some workarounds to implement their application logic: creating a new resource with the latest version and replace them after the resource is fully loaded. And for Helix controller, version is unknown.
> Another option is Pinot team maintaining their own config item or property store item for recording versions.
> Both ways require Pinot team implementing version control themselves.
> Another requirement is from Ambry team. Where partition can be "ONLINE:READ" or "ONLINE:WRITE".
> In both cases, single state mechanism is not sufficient for applications' requirement.
> It would be very helpful to provide a framework level feature that supports more than one states for each partition.
> Benefits: 
> # The application doesn't need to write additional code for managing additional states.
> # Avoid potential conflict when multiple states transition happens concurrently.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)