You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@sentry.apache.org by "Na Li (JIRA)" <ji...@apache.org> on 2018/04/19 16:30:00 UTC
[jira] [Comment Edited] (SENTRY-2203) Leader Lock is not released when Sentry service shuts down

    [ https://issues.apache.org/jira/browse/SENTRY-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444340#comment-16444340 ] 

Na Li edited comment on SENTRY-2203 at 4/19/18 4:29 PM:
--------------------------------------------------------

[~akolb]
1) I found a bug in my testing code, and that is why I saw no leader elected in testing code, and thought I had reproduced the issue that sometimes, no leader is elected. After I fixed it, then the test passed without my fix. So the root cause of the issue is not caused by sentry. I believe now it is bug in zookeeper.

2) The details of leader election algorithm is described in http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks. As you can see, when the session of a host to zookeeper is down, its znode should be removed based on the description. But it looks like the znode was not removed under some conditioned.

{code}
Leader Election

A simple way of doing leader election with ZooKeeper is to use the SEQUENCE|EPHEMERAL flags when creating znodes that represent "proposals" of clients. The idea is to have a znode, say "/election", such that each znode creates a child znode "/election/n_" with both flags SEQUENCE|EPHEMERAL. With the sequence flag, ZooKeeper automatically appends a sequence number that is greater that any one previously appended to a child of "/election". The process that created the znode with the smallest appended sequence number is the leader.

That's not all, though. It is important to watch for failures of the leader, so that a new client arises as the new leader in the case the current leader fails. A trivial solution is to have all application processes watching upon the current smallest znode, and checking if they are the new leader when the smallest znode goes away (note that the smallest znode will go away if the leader fails because the node is ephemeral). But this causes a herd effect: upon of failure of the current leader, all other processes receive a notification, and execute getChildren on "/election" to obtain the current list of children of "/election". If the number of clients is large, it causes a spike on the number of operations that ZooKeeper servers have to process. To avoid the herd effect, it is sufficient to watch for the next znode down on the sequence of znodes. If a client receives a notification that the znode it is watching is gone, then it becomes the new leader in the case that there is no smaller znode. Note that this avoids the herd effect by not having all clients watching the same znode.

Here's the pseudo code:

Let ELECTION be a path of choice of the application. To volunteer to be a leader:

    Create znode z with path "ELECTION/n_" with both SEQUENCE and EPHEMERAL flags;

    Let C be the children of "ELECTION", and i be the sequence number of z;

    Watch for changes on "ELECTION/n_j", where j is the smallest sequence number such that j < i and n_j is a znode in C;

Upon receiving a notification of znode deletion:

    Let C be the new set of children of ELECTION;

    If z is the smallest node in C, then execute leader procedure;

    Otherwise, watch for changes on "ELECTION/n_j", where j is the smallest sequence number such that j < i and n_j is a znode in C;

Note that the znode having no preceding znode on the list of children does not imply that the creator of this znode is aware that it is the current leader. Applications may consider creating a separate to znode to acknowledge that the leader has executed the leader procedure. 
{code}

3) The curator code below shows that the znode is created with both SEQUENCE and EPHEMERAL flags. So zookeeper will remove that znode when session terminates.
{code}
In org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver at sentry server, create ephemeral node

  public String createsTheLock(CuratorFramework client, String path, byte[] lockNodeBytes) throws Exception {
    String ourPath;
    if (lockNodeBytes != null) {
      ourPath = (String)((ACLBackgroundPathAndBytesable)client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL)).forPath(path, lockNodeBytes);
    } else {
      ourPath = (String)((ACLBackgroundPathAndBytesable)client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL)).forPath(path);
    }

    return ourPath;
  }
{code}


was (Author: linaataustin):
[~akolb]
1) I found a bug in my testing code, and that is why I saw no leader elected in testing code, and thought I had reproduced the issue that sometimes, no leader is elected. After I fixed it, then the test passed without my fix. So the root cause of the smoke test, which is c6 blocker, is not caused by sentry. I believe now it is bug in zookeeper.

2) The details of leader election algorithm is described in http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks. As you can see, when the session of a host to zookeeper is down, its znode should be removed based on the description. But it looks like the znode was not removed under some conditioned.

{code}
Leader Election

A simple way of doing leader election with ZooKeeper is to use the SEQUENCE|EPHEMERAL flags when creating znodes that represent "proposals" of clients. The idea is to have a znode, say "/election", such that each znode creates a child znode "/election/n_" with both flags SEQUENCE|EPHEMERAL. With the sequence flag, ZooKeeper automatically appends a sequence number that is greater that any one previously appended to a child of "/election". The process that created the znode with the smallest appended sequence number is the leader.

That's not all, though. It is important to watch for failures of the leader, so that a new client arises as the new leader in the case the current leader fails. A trivial solution is to have all application processes watching upon the current smallest znode, and checking if they are the new leader when the smallest znode goes away (note that the smallest znode will go away if the leader fails because the node is ephemeral). But this causes a herd effect: upon of failure of the current leader, all other processes receive a notification, and execute getChildren on "/election" to obtain the current list of children of "/election". If the number of clients is large, it causes a spike on the number of operations that ZooKeeper servers have to process. To avoid the herd effect, it is sufficient to watch for the next znode down on the sequence of znodes. If a client receives a notification that the znode it is watching is gone, then it becomes the new leader in the case that there is no smaller znode. Note that this avoids the herd effect by not having all clients watching the same znode.

Here's the pseudo code:

Let ELECTION be a path of choice of the application. To volunteer to be a leader:

    Create znode z with path "ELECTION/n_" with both SEQUENCE and EPHEMERAL flags;

    Let C be the children of "ELECTION", and i be the sequence number of z;

    Watch for changes on "ELECTION/n_j", where j is the smallest sequence number such that j < i and n_j is a znode in C;

Upon receiving a notification of znode deletion:

    Let C be the new set of children of ELECTION;

    If z is the smallest node in C, then execute leader procedure;

    Otherwise, watch for changes on "ELECTION/n_j", where j is the smallest sequence number such that j < i and n_j is a znode in C;

Note that the znode having no preceding znode on the list of children does not imply that the creator of this znode is aware that it is the current leader. Applications may consider creating a separate to znode to acknowledge that the leader has executed the leader procedure. 
{code}

3) The curator code below shows that the znode is created with both SEQUENCE and EPHEMERAL flags. So zookeeper will remove that znode when session terminates.
{code}
In org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver at sentry server, create ephemeral node

  public String createsTheLock(CuratorFramework client, String path, byte[] lockNodeBytes) throws Exception {
    String ourPath;
    if (lockNodeBytes != null) {
      ourPath = (String)((ACLBackgroundPathAndBytesable)client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL)).forPath(path, lockNodeBytes);
    } else {
      ourPath = (String)((ACLBackgroundPathAndBytesable)client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL)).forPath(path);
    }

    return ourPath;
  }
{code}

> Leader Lock is not released when Sentry service shuts down
> ----------------------------------------------------------
>
>                 Key: SENTRY-2203
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2203
>             Project: Sentry
>          Issue Type: Bug
>          Components: Sentry
>    Affects Versions: 2.1.0
>            Reporter: Na Li
>            Assignee: Na Li
>            Priority: Critical
>         Attachments: SENTRY-2203.001.patch
>
>
> In our testing for sentry HA, we found after restarting sentry service without restarting zookeeper service, it is possible that none of sentry servers is elected as leader to sync with HMS.
> What happened was
> 1) When a leader is elected, the sentry server host holds the leader lock. The lock is identified by the mutexPath. All sentry servers in a cluster use the same mutexPath.
> 2) When sentry service is shutdown, the HAContext is shutdown, so its contained CuratorFrameworkImpl was shutdown, but the leader lock was still hold by the sentry server host 
> 3) When the Interruption signal from shutdown caused the leader election thread to be interrupted, releasing the leader lock failed because CuratorFrameworkImpl was not in started state. 
> 4) When sentry server restarts, acquiring the leader lock failed because it was not released. So no active sentry servers is leader. 
> 5) If releasing leader lock happened before CuratorFrameworkImpl was shutdown, this issue won't happen. If restarting zookeeper after sentry service restart, this issue won't happen.
> To fix this issue,
> Sentry LeaderStatusMonitor can deactivate the leader to release the leader lock when it is closed, so the leader lock can be guaranteed to release before CuratorFrameworkImpl is shutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)