You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Viniti (Jira)" <ji...@apache.org> on 2020/06/05 03:55:00 UTC

[jira] [Comment Edited] (CURATOR-573) No leader is getting selected intermittently

    [ https://issues.apache.org/jira/browse/CURATOR-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126352#comment-17126352 ] 

Viniti edited comment on CURATOR-573 at 6/5/20, 3:54 AM:
---------------------------------------------------------

[~randgalt] I am facing this issue intermittently(last time was 15 days ago) on my staging environment, could not replicate on my local environment. When LSAdapter and CuratorClient are closed, I see below logs as well, if that helps:

2020-06-04 18:23:29 INFO  CuratorFrameworkImpl:937 - backgroundOperationsLoop exiting
2020-06-04 18:23:29 ERROR LeaderSelector:454 - The leader threw an exception
java.lang.IllegalStateException: instance must be started before calling this method
        at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:424)
        at org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347)
        at org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124)
        at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
        at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449)
        at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466)
        at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65)
        at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246)
        at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2020-06-04 18:23:29 INFO  ClientCnxn:524 - EventThread shut down for session: 0x302f056e0960036
2020-06-04 18:23:29 INFO  ZooKeeper:1422 - Session: 0x302f056e0960036 closed
2020-06-04 18:23:29 INFO  CuratorFrameworkImpl:937 - backgroundOperationsLoop exiting
2020-06-04 18:23:29 INFO  ZooKeeper:1422 - Session: 0x302f056e0960035 closed
2020-06-04 18:23:29 INFO  ClientCnxn:524 - EventThread shut down for session: 0x302f056e0960035


was (Author: viniti):
[~randgalt] I am facing this issue intermittently(last time was 15 days ago) on my staging environment, could not replicate on my local environment. I see below logs as well, if that helps:

2020-06-04 18:23:29 INFO  CuratorFrameworkImpl:937 - backgroundOperationsLoop exiting
2020-06-04 18:23:29 ERROR LeaderSelector:454 - The leader threw an exception
java.lang.IllegalStateException: instance must be started before calling this method
        at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:424)
        at org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347)
        at org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124)
        at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
        at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449)
        at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466)
        at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65)
        at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246)
        at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2020-06-04 18:23:29 INFO  ClientCnxn:524 - EventThread shut down for session: 0x302f056e0960036
2020-06-04 18:23:29 INFO  ZooKeeper:1422 - Session: 0x302f056e0960036 closed
2020-06-04 18:23:29 INFO  CuratorFrameworkImpl:937 - backgroundOperationsLoop exiting
2020-06-04 18:23:29 INFO  ZooKeeper:1422 - Session: 0x302f056e0960035 closed
2020-06-04 18:23:29 INFO  ClientCnxn:524 - EventThread shut down for session: 0x302f056e0960035

> No leader is getting selected intermittently
> --------------------------------------------
>
>                 Key: CURATOR-573
>                 URL: https://issues.apache.org/jira/browse/CURATOR-573
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Apache, Framework, Recipes
>    Affects Versions: 4.0.1
>            Reporter: Viniti
>            Priority: Critical
>
> I am using Apache Curator Leader Election Recipe : https://curator.apache.org/curator-recipes/leader-election.html in my application.
> Zookeeper version : 3.5.7
> Curator : 4.0.1
> Below are the sequence of steps:
> 1. Whenever my tomcat server instance is getting up, I create a single CuratorFramework instance(single instance per tomcat server) and start it : 
> ```
> CuratorFramework client = CuratorFrameworkFactory.newClient(connectionString, retryPolicy);
> client.start();
> if(!client.blockUntilConnected(10, TimeUnit.MINUTES)){
>  LOGGER.error("Zookeeper connection could not establish!");
>  throw new RuntimeException("Zookeeper connection could not establish");
> }
> ```
> 2. Create an instance of LSAdapter and start it:
> ```
> LSAdapter adapter = new LSAdapter(client, <some_metadata>);
> adapter.start();
> ```
> Below is my LSAdapter class :
> ```
> public class LSAdapter extends LeaderSelectorListenerAdapter implements Closeable {
> //<Class instance variables defined>
>  public LSAdapter(CuratorFramework client, <some_metadata>) {
>  leaderSelector = new LeaderSelector(client, <path_to_be_used_for_leader_election>, this);
>  leaderSelector.autoRequeue();
>  }
> public void start() throws IOException {
>  leaderSelector.start();
>  }
> @Override
>  public void close() throws IOException {
>  leaderSelector.close();
>  }
> @Override
>  public void takeLeadership(CuratorFramework client) throws Exception {
>  final int waitSeconds = (int) (5 * Math.random()) + 1;
> LOGGER.info(name + " is now the leader. Waiting " + waitSeconds + " seconds...");
>  LOGGER.debug(name + " has been leader " + leaderCount.getAndIncrement() + " time(s) before.");
>  while (true) {
>  try {
>  Thread.sleep(TimeUnit.SECONDS.toMillis(waitSeconds));
>  //do leader tasks
>  } catch (InterruptedException e) {
>  LOGGER.error(name + " was interrupted.");
>  //cleanup
>  Thread.currentThread().interrupt();
>  } finally {
> }
>  }
>  }
> }
> ```
> 4. When server instance is getting down, close LSAdapter instance(which application is using) and close CuratorFramework client created
> ```
> CloseableUtils.closeQuietly(lsAdapter);
> curatorFrameworkClient.close();
> ```
> The issue I am facing is that at times, when server is restarted, no leader gets elected. I checked that by tracing the log inside takeLeadership(). I have two tomcat server instances with above code, connecting to same zookeeper quorum and most of the times one of the instance becomes leader but when this issue happens, both of them becomes follower. Please suggest what am I doing wrong.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)