You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2019/04/10 03:50:00 UTC

[jira] [Commented] (SOLR-13386) Remove race in OverseerTaskQueue#remove that can result in the Overseer causing a Zookeeper call spin spike.

    [ https://issues.apache.org/jira/browse/SOLR-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814018#comment-16814018 ] 

Mark Miller commented on SOLR-13386:
------------------------------------

We just need to catch the NoNodeException setData can throw and treat it the same as exists returning false (NOOP). I've been reviewing for any similar case to fix, but have not spotted anything yet.

{noformat}
  /**
   * Remove the event and save the response into the other path.
   */
  public void remove(QueueEvent event) throws KeeperException,
      InterruptedException {
    Timer.Context time = stats.time(dir + "_remove_event");
    try {
      String path = event.getId();
      String responsePath = dir + "/" + RESPONSE_PREFIX
          + path.substring(path.lastIndexOf("-") + 1);
      if (zookeeper.exists(responsePath, true)) {
        zookeeper.setData(responsePath, event.getBytes(), true);
      } else {
        log.info("Response ZK path: " + responsePath + " doesn't exist."
            + "  Requestor may have disconnected from ZooKeeper");
      }
      try {
        zookeeper.delete(path, -1, true);
      } catch (KeeperException.NoNodeException ignored) {
      }
    } finally {
      time.stop();
    }
  }
{noformat}

> Remove race in OverseerTaskQueue#remove that can result in the Overseer causing a Zookeeper call spin spike.
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13386
>                 URL: https://issues.apache.org/jira/browse/SOLR-13386
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Major
>             Fix For: 7.7.2, 8.1
>
>
> If the getData call hits NoNodeException, it will throw and the Overseer work queue processor will catch it and loop and repeat, which causes major zk getData / NoNode call traffic or other such things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org