You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Scott Blum (JIRA)" <ji...@apache.org> on 2017/05/04 21:57:04 UTC
[jira] [Comment Edited] (SOLR-10524) Explore in-memory partitioning for processing Overseer queue messages

    [ https://issues.apache.org/jira/browse/SOLR-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997500#comment-15997500 ] 

Scott Blum edited comment on SOLR-10524 at 5/4/17 9:56 PM:
-----------------------------------------------------------

Couple of thoughts:

1) In the places where you've changed Collection -> List, I would go one step further and make it a concrete ArrayList, to a) explicitly convey that the returned list is a mutable copy rather than a view of internal state and b) explicitly convey that sortAndAdd() is operating efficiently on said lists.

2) DQ.remove(id): don't you want to unconditionally knownChildren.remove(id), even if the ZK delete succeeds?

3) DQ.remove(id): there is no need to loop here, in fact you'll get stuck in an infinite loop if someone else deletes the node you're targeting.  The reason there's a loop in removeFirst() is because it's trying a different id each iteration.

Suggested remove(id) impl:

{code}
  public void remove(String id) throws KeeperException, InterruptedException {
    // Remove the ZK node *first*; ZK will resolve any races with peek()/poll().
    // This is counterintuitive, but peek()/poll() will not return an element if the underlying
    // ZK node has been deleted, so it's okay to update knownChildren afterwards.
    try {
      String path = dir + "/" + id;
      zookeeper.delete(path, -1, true);
    } catch (KeeperException.NoNodeException e) {
      // Another client deleted the node first, this is fine.
    }
    updateLock.lockInterruptibly();
    try {
      knownChildren.remove(id);
    } finally {
      updateLock.unlock();
    }
  }
{code}



was (Author: dragonsinth):
Couple of thoughts:

1) In the places where you've changed Collection -> List, I would go one step further and make it a concrete ArrayList, to a) explicitly convey that the returned list is a mutable copy rather than a view of internal state and b) explicitly convey that sortAndAdd() is operating efficiently on said lists.

2) DQ.remove(id): don't you need to unconditionally knownChildren.remove(id), even if the ZK delete succeeds?

3) DQ.remove(id): there is no need to loop here, in fact you'll get stuck in an infinite loop if someone else deletes the node you're targeting.  The reason there's a loop in removeFirst() is because it's trying a different id each iteration.

Suggested remove(id) impl:

{code}
  public void remove(String id) throws KeeperException, InterruptedException {
    // Remove the ZK node *first*; ZK will resolve any races with peek()/poll().
    // This is counterintuitive, but peek()/poll() will not return an element if the underlying
    // ZK node has been deleted, so it's okay to update knownChildren afterwards.
    try {
      String path = dir + "/" + id;
      zookeeper.delete(path, -1, true);
    } catch (KeeperException.NoNodeException e) {
      // Another client deleted the node first, this is fine.
    }
    updateLock.lockInterruptibly();
    try {
      knownChildren.remove(id);
    } finally {
      updateLock.unlock();
    }
  }
{code}


> Explore in-memory partitioning for processing Overseer queue messages
> ---------------------------------------------------------------------
>
>                 Key: SOLR-10524
>                 URL: https://issues.apache.org/jira/browse/SOLR-10524
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>         Attachments: SOLR-10524.patch, SOLR-10524.patch
>
>
> There are several JIRAs (I'll link in a second) about trying to be more efficient about processing overseer messages as the overseer can become a bottleneck, especially with very large numbers of replicas in a cluster. One of the approaches mentioned near the end of SOLR-5872 (15-Mar) was to "read large no:of items say 10000. put them into in memory buckets and feed them into overseer....".
> This JIRA is to break out that part of the discussion as it might be an easy win whereas "eliminating the Overseer queue" would be quite an undertaking.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org