You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by slackhappy <gi...@git.apache.org> on 2017/12/19 18:22:52 UTC

[GitHub] lucene-solr pull request #294: ZkStateReader: cache LazyCollectionRef

GitHub user slackhappy opened a pull request:

    https://github.com/apache/lucene-solr/pull/294

    ZkStateReader: cache LazyCollectionRef

    SOLR-10524 introduced zk state update batching, with
    a default interval of 2 seconds.  That opens
    the door for a simple, time-based cache on the read side
    to address the issue described in SOLR-8327

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/slackhappy/lucene-solr cloud_cache_lazy_collection_ref

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/294.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #294
    
----
commit e7c6a6773f1d01d2ddb0dbce6cdbaeff52e78376
Author: John Gallagher <jg...@slack-corp.com>
Date:   2017-12-19T16:57:25Z

    ZkStateReader: cache LazyCollectionRef
    
    SOLR-10524 introduced zk state update batching, with
    a default interval of 2 seconds.  That opens
    the door for a simple, time-based cache on the read side
    to address the issue described in SOLR-8327

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #294: ZkStateReader: cache LazyCollectionRef

Posted by slackhappy <gi...@git.apache.org>.

Github user slackhappy commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/294#discussion_r157842171
  
    --- Diff: solr/core/src/java/org/apache/solr/cloud/Overseer.java ---
    @@ -68,7 +68,7 @@
       public static final String QUEUE_OPERATION = "operation";
     
       // System properties are used in tests to make them run fast
    -  public static final int STATE_UPDATE_DELAY = Integer.getInteger("solr.OverseerStateUpdateDelay", 2000);  // delay between cloud state updates
    +  public static final int STATE_UPDATE_DELAY = ZkStateReader.STATE_UPDATE_DELAY;
    --- End diff --
    
    Moved so that I could access this setting in ZkStateReader, but left an alias here for locality.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr issue #294: ZkStateReader: cache LazyCollectionRef (SOLR-8327)

Posted by slackhappy <gi...@git.apache.org>.

Github user slackhappy commented on the issue:

    https://github.com/apache/lucene-solr/pull/294
  
    I'll look into the test failures.  I actually didn't mean to create the PR yet 😳 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr issue #294: ZkStateReader: cache LazyCollectionRef (SOLR-8327)

Posted by dragonsinth <gi...@git.apache.org>.

Github user dragonsinth commented on the issue:

    https://github.com/apache/lucene-solr/pull/294
  
    This approach seems fine to me.  Remind me why we use nanoTime vs. normal clock?  I'm sure you're right I just want to refresh my brain.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr issue #294: ZkStateReader: cache LazyCollectionRef (SOLR-8327)

Posted by chatman <gi...@git.apache.org>.

Github user chatman commented on the issue:

    https://github.com/apache/lucene-solr/pull/294
  
    Seems like there are some test failures due to this change:
    
       [junit4] Tests with failures [seed: DE1D5337E38D2C32]:
       [junit4]   - org.apache.solr.cloud.TestPullReplica.testRemoveAllWriterReplicas
       [junit4]   - org.apache.solr.cloud.TestPullReplica.testAddRemovePullReplica
       [junit4]   - org.apache.solr.cloud.CollectionTooManyReplicasTest.testAddTooManyReplicas
       [junit4]   - org.apache.solr.cloud.CollectionsAPIDistributedZkTest.addReplicaTest
       [junit4]   - org.apache.solr.cloud.DeleteShardTest.testDirectoryCleanupAfterDeleteShard
       [junit4]   - org.apache.solr.cloud.TestCloudRecovery.corruptedLogTest
       [junit4]   - org.apache.solr.cloud.TestCloudRecovery.leaderRecoverFromLogOnStartupTest
       [junit4]   - org.apache.solr.cloud.TestUtilizeNode.test
       [junit4]   - org.apache.solr.cloud.TestTlogReplica.testAddRemoveTlogReplica
    
    Haven't looked into them yet, though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr issue #294: ZkStateReader: cache LazyCollectionRef (SOLR-8327)

Posted by slackhappy <gi...@git.apache.org>.

Github user slackhappy commented on the issue:

    https://github.com/apache/lucene-solr/pull/294
  
    I updated my PR to target SOLR-8327 more specifically, and got the tests to pass.  I think a smarter approach like that used by CloudSolrClient would be great. My understanding of the change in SOLR-10524 is that even the smartest/fastest updates of zookeeper data won't match the real-world state of the cluster in many situations, such as replica state changes, because those will be batched, but certainly a smarter approach would narrow that gap as much as possible, in addition to reducing the amount of state fetching.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr issue #294: ZkStateReader: cache LazyCollectionRef (SOLR-8327)

Posted by elyograg <gi...@git.apache.org>.

Github user elyograg commented on the issue:

    https://github.com/apache/lucene-solr/pull/294
  
    Java programs are migrating to nanoTime instead of currentTimeMillis for elapsed time because many people have found that the latter will go *backwards* on occasion.  It is not monotonic.
    
    Using nanoTime should be far less likely to go backwards.  That undesirable behavior has been observed in the wild, but should be rare.  Supposedly nanoTime is monotonic if the OS properly supports a monotonic clock.  There's a lot of info out there about it:
    
    https://www.google.com/search?q=java+nanotime+monotonic
    
    The fact that nanoTime *might* produce elapsed times with greater accuracy than one millisecond is a bonus.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #294: ZkStateReader: cache LazyCollectionRef

Posted by slackhappy <gi...@git.apache.org>.

Github user slackhappy commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/294#discussion_r157842876
  
    --- Diff: solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java ---
    @@ -632,19 +636,24 @@ private void refreshCollectionList(Watcher watcher) throws KeeperException, Inte
       }
     
       private class LazyCollectionRef extends ClusterState.CollectionRef {
    -
         private final String collName;
    +    private long lastUpdateTime;
    +    private DocCollection cachedDocCollection;
     
         public LazyCollectionRef(String collName) {
           super(null);
           this.collName = collName;
    +      this.lastUpdateTime = -1;
         }
     
         @Override
    -    public DocCollection get() {
    +    public synchronized DocCollection get() {
    --- End diff --
    
    I thought synchronized here would provide a good enough performance increase without the complexity of other approches


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org