You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/01/31 16:10:30 UTC

[GitHub] [solr] dsmiley opened a new pull request #580: SOLR-15964: Transient cores: don't evict open ones

dsmiley opened a new pull request #580:
URL: https://github.com/apache/solr/pull/580


   https://issues.apache.org/jira/browse/SOLR-15964
   
   First commit shows the test that fails, subsequent will fix the bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] bruno-roustant commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797345078



##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2284,8 +2285,10 @@ public void run() {
 
       SolrCore core;
       while (!container.isShutDown() && (core = solrCores.getCoreToClose()) != null) {
+        assert core.getOpenCount() == 1;
         try {
-          core.close();
+          MDCLoggingContext.setCore(core);

Review comment:
       :+1:




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley merged pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
dsmiley merged pull request #580:
URL: https://github.com/apache/solr/pull/580


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] ben-manes commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
ben-manes commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797143819



##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,21 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (core.getOpenCount() > 1) {

Review comment:
       The way to handle this is to [pin entries](https://github.com/ben-manes/caffeine/wiki/Faq#pinning-entries). When an entry should be pinned you can update it through the cache (e.g. `cache.asMap.compute`) and have a `Weigher` to evaluate the entry's size as `0`. Then it won't take any capacity and be skipped over during size eviction. Similarly for expiration using an `Expiry` that gives an infinite lifetime.
   
   This approach is preferred because then the cache won't surprisingly violate the contracts, won't perform O(n) scans searching for candidates, and the decisions are more explicit. Caffeine skips over zero-weight entries, but in the future it could move them elsewhere. So far this pin/unpin approach seems to clearer than Ehcache's [EvictionAdvisor](https://github.com/ehcache/ehcache3/blob/master/ehcache-api/src/main/java/org/ehcache/config/EvictionAdvisor.java) (a soft no; will evict an entry [regardless](https://github.com/ehcache/ehcache3/blob/5cfa96b868423f0fbb19bf27ead647823b14a165/ehcache-impl/src/main/java/org/ehcache/impl/internal/store/heap/OnHeapStore.java#L1578-L1583) of the advice) and Coherence's [EvictionApprover](https://github.com/oracle/coherence/blob/d97c16ca8069143a8117356914da1bc341e8c19f/prj/coherence-core/src/main/java/com/tangosol/net/cache/ConfigurableCacheMap.java#L331-L364) is a hard no (which could lead to heap exhaustion), but please let me know if you ru
 n into issues that advocate for reevaluating our approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797965685



##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -261,23 +275,16 @@ public void testCachingLimit() throws Exception {
         assertNotNull(o);
       }
 
-
-      // Note decrementing the count when the core is removed from the lazyCores list is appropriate, since the
-      // refcount is 1 when constructed. anyone _else_ who's opened up one has to close it.
-      core1.close();
-      core2.close();
-      core3.close();
-      core4.close();
-      core5.close();
-      core6.close();
-      core7.close();
-      core8.close();
-      core9.close();
     } finally {
       cc.shutdown();
     }
   }
 
+  private void getCoreAndPutBack(CoreContainer cc, String name) {

Review comment:
       I think of it as taking from a bag and putting back.  The details of this are calling "close" to put back but that evokes a sense of truly closing it which is inaccurate.  When this method is called, it's not truly going to be closed.
   I'll comment on the eviction criteria.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] bruno-roustant commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797347357



##########
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##########
@@ -377,6 +370,12 @@ protected CoreDescriptor getUnloadedCoreDescriptor(String cname) {
     }
   }
 
+  boolean hasPendingCoreOps(String name) {

Review comment:
       Should it be protected?

##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2084,12 +2084,13 @@ public boolean isLoaded(String name) {
     return solrCores.isLoaded(name);
   }
 
-  public boolean isLoadedNotPendingClose(String name) {
-    return solrCores.isLoadedNotPendingClose(name);
+  /** The core is loading, unloading, or reloading. */
+  boolean hasPendingCoreOps(String name) {

Review comment:
       Shouldn't it be public? This seems useful information.

##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -899,4 +890,67 @@ private void check10(SolrCore core) {
         , "//result[@numFound='10']"
     );
   }
+
+  public void testDontEvictUsedCore() throws Exception {
+    // If a core is being used for a long time (say a long indexing batch) but nothing else for it,
+    // and if the transient cache has pressure and thus wants to unload a core, we should not
+    // unload it (yet).
+
+    CoreContainer cc = init();
+    String[] transientCoreNames = new String[]{
+        "collection2",
+        "collection3",
+        "collection6",
+        "collection7",
+        "collection8",
+        "collection9"
+    };
+
+    try (LogListener logs = LogListener.info(TransientSolrCoreCacheDefault.class.getName())
+        .substring("NOT evicting transient core [" + transientCoreNames[0] + "]")) {
+      cc.waitForLoadingCoresToFinish(1000);
+      var solr = new EmbeddedSolrServer(cc, null);
+      final var longReqTimeMs = 2000; // if lower too much, the test will fail on a slow/busy CI
+
+      // First, start a long request on the first transient core
+      var thread = new Thread(() -> {
+        try {
+          // TODO Solr ought to have a query test "latch" mechanism so we don't sleep arbitrarily
+          solr.query(transientCoreNames[0], params("q", "{!func}sleep(" + longReqTimeMs + ",1)"));
+        } catch (SolrServerException | IOException e) {
+          fail(e.toString());
+        }
+      }, "longRequest");
+      thread.start();
+
+      System.out.println("Inducing pressure on cache by querying many cores...");
+      // Now hammer on other transient cores to create transient cache pressure
+      for (int round = 0; round < 5 && logs.getCount() == 0; round++) {
+        // note: we skip over the first; we want the first to remain non-busy
+        for (int i = 1; i < transientCoreNames.length; i++) {
+          solr.query(transientCoreNames[i], params("q", "*:*"));
+        }
+      }
+      // Show that the cache logs that it was asked to evict but did not.
+      // To show the bug behavior, comment this out and also comment out the corresponding logic
+      // that fixes it at the spot this message is logged.
+      assertTrue(logs.getCount() > 0);

Review comment:
       Where would this test fail if we uncommented the logic that fixes the problem?

##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2284,8 +2285,10 @@ public void run() {
 
       SolrCore core;
       while (!container.isShutDown() && (core = solrCores.getCoreToClose()) != null) {
+        assert core.getOpenCount() == 1;
         try {
-          core.close();
+          MDCLoggingContext.setCore(core);

Review comment:
       +1

##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's loading or something else", core.getName());

Review comment:
       Maybe log the cache size?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] murblanc commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
murblanc commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797700536



##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's loading or something else", core.getName());
+      }
+      transientCores.put(core.getName(), core); // put back

Review comment:
       This `put` and the one below being called from a `removalListener`, they happen **after** the eviction from the cache.
   I assume that if another thread tries to access the core (and therefore tries to create a new one?) it will see the error this PR is fixing, but once the core is put back here, things should be ok.
   Not suggesting any change, just checking my understanding that there might be some race condition transient errors around this.

##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's loading or something else", core.getName());
+      }
+      transientCores.put(core.getName(), core); // put back
+    } else if (core.getOpenCount() > 1) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's still in use", core.getName());
+      }
+      transientCores.put(core.getName(), core); // put back
+    } else {
+      if (log.isInfoEnabled()) {
+        log.info("Closing transient core [{}] evicted from the cache", core.getName());
+      }
+      coreContainer.queueCoreToClose(core);

Review comment:
       What happens if once the core was added here to the `SolrCores.pendingCloses` but before the close actually happens, some thread reopens and reloads the core?

##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -212,43 +225,44 @@ public void testCachingLimit() throws Exception {
       checkCoresNotLoaded(cc, "collection3", "collection4", "collection6", "collection7", "collection8", "collection9");
 
       // By putting these in non-alpha order, we're also checking that we're  not just seeing an artifact.
-      SolrCore core1 = cc.getCore("collection1");
-      SolrCore core3 = cc.getCore("collection3");
-      SolrCore core4 = cc.getCore("collection4");
-      SolrCore core2 = cc.getCore("collection2");
-      SolrCore core5 = cc.getCore("collection5");
+      getCoreAndPutBack(cc, "collection1");
+      getCoreAndPutBack(cc, "collection3");
+      getCoreAndPutBack(cc, "collection4");
+      getCoreAndPutBack(cc, "collection2");
+      getCoreAndPutBack(cc, "collection5");
 
       checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5");
       checkCoresNotLoaded(cc, "collection6", "collection7", "collection8", "collection9");
 
       // map should be full up, add one more and verify
-      SolrCore core6 = cc.getCore("collection6");
+      getCoreAndPutBack(cc, "collection6");
       checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5",
           "collection6");
       checkCoresNotLoaded(cc, "collection7", "collection8", "collection9");
 
-      SolrCore core7 = cc.getCore("collection7");
+      getCoreAndPutBack(cc, "collection7");
       checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5",
           "collection6", "collection7");
       checkCoresNotLoaded(cc, "collection8", "collection9");
 
-      SolrCore core8 = cc.getCore("collection8");
+      getCoreAndPutBack(cc, "collection8");
       checkLoadedCores(cc, "collection1", "collection4", "collection5", "collection8");

Review comment:
       Help me: why are we sure here that collection1, collection4 and collection5 cores are loaded whereas collection2 and collection3 for example might not? @bruno-roustant maybe you know?

##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2284,8 +2285,10 @@ public void run() {
 
       SolrCore core;
       while (!container.isShutDown() && (core = solrCores.getCoreToClose()) != null) {
+        assert core.getOpenCount() == 1;

Review comment:
       assert is only enabled in tests usually, right?
   
   What happens if the core gets reopened while we close it here? Is it a case of the open count being higher than one in a legitimate way?

##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {

Review comment:
       After offline discussion with David, this can happen when a core load takes too much time (registering in ZK etc) and we try to unload it before it has completed. Likely deserves a comment here.
   
   This means the cache can grow pretty big, as evictions will not be allowed for a while, or if we prevent additions to the cache when it's over max size, this can cause global slowdown on node startup.
   
   If the cache grows big, we need to make sure there is some process shrinking it back to reasonable size in reasonable time. If we only evict one core when we add one (I don't know if that's the case), the cache might never reduce in size.

##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -261,23 +275,16 @@ public void testCachingLimit() throws Exception {
         assertNotNull(o);
       }
 
-
-      // Note decrementing the count when the core is removed from the lazyCores list is appropriate, since the
-      // refcount is 1 when constructed. anyone _else_ who's opened up one has to close it.
-      core1.close();
-      core2.close();
-      core3.close();
-      core4.close();
-      core5.close();
-      core6.close();
-      core7.close();
-      core8.close();
-      core9.close();
     } finally {
       cc.shutdown();
     }
   }
 
+  private void getCoreAndPutBack(CoreContainer cc, String name) {

Review comment:
       `getAndCloseCore` a better name maybe?
   And comment that core needs to be closed for it to be an eviction candidate.

##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2284,8 +2285,10 @@ public void run() {
 
       SolrCore core;
       while (!container.isShutDown() && (core = solrCores.getCoreToClose()) != null) {
+        assert core.getOpenCount() == 1;
         try {
-          core.close();
+          MDCLoggingContext.setCore(core);

Review comment:
       Nice. Wasn't obvious for me at first glance that it was for setting the right logging context for the `close()` call.

##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's loading or something else", core.getName());

Review comment:
       Need to update comment line 72 above since the cache max size is not strictly enforced anymore. I don't think the cache can grow extremely big unless all the cores we try to evict happen to be still in use, which is unlikely.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797970632



##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2284,8 +2285,10 @@ public void run() {
 
       SolrCore core;
       while (!container.isShutDown() && (core = solrCores.getCoreToClose()) != null) {
+        assert core.getOpenCount() == 1;

Review comment:
       The closer thread only operates on cores that have already been removed from access (e.g. to be opened for a new query).  And a request to open it (transiently?) will block because this queue is considered in `SolrCores.pendingCoreOps`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797956081



##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -899,4 +890,67 @@ private void check10(SolrCore core) {
         , "//result[@numFound='10']"
     );
   }
+
+  public void testDontEvictUsedCore() throws Exception {
+    // If a core is being used for a long time (say a long indexing batch) but nothing else for it,
+    // and if the transient cache has pressure and thus wants to unload a core, we should not
+    // unload it (yet).
+
+    CoreContainer cc = init();
+    String[] transientCoreNames = new String[]{
+        "collection2",
+        "collection3",
+        "collection6",
+        "collection7",
+        "collection8",
+        "collection9"
+    };
+
+    try (LogListener logs = LogListener.info(TransientSolrCoreCacheDefault.class.getName())
+        .substring("NOT evicting transient core [" + transientCoreNames[0] + "]")) {
+      cc.waitForLoadingCoresToFinish(1000);
+      var solr = new EmbeddedSolrServer(cc, null);
+      final var longReqTimeMs = 2000; // if lower too much, the test will fail on a slow/busy CI
+
+      // First, start a long request on the first transient core
+      var thread = new Thread(() -> {
+        try {
+          // TODO Solr ought to have a query test "latch" mechanism so we don't sleep arbitrarily
+          solr.query(transientCoreNames[0], params("q", "{!func}sleep(" + longReqTimeMs + ",1)"));
+        } catch (SolrServerException | IOException e) {
+          fail(e.toString());
+        }
+      }, "longRequest");
+      thread.start();
+
+      System.out.println("Inducing pressure on cache by querying many cores...");
+      // Now hammer on other transient cores to create transient cache pressure
+      for (int round = 0; round < 5 && logs.getCount() == 0; round++) {
+        // note: we skip over the first; we want the first to remain non-busy
+        for (int i = 1; i < transientCoreNames.length; i++) {
+          solr.query(transientCoreNames[i], params("q", "*:*"));
+        }
+      }
+      // Show that the cache logs that it was asked to evict but did not.
+      // To show the bug behavior, comment this out and also comment out the corresponding logic
+      // that fixes it at the spot this message is logged.
+      assertTrue(logs.getCount() > 0);

Review comment:
       ;-) it's complicated.  
   
   In a realistic setting, I saw Solr lose track of the fact that the SolrCore was actually being used, despite the closer thread calling close() on it.  The consequence of that was that the lock file is still on disk (while the core is still open), and thus a future request to the core would try to open a new SolrCore but fail to because of the presence of the lock file.  This failure is "sticky"; a node restart is needed to remedy for the affected core.
   
   In this test setting, the long running request hung indefinitely.  The test fails because it joins for a limited period of time and asserts that the thread is no longer alive.  When the request internally gets to the end and it closes its SolrCore reference, this does a real core close, not merely a decref because the core closer thread already decref'ed it.  For reasons I'm unsure of, I see it hanging on CachingDirectoryFactory.close.  There's a twelve second wait in there "Timeout waiting for all directory ref counts to be released".




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797963815



##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's loading or something else", core.getName());
+      }
+      transientCores.put(core.getName(), core); // put back
+    } else if (core.getOpenCount() > 1) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's still in use", core.getName());
+      }
+      transientCores.put(core.getName(), core); // put back
+    } else {
+      if (log.isInfoEnabled()) {
+        log.info("Closing transient core [{}] evicted from the cache", core.getName());
+      }
+      coreContainer.queueCoreToClose(core);

Review comment:
       This happens under a synchronized lock on SolrCores.modifyLock.  Also, pendingCloses is sort of a peer to pendingCoreOps, thus a pending close blocks any opening of this core.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] bruno-roustant commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r798426152



##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -212,43 +225,44 @@ public void testCachingLimit() throws Exception {
       checkCoresNotLoaded(cc, "collection3", "collection4", "collection6", "collection7", "collection8", "collection9");
 
       // By putting these in non-alpha order, we're also checking that we're  not just seeing an artifact.
-      SolrCore core1 = cc.getCore("collection1");
-      SolrCore core3 = cc.getCore("collection3");
-      SolrCore core4 = cc.getCore("collection4");
-      SolrCore core2 = cc.getCore("collection2");
-      SolrCore core5 = cc.getCore("collection5");
+      getCoreAndPutBack(cc, "collection1");
+      getCoreAndPutBack(cc, "collection3");
+      getCoreAndPutBack(cc, "collection4");
+      getCoreAndPutBack(cc, "collection2");
+      getCoreAndPutBack(cc, "collection5");
 
       checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5");
       checkCoresNotLoaded(cc, "collection6", "collection7", "collection8", "collection9");
 
       // map should be full up, add one more and verify
-      SolrCore core6 = cc.getCore("collection6");
+      getCoreAndPutBack(cc, "collection6");
       checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5",
           "collection6");
       checkCoresNotLoaded(cc, "collection7", "collection8", "collection9");
 
-      SolrCore core7 = cc.getCore("collection7");
+      getCoreAndPutBack(cc, "collection7");
       checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5",
           "collection6", "collection7");
       checkCoresNotLoaded(cc, "collection8", "collection9");
 
-      SolrCore core8 = cc.getCore("collection8");
+      getCoreAndPutBack(cc, "collection8");
       checkLoadedCores(cc, "collection1", "collection4", "collection5", "collection8");

Review comment:
       Simply because collection1, colllection4 and collection5 are not transient so they are not in the transient core cache with limited size. See the testCores constant above (isTransient is the 2nd param - thanks IntelliJ for displaying the param name).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] murblanc commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
murblanc commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797792950



##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {

Review comment:
       After offline discussion with David, this can happen when a core load takes too much time (registering in ZK etc) and we try to unload it before it has completed. Likely deserves a comment here.
   
   This means the cache can grow pretty big, as evictions will not be allowed for a while, or if we prevent additions to the cache when it's over max size, this can cause global slowdown on node startup.
   
   If the cache grows big, we need to make sure there is some process shrinking it back to reasonable size in reasonable time. If we only evict one core when we add one (I don't know if that's the case), the cache might never reduce in size.
   
   **EDIT: after more discussion with David, this likely cannot happen if the core loading thread pool is smaller than the transient core cache limit.**




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] ben-manes commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
ben-manes commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797197119



##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,21 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (core.getOpenCount() > 1) {

Review comment:
       That's make sense. I think informing the cache that the entry's state changed is desirable for managing resources, but understandably hard as projects evolve without that behavior in mind. If you obtain metrics that indicate a problem then it will be interesting to discuss our options. I think your approach is pretty good as a workaround and sounds safe enough.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] murblanc commented on pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
murblanc commented on pull request #580:
URL: https://github.com/apache/solr/pull/580#issuecomment-1026003814


   I assume that in purpose for now only the test is in the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] bruno-roustant commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797064012



##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,21 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (core.getOpenCount() > 1) {

Review comment:
       @ben-manes is there a way to check this condition (core.getOpenCount() <= 1) during the selection of the entry to evict in Caffeine cache?
   Here we put back the evicted entry if this condition is not satisfied, potentially going beyond the max size defined, whereas there may be other potential entries to evict for which this condition is satisfied.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797605216



##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2084,12 +2084,13 @@ public boolean isLoaded(String name) {
     return solrCores.isLoaded(name);
   }
 
-  public boolean isLoadedNotPendingClose(String name) {
-    return solrCores.isLoadedNotPendingClose(name);
+  /** The core is loading, unloading, or reloading. */
+  boolean hasPendingCoreOps(String name) {

Review comment:
       I'd prefer the reverse -- don't even have it on CoreContainer but weirdly the transient core cache doesn't have access to SolrCores despite it being intertwined with it.  On our massive codebase, I'd prefer we be mindful of the use of "public".

##########
File path: solr/core/src/java/org/apache/solr/core/SolrCores.java
##########
@@ -377,6 +370,12 @@ protected CoreDescriptor getUnloadedCoreDescriptor(String cname) {
     }
   }
 
+  boolean hasPendingCoreOps(String name) {

Review comment:
       SolrCores isn't sub-classed.  I suspect you asked this because the use of protected is pervasive across other methods here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] murblanc commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
murblanc commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r798432198



##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -212,43 +225,44 @@ public void testCachingLimit() throws Exception {
       checkCoresNotLoaded(cc, "collection3", "collection4", "collection6", "collection7", "collection8", "collection9");
 
       // By putting these in non-alpha order, we're also checking that we're  not just seeing an artifact.
-      SolrCore core1 = cc.getCore("collection1");
-      SolrCore core3 = cc.getCore("collection3");
-      SolrCore core4 = cc.getCore("collection4");
-      SolrCore core2 = cc.getCore("collection2");
-      SolrCore core5 = cc.getCore("collection5");
+      getCoreAndPutBack(cc, "collection1");
+      getCoreAndPutBack(cc, "collection3");
+      getCoreAndPutBack(cc, "collection4");
+      getCoreAndPutBack(cc, "collection2");
+      getCoreAndPutBack(cc, "collection5");
 
       checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5");
       checkCoresNotLoaded(cc, "collection6", "collection7", "collection8", "collection9");
 
       // map should be full up, add one more and verify
-      SolrCore core6 = cc.getCore("collection6");
+      getCoreAndPutBack(cc, "collection6");
       checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5",
           "collection6");
       checkCoresNotLoaded(cc, "collection7", "collection8", "collection9");
 
-      SolrCore core7 = cc.getCore("collection7");
+      getCoreAndPutBack(cc, "collection7");
       checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5",
           "collection6", "collection7");
       checkCoresNotLoaded(cc, "collection8", "collection9");
 
-      SolrCore core8 = cc.getCore("collection8");
+      getCoreAndPutBack(cc, "collection8");
       checkLoadedCores(cc, "collection1", "collection4", "collection5", "collection8");

Review comment:
       Thanks @bruno-roustant. Giving different names to the collections to reflect their properties would have saved me from asking a stupid question and would make the test easier to read/understand (I know this is ooooold code).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797171492



##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,21 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (core.getOpenCount() > 1) {

Review comment:
       From the pinning docs:
   
   > The weight and expiration are evaluated when the entry is written into the cache.
   
   This suggests that if the weight changes, then it needs to be re-put into the cache?  For SolrCores in this cache, I think it would be awkward/entangled to do this.  Basically whenever anyone requests or returns a core, we call this method.  Not much lines-of-code but it's conceptually strange and readers of the code and our future selves will wonder what's going on.
   
   Any way, I think the approach here (put back on eviction) is expected to be rare.  Basically, the least requested item (probably hasn't been requested in a while) is still being used, which suggests some long-running operation on it is underway.  Also I have some logs on this so we can observe if this is happening more often.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797962588



##########
File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's loading or something else", core.getName());
+      }
+      transientCores.put(core.getName(), core); // put back

Review comment:
       In a race scenario with the removalListener: getCore will call `core = solrCores.waitAddPendingCoreOps(name);` which will return the core that was put back (un-evicted).  It will see it and return it without creating a new one.  I don't think there will be errors.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #580: SOLR-15964: Transient cores: don't evict open ones

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797971704



##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2284,8 +2285,10 @@ public void run() {
 
       SolrCore core;
       while (!container.isShutDown() && (core = solrCores.getCoreToClose()) != null) {
+        assert core.getOpenCount() == 1;

Review comment:
       > assert is only enabled in tests usually, right?
   
   Right.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org