You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/02/05 21:42:25 UTC
[GitHub] [solr] dsmiley commented on a change in pull request #592: SOLR-14765: Optimize DocList creation by skipping sort for sort-irrelevant cases

dsmiley commented on a change in pull request #592:
URL: https://github.com/apache/solr/pull/592#discussion_r800100152



##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -854,8 +868,58 @@ private DocSet getAndCacheDocSet(Query query) throws IOException {
     return filterCache.computeIfAbsent(query, q -> getDocSetNC(q, null));
   }
 
-  private static Query matchAllDocsQuery = new MatchAllDocsQuery();
+  private static final Query MATCH_ALL_DOCS_QUERY = new MatchAllDocsQuery();
   private volatile BitDocSet liveDocs;
+  private final FutureTask<BitDocSet> liveDocsFuture = new FutureTask<>(this::computeLiveDocs);
+
+  private BitDocSet computeLiveDocs() {

Review comment:
       So you wrote all this low level code to avoid a simple MatchAllDocsQuery with a DocSetCollector? See org.apache.solr.search.SolrIndexSearcher.getDocSetNC which calls DocSetUtil.  Even if we choose to keep your low level code, I'd propose it go to a special case inside of DocSetUtil when the query is MatchAllDocsQuery.

##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -1299,6 +1385,47 @@ public DocList getDocList(Query query, List<Query> filterList, Sort lsort, int o
   public static final int GET_DOCLIST = 0x02; // get the documents actually returned in a response
   public static final int GET_SCORES = 0x01;
 
+  private static boolean isConstantScoreQuery(Query q) {

Review comment:
       I've seen a need for this elsewhere.  Let's make it handle other cases too (e.g. MatchAllDocsQuery, MatchNoDocsQuery, "Filter" (soon to be DocSetQuery)) and be recursive to the cases below (hey why not).  This could move to QueryUtils rather than having the massive SolrIndexSearcher grow

##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -865,9 +929,31 @@ public BitDocSet getLiveDocSet() throws IOException {
     // Going through the filter cache will provide thread safety here if we only had getLiveDocs,
     // but the addition of setLiveDocs means we needed to add volatile to "liveDocs".
     BitDocSet docs = liveDocs;
-    if (docs == null) {
-      //note: maybe should instead calc manually by segment, using FixedBitSet.copyOf(segLiveDocs); avoid filter cache?
-      liveDocs = docs = getDocSetBits(matchAllDocsQuery);
+    if (docs != null) {
+      matchAllDocsCacheHitCount.incrementAndGet();
+    } else {
+      if (matchAllDocsCacheComputationTracker.compareAndSet(Long.MIN_VALUE, 0)) {
+        // run the initial/only/final future inline
+        // This thread will block execution here and `liveDocsFuture.get()` (below) should then return immediately
+        liveDocsFuture.run();
+      } else {
+        // another thread has already called `computeLiveDocs.run()`; this thread will block on
+        // `liveDocsFuture.get()` (below)
+        if (matchAllDocsCacheComputationTracker.getAndIncrement() < 0) {

Review comment:
       The Future + AtomicLong thing looks overly complicated; maybe I'm not getting why it needs to be so complicated.  Can we have a simple double-check lock with synchronized to ensure only one creation happens?  We've all seen this pattern many times, I'm sure.  We don't even need the liveDocs to be volatile since it's an immutable object, not unlike a String, except for the lazy populated size but that doesn't matter as it's idempotent.

##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -2281,6 +2455,11 @@ public void initializeMetrics(SolrMetricsContext parentContext, String scope) {
     parentContext.gauge(() -> openTime, true, "openedAt", Category.SEARCHER.toString(), scope);
     parentContext.gauge(() -> warmupTime, true, "warmupTime", Category.SEARCHER.toString(), scope);
     parentContext.gauge(() -> registerTime, true, "registeredAt", Category.SEARCHER.toString(), scope);
+    parentContext.gauge(fullSortCount::get, true, "fullSortCount", Category.SEARCHER.toString(), scope);
+    parentContext.gauge(skipSortCount::get, true, "skipSortCount", Category.SEARCHER.toString(), scope);
+    parentContext.gauge(matchAllDocsCacheConsultationCount::get, true, "matchAllDocsCacheConsultationCount", Category.SEARCHER.toString(), scope);
+    parentContext.gauge(matchAllDocsCacheHitCount::get, true, "matchAllDocsCacheHitCount", Category.SEARCHER.toString(), scope);
+    parentContext.gauge(matchAllDocsCacheComputationTracker::get, true, "matchAllDocsCacheComputationTracker", Category.SEARCHER.toString(), scope);

Review comment:
       Is there value in publishing that "Tracker"?  It doesn't even sound like a metric.
   
   If we want these metrics, perhaps we should instead have a one-element Cache, which already publishes metrics and in a standard way across other Caches.  This is just an idea... if it's not easy then nevermind.
   

##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -854,8 +868,58 @@ private DocSet getAndCacheDocSet(Query query) throws IOException {
     return filterCache.computeIfAbsent(query, q -> getDocSetNC(q, null));
   }
 
-  private static Query matchAllDocsQuery = new MatchAllDocsQuery();
+  private static final Query MATCH_ALL_DOCS_QUERY = new MatchAllDocsQuery();
   private volatile BitDocSet liveDocs;
+  private final FutureTask<BitDocSet> liveDocsFuture = new FutureTask<>(this::computeLiveDocs);
+
+  private BitDocSet computeLiveDocs() {
+    switch (leafContexts.size()) {
+      case 0:
+        assert numDocs() == 0;
+        return new BitDocSet(DocSet.empty().getFixedBitSet(), 0);

Review comment:
       This looks round-about.  Just return `new FixedBitSet(0);`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org