You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/11/17 21:23:27 UTC

[GitHub] [solr] risdenk opened a new pull request, #1183: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

risdenk opened a new pull request, #1183:
URL: https://github.com/apache/solr/pull/1183

   https://issues.apache.org/jira/browse/SOLR-16555


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on pull request #1183: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
dsmiley commented on PR #1183:
URL: https://github.com/apache/solr/pull/1183#issuecomment-1319255367

   Just to be clear, this is a highly performance critical code path.  Sadly we have no benchmarks of this that I know of but at least there's peer review.  Once the solution looks more work-able, I'll at-mention some more eye-balls.  Maybe you might consider adding to /solr/benchmark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #1183: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #1183:
URL: https://github.com/apache/solr/pull/1183#issuecomment-1319276769

   From David on Jira an idea to explore:
   > Create a subclass of BitDocSet called MutableBitDocSet. Its intersect() and andNot() methods can return itself, and be implemented that way. In getProcessedFilter, before those two tight loops that call those two methods, upgrade/replace "answer" with MutableBitDocSet if end (the number of combinations to do) is greater than 1. If it's 1 or 0 then there's no point in the upgrade.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #1183: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #1183:
URL: https://github.com/apache/solr/pull/1183#issuecomment-1320309454

   This is inefficient as David pointed out and caused timeouts on a bunch of queries during testing. I was hoping to at least capture some profile from it but didn't last long enough to do that. https://github.com/apache/solr/pull/1184 is next iteration using a `MutableBitSet` as David pointed out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk closed pull request #1183: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk closed pull request #1183: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly
URL: https://github.com/apache/solr/pull/1183


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1183: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1183:
URL: https://github.com/apache/solr/pull/1183#discussion_r1025766090


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,42 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {
+      if (answer == null) {
+        answer = getLiveDocSet();
+      }
 
-    // do negative queries first to shrink set size
-    for (int i = 0; i < end; i++) {
-      if (neg[i]) answer = answer.andNot(sets[i]);
-    }
+      // min size must be maxDoc based on BitDocSet conversion at the end
+      FixedBitSet tempAnswer = FixedBitSet.ensureCapacity(answer.getFixedBitSetClone(), maxDoc());
+
+      // do negative queries first to shrink set size
+      for (int i = 0; i < end; i++) {
+        if (neg[i]) {
+          // Clear any documents that are in the negative filter query
+          DocIterator iterator = sets[i].iterator();
+          while (iterator.hasNext()) {
+            int doc = iterator.nextDoc();
+            tempAnswer.clear(doc);
+          }
+        }
+      }
+
+      for (int i = 0; i < end; i++) {
+        if (!neg[i] && i != smallestIndex) {
+          DocIterator iterator = sets[i].iterator();

Review Comment:
   I need to add a comment around why not using `answer.intersection(sets[i])`



##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,42 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {
+      if (answer == null) {
+        answer = getLiveDocSet();
+      }
 
-    // do negative queries first to shrink set size
-    for (int i = 0; i < end; i++) {
-      if (neg[i]) answer = answer.andNot(sets[i]);
-    }
+      // min size must be maxDoc based on BitDocSet conversion at the end
+      FixedBitSet tempAnswer = FixedBitSet.ensureCapacity(answer.getFixedBitSetClone(), maxDoc());
+
+      // do negative queries first to shrink set size
+      for (int i = 0; i < end; i++) {
+        if (neg[i]) {
+          // Clear any documents that are in the negative filter query

Review Comment:
   I need to add a comment around why not using `answer.andNot(sets[i])`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #1183: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #1183:
URL: https://github.com/apache/solr/pull/1183#issuecomment-1319266439

   > Instead of bulk bitSet operations, you're iterating doc-by-doc! For small sets (SortedIntDocSet), this is fine, but for bigger ones -- no.
   
   @dsmiley yea not saying this is the final version at all. I'm actually curious on testing this with some benchmarking stuff to see how much the memory allocations slowed this down. This approach requiring more CPU, but less memory. I would not disagree there is more performance using the bulk bitset operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org