You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/11/18 15:59:00 UTC

[GitHub] [solr] risdenk opened a new pull request, #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

risdenk opened a new pull request, #1184:
URL: https://github.com/apache/solr/pull/1184

   https://issues.apache.org/jira/browse/SOLR-16555


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028485895


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {

Review Comment:
   Fixed in 414c8749dcdf2530068014c4ee81a3a518de50bf



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk merged pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk merged PR #1184:
URL: https://github.com/apache/solr/pull/1184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
dsmiley commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028371762


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {

Review Comment:
   As a TODO, okay.  But not essential.  That method isn't used where we are using MutableBitDocSet.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026840953


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array.
+ *

Review Comment:
   Addressed in bdf47db



##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array.
+ *
+ * @since solr 9.2
+ */
+public class MutableBitDocSet extends BitDocSet {

Review Comment:
   Addressed in bdf47db



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
dsmiley commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026647448


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,22 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {

Review Comment:
   As I said, I think we shouldn't do this if equals to 1 since there will be at most one copy any way (sometimes optimizations lead to none).  Furthermore, consider that "answer" may be a SortedIntDocSet and thus upgrading it to a BitSet may be wasteful.



##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,22 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {

Review Comment:
   Come to think of it, if "answer" is an instance of SortedIntDocSet, let's not do this optimization at all; right?



##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,22 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {

Review Comment:
   Also, iff smallestIdx is >= 0 then "end" is effectively one smaller for the purposes of this check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028467946


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {

Review Comment:
   I can go a step further here and explicitly throw UnsupportedOperationException for methods that we don't use and don't plan to support now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026817305


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,28 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {
+      if (answer == null) {
+        answer = getLiveDocSet();
+      }
 
-    // do negative queries first to shrink set size
-    for (int i = 0; i < end; i++) {
-      if (neg[i]) answer = answer.andNot(sets[i]);
-    }
+      // This optimizes for the case where we have more than 2 filters and instead
+      // of copying the bitsets we make one mutable bitset. We only need to do this
+      // for BitDocSet since it clones the backing bitset for andNot and intersection.
+      if (end > 1 && answer instanceof BitDocSet) {

Review Comment:
   the line 1214 is checking if `end > 0` - which was just an expansion of the previous if statement that was there. Previously `end > 0` was checked before updating `answer` only and then the for loops would skip since i = 0 and end = 0 in that case. I wanted it to be clear - if `end <= 0` we don't need to do this whole block.
   
   The end > 1 here is to make sure we have 2 bitsets to work with?
   
   Unless I'm misunderstanding your comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #1184:
URL: https://github.com/apache/solr/pull/1184#issuecomment-1322170003

   Thanks @dsmiley appreciate the reviews. I added a bunch of others to hopefully make sure we get another set of eyes on this. I'll test this change again today with your commit as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
magibney commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028810262


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1190,38 +1192,52 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
       }
 
       Query posQuery = QueryUtils.getAbs(q);
-      sets[end] = getPositiveDocSet(posQuery);
+      DocSet docSet = getPositiveDocSet(posQuery);
       // Negative query if absolute value different from original
       if (Objects.equals(q, posQuery)) {
-        neg[end] = false;
-        // keep track of the smallest positive set.
-        // This optimization is only worth it if size() is cached, which it would
-        // be if we don't do any set operations.
-        int sz = sets[end].size();
-        if (sz < smallestCount) {
-          smallestCount = sz;
-          smallestIndex = end;
-          answer = sets[end];
+        // keep track of the smallest positive set; use "answer" for this.
+        if (answer == null) {
+          answer = docSet;
+          continue;
         }
+        // note: assume that size() is cached.  It generally comes from the cache, so should be.
+        if (docSet.size() < answer.size()) {
+          // swap answer & docSet so that answer is smallest
+          DocSet tmp = answer;
+          answer = docSet;
+          docSet = tmp;
+        }
+        neg[end] = false;
       } else {
         neg[end] = true;
       }
+      sets[end++] = docSet;
+    } // end of queries
 
-      end++;
-    }
+    if (end > 0) {
+      // Are all of our normal cached filters negative?
+      if (answer == null) {
+        answer = getLiveDocSet();
+      }
 
-    // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+      // This optimizes for the case where we have more than 2 filters and instead
+      // of copying the bitsets we make one mutable bitset. We should only do this
+      // for BitDocSet since it clones the backing bitset for andNot and intersection.
+      if (end > 1 && answer instanceof BitDocSet) {
+        answer = MutableBitDocSet.fromBitDocSet((BitDocSet) answer);
+      }
 
-    // do negative queries first to shrink set size
-    for (int i = 0; i < end; i++) {
-      if (neg[i]) answer = answer.andNot(sets[i]);
-    }
+      // do negative queries first to shrink set size
+      for (int i = 0; i < end; i++) {
+        if (neg[i]) answer = answer.andNot(sets[i]);
+      }
+
+      for (int i = 0; i < end; i++) {
+        if (!neg[i]) answer = answer.intersection(sets[i]);
+      }
 
-    for (int i = 0; i < end; i++) {
-      if (!neg[i] && i != smallestIndex) answer = answer.intersection(sets[i]);
+      // Make sure to keep answer as an immutable DocSet if we made it mutable
+      answer = MutableBitDocSet.unwrapIfMutable(answer);
     }
 
     // ignore "answer" if it simply matches all docs

Review Comment:
   Out of scope of this PR, but while looking at this code: will `answer.size()` always be called downstream of this? If so, then it's fine to force size calculation here; if _not_, I think there should be better ways to check this condition than naively calling `answer.size()` after all intersections are done.
   EDIT: to be clear I'm talking about the block below this comment:
   ```java
       if (answer != null && answer.size() == numDocs()) {
         answer = null;
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026817479


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array.
+ *

Review Comment:
   Gah yea good call - I updated the javadoc on the methods but will add that to the class too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028485479


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {
+  private MutableBitDocSet(FixedBitSet bits) {
+    super(bits);
+  }
+
+  /**
+   * Returns a mutable BitDocSet that is a copy of the provided BitDocSet.
+   *
+   * @param bitDocSet a BitDocSet
+   * @return copy of bitDocSet that is now mutable
+   */
+  public static MutableBitDocSet fromBitDocSet(BitDocSet bitDocSet) {
+    return new MutableBitDocSet(bitDocSet.getFixedBitSetClone());
+  }
+
+  /**
+   * Returns a new BitDocSet with the same bits if the DocSet provided is a MutableBitDocSet.
+   * Otherwise, just returns the provided DocSet.
+   *
+   * @param docSet DocSet to unwrap if it is a MutableBitDocSet
+   * @return Unwrapped DocSet that is not mutable
+   */
+  public static DocSet unwrapIfMutable(DocSet docSet) {
+    if (docSet instanceof MutableBitDocSet) {
+      return new BitDocSet(((MutableBitDocSet) docSet).getBits());
+    }
+    return docSet;
+  }
+
+  /**
+   * Returns the documents in this set that are not in the other set. This mutates the underlying
+   * bits so do not cache the returned bitset.
+   *
+   * @return a DocSet representing this AND NOT other
+   */
+  @Override
+  public DocSet andNot(DocSet other) {
+    if (other instanceof BitDocSet) {
+      bits.andNot(((BitDocSet) other).bits);
+    } else {
+      DocIterator iter = other.iterator();

Review Comment:
   Fixed in 26754f8e2f8db5ad6a5535eb10cb73d43d4739d3



##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.

Review Comment:
   Fixed in e4c92289514e91bfdefb727437f6e2a042e0577e



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028633733


##########
solr/core/src/java/org/apache/solr/search/BitDocSet.java:
##########
@@ -194,19 +194,30 @@ public void addAllTo(FixedBitSet target) {
 
   @Override
   public DocSet andNot(DocSet other) {
-    FixedBitSet newbits = bits.clone();
+    return new BitDocSet(this.andNot(getFixedBitSetClone(), other));
+  }
+
+  /**
+   * Helper method for andNot that takes FixedBitSet and DocSet. This returns the resulting bits
+   * andNoted together.
+   *
+   * @param bits bits to operate on
+   * @param other The DocSet to compare to
+   * @return Resulting andNoted FixedBitSet
+   */
+  protected FixedBitSet andNot(FixedBitSet bits, DocSet other) {

Review Comment:
   addressed in a1af3895af00b37b99d738556ce14cd75441f502



##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {
+  private MutableBitDocSet(FixedBitSet bits) {
+    super(bits);
+  }
+
+  /**
+   * Returns a mutable BitDocSet that is a copy of the provided BitDocSet.
+   *
+   * @param bitDocSet a BitDocSet
+   * @return copy of bitDocSet that is now mutable
+   */
+  public static MutableBitDocSet fromBitDocSet(BitDocSet bitDocSet) {
+    return new MutableBitDocSet(bitDocSet.getFixedBitSetClone());
+  }
+
+  /**
+   * Returns a new BitDocSet with the same bits if the DocSet provided is a MutableBitDocSet.
+   * Otherwise, just returns the provided DocSet.
+   *
+   * @param docSet DocSet to unwrap if it is a MutableBitDocSet
+   * @return Unwrapped DocSet that is not mutable
+   */
+  public static DocSet unwrapIfMutable(DocSet docSet) {
+    if (docSet instanceof MutableBitDocSet) {
+      return new BitDocSet(((MutableBitDocSet) docSet).getBits());
+    }
+    return docSet;
+  }
+
+  @Override
+  public DocIterator iterator() {
+    throw new UnsupportedOperationException();
+  }
+
+  /**
+   * Returns the intersection of this set with another set. This mutates the underlying bits so do
+   * not cache the returned bitset.
+   *
+   * @return a DocSet representing the intersection
+   */
+  @Override
+  public DocSet intersection(DocSet other) {
+    // intersection is overloaded in the smaller DocSets to be more
+    // efficient, so dispatch off of it instead.
+    if (!(other instanceof BitDocSet)) {
+      return other.intersection(this);
+    }
+
+    // Default... handle with bitsets.
+    FixedBitSet newbits = getBits();
+    newbits.and(other.getFixedBitSet());
+
+    // We can't return just this since `size` is cached and
+    // we are changing the cardinality of the underlying bits.
+    return new MutableBitDocSet(newbits);
+  }
+
+  @Override
+  public int intersectionSize(DocSet other) {
+    throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public boolean intersects(DocSet other) {
+    throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public int unionSize(DocSet other) {
+    throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public int andNotSize(DocSet other) {
+    throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public void addAllTo(FixedBitSet target) {
+    throw new UnsupportedOperationException();
+  }
+
+  /**
+   * Returns the documents in this set that are not in the other set. This mutates the underlying
+   * bits so do not cache the returned bitset.
+   *
+   * @return a DocSet representing this AND NOT other
+   */
+  @Override
+  public DocSet andNot(DocSet other) {
+    // We can't return just this since `size` is cached and
+    // we are changing the cardinality of the underlying bits.

Review Comment:
   addressed in a1af3895af00b37b99d738556ce14cd75441f502



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026723125


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,22 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {

Review Comment:
   Pushed some of these optimizations in 5e753aa44b848f4e45361864c84be23a0c774edd - I checked if answer is instance of BitDocSet since thats the only case we can optimize easily.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026687927


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,22 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {

Review Comment:
   > Also, iff smallestIdx is >= 0 then "end" is effectively one smaller for the purposes of this check.
   
   not sure I understand this comment. I'll dig in further. The other changes I'll push in a few minutes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026817305


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,28 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {
+      if (answer == null) {
+        answer = getLiveDocSet();
+      }
 
-    // do negative queries first to shrink set size
-    for (int i = 0; i < end; i++) {
-      if (neg[i]) answer = answer.andNot(sets[i]);
-    }
+      // This optimizes for the case where we have more than 2 filters and instead
+      // of copying the bitsets we make one mutable bitset. We only need to do this
+      // for BitDocSet since it clones the backing bitset for andNot and intersection.
+      if (end > 1 && answer instanceof BitDocSet) {

Review Comment:
   the line 1214 is checking if `end > 0` - which was just an expansion of the previous if statement that was there. Previously `end > 0` was checked before updating `answer` only and then the for loops would skip since i = 0 and end = 0 in that case. I wanted it to be clear - if `end <= 0` we don't need to do this whole block.
   
   The end > 1 here is to make sure we have 2 bitsets to work with?
   
   Unless I'm misunderstanding your command.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026722477


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+public class MutableBitDocSet extends BitDocSet {
+  public MutableBitDocSet(FixedBitSet bits) {
+    super(bits);
+  }
+
+  @Override
+  public DocSet andNot(DocSet other) {
+    if (other instanceof BitDocSet) {
+      bits.andNot(((BitDocSet) other).bits);
+    } else {
+      DocIterator iter = other.iterator();
+      while (iter.hasNext()) {
+        int doc = iter.nextDoc();
+        if (doc < bits.length()) {
+          bits.clear(doc);
+        }
+      }
+    }
+    return new MutableBitDocSet(bits);

Review Comment:
   addressed in bc97db7183cedabbb7615c081298f45f34750bb1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #1184:
URL: https://github.com/apache/solr/pull/1184#issuecomment-1320600677

   So I reran this update especially after f1fe35c but included the review changes as well.
   
   Memory allocation is no longer dominated by getProcessedFilter. I'd share screenshots, but custom scorers now dominate the flamegraph after cleaning this up.
   
   Some performance numbers from a cluster test:
   * median latency baseline from prod - something like 23ms. iteration prior to f1fe35c -> 18ms. second iteration after f1fe35c and cleanup -> 11ms
   * p75 in the 16ms range vs ~60ms range from prod
   * p99 in the 100ms range vs ~325ms range from prod
   * prod was hitting ~75% cpu. this is 30% CPU
   * GC counts were in the 250 range for prod at ~2.25s. during this test maxing around 75 range and around 500ms max
   * total estimated memory allocation in prod for 5 min period - 1.5TB. total memory allocated in test during 5 min period = 356GB
   * ~50% reduction in latency in ms across the board for percentiles under 99.99%.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #1184:
URL: https://github.com/apache/solr/pull/1184#issuecomment-1320218975

   @dsmiley here is the `MutableBitDocSet` version. I'll consolidate down to one PR once I do some testing and get some perf numbers. I just pushed it up separate to get some eyes on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #1184:
URL: https://github.com/apache/solr/pull/1184#issuecomment-1323759792

   Thanks @madrob and @magibney for the reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
dsmiley commented on PR #1184:
URL: https://github.com/apache/solr/pull/1184#issuecomment-1324121770

   Awesome perf analysis by the way!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
magibney commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028526100


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {
+  private MutableBitDocSet(FixedBitSet bits) {
+    super(bits);
+  }
+
+  /**
+   * Returns a mutable BitDocSet that is a copy of the provided BitDocSet.
+   *
+   * @param bitDocSet a BitDocSet
+   * @return copy of bitDocSet that is now mutable
+   */
+  public static MutableBitDocSet fromBitDocSet(BitDocSet bitDocSet) {
+    return new MutableBitDocSet(bitDocSet.getFixedBitSetClone());
+  }
+
+  /**
+   * Returns a new BitDocSet with the same bits if the DocSet provided is a MutableBitDocSet.
+   * Otherwise, just returns the provided DocSet.
+   *
+   * @param docSet DocSet to unwrap if it is a MutableBitDocSet
+   * @return Unwrapped DocSet that is not mutable
+   */
+  public static DocSet unwrapIfMutable(DocSet docSet) {
+    if (docSet instanceof MutableBitDocSet) {
+      return new BitDocSet(((MutableBitDocSet) docSet).getBits());
+    }
+    return docSet;
+  }
+
+  @Override
+  public DocIterator iterator() {
+    throw new UnsupportedOperationException();
+  }
+
+  /**
+   * Returns the intersection of this set with another set. This mutates the underlying bits so do
+   * not cache the returned bitset.
+   *
+   * @return a DocSet representing the intersection
+   */
+  @Override
+  public DocSet intersection(DocSet other) {
+    // intersection is overloaded in the smaller DocSets to be more
+    // efficient, so dispatch off of it instead.
+    if (!(other instanceof BitDocSet)) {
+      return other.intersection(this);
+    }
+
+    // Default... handle with bitsets.
+    FixedBitSet newbits = getBits();
+    newbits.and(other.getFixedBitSet());
+
+    // We can't return just this since `size` is cached and
+    // we are changing the cardinality of the underlying bits.
+    return new MutableBitDocSet(newbits);
+  }
+
+  @Override
+  public int intersectionSize(DocSet other) {
+    throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public boolean intersects(DocSet other) {
+    throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public int unionSize(DocSet other) {
+    throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public int andNotSize(DocSet other) {
+    throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public void addAllTo(FixedBitSet target) {
+    throw new UnsupportedOperationException();
+  }
+
+  /**
+   * Returns the documents in this set that are not in the other set. This mutates the underlying
+   * bits so do not cache the returned bitset.
+   *
+   * @return a DocSet representing this AND NOT other
+   */
+  @Override
+  public DocSet andNot(DocSet other) {
+    // We can't return just this since `size` is cached and
+    // we are changing the cardinality of the underlying bits.

Review Comment:
   True; but doesn't that potentially leave `this` in an inconsistent state -- with a cached `size` that doesn't match the size according to the underlying bits? Shouldn't we clear/reset size to `-1`? And, if we do that, could we then just return `this`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1029373164


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1190,38 +1192,52 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
       }
 
       Query posQuery = QueryUtils.getAbs(q);
-      sets[end] = getPositiveDocSet(posQuery);
+      DocSet docSet = getPositiveDocSet(posQuery);
       // Negative query if absolute value different from original
       if (Objects.equals(q, posQuery)) {
-        neg[end] = false;
-        // keep track of the smallest positive set.
-        // This optimization is only worth it if size() is cached, which it would
-        // be if we don't do any set operations.
-        int sz = sets[end].size();
-        if (sz < smallestCount) {
-          smallestCount = sz;
-          smallestIndex = end;
-          answer = sets[end];
+        // keep track of the smallest positive set; use "answer" for this.
+        if (answer == null) {
+          answer = docSet;
+          continue;
         }
+        // note: assume that size() is cached.  It generally comes from the cache, so should be.
+        if (docSet.size() < answer.size()) {
+          // swap answer & docSet so that answer is smallest
+          DocSet tmp = answer;
+          answer = docSet;
+          docSet = tmp;
+        }
+        neg[end] = false;
       } else {
         neg[end] = true;
       }
+      sets[end++] = docSet;
+    } // end of queries
 
-      end++;
-    }
+    if (end > 0) {
+      // Are all of our normal cached filters negative?
+      if (answer == null) {
+        answer = getLiveDocSet();
+      }
 
-    // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+      // This optimizes for the case where we have more than 2 filters and instead
+      // of copying the bitsets we make one mutable bitset. We should only do this
+      // for BitDocSet since it clones the backing bitset for andNot and intersection.
+      if (end > 1 && answer instanceof BitDocSet) {
+        answer = MutableBitDocSet.fromBitDocSet((BitDocSet) answer);
+      }
 
-    // do negative queries first to shrink set size
-    for (int i = 0; i < end; i++) {
-      if (neg[i]) answer = answer.andNot(sets[i]);
-    }
+      // do negative queries first to shrink set size
+      for (int i = 0; i < end; i++) {
+        if (neg[i]) answer = answer.andNot(sets[i]);
+      }
+
+      for (int i = 0; i < end; i++) {
+        if (!neg[i]) answer = answer.intersection(sets[i]);
+      }
 
-    for (int i = 0; i < end; i++) {
-      if (!neg[i] && i != smallestIndex) answer = answer.intersection(sets[i]);
+      // Make sure to keep answer as an immutable DocSet if we made it mutable
+      answer = MutableBitDocSet.unwrapIfMutable(answer);
     }
 
     // ignore "answer" if it simply matches all docs

Review Comment:
   so in theory you are probably correct. in practice don't plan on tackling this here :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1029371826


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {
+  private MutableBitDocSet(FixedBitSet bits) {
+    super(bits);
+  }
+
+  /**
+   * Returns a mutable BitDocSet that is a copy of the provided BitDocSet.
+   *
+   * @param bitDocSet a BitDocSet
+   * @return copy of bitDocSet that is now mutable
+   */
+  public static MutableBitDocSet fromBitDocSet(BitDocSet bitDocSet) {
+    return new MutableBitDocSet(bitDocSet.getFixedBitSetClone());

Review Comment:
   fixed in 7a7b731da3d925af80decb435ff6461e30bae00e - decided to just pass through size.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026682252


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,22 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {

Review Comment:
   Yea these are all good thoughts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
dsmiley commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026822293


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,22 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {

Review Comment:
   RE smallestIdx that I was saying, the logic here is confusing.  I think a refactor is in order to make this clearer.  I can do a push to your branch doing that, and thus adopting a small improvement here.  Without _something_, end could easily be equal to 1 and smallestIdx == 0 which is a scenario where we should do nothing because none of the pos/neg loops will do any work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
dsmiley commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028371762


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {

Review Comment:
   As a TODO, okay.  But not essential.  That method isn't used where we are using MutableBitDocSet.  
   To be clear, if we were to implement it, it would be dead code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #1184:
URL: https://github.com/apache/solr/pull/1184#issuecomment-1320455964

   I ran this through some benchmarks - same one as the screenshots on SOLR-16555 and got the following:
   
   <img width="1449" alt="Screenshot 2022-11-18 at 14 44 07" src="https://user-images.githubusercontent.com/3384157/202789352-1fb6af4c-28f3-44df-bc16-d506b177ca2e.png">
   <img width="792" alt="Screenshot 2022-11-18 at 14 44 24" src="https://user-images.githubusercontent.com/3384157/202789355-10977730-3713-4b79-83ee-69d74703ddf2.png">
   
   I found that `[FixedBitSet.ensureCapacity](https://github.com/apache/solr/pull/1184/commits/f1fe35ca91a9a219dcadc29186a963e8f665e5b5)` isn't even needed so can skip that too. I pushed that commit [f1fe35c](https://github.com/apache/solr/pull/1184/commits/f1fe35ca91a9a219dcadc29186a963e8f665e5b5) a few minutes ago.
   
   I'll update based on David's comments in a few minutes as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026961673


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1190,38 +1187,53 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
       }
 
       Query posQuery = QueryUtils.getAbs(q);
-      sets[end] = getPositiveDocSet(posQuery);
+      DocSet docSet = getPositiveDocSet(posQuery);
       // Negative query if absolute value different from original
       if (Objects.equals(q, posQuery)) {
-        neg[end] = false;
-        // keep track of the smallest positive set.
+        // keep track of the smallest positive set; use "answer" for this.
         // This optimization is only worth it if size() is cached, which it would
         // be if we don't do any set operations.

Review Comment:
   I think this comment can move down to just above line 1200



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] madrob commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
madrob commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028297158


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.

Review Comment:
   Can we add a java assert to some of the cache paths that verify it's not a MutableBitDocSet? It feels like an easy mistake to make in the future.



##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {
+  private MutableBitDocSet(FixedBitSet bits) {
+    super(bits);
+  }
+
+  /**
+   * Returns a mutable BitDocSet that is a copy of the provided BitDocSet.
+   *
+   * @param bitDocSet a BitDocSet
+   * @return copy of bitDocSet that is now mutable
+   */
+  public static MutableBitDocSet fromBitDocSet(BitDocSet bitDocSet) {
+    return new MutableBitDocSet(bitDocSet.getFixedBitSetClone());
+  }
+
+  /**
+   * Returns a new BitDocSet with the same bits if the DocSet provided is a MutableBitDocSet.
+   * Otherwise, just returns the provided DocSet.
+   *
+   * @param docSet DocSet to unwrap if it is a MutableBitDocSet
+   * @return Unwrapped DocSet that is not mutable
+   */
+  public static DocSet unwrapIfMutable(DocSet docSet) {
+    if (docSet instanceof MutableBitDocSet) {
+      return new BitDocSet(((MutableBitDocSet) docSet).getBits());
+    }
+    return docSet;
+  }
+
+  /**
+   * Returns the documents in this set that are not in the other set. This mutates the underlying
+   * bits so do not cache the returned bitset.
+   *
+   * @return a DocSet representing this AND NOT other
+   */
+  @Override
+  public DocSet andNot(DocSet other) {
+    if (other instanceof BitDocSet) {
+      bits.andNot(((BitDocSet) other).bits);
+    } else {
+      DocIterator iter = other.iterator();

Review Comment:
   This is mostly copied from BitDocSet, right? It feels like there could be a static method `andNot(DocSet left, DocSet right)` and this class calls `andNot(this.bits, other)` while BitDocSet calls `andNot(this.bits.clone(), other)` and we have the logic in one place.



##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {

Review Comment:
   We should also implement `union` for completeness.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026840803


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,28 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {
+      if (answer == null) {
+        answer = getLiveDocSet();
+      }
 
-    // do negative queries first to shrink set size
-    for (int i = 0; i < end; i++) {
-      if (neg[i]) answer = answer.andNot(sets[i]);
-    }
+      // This optimizes for the case where we have more than 2 filters and instead
+      // of copying the bitsets we make one mutable bitset. We only need to do this
+      // for BitDocSet since it clones the backing bitset for andNot and intersection.
+      if (end > 1 && answer instanceof BitDocSet) {
+        FixedBitSet fixedBitSet =
+            FixedBitSet.ensureCapacity(answer.getFixedBitSetClone(), maxDoc());
+        answer = new MutableBitDocSet(fixedBitSet);

Review Comment:
   Addressed in bdf47db



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026607957


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+public class MutableBitDocSet extends BitDocSet {
+  public MutableBitDocSet(FixedBitSet bits) {
+    super(bits);
+  }
+
+  @Override
+  public DocSet andNot(DocSet other) {
+    if (other instanceof BitDocSet) {
+      bits.andNot(((BitDocSet) other).bits);
+    } else {
+      DocIterator iter = other.iterator();
+      while (iter.hasNext()) {
+        int doc = iter.nextDoc();
+        if (doc < bits.length()) {
+          bits.clear(doc);
+        }
+      }
+    }
+    return new MutableBitDocSet(bits);

Review Comment:
   Need comment - can't return `this` since `BitSetDoc` stores size cached. Easier to just create new lightweight object.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
dsmiley commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1026790611


##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array.
+ *

Review Comment:
   Lets warn that this ought not to be cached, because it might inadvertently be modified.



##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array.
+ *
+ * @since solr 9.2
+ */
+public class MutableBitDocSet extends BitDocSet {

Review Comment:
   Let's not make this public.



##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,28 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {
+      if (answer == null) {
+        answer = getLiveDocSet();
+      }
 
-    // do negative queries first to shrink set size
-    for (int i = 0; i < end; i++) {
-      if (neg[i]) answer = answer.andNot(sets[i]);
-    }
+      // This optimizes for the case where we have more than 2 filters and instead
+      // of copying the bitsets we make one mutable bitset. We only need to do this
+      // for BitDocSet since it clones the backing bitset for andNot and intersection.
+      if (end > 1 && answer instanceof BitDocSet) {

Review Comment:
   at this spot, end is guaranteed to be greater than 1 based on you putting a big if-else around this section (line 1214); right?



##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1211,17 +1211,28 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
     }
 
     // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+    if (end > 0) {
+      if (answer == null) {
+        answer = getLiveDocSet();
+      }
 
-    // do negative queries first to shrink set size
-    for (int i = 0; i < end; i++) {
-      if (neg[i]) answer = answer.andNot(sets[i]);
-    }
+      // This optimizes for the case where we have more than 2 filters and instead
+      // of copying the bitsets we make one mutable bitset. We only need to do this
+      // for BitDocSet since it clones the backing bitset for andNot and intersection.
+      if (end > 1 && answer instanceof BitDocSet) {
+        FixedBitSet fixedBitSet =
+            FixedBitSet.ensureCapacity(answer.getFixedBitSetClone(), maxDoc());
+        answer = new MutableBitDocSet(fixedBitSet);

Review Comment:
   I suggest adding a small static utility method on MutableBitDocSet `fromBitDocSet` that takes only BitDocSet and maxDoc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
magibney commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028792927


##########
solr/core/src/java/org/apache/solr/search/BitDocSet.java:
##########
@@ -194,19 +194,31 @@ public void addAllTo(FixedBitSet target) {
 
   @Override
   public DocSet andNot(DocSet other) {
-    FixedBitSet newbits = bits.clone();
+    FixedBitSet newbits = getFixedBitSetClone();
+    andNot(newbits, other);
+    return new BitDocSet(newbits);
+  }
+
+  /**
+   * Helper method for andNot that takes FixedBitSet and DocSet. This modifies the provided
+   * FixedBitSet to remove all bits contained in the DocSet argument -- equivalent to calling
+   * a.andNot(b), but modifies the state of the FixedBitSet instead of returning a new FixedBitSet.
+   *
+   * @param bits FixedBitSet to operate on
+   * @param other The DocSet to compare to
+   */
+  protected static void andNot(FixedBitSet bits, DocSet other) {
     if (other instanceof BitDocSet) {
-      newbits.andNot(((BitDocSet) other).bits);
+      bits.andNot(((BitDocSet) other).bits);
     } else {
       DocIterator iter = other.iterator();
       while (iter.hasNext()) {
         int doc = iter.nextDoc();
-        if (doc < newbits.length()) {
-          newbits.clear(doc);
+        if (doc < bits.length()) {
+          bits.clear(doc);

Review Comment:
   it occurs to me: now that this method no longer returns the FixedBitSet, it could easily return `boolean`, something along the lines of `true: may have been modified`, `false: definitely not modified`, detecting the latter for the non-BitDocSet case by calling `bits.getAndClear(int)`. Not sure it'd be worth it, but thought it might be worth mentioning ... (could use `false` return value to avoid resetting size, which I _think_ actually might make.a difference for some use cases: e.g., a cached positive filter and small (non-BitDocSet) negative filters that don't actually exclude anything). 



##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits for andNot and
+ * intersection. This allows for computing the combinations of sets without duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {
+  private MutableBitDocSet(FixedBitSet bits) {
+    super(bits);
+  }
+
+  /**
+   * Returns a mutable BitDocSet that is a copy of the provided BitDocSet.
+   *
+   * @param bitDocSet a BitDocSet
+   * @return copy of bitDocSet that is now mutable
+   */
+  public static MutableBitDocSet fromBitDocSet(BitDocSet bitDocSet) {
+    return new MutableBitDocSet(bitDocSet.getFixedBitSetClone());

Review Comment:
   It will often (indeed, usually?) be the case that we have `size` already calculated on the input `bitDocSet`. It could be worth passing this along using the `BitDocSet(FixedBitSet bits, int size)` ctor? This would only have a practical benefit in the case where you have one positive filter and a bunch of small negative filters that don't actually modify the set (in conjunction with the suggestion to make `andNot(FixedBitSet bits, DocSet other)` return boolean). But it costs nothing to preserve this information if it already exists.



##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1190,38 +1192,52 @@ public ProcessedFilter getProcessedFilter(DocSet setFilter, List<Query> queries)
       }
 
       Query posQuery = QueryUtils.getAbs(q);
-      sets[end] = getPositiveDocSet(posQuery);
+      DocSet docSet = getPositiveDocSet(posQuery);
       // Negative query if absolute value different from original
       if (Objects.equals(q, posQuery)) {
-        neg[end] = false;
-        // keep track of the smallest positive set.
-        // This optimization is only worth it if size() is cached, which it would
-        // be if we don't do any set operations.
-        int sz = sets[end].size();
-        if (sz < smallestCount) {
-          smallestCount = sz;
-          smallestIndex = end;
-          answer = sets[end];
+        // keep track of the smallest positive set; use "answer" for this.
+        if (answer == null) {
+          answer = docSet;
+          continue;
         }
+        // note: assume that size() is cached.  It generally comes from the cache, so should be.
+        if (docSet.size() < answer.size()) {
+          // swap answer & docSet so that answer is smallest
+          DocSet tmp = answer;
+          answer = docSet;
+          docSet = tmp;
+        }
+        neg[end] = false;
       } else {
         neg[end] = true;
       }
+      sets[end++] = docSet;
+    } // end of queries
 
-      end++;
-    }
+    if (end > 0) {
+      // Are all of our normal cached filters negative?
+      if (answer == null) {
+        answer = getLiveDocSet();
+      }
 
-    // Are all of our normal cached filters negative?
-    if (end > 0 && answer == null) {
-      answer = getLiveDocSet();
-    }
+      // This optimizes for the case where we have more than 2 filters and instead
+      // of copying the bitsets we make one mutable bitset. We should only do this
+      // for BitDocSet since it clones the backing bitset for andNot and intersection.
+      if (end > 1 && answer instanceof BitDocSet) {
+        answer = MutableBitDocSet.fromBitDocSet((BitDocSet) answer);
+      }
 
-    // do negative queries first to shrink set size
-    for (int i = 0; i < end; i++) {
-      if (neg[i]) answer = answer.andNot(sets[i]);
-    }
+      // do negative queries first to shrink set size
+      for (int i = 0; i < end; i++) {
+        if (neg[i]) answer = answer.andNot(sets[i]);
+      }
+
+      for (int i = 0; i < end; i++) {
+        if (!neg[i]) answer = answer.intersection(sets[i]);
+      }
 
-    for (int i = 0; i < end; i++) {
-      if (!neg[i] && i != smallestIndex) answer = answer.intersection(sets[i]);
+      // Make sure to keep answer as an immutable DocSet if we made it mutable
+      answer = MutableBitDocSet.unwrapIfMutable(answer);
     }
 
     // ignore "answer" if it simply matches all docs

Review Comment:
   Out of scope of this PR, but while looking at this code: will `answer.size()` always be called downstream of this? If so, then it's fine to force size calculation here; if _not_, I think there should be better ways to check this condition than naively calling `answer.size()` after all intersections are done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1029359273


##########
solr/core/src/java/org/apache/solr/search/BitDocSet.java:
##########
@@ -194,19 +194,31 @@ public void addAllTo(FixedBitSet target) {
 
   @Override
   public DocSet andNot(DocSet other) {
-    FixedBitSet newbits = bits.clone();
+    FixedBitSet newbits = getFixedBitSetClone();
+    andNot(newbits, other);
+    return new BitDocSet(newbits);
+  }
+
+  /**
+   * Helper method for andNot that takes FixedBitSet and DocSet. This modifies the provided
+   * FixedBitSet to remove all bits contained in the DocSet argument -- equivalent to calling
+   * a.andNot(b), but modifies the state of the FixedBitSet instead of returning a new FixedBitSet.
+   *
+   * @param bits FixedBitSet to operate on
+   * @param other The DocSet to compare to
+   */
+  protected static void andNot(FixedBitSet bits, DocSet other) {
     if (other instanceof BitDocSet) {
-      newbits.andNot(((BitDocSet) other).bits);
+      bits.andNot(((BitDocSet) other).bits);
     } else {
       DocIterator iter = other.iterator();
       while (iter.hasNext()) {
         int doc = iter.nextDoc();
-        if (doc < newbits.length()) {
-          newbits.clear(doc);
+        if (doc < bits.length()) {
+          bits.clear(doc);

Review Comment:
   eh not sure it would be worth it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on a diff in pull request #1184: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly

Posted by GitBox <gi...@apache.org>.
magibney commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028520036


##########
solr/core/src/java/org/apache/solr/search/BitDocSet.java:
##########
@@ -194,19 +194,30 @@ public void addAllTo(FixedBitSet target) {
 
   @Override
   public DocSet andNot(DocSet other) {
-    FixedBitSet newbits = bits.clone();
+    return new BitDocSet(this.andNot(getFixedBitSetClone(), other));
+  }
+
+  /**
+   * Helper method for andNot that takes FixedBitSet and DocSet. This returns the resulting bits
+   * andNoted together.
+   *
+   * @param bits bits to operate on
+   * @param other The DocSet to compare to
+   * @return Resulting andNoted FixedBitSet
+   */
+  protected FixedBitSet andNot(FixedBitSet bits, DocSet other) {

Review Comment:
   nit: this could be a static method?
   
   Also, a couple of minor issues with javadoc wording:
   1. it could be clearer that this method modifies the docset provided as the first arg and returns that same object (making this a static method might help clarify that point).
   2. "andNoted together" implies that "andNot" is a symmetric operation, which of course it's not. Maybe something like "modifies the first arg input BitDocSet to remove all bits contained in the second DocSet arg -- equivalent to calling `a.andNot(b)`, but modifies the state of the base docset instead of returning a new docSet"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org