You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/05/31 20:58:16 UTC

[GitHub] [lucene] zhaih opened a new pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

zhaih opened a new pull request #163:
URL: https://github.com/apache/lucene/pull/163


   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Lucene:
   
   * https://issues.apache.org/jira/projects/LUCENE
   
   You will need to create an account in Jira in order to create an issue.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * LUCENE-####: <short description of problem or changes>
   
   LUCENE must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   In `determinize` function (which truns NFA to DFA), we use `SortedIntSet` to store powerset and it uses a sorted data structure which could be very costly in some cases. And we actually don't need to keep the set always sorted.
   
   # Solution
   
   Replace usage of `SortedIntSet` by `StateSet`, a wrapper of `IntIntHashMap`. Note I haven't removed `SortedIntSet` but probably we should since it is only used in `determinize`?
   
   # Tests
    * Reused old tests that used to test `SortedIntSet`
    * I also applied the patch attached in [LUCENE-9981](https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-9981). Before the commit it runs 6min before throw an exception and after this commit it is 16 sec.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on pull request #163:
URL: https://github.com/apache/lucene/pull/163#issuecomment-864214067


   > Thanks @zhaih this looks great -- I think it's ready! Have you confirmed that `gradle check` passes?
   
   Thank you for reviewing it! Yeah `gradle check` passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r643959663



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {
+    if (changed == false) {
+      return;
+    }
+    hashCode = inner.size();
+    for (IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * <p>It is the caller's responsibility to ensure that the hashCode and data are up to date via
+   * the {@link #computeHash()} method before calling this method.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    if (changed == false) {
+      assert arrayCache != null;

Review comment:
       bq. why hasn't this been "published in a public revision" yet :)
   
   Don't know. Life gets in the way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r642719501



##########
File path: lucene/core/build.gradle
##########
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
       Oh I see, we might need to modify the source so that we won't bring in a whole interface set used in HPPC. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r642718657



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;

Review comment:
       Ah yeah that's a good idea, I forgot to integrate it here, will do.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand merged pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
mikemccand merged pull request #163:
URL: https://github.com/apache/lucene/pull/163


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] bruno-roustant commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r644575732



##########
File path: lucene/core/build.gradle
##########
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
       Yes removal is fast.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on pull request #163:
URL: https://github.com/apache/lucene/pull/163#issuecomment-853538280


   Thank you @mikemccand @dweiss and @bruno-roustant all for reviewing this PR! Since this PR is more of an optimization for adversarial cases so we don't want to sacrifice performance of our normal use cases. I will take some time to benchmark those changes (this one as well as few others that are not yet presented) first and see what numbers we'll get to decide how to move on with this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r642819034



##########
File path: lucene/core/build.gradle
##########
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
       Feel free to cut and trim to your will. This is exactly why it's licensed the way it is. @bruno-roustant came up with some clever new hashing improvements recently - these are not published as a public revision but you can get them from the repository and compile it locally. See this for details:
   
   https://issues.carrot2.org/browse/HPPC-176




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r642718504



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {
+    if (changed == false) {
+      return;
+    }
+    hashCode = inner.size();
+    for (IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * <p>It is the caller's responsibility to ensure that the hashCode and data are up to date via
+   * the {@link #computeHash()} method before calling this method.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    if (changed == false) {
+      assert arrayCache != null;
+      return new FrozenIntSet(arrayCache, hashCode, state);
+    }
+    return new FrozenIntSet(getArray(), hashCode, state);
+  }
+
+  @Override
+  int[] getArray() {
+    if (changed == false) {
+      assert arrayCache != null;
+      return arrayCache;
+    }
+    changed = false;
+    arrayCache = inner.keys().toArray();
+    // we need to sort this array since "equals" method depend on this
+    Arrays.sort(arrayCache);

Review comment:
       I have posted some stats in the JIRA issue, it turns out we are doing way way much more add/delete than freeze. Since  `(#add + #delete)/#freeze > 10000` so a rough analysis is each `add` or `delete` use `O(log(n))` and `freeze` use `O(n * log(n))`, we see the cost of `#add + #delete` is still larger, though I'm not sure why it is this much faster.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r642721094



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {
+    if (changed == false) {
+      return;
+    }
+    hashCode = inner.size();
+    for (IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * <p>It is the caller's responsibility to ensure that the hashCode and data are up to date via
+   * the {@link #computeHash()} method before calling this method.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    if (changed == false) {
+      assert arrayCache != null;
+      return new FrozenIntSet(arrayCache, hashCode, state);
+    }
+    return new FrozenIntSet(getArray(), hashCode, state);
+  }
+
+  @Override
+  int[] getArray() {
+    if (changed == false) {
+      assert arrayCache != null;
+      return arrayCache;
+    }
+    changed = false;
+    arrayCache = inner.keys().toArray();
+    // we need to sort this array since "equals" method depend on this
+    Arrays.sort(arrayCache);

Review comment:
       Oh another thought is, we were using raw array for states less than 30, which have `add/delete` cost of `O(n)`, and I would expect most of the powerset will be constructed using these 30 states (since `C(30,4) > 10000` already), meaning we might be actually paying `O(n)` price per `add/delete` most times




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
mikemccand commented on pull request #163:
URL: https://github.com/apache/lucene/pull/163#issuecomment-865908030


   > > Thanks @zhaih this looks great -- I think it's ready! Have you confirmed that `gradle check` passes?
   > 
   > Thank you for reviewing it! Yeah `gradle check` passed.
   
   Great!  I'll try to push this today!  Thanks @zhaih!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
mikemccand commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r650412326



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);

Review comment:
       +1 to add the (missing?) `indexRemove` for this situation!

##########
File path: lucene/core/src/java/org/apache/lucene/util/BitMixer.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util;
+
+/**
+ * Bit mixing utilities. The purpose of these methods is to evenly distribute key space over int32
+ * range.
+ *
+ * <p>Forked from com.carrotsearch.hppc.BitMixer

Review comment:
       Maybe we should move this under sub-package `o.a.l.util.hppc` also?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r650443185



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import java.util.Arrays;
+import org.apache.lucene.util.BitMixer;
+import org.apache.lucene.util.IntIntHashMap;
+
+/** A thin wrapper of {@link IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private long hashCode;
+  private boolean hashUpdated = true;
+  private boolean arrayUpdated = true;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      keyChanged();
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      keyChanged();
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    return new FrozenIntSet(getArray(), longHashCode(), state);
+  }
+
+  private void keyChanged() {
+    hashUpdated = false;
+    arrayUpdated = false;
+  }
+
+  @Override
+  int[] getArray() {
+    if (arrayUpdated) {
+      return arrayCache;
+    }
+    arrayCache = new int[inner.size()];
+    int i = 0;
+    for (IntIntHashMap.IntCursor cursor : inner.keys()) {
+      arrayCache[i++] = cursor.value;
+    }
+    // we need to sort this array since "equals" method depend on this
+    Arrays.sort(arrayCache);
+    arrayUpdated = true;
+    return arrayCache;
+  }
+
+  @Override
+  int size() {
+    return inner.size();
+  }
+
+  @Override
+  long longHashCode() {
+    if (hashUpdated) {
+      return hashCode;
+    }
+    hashCode = inner.size();
+    for (IntIntHashMap.IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);

Review comment:
       But wouldn't that make hashcode depend on order of keys?

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);

Review comment:
       Thanks Bruno for adding this capability! I'll fork the indexRemove and use it here.

##########
File path: lucene/core/src/java/org/apache/lucene/util/BitMixer.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util;
+
+/**
+ * Bit mixing utilities. The purpose of these methods is to evenly distribute key space over int32
+ * range.
+ *
+ * <p>Forked from com.carrotsearch.hppc.BitMixer

Review comment:
       Done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on pull request #163:
URL: https://github.com/apache/lucene/pull/163#issuecomment-853538280


   Thank you @mikemccand @dweiss and @bruno-roustant all for reviewing this PR! Since this PR is more of an optimization for adversarial cases so we don't want to sacrifice performance of our normal use cases. I will take some time to benchmark those changes (this one as well as few others that are not yet presented) first and see what numbers we'll get to decide how to move on with this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r650372453



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import java.util.Arrays;
+import org.apache.lucene.util.BitMixer;
+import org.apache.lucene.util.IntIntHashMap;
+
+/** A thin wrapper of {@link IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private long hashCode;
+  private boolean hashUpdated = true;
+  private boolean arrayUpdated = true;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      keyChanged();
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      keyChanged();
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    return new FrozenIntSet(getArray(), longHashCode(), state);
+  }
+
+  private void keyChanged() {
+    hashUpdated = false;
+    arrayUpdated = false;
+  }
+
+  @Override
+  int[] getArray() {
+    if (arrayUpdated) {
+      return arrayCache;
+    }
+    arrayCache = new int[inner.size()];
+    int i = 0;
+    for (IntIntHashMap.IntCursor cursor : inner.keys()) {
+      arrayCache[i++] = cursor.value;
+    }
+    // we need to sort this array since "equals" method depend on this
+    Arrays.sort(arrayCache);
+    arrayUpdated = true;
+    return arrayCache;
+  }
+
+  @Override
+  int size() {
+    return inner.size();
+  }
+
+  @Override
+  long longHashCode() {
+    if (hashUpdated) {
+      return hashCode;
+    }
+    hashCode = inner.size();
+    for (IntIntHashMap.IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);

Review comment:
       I'd still think the simple aggregation 37 * hashCode + cursor.value with a final mix step would be a good enough (or maybe even better) choice here.

##########
File path: lucene/core/src/java/org/apache/lucene/util/BitMixer.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util;
+
+/**
+ * Bit mixing utilities. The purpose of these methods is to evenly distribute key space over int32
+ * range.
+ *
+ * <p>Forked from com.carrotsearch.hppc.BitMixer

Review comment:
       I'd make it a link back to the original project and include a comment which hash ref of the repository this was copied from, for reference and in case somebody wishes to sync/update in the future.

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import java.util.Arrays;
+import org.apache.lucene.util.BitMixer;
+import org.apache.lucene.util.IntIntHashMap;
+
+/** A thin wrapper of {@link IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private long hashCode;
+  private boolean hashUpdated = true;
+  private boolean arrayUpdated = true;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      keyChanged();
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      keyChanged();
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    return new FrozenIntSet(getArray(), longHashCode(), state);
+  }
+
+  private void keyChanged() {
+    hashUpdated = false;
+    arrayUpdated = false;
+  }
+
+  @Override
+  int[] getArray() {
+    if (arrayUpdated) {
+      return arrayCache;
+    }
+    arrayCache = new int[inner.size()];
+    int i = 0;
+    for (IntIntHashMap.IntCursor cursor : inner.keys()) {
+      arrayCache[i++] = cursor.value;
+    }
+    // we need to sort this array since "equals" method depend on this
+    Arrays.sort(arrayCache);
+    arrayUpdated = true;
+    return arrayCache;
+  }
+
+  @Override
+  int size() {
+    return inner.size();
+  }
+
+  @Override
+  long longHashCode() {
+    if (hashUpdated) {
+      return hashCode;
+    }
+    hashCode = inner.size();
+    for (IntIntHashMap.IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);

Review comment:
       Duh. Of course! Sorry for not noticing this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r642820266



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {
+    if (changed == false) {
+      return;
+    }
+    hashCode = inner.size();
+    for (IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);

Review comment:
       +1. An alternative: 37 * hashCode + cursor.value with a final mix step.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] bruno-roustant commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r650761025



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import java.util.Arrays;
+import org.apache.lucene.util.hppc.BitMixer;
+import org.apache.lucene.util.hppc.IntIntHashMap;
+
+/** A thin wrapper of {@link IntIntHashMap} */

Review comment:
       Should we say something about the keys and values?
   key = state num, value = ref count

##########
File path: lucene/core/src/test/org/apache/lucene/util/automaton/TestIntSet.java
##########
@@ -31,14 +31,13 @@ public void testFreezeEqualityLargeSet() {
   }
 
   private void testFreezeEquality(int size) {
-    SortedIntSet sortedSet = new SortedIntSet(0);
+    StateSet sortedSet = new StateSet(0);

Review comment:
       Can we rename sortedSet? (confusing)

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java
##########
@@ -693,7 +694,7 @@ public static Automaton determinize(Automaton a, int maxDeterminizedStates) {
     final PointTransitionSet points = new PointTransitionSet();
 
     // like SortedMap<Integer,Integer>

Review comment:
       The comment should be replaced.

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import java.util.Arrays;
+import org.apache.lucene.util.hppc.BitMixer;
+import org.apache.lucene.util.hppc.IntIntHashMap;
+
+/** A thin wrapper of {@link IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private long hashCode;
+  private boolean hashUpdated = true;
+  private boolean arrayUpdated = true;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set

Review comment:
       Should be javadoc. Increments the ref count of a state?

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import java.util.Arrays;
+import org.apache.lucene.util.hppc.BitMixer;
+import org.apache.lucene.util.hppc.IntIntHashMap;
+
+/** A thin wrapper of {@link IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private long hashCode;
+  private boolean hashUpdated = true;
+  private boolean arrayUpdated = true;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      keyChanged();
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0

Review comment:
       Better with javadoc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] bruno-roustant commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r643116749



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;

Review comment:
       Could we rename "changed" to "keysChanged" as this only tracks keys and not values?

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {

Review comment:
       I find strange that this method is called externally by Operations. Since the hash is cached and we track when it needs to be recomputed with "changed", could we make it private and change the signature to "private int getOrComputeHash()"?

##########
File path: lucene/core/build.gradle
##########
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
       According to the stats you shared @zhaih in the Jira issue, there are 3 times more get (inc/dec) & removal than entry additions. If there are less than 2M states then IntIntWormMap should be more efficient.
   I suggest that you benchmark with IntIntHashMap and IntIntWormMap (with your existing tests) and take the most appropriate.

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java
##########
@@ -676,7 +677,7 @@ public static Automaton determinize(Automaton a, int maxDeterminizedStates) {
     // a.writeDot("/l/la/lucene/core/detin.dot");
 
     // Same initial values and state will always have the same hashCode
-    FrozenIntSet initialset = new FrozenIntSet(new int[] {0}, 683, 0);
+    FrozenIntSet initialset = new FrozenIntSet(new int[] {0}, BitMixer.mix(0) + 1, 0);

Review comment:
       Why not simply 0 as initial hash code?

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {
+    if (changed == false) {
+      return;
+    }
+    hashCode = inner.size();
+    for (IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * <p>It is the caller's responsibility to ensure that the hashCode and data are up to date via
+   * the {@link #computeHash()} method before calling this method.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    if (changed == false) {
+      assert arrayCache != null;
+      return new FrozenIntSet(arrayCache, hashCode, state);
+    }
+    return new FrozenIntSet(getArray(), hashCode, state);
+  }
+
+  @Override
+  int[] getArray() {
+    if (changed == false) {
+      assert arrayCache != null;
+      return arrayCache;
+    }
+    changed = false;
+    arrayCache = inner.keys().toArray();
+    // we need to sort this array since "equals" method depend on this
+    Arrays.sort(arrayCache);

Review comment:
       How many states are we manipulating? If it's not too many, then indeed maybe a simple array of state-counters is enough?
   If array[i] is the counter for state i, then incr(i) / decr(i) are 0(1), hash is still O(n) (skipping 0 values), freeze does not need to be sorted.

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);

Review comment:
       @dweiss ideally here I would use inner.indexRemove(keyIndex). Does this miss in HPPC?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r644146024



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {
+    if (changed == false) {
+      return;
+    }
+    hashCode = inner.size();
+    for (IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * <p>It is the caller's responsibility to ensure that the hashCode and data are up to date via
+   * the {@link #computeHash()} method before calling this method.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    if (changed == false) {
+      assert arrayCache != null;

Review comment:
       @mikemccand We actually might fall inside this `if`. So before we call `freeze`, we will perform a `get` using this `StateSet` by the `newStates` hashmap, there we might call `getArray()` if there's hashcode collision and `changed` should be set to `false` there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] bruno-roustant commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r643113793



##########
File path: lucene/core/build.gradle
##########
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
       According to the stats you shared @zhaih in the Jira issue, there are 3 times more get (inc/dec) & removal than entry additions. If there are less than 2M states then IntIntWormMap should be more efficient.
   I suggest that you benchmark with IntIntHashMap and IntIntWormMap (with your existing tests) and take the most appropriate.
   Edit: actually maybe a simple array is working? (see the question in the Operations comment)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r642817638



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;

Review comment:
       Not just moving to a long but also using a better hash function. The typical accumulative default  (prime * current + elementValue) is fine but if the number of hashed elements is small (or if their values are small) this leads to poor distributions. Throw a murmur mix function in between (or at the end at least) and things typically look much better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r644142000



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;

Review comment:
       +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r644141734



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java
##########
@@ -676,7 +677,7 @@ public static Automaton determinize(Automaton a, int maxDeterminizedStates) {
     // a.writeDot("/l/la/lucene/core/detin.dot");
 
     // Same initial values and state will always have the same hashCode
-    FrozenIntSet initialset = new FrozenIntSet(new int[] {0}, 683, 0);
+    FrozenIntSet initialset = new FrozenIntSet(new int[] {0}, BitMixer.mix(0) + 1, 0);

Review comment:
       Just to keep the hash code the same with the one used in `StateSet`, 0 should be the hash code used for 0 length array I think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r644138606



##########
File path: lucene/core/build.gradle
##########
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
       @bruno-roustant I tried WormMap yesterday (with hppc 0.9.0.RC2) and I didn't see benefits from the adversarial test case. Just to educate myself, is removal also a fast operation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
mikemccand commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r643866949



##########
File path: lucene/core/build.gradle
##########
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
       > @bruno-roustant came up with some clever new hashing improvements recently - these are not published as a public revision but you can get them from the repository and compile it locally. See this for details:
   > 
   > https://issues.carrot2.org/browse/HPPC-176
   
   Whoa, this looks new "worm" hashing looks great!  That is frequently a great tradeoff (slower put, faster get)?  Hmm why hasn't this been "published in a public revision" yet :)

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {
+    if (changed == false) {
+      return;
+    }
+    hashCode = inner.size();
+    for (IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * <p>It is the caller's responsibility to ensure that the hashCode and data are up to date via
+   * the {@link #computeHash()} method before calling this method.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    if (changed == false) {
+      assert arrayCache != null;

Review comment:
       Hmm do we actually fall inside this `if`?  I would think we shouldn't ever hit this -- we should't call `freeze` unless something had in fact changed?  Or, if we are, something else might be wrong?  Or maybe I am simply confused ;)

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {

Review comment:
       Yeah this is odd -- `+1` to make the hash code a privately maintained thing.

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {
+    if (changed == false) {
+      return;
+    }
+    hashCode = inner.size();
+    for (IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * <p>It is the caller's responsibility to ensure that the hashCode and data are up to date via
+   * the {@link #computeHash()} method before calling this method.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    if (changed == false) {
+      assert arrayCache != null;
+      return new FrozenIntSet(arrayCache, hashCode, state);
+    }
+    return new FrozenIntSet(getArray(), hashCode, state);
+  }
+
+  @Override
+  int[] getArray() {
+    if (changed == false) {
+      assert arrayCache != null;
+      return arrayCache;
+    }
+    changed = false;
+    arrayCache = inner.keys().toArray();
+    // we need to sort this array since "equals" method depend on this
+    Arrays.sort(arrayCache);

Review comment:
       > How many states are we manipulating? If it's not too many, then indeed maybe a simple array of state-counters is enough?
   > If array[i] is the counter for state i, then incr(i) / decr(i) are 0(1), hash is still O(n) (skipping 0 values), freeze does not need to be sorted.
   
   That's a neat idea!
   
   The tricky thing is that these are usually very sparsely occupied sets (each DFA state is "typically" just a few NFA states, I think), so we would have to pay the O(N) cost to walk that array and turn it into the frozen (sparse `int[]`) representation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
mikemccand commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r642695443



##########
File path: lucene/core/build.gradle
##########
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
       Can you add a `// nocommit` here?  We can't take (external) dependencies in Lucene's `core` ... but hopefully we can fork the one HPPC class we need here (`IntIntHashMap`?).

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;

Review comment:
       @dweiss suggested upgrading the hash to `long` to further reduce chance of false collisions.

##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {
+    if (changed == false) {
+      return;
+    }
+    hashCode = inner.size();
+    for (IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * <p>It is the caller's responsibility to ensure that the hashCode and data are up to date via
+   * the {@link #computeHash()} method before calling this method.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    if (changed == false) {
+      assert arrayCache != null;
+      return new FrozenIntSet(arrayCache, hashCode, state);
+    }
+    return new FrozenIntSet(getArray(), hashCode, state);
+  }
+
+  @Override
+  int[] getArray() {
+    if (changed == false) {
+      assert arrayCache != null;
+      return arrayCache;
+    }
+    changed = false;
+    arrayCache = inner.keys().toArray();
+    // we need to sort this array since "equals" method depend on this
+    Arrays.sort(arrayCache);

Review comment:
       Oh I see, we are still sorting on `freeze`, but not when accumulating the powerset.  Curious this is so much faster :)  I would expect it to be slower on the adversarial regexp, I think, because we pay the sort price on every freeze now, instead on insert/delete into this map?  Confusing :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on pull request #163:
URL: https://github.com/apache/lucene/pull/163#issuecomment-860880755


   Thank you @bruno-roustant for approving this, I've refined the javadoc according to your comment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r642816694



##########
File path: lucene/core/src/java/org/apache/lucene/util/automaton/StateSet.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.automaton;
+
+import com.carrotsearch.hppc.BitMixer;
+import com.carrotsearch.hppc.IntIntHashMap;
+import com.carrotsearch.hppc.cursors.IntCursor;
+import java.util.Arrays;
+
+/** A thin wrapper of {@link com.carrotsearch.hppc.IntIntHashMap} */
+final class StateSet extends IntSet {
+
+  private final IntIntHashMap inner;
+  private int hashCode;
+  private boolean changed;
+  private int[] arrayCache = new int[0];
+
+  StateSet(int capacity) {
+    inner = new IntIntHashMap(capacity);
+  }
+
+  // Adds this state to the set
+  void incr(int num) {
+    if (inner.addTo(num, 1) == 1) {
+      changed = true;
+    }
+  }
+
+  // Removes this state from the set, if count decrs to 0
+  void decr(int num) {
+    assert inner.containsKey(num);
+    int keyIndex = inner.indexOf(num);
+    int count = inner.indexGet(keyIndex) - 1;
+    if (count == 0) {
+      inner.remove(num);
+      changed = true;
+    } else {
+      inner.indexReplace(keyIndex, count);
+    }
+  }
+
+  void computeHash() {
+    if (changed == false) {
+      return;
+    }
+    hashCode = inner.size();
+    for (IntCursor cursor : inner.keys()) {
+      hashCode += BitMixer.mix(cursor.value);
+    }
+  }
+
+  /**
+   * Create a snapshot of this int set associated with a given state. The snapshot will not retain
+   * any frequency information about the elements of this set, only existence.
+   *
+   * <p>It is the caller's responsibility to ensure that the hashCode and data are up to date via
+   * the {@link #computeHash()} method before calling this method.
+   *
+   * @param state the state to associate with the frozen set.
+   * @return A new FrozenIntSet with the same values as this set.
+   */
+  FrozenIntSet freeze(int state) {
+    if (changed == false) {
+      assert arrayCache != null;
+      return new FrozenIntSet(arrayCache, hashCode, state);
+    }
+    return new FrozenIntSet(getArray(), hashCode, state);
+  }
+
+  @Override
+  int[] getArray() {
+    if (changed == false) {
+      assert arrayCache != null;
+      return arrayCache;
+    }
+    changed = false;
+    arrayCache = inner.keys().toArray();
+    // we need to sort this array since "equals" method depend on this
+    Arrays.sort(arrayCache);

Review comment:
       Yeah... I think a good empirical breakdown of what actually is happening may be much better than theoretical complexity analysis. This doesn't mean it's not worth experimenting with int-int hash maps but I have a gut feeling you can get pretty darn close with less memory used using existing sorted arrays.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
zhaih commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r644138606



##########
File path: lucene/core/build.gradle
##########
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
       @bruno-roustant Thank you for the advice! Unfortunately I tried WormMap yesterday (with hppc 0.9.0.RC2) and I didn't see benefits from the adversarial test case. Just to educate myself, is removal also a fast operation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] bruno-roustant commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r644575732



##########
File path: lucene/core/build.gradle
##########
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
       Yes removal is fast.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org