You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "rmuir (via GitHub)" <gi...@apache.org> on 2023/05/19 16:11:51 UTC

[GitHub] [lucene] rmuir commented on a diff in pull request #12312: [DRAFT] GH#12176: TermInSetQuery extends AutomatonQuery

rmuir commented on code in PR #12312:
URL: https://github.com/apache/lucene/pull/12312#discussion_r1199137356


##########
lucene/core/src/java/org/apache/lucene/util/automaton/DaciukMihovAutomatonBuilder.java:
##########
@@ -308,17 +290,83 @@ private void replaceOrRegister(State state) {
     }
   }
 
-  /**
-   * Add a suffix of <code>current</code> starting at <code>fromIndex</code> (inclusive) to state
-   * <code>state</code>.
-   */
-  private void addSuffix(State state, CharSequence current, int fromIndex) {
-    final int len = current.length();
-    while (fromIndex < len) {
-      int cp = Character.codePointAt(current, fromIndex);
-      state = state.newState(cp);
-      fromIndex += Character.charCount(cp);
+  private static class CharacterBasedBuilder extends DaciukMihovAutomatonBuilder {
+    private final CharsRefBuilder scratch = new CharsRefBuilder();
+
+    @Override
+    protected void add(BytesRef current) {
+      // Convert the input UTF-8 bytes to CharsRef so we can use the code points as our transition
+      // labels.

Review Comment:
   hmm, this won't work if someone has binary terms that aren't UTF-8 encoded. I think we shouldn't be building an int32 automaton since we are passing `binary=true` to AutomatonQuery constructor.
   
   Instead we just want to build a binary automaton, where transitions are just bytes. 
   
   PrefixQuery is an example of an automatonquery that supports binary terms and builds a binary automaton directly in this way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org