You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/24 13:23:36 UTC

[GitHub] [lucene] rmuir opened a new pull request, #11813: Remove Operations.isFinite

rmuir opened a new pull request, #11813:
URL: https://github.com/apache/lucene/pull/11813

   This method is recursive: to avoid eating too much stack we apply a small limit. This means it can't really be used on any largish automata without hitting exception.
   
   But the benefit of knowing finite vs infinite in AutomatonTermsEnum is minor: let's not auto-compute this. FuzzyQuery still gets the finite optimization because its finite by definition. PrefixQuery is always infinite. Wildcard/Regex just assume infinite which is safe to do.
   
   Remove the auto-computation and the "trillean" Boolean parameter. If you dont know that your automaton is finite, pass false to CompiledAutomaton, it is safe.
   
   Move this method to AutomatonTestUtil so we can still use it in test asserts.
   
   Closes #11809


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #11813:
URL: https://github.com/apache/lucene/pull/11813#issuecomment-1256986744

   I'm planning on doing this 10.x-only, not out of laziness, but because there are already several related 10.x changes around this stuff: removal of det in #11049, removal of minimize in #11332, etc.
   
   Not opposed to backporting this stuff to 9.x but we'd need to be careful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir merged pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
rmuir merged PR #11813:
URL: https://github.com/apache/lucene/pull/11813


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #11813:
URL: https://github.com/apache/lucene/pull/11813#discussion_r979251445


##########
lucene/core/src/java/org/apache/lucene/util/automaton/CompiledAutomaton.java:
##########
@@ -139,21 +136,21 @@ private static int findSinkState(Automaton automaton) {
   }
 
   /**
-   * Create this. If finite is null, we use {@link Operations#isFinite} to determine whether it is
-   * finite. If simplify is true, we run possibly expensive operations to determine if the automaton
-   * is one the cases in {@link CompiledAutomaton.AUTOMATON_TYPE}.
+   * Create this. If simplify is true, we run possibly expensive operations to determine if the
+   * automaton is one the cases in {@link CompiledAutomaton.AUTOMATON_TYPE}. Set finite to true if
+   * the automaton is finite, otherwise set to false if infinite or you don't know.

Review Comment:
   I assume wrong answers or assertions :) Only FuzzyTermsEnum uses this optimization to avoid a little CPU/upkeeping for the fuzzy case, its really an opto for that. BlockTree intersection doesn't even look at it, i think.
   
   We could probably tone down some of these CompiledAutomaton ctors to expose it less in the future. We just need a single expert ctor for Fuzzy? I was trying to minimize the scope of API changes here but can do more.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a diff in pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
mikemccand commented on code in PR #11813:
URL: https://github.com/apache/lucene/pull/11813#discussion_r979251604


##########
lucene/core/src/java/org/apache/lucene/util/automaton/CompiledAutomaton.java:
##########
@@ -139,21 +136,21 @@ private static int findSinkState(Automaton automaton) {
   }
 
   /**
-   * Create this. If finite is null, we use {@link Operations#isFinite} to determine whether it is
-   * finite. If simplify is true, we run possibly expensive operations to determine if the automaton
-   * is one the cases in {@link CompiledAutomaton.AUTOMATON_TYPE}.
+   * Create this. If simplify is true, we run possibly expensive operations to determine if the
+   * automaton is one the cases in {@link CompiledAutomaton.AUTOMATON_TYPE}. Set finite to true if
+   * the automaton is finite, otherwise set to false if infinite or you don't know.

Review Comment:
   OK no worries -- no need to do more here!  This change is already self-contained and a great progress!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a diff in pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
mikemccand commented on code in PR #11813:
URL: https://github.com/apache/lucene/pull/11813#discussion_r979254281


##########
lucene/core/src/java/org/apache/lucene/analysis/AutomatonToTokenStream.java:
##########
@@ -46,10 +45,6 @@ private AutomatonToTokenStream() {}
    * @return TokenStream representation of automaton.
    */
   public static TokenStream toTokenStream(Automaton automaton) {
-    if (Operations.isFinite(automaton) == false) {

Review Comment:
   Ahh super yes I agree!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a diff in pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
mikemccand commented on code in PR #11813:
URL: https://github.com/apache/lucene/pull/11813#discussion_r979249444


##########
lucene/core/src/java/org/apache/lucene/util/automaton/CompiledAutomaton.java:
##########
@@ -139,21 +136,21 @@ private static int findSinkState(Automaton automaton) {
   }
 
   /**
-   * Create this. If finite is null, we use {@link Operations#isFinite} to determine whether it is
-   * finite. If simplify is true, we run possibly expensive operations to determine if the automaton
-   * is one the cases in {@link CompiledAutomaton.AUTOMATON_TYPE}.
+   * Create this. If simplify is true, we run possibly expensive operations to determine if the
+   * automaton is one the cases in {@link CompiledAutomaton.AUTOMATON_TYPE}. Set finite to true if
+   * the automaton is finite, otherwise set to false if infinite or you don't know.

Review Comment:
   If a user accidentally claims the automaton was finite but it is not, what happens?



##########
lucene/core/src/java/org/apache/lucene/analysis/AutomatonToTokenStream.java:
##########
@@ -46,10 +45,6 @@ private AutomatonToTokenStream() {}
    * @return TokenStream representation of automaton.
    */
   public static TokenStream toTokenStream(Automaton automaton) {
-    if (Operations.isFinite(automaton) == false) {

Review Comment:
   Maybe we should add a warning that this may run forever on an infinite automaton?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #11813:
URL: https://github.com/apache/lucene/pull/11813#discussion_r979252083


##########
lucene/core/src/java/org/apache/lucene/util/automaton/CompiledAutomaton.java:
##########
@@ -139,21 +136,21 @@ private static int findSinkState(Automaton automaton) {
   }
 
   /**
-   * Create this. If finite is null, we use {@link Operations#isFinite} to determine whether it is
-   * finite. If simplify is true, we run possibly expensive operations to determine if the automaton
-   * is one the cases in {@link CompiledAutomaton.AUTOMATON_TYPE}.
+   * Create this. If simplify is true, we run possibly expensive operations to determine if the
+   * automaton is one the cases in {@link CompiledAutomaton.AUTOMATON_TYPE}. Set finite to true if
+   * the automaton is finite, otherwise set to false if infinite or you don't know.

Review Comment:
   yeah, this recursive method is "in the query path", the only remaining recursive method is sortTopoStates, which is less exposed (i think suggesters only).
   
   ultimately it would be great to remove more of these "automatic" (sometimes costly) optimizations, maybe even remove CompiledAutomaton.  we've been making progress.
   
   but this Operations.isFinite is definitely the biggest problem and easiest win right now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #11813:
URL: https://github.com/apache/lucene/pull/11813#discussion_r979254120


##########
lucene/core/src/java/org/apache/lucene/analysis/AutomatonToTokenStream.java:
##########
@@ -46,10 +45,6 @@ private AutomatonToTokenStream() {}
    * @return TokenStream representation of automaton.
    */
   public static TokenStream toTokenStream(Automaton automaton) {
-    if (Operations.isFinite(automaton) == false) {

Review Comment:
   I looked at the javadocs (scroll up just a bit more), it looks good?
   
   ```java
     /**
      * converts an automaton into a TokenStream. This is done by first Topo sorting the nodes in the
      * Automaton. Nodes that have the same distance from the start are grouped together to form the
      * position nodes for the TokenStream. The resulting TokenStream releases edges from the automaton
      * as tokens in order from the position nodes. This requires the automaton be a finite DAG.
      *
      * @param automaton automaton to convert. Must be a finite DAG.
      * @return TokenStream representation of automaton.
      */
     public static TokenStream toTokenStream(Automaton automaton) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #11813:
URL: https://github.com/apache/lucene/pull/11813#discussion_r979255164


##########
lucene/core/src/java/org/apache/lucene/analysis/AutomatonToTokenStream.java:
##########
@@ -46,10 +45,6 @@ private AutomatonToTokenStream() {}
    * @return TokenStream representation of automaton.
    */
   public static TokenStream toTokenStream(Automaton automaton) {
-    if (Operations.isFinite(automaton) == false) {

Review Comment:
   I boosted it a little bit with an explanation mark and the words "infinite loop"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] magibney commented on pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
magibney commented on PR #11813:
URL: https://github.com/apache/lucene/pull/11813#issuecomment-1257009186

   fwiw I have a local branch (very recent) that changed the implementation of `Operations.isFinite()` (and `Operations.topoSortStates()`) to be non-recursive, afaict without sacrificing performance on common use cases. Hoping to push a PR soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a diff in pull request #11813: Remove Operations.isFinite

Posted by GitBox <gi...@apache.org>.
rmuir commented on code in PR #11813:
URL: https://github.com/apache/lucene/pull/11813#discussion_r979251049


##########
lucene/core/src/java/org/apache/lucene/analysis/AutomatonToTokenStream.java:
##########
@@ -46,10 +45,6 @@ private AutomatonToTokenStream() {}
    * @return TokenStream representation of automaton.
    */
   public static TokenStream toTokenStream(Automaton automaton) {
-    if (Operations.isFinite(automaton) == false) {

Review Comment:
   ok, ill fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org