You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/23 16:59:40 UTC

[GitHub] [lucene] taroplus opened a new issue, #11809: input automaton is too large for lengthy wildcard query

taroplus opened a new issue, #11809:
URL: https://github.com/apache/lucene/issues/11809

   ### Description
   
   Hello, I have a very lengthy string to search with, basically 
   
   ```
   String term = "very-lengthy-text-contains-dots-and-dashes";
   ```
   
   When I try to create a WildcardQuery like below, I get java.lang.IllegalArgumentException: input automaton is too large: 1001
   
   ```
   WildcardQuery query = new WildcardQuery(new Term("field", term + "*"));
   ```
   
   exception looks like this
   ```
   java.lang.IllegalArgumentException: input automaton is too large: 1001
   
   	at org.apache.lucene.util.automaton.Operations.isFinite(Operations.java:1060)
   	at org.apache.lucene.util.automaton.Operations.isFinite(Operations.java:1066)
   	at org.apache.lucene.util.automaton.Operations.isFinite(Operations.java:1066)
   	at org.apache.lucene.util.automaton.Operations.isFinite(Operations.java:1066)
   	at org.apache.lucene.util.automaton.Operations.isFinite(Operations.java:1066)
   	at org.apache.lucene.util.automaton.Operations.isFinite(Operations.java:1066)
   	at org.apache.lucene.util.automaton.Operations.isFinite(Operations.java:1066)
   ```
   
   Actual string I have is below
   ```
   "{group-bm-http-server-02083.node.dm.reg,group-bm-http-server-02082.node.dm.reg,group-bm-http-server-02081.node.dm.reg,group-bm-http-server-02080.node.dm.reg,group-bm-http-server-02079.node.dm.reg,group-bm-http-server-02078.node.dm.reg,group-bm-http-server-02077.node.dm.reg,group-bm-http-server-02076.node.dm.reg,group-bm-http-server-02073.node.dm.reg,group-bm-http-server-02070.node.dm.reg,group-bm-http-server-02067.node.dm.reg,group-bm-http-server-02064.node.dm.reg,group-bm-http-server-02029.node.dm.reg,group-bm-http-server-02028.node.dm.reg,group-bm-http-server-02027.node.dm.reg,group-bm-http-server-02026.node.dm.reg,group-bm-http-server-02025.node.dm.reg,group-bm-http-server-02023.node.dm.reg,group-bm-http-server-02022.node.dm.reg,group-bm-http-server-02021.node.dm.reg,group-bm-http-server-02020.node.dm.reg,group-bm-http-server-02019.node.dm.reg,group-bm-http-server-02018.node.dm.reg,group-bm-http-server-02016.node.dm.reg,group-bm-http-server-02015.node.dm.reg,group-bm-http-serv
 er-02014.node.dm.reg,group-bm-http-server-02009.node.dm.reg,group-bm-http-server-02007.node.dm.reg,group-bm-http-server-02004.node.dm.reg,group-bm-http-server-02003.node.dm.reg,group-bm-http-server-02002.node.dm.reg,group-bm-http-server-01311.node.dm.reg,group-bm-http-server-01309.node.dm.reg,group-bm-http-server-01307.node.dm.reg}"
   ```
   
   i know it's not a ordinal situation, however, I'm not sure why Automaton compilation needs to go that deep.
   
   ### Version and environment details
   
   Lucene 8.11.1 / Java 8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #11809: input automaton is too large for lengthy wildcard query

Posted by GitBox <gi...@apache.org>.
rmuir commented on issue #11809:
URL: https://github.com/apache/lucene/issues/11809#issuecomment-1256988889

   Thanks for reporting this with easy-to-reproduce testcase @taroplus 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #11809: input automaton is too large for lengthy wildcard query

Posted by GitBox <gi...@apache.org>.
rmuir commented on issue #11809:
URL: https://github.com/apache/lucene/issues/11809#issuecomment-1256465997

   not sure it is still an issue for `main` branch as i don't have the full stacktrace. however i would recommend using TermInSetQuery instead of the large regex you have that seems to represent a simple set of string values. It should be more performant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir closed issue #11809: input automaton is too large for lengthy wildcard query

Posted by GitBox <gi...@apache.org>.
rmuir closed issue #11809: input automaton is too large  for lengthy wildcard query
URL: https://github.com/apache/lucene/issues/11809


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] taroplus commented on issue #11809: input automaton is too large for lengthy wildcard query

Posted by GitBox <gi...@apache.org>.
taroplus commented on issue #11809:
URL: https://github.com/apache/lucene/issues/11809#issuecomment-1256541453

   Tried with the latest commit, it happens. it's not regex, it's just `*` after a plain text. I'm just trying to run a prefix query (same happens with PrefixQuery too)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] taroplus commented on issue #11809: input automaton is too large for lengthy wildcard query

Posted by GitBox <gi...@apache.org>.
taroplus commented on issue #11809:
URL: https://github.com/apache/lucene/issues/11809#issuecomment-1256518868

   stacktrace is long
   ```
   java.lang.IllegalArgumentException: input automaton is too large: 1001
   
   	at org.apache.lucene.util.automaton.Operations.isFinite(Operations.java:1066)
           <thousand of this>
   	at org.apache.lucene.util.automaton.Operations.isFinite(Operations.java:1049)
   	at org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:224)
   	at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:108)
   	at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:86)
   	at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:71)
   	at org.apache.lucene.search.WildcardQuery.<init>(WildcardQuery.java:56)
   ```
   i'll test with the latest master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #11809: input automaton is too large for lengthy wildcard query

Posted by GitBox <gi...@apache.org>.
rmuir commented on issue #11809:
URL: https://github.com/apache/lucene/issues/11809#issuecomment-1256553025

   ok, thanks for reporting. I will dig more into this.
   
   The problem is that `isFinite` is implemented recursively, so we have a defensive check that you are hitting, due to the length of the string. See https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java#L1056-L1057
   
   For PrefixQuery, we shouldn't even be calculating `isFinite`: its implicitly infinite.
   For WildcardQuery, we could avoid calculating `isFinite`: if we ever see `*` operator, its infinite, otherwise its finite.
   
   and of course, it would be great to implement this function without recursion at some point. but i'm not sure its needed to solve your issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org