You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Andy Webb (Jira)" <ji...@apache.org> on 2019/12/17 20:02:00 UTC
[jira] [Commented] (SOLR-13190) Fuzzy search treated as server error instead of client error when terms are too complex

    [ https://issues.apache.org/jira/browse/SOLR-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998533#comment-16998533 ] 

Andy Webb commented on SOLR-13190:
----------------------------------

hi, we've seeing {{TooComplexToDeterminizeException}} in production - our current mitigation is to avoid asking Solr to spellcheck long queries.

The exception can be triggered in 8.3.0 as follows:
 # create a collection (e.g. {{default}}) using the {{_default}} config
 # add a single document with some random content in the {{\_text_}} field
 # send a spellcheck request such as {{/solr/default/spell?q=kjshgkjahdskjgadhsgkahsdkgskd%C4%A3shdjghaksdhdhdkadhgkjahsdkjgahskdghjjhgkasjdhgajhdskgjahsdgkahjsdkjghaksd}} 

The presence of a multi-byte character seems to matter - without it the query can be several times longer before a StackOverflowError is thrown instead.

Would you expect the PR on this ticket to resolve this? If so, we'd be very keen to see it merged in please. (I'll try spinning up a custom build Solr with the patch applied to test this myself.)

thanks,

Andy

> Fuzzy search treated as server error instead of client error when terms are too complex
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-13190
>                 URL: https://issues.apache.org/jira/browse/SOLR-13190
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: master (9.0)
>            Reporter: Mike Drob
>            Assignee: Mike Drob
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We've seen a fuzzy search end up breaking the automaton and getting reported as a server error. This usage should be improved by
> 1) reporting as a client error, because it's similar to something like too many boolean clauses queries in how an operator should deal with it
> 2) report what field is causing the error, since that currently must be deduced from adjacent query logs and can be difficult if there are multiple terms in the search
> This trigger was added to defend against adversarial regex but somehow hits fuzzy terms as well, I don't understand enough about the automaton mechanisms to really know how to approach a fix there, but improving the operability is a good first step.
> relevant stack trace:
> {noformat}
> org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton with 13632 states and 21348 transitions would result in more than 10000 states.
> 	at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:746)
> 	at org.apache.lucene.util.automaton.RunAutomaton.<init>(RunAutomaton.java:69)
> 	at org.apache.lucene.util.automaton.ByteRunAutomaton.<init>(ByteRunAutomaton.java:32)
> 	at org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:247)
> 	at org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:133)
> 	at org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:143)
> 	at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
> 	at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
> 	at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
> 	at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
> 	at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310)
> 	at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:667)
> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:442)
> 	at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200)
> 	at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1604)
> 	at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1420)
> 	at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:567)
> 	at org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1435)
> 	at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:374)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2559)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org