You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jian Mou <la...@gmail.com> on 2016/01/22 02:18:36 UTC

One complex wildcard query lead solr OOM

We are using Solr as our search engine, and recently notice some user input
wildcard query can lead to Solr dead loop in

org.apache.lucene.util.automaton.Operations.determinize()

, and it also eats memory and finally OOM.

the wildcard query seems like **?????????-???????o·???è??**。

Although we can validate the input parameter, but I also wonder is there
any configuration which can disable complex wildcard query like this which
lead to serve performance problems.


Related statcktrace


[image: Inline image 1]



Thanks,

Jian

Re: One complex wildcard query lead solr OOM

Posted by Jack Krupansky <ja...@gmail.com>.
Just escape them with a backslash. Or put each term in quotes.

-- Jack Krupansky

On Sun, Jan 24, 2016 at 5:21 AM, Jian Mou <la...@gmail.com> wrote:

> Hi Jack,
>
> Thanks! Do you know how to disable wildcards, What I want is if input is
> wildcards, just treat it as a normal char. I other words,
> I just want to disable wildcard search.
>
> Thanks,
> Jian
>
> On Fri, Jan 22, 2016 at 1:55 PM, Jack Krupansky <ja...@gmail.com>
> wrote:
>
> > The Lucene WildcardQuery class does have an additional constructor that
> has
> > a maxDeterminizedStates parameter to limit the size of the FSM generated
> by
> > a wildcard queery, and the QueryParserBase class does have a method to
> set
> > that parameter, setMaxDeterminizedStates, but there is no Solr support
> for
> > invoking that method.
> >
> > It is probably worth a Jira to get such support. Even then, the question
> is
> > how Solr should respond to the exception that gets thrown when that limit
> > is reached.
> >
> > Even if Solr had an option to disable complex wildcards, the question is
> > what you want to happen when a complex wildcard is used - should an
> > exception be thrown, or... what?
> >
> > I suppose it might be simplest to have a Solr option to limit the number
> of
> > wildcard characters used in a term, like to 4 or 8 or something like
> that.
> > IOW, have Solr check the term before the WildcardQuery is generated.
> >
> > -- Jack Krupansky
> >
> > On Thu, Jan 21, 2016 at 8:18 PM, Jian Mou <la...@gmail.com> wrote:
> >
> > > We are using Solr as our search engine, and recently notice some user
> > > input wildcard query can lead to Solr dead loop in
> > >
> > > org.apache.lucene.util.automaton.Operations.determinize()
> > >
> > > , and it also eats memory and finally OOM.
> > >
> > > the wildcard query seems like **?????????-???????o·???è??**。
> > >
> > > Although we can validate the input parameter, but I also wonder is
> there
> > > any configuration which can disable complex wildcard query like this
> > which
> > > lead to serve performance problems.
> > >
> > >
> > > Related statcktrace
> > >
> > >
> > > [image: Inline image 1]
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Jian
> > >
> >
>

Re: One complex wildcard query lead solr OOM

Posted by Erik Hatcher <er...@gmail.com>.
One option is to use the dismax (not edismax) as it does not support wild card queries. 
   
   Erik

> On Jan 24, 2016, at 05:21, Jian Mou <la...@gmail.com> wrote:
> 
> Hi Jack,
> 
> Thanks! Do you know how to disable wildcards, What I want is if input is
> wildcards, just treat it as a normal char. I other words,
> I just want to disable wildcard search.
> 
> Thanks,
> Jian
> 
> On Fri, Jan 22, 2016 at 1:55 PM, Jack Krupansky <ja...@gmail.com>
> wrote:
> 
>> The Lucene WildcardQuery class does have an additional constructor that has
>> a maxDeterminizedStates parameter to limit the size of the FSM generated by
>> a wildcard queery, and the QueryParserBase class does have a method to set
>> that parameter, setMaxDeterminizedStates, but there is no Solr support for
>> invoking that method.
>> 
>> It is probably worth a Jira to get such support. Even then, the question is
>> how Solr should respond to the exception that gets thrown when that limit
>> is reached.
>> 
>> Even if Solr had an option to disable complex wildcards, the question is
>> what you want to happen when a complex wildcard is used - should an
>> exception be thrown, or... what?
>> 
>> I suppose it might be simplest to have a Solr option to limit the number of
>> wildcard characters used in a term, like to 4 or 8 or something like that.
>> IOW, have Solr check the term before the WildcardQuery is generated.
>> 
>> -- Jack Krupansky
>> 
>>> On Thu, Jan 21, 2016 at 8:18 PM, Jian Mou <la...@gmail.com> wrote:
>>> 
>>> We are using Solr as our search engine, and recently notice some user
>>> input wildcard query can lead to Solr dead loop in
>>> 
>>> org.apache.lucene.util.automaton.Operations.determinize()
>>> 
>>> , and it also eats memory and finally OOM.
>>> 
>>> the wildcard query seems like **?????????-???????o·???è??**。
>>> 
>>> Although we can validate the input parameter, but I also wonder is there
>>> any configuration which can disable complex wildcard query like this
>> which
>>> lead to serve performance problems.
>>> 
>>> 
>>> Related statcktrace
>>> 
>>> 
>>> [image: Inline image 1]
>>> 
>>> 
>>> 
>>> Thanks,
>>> 
>>> Jian
>> 

Re: One complex wildcard query lead solr OOM

Posted by Jian Mou <la...@gmail.com>.
Hi Jack,

Thanks! Do you know how to disable wildcards, What I want is if input is
wildcards, just treat it as a normal char. I other words,
I just want to disable wildcard search.

Thanks,
Jian

On Fri, Jan 22, 2016 at 1:55 PM, Jack Krupansky <ja...@gmail.com>
wrote:

> The Lucene WildcardQuery class does have an additional constructor that has
> a maxDeterminizedStates parameter to limit the size of the FSM generated by
> a wildcard queery, and the QueryParserBase class does have a method to set
> that parameter, setMaxDeterminizedStates, but there is no Solr support for
> invoking that method.
>
> It is probably worth a Jira to get such support. Even then, the question is
> how Solr should respond to the exception that gets thrown when that limit
> is reached.
>
> Even if Solr had an option to disable complex wildcards, the question is
> what you want to happen when a complex wildcard is used - should an
> exception be thrown, or... what?
>
> I suppose it might be simplest to have a Solr option to limit the number of
> wildcard characters used in a term, like to 4 or 8 or something like that.
> IOW, have Solr check the term before the WildcardQuery is generated.
>
> -- Jack Krupansky
>
> On Thu, Jan 21, 2016 at 8:18 PM, Jian Mou <la...@gmail.com> wrote:
>
> > We are using Solr as our search engine, and recently notice some user
> > input wildcard query can lead to Solr dead loop in
> >
> > org.apache.lucene.util.automaton.Operations.determinize()
> >
> > , and it also eats memory and finally OOM.
> >
> > the wildcard query seems like **?????????-???????o·???è??**。
> >
> > Although we can validate the input parameter, but I also wonder is there
> > any configuration which can disable complex wildcard query like this
> which
> > lead to serve performance problems.
> >
> >
> > Related statcktrace
> >
> >
> > [image: Inline image 1]
> >
> >
> >
> > Thanks,
> >
> > Jian
> >
>

Re: One complex wildcard query lead solr OOM

Posted by Jack Krupansky <ja...@gmail.com>.
The Lucene WildcardQuery class does have an additional constructor that has
a maxDeterminizedStates parameter to limit the size of the FSM generated by
a wildcard queery, and the QueryParserBase class does have a method to set
that parameter, setMaxDeterminizedStates, but there is no Solr support for
invoking that method.

It is probably worth a Jira to get such support. Even then, the question is
how Solr should respond to the exception that gets thrown when that limit
is reached.

Even if Solr had an option to disable complex wildcards, the question is
what you want to happen when a complex wildcard is used - should an
exception be thrown, or... what?

I suppose it might be simplest to have a Solr option to limit the number of
wildcard characters used in a term, like to 4 or 8 or something like that.
IOW, have Solr check the term before the WildcardQuery is generated.

-- Jack Krupansky

On Thu, Jan 21, 2016 at 8:18 PM, Jian Mou <la...@gmail.com> wrote:

> We are using Solr as our search engine, and recently notice some user
> input wildcard query can lead to Solr dead loop in
>
> org.apache.lucene.util.automaton.Operations.determinize()
>
> , and it also eats memory and finally OOM.
>
> the wildcard query seems like **?????????-???????o·???è??**。
>
> Although we can validate the input parameter, but I also wonder is there
> any configuration which can disable complex wildcard query like this which
> lead to serve performance problems.
>
>
> Related statcktrace
>
>
> [image: Inline image 1]
>
>
>
> Thanks,
>
> Jian
>