You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Harkins <mp...@case.edu> on 2016/04/10 18:28:15 UTC

Limiting regex queries

Hey all,

I am using lucene and solr version 4.2, and was wondering what would
be the best way to not allow regex queries with very large numbers.
Something like blah{1234567} or blah{1234, 123445678}

Re: Limiting regex queries

Posted by Erick Erickson <er...@gmail.com>.
There's some ability to time-limit queries so they
stop after a specified time. That does not do any
cost analysis ahead of time though.

Periodically there's some interest in a way to
short-circuit "expensive" queries through some
kind of query plan, but nothing committed yet.

Yeah, basically the underlying query has to enumerate
all the terms between the two values and create
a huge OR clause (it's more efficient than that, but
that's the conceptual task). Don't know of a good
automatic way to do that.

Best,
Erick

On Sun, Apr 10, 2016 at 2:38 PM, Michael Harkins <mp...@case.edu> wrote:
> Well the originally architecture is out of my hands , but when someone
> sends in a query like that, if the range is a large number , my system
> basically shuts down and the cpu spikes with a large increase in
> memory usage. The queries are for strings. The query itself was an
> accident but I want to be able to prevent an accident from bringing
> down the index.
>
>
>> On Apr 10, 2016, at 12:34 PM, Erick Erickson <er...@gmail.com> wrote:
>>
>> OK, why is this a problem? This smells like an XY problem,
>> you want to take some specific action, but it's not at all
>> clear what the problem is. There might be other ways
>> of doing this.
>>
>> If you're allowing regexes on numeric fields, using real
>> number fields (trie) and using range queries is a much
>> better way to go.
>>
>> Best,
>> Erick
>>
>>> On Sun, Apr 10, 2016 at 9:28 AM, Michael Harkins <mp...@case.edu> wrote:
>>> Hey all,
>>>
>>> I am using lucene and solr version 4.2, and was wondering what would
>>> be the best way to not allow regex queries with very large numbers.
>>> Something like blah{1234567} or blah{1234, 123445678}

Re: Limiting regex queries

Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Michael,

I suggest to wrap the query parser you're using now with a custom one.
That's should help to handle the case where the query has a range with a
large number.

I did something like that with Edismax.

https://github.com/freedev/solr-synonyms-query-parser-plugin

Take a look at the createParser parser method.

When createParser is execute you can choose if you want rewrite the query
parameters or use a custom list.

Hope this helps,
Vincenzo


On Sun, Apr 10, 2016 at 11:38 PM, Michael Harkins <mp...@case.edu> wrote:

> Well the originally architecture is out of my hands , but when someone
> sends in a query like that, if the range is a large number , my system
> basically shuts down and the cpu spikes with a large increase in
> memory usage. The queries are for strings. The query itself was an
> accident but I want to be able to prevent an accident from bringing
> down the index.
>
>
> > On Apr 10, 2016, at 12:34 PM, Erick Erickson <er...@gmail.com>
> wrote:
> >
> > OK, why is this a problem? This smells like an XY problem,
> > you want to take some specific action, but it's not at all
> > clear what the problem is. There might be other ways
> > of doing this.
> >
> > If you're allowing regexes on numeric fields, using real
> > number fields (trie) and using range queries is a much
> > better way to go.
> >
> > Best,
> > Erick
> >
> >> On Sun, Apr 10, 2016 at 9:28 AM, Michael Harkins <mp...@case.edu>
> wrote:
> >> Hey all,
> >>
> >> I am using lucene and solr version 4.2, and was wondering what would
> >> be the best way to not allow regex queries with very large numbers.
> >> Something like blah{1234567} or blah{1234, 123445678}
>



-- 
Vincenzo D'Amore
email: v.damore@gmail.com
skype: free.dev
mobile: +39 349 8513251

Re: Limiting regex queries

Posted by Michael Harkins <mp...@case.edu>.
Well the originally architecture is out of my hands , but when someone
sends in a query like that, if the range is a large number , my system
basically shuts down and the cpu spikes with a large increase in
memory usage. The queries are for strings. The query itself was an
accident but I want to be able to prevent an accident from bringing
down the index.


> On Apr 10, 2016, at 12:34 PM, Erick Erickson <er...@gmail.com> wrote:
>
> OK, why is this a problem? This smells like an XY problem,
> you want to take some specific action, but it's not at all
> clear what the problem is. There might be other ways
> of doing this.
>
> If you're allowing regexes on numeric fields, using real
> number fields (trie) and using range queries is a much
> better way to go.
>
> Best,
> Erick
>
>> On Sun, Apr 10, 2016 at 9:28 AM, Michael Harkins <mp...@case.edu> wrote:
>> Hey all,
>>
>> I am using lucene and solr version 4.2, and was wondering what would
>> be the best way to not allow regex queries with very large numbers.
>> Something like blah{1234567} or blah{1234, 123445678}

Re: Limiting regex queries

Posted by Erick Erickson <er...@gmail.com>.
OK, why is this a problem? This smells like an XY problem,
you want to take some specific action, but it's not at all
clear what the problem is. There might be other ways
of doing this.

If you're allowing regexes on numeric fields, using real
number fields (trie) and using range queries is a much
better way to go.

Best,
Erick

On Sun, Apr 10, 2016 at 9:28 AM, Michael Harkins <mp...@case.edu> wrote:
> Hey all,
>
> I am using lucene and solr version 4.2, and was wondering what would
> be the best way to not allow regex queries with very large numbers.
> Something like blah{1234567} or blah{1234, 123445678}