You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wei <we...@gmail.com> on 2019/10/07 17:18:39 UTC

How to block expensive solr queries

Hi,

Recently we encountered a problem when solr cloud query latency suddenly
increase, many simple queries that has small recall gets time out. After
digging a bit I found that the root cause is some stats queries happen at
the same time, such as

/solr/mycollection/select?stats=true&stats.field=unique_ids&stats.calcdistinct=true



I see unique_ids is a high cardinality field so this query is quite
expensive. But why a small volume of such query blocks other queries and
make simple queries time out?  I checked the solr thread pool and see there
are plenty of idle threads available.  We are using solr 7.6.2 with a 10
shard cloud set up.

Is there a way to block certain solr queries based on url pattern? i.e.
ignore the stats.calcdistinct request in this case.


Thanks,

Wei

Re: How to block expensive solr queries

Posted by Mikhail Khludnev <mk...@apache.org>.
It's worth to raise an issue for supporting timeAllowed for stats. Until
it's done, something like jetty filter is only an option,

On Tue, Oct 8, 2019 at 12:34 AM Wei <we...@gmail.com> wrote:

> Hi Mikhail,
>
> Yes I have the timeAllowed parameter configured, still is this case it
> doesn't seem to prevent the stats request from blocking other normal
> queries.  Is it possible to drop the request before solr executes it? maybe
> at the jetty request filter?
>
> Thanks,
> Wei
>
> On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
> > Hello, Wei.
> >
> > Have you tried to abandon heavy queries with
> >
> >
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
> >  ?
> > It may or may not be able to stop stats.
> >
> >
> https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
> > can clarify it.
> >
> > On Mon, Oct 7, 2019 at 8:19 PM Wei <we...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Recently we encountered a problem when solr cloud query latency
> suddenly
> > > increase, many simple queries that has small recall gets time out.
> After
> > > digging a bit I found that the root cause is some stats queries happen
> at
> > > the same time, such as
> > >
> > >
> > >
> >
> /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.calcdistinct=true
> > >
> > >
> > >
> > > I see unique_ids is a high cardinality field so this query is quite
> > > expensive. But why a small volume of such query blocks other queries
> and
> > > make simple queries time out?  I checked the solr thread pool and see
> > there
> > > are plenty of idle threads available.  We are using solr 7.6.2 with a
> 10
> > > shard cloud set up.
> > >
> > > Is there a way to block certain solr queries based on url pattern? i.e.
> > > ignore the stats.calcdistinct request in this case.
> > >
> > >
> > > Thanks,
> > >
> > > Wei
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev

Re: How to block expensive solr queries

Posted by Wei <we...@gmail.com>.
Hi Mikhail,

Yes I have the timeAllowed parameter configured, still is this case it
doesn't seem to prevent the stats request from blocking other normal
queries.  Is it possible to drop the request before solr executes it? maybe
at the jetty request filter?

Thanks,
Wei

On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev <mk...@apache.org> wrote:

> Hello, Wei.
>
> Have you tried to abandon heavy queries with
>
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
>  ?
> It may or may not be able to stop stats.
>
> https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
> can clarify it.
>
> On Mon, Oct 7, 2019 at 8:19 PM Wei <we...@gmail.com> wrote:
>
> > Hi,
> >
> > Recently we encountered a problem when solr cloud query latency suddenly
> > increase, many simple queries that has small recall gets time out. After
> > digging a bit I found that the root cause is some stats queries happen at
> > the same time, such as
> >
> >
> >
> /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.calcdistinct=true
> >
> >
> >
> > I see unique_ids is a high cardinality field so this query is quite
> > expensive. But why a small volume of such query blocks other queries and
> > make simple queries time out?  I checked the solr thread pool and see
> there
> > are plenty of idle threads available.  We are using solr 7.6.2 with a 10
> > shard cloud set up.
> >
> > Is there a way to block certain solr queries based on url pattern? i.e.
> > ignore the stats.calcdistinct request in this case.
> >
> >
> > Thanks,
> >
> > Wei
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: How to block expensive solr queries

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Wei.

Have you tried to abandon heavy queries with
https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
 ?
It may or may not be able to stop stats.
https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
can clarify it.

On Mon, Oct 7, 2019 at 8:19 PM Wei <we...@gmail.com> wrote:

> Hi,
>
> Recently we encountered a problem when solr cloud query latency suddenly
> increase, many simple queries that has small recall gets time out. After
> digging a bit I found that the root cause is some stats queries happen at
> the same time, such as
>
>
> /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.calcdistinct=true
>
>
>
> I see unique_ids is a high cardinality field so this query is quite
> expensive. But why a small volume of such query blocks other queries and
> make simple queries time out?  I checked the solr thread pool and see there
> are plenty of idle threads available.  We are using solr 7.6.2 with a 10
> shard cloud set up.
>
> Is there a way to block certain solr queries based on url pattern? i.e.
> ignore the stats.calcdistinct request in this case.
>
>
> Thanks,
>
> Wei
>


-- 
Sincerely yours
Mikhail Khludnev

Re: How to block expensive solr queries

Posted by Wei <we...@gmail.com>.
On Wed, Oct 9, 2019 at 9:59 AM Wei <we...@gmail.com> wrote:

> Thanks all. I debugged a bit and see timeAllowed does not limit stats
> call. Also I think it would be useful for solr to support a white list or
> black list of operations as Toke suggested. Will create jira for it.
> Currently seems the only option to explore is adding filter to solr's
> embedded jetty.  Does anyone have experience doing that? Do I also need to
> change SolrDispatchFilter?
>
> On Tue, Oct 8, 2019 at 3:50 AM Toke Eskildsen <to...@kb.dk> wrote:
>
>> On Mon, 2019-10-07 at 10:18 -0700, Wei wrote:
>> > /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.cal
>> > cdistinct=true
>> ...
>> > Is there a way to block certain solr queries based on url pattern?
>> > i.e. ignore the stats.calcdistinct request in this case.
>>
>> It sounds like it is possible for users to issue arbitrary queries
>> against your Solr installation. As you have noticed, it makes it easy
>> to perform a Denial Of Service (intentional or not). Filtering out
>> stats.calcdistinct won't help with the next request for
>> group.ngroups=true, facet.field=unique_id&facet.limit=100000000,
>> rows=100000000 or something fifth.
>>
>> I recommend you flip your logic and only allow specific types of
>> requests and put limits on those. To my knowledge that is not a build-
>> in feature of Solr.
>>
>> - Toke Eskildsem, Royal Danish Library
>>
>>
>>

Re: How to block expensive solr queries

Posted by Toke Eskildsen <to...@kb.dk>.
On Mon, 2019-10-07 at 10:18 -0700, Wei wrote:
> /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.cal
> cdistinct=true
...
> Is there a way to block certain solr queries based on url pattern?
> i.e. ignore the stats.calcdistinct request in this case.

It sounds like it is possible for users to issue arbitrary queries
against your Solr installation. As you have noticed, it makes it easy
to perform a Denial Of Service (intentional or not). Filtering out
stats.calcdistinct won't help with the next request for
group.ngroups=true, facet.field=unique_id&facet.limit=100000000,
rows=100000000 or something fifth.

I recommend you flip your logic and only allow specific types of
requests and put limits on those. To my knowledge that is not a build-
in feature of Solr.

- Toke Eskildsem, Royal Danish Library