You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Susheel Kumar <su...@gmail.com> on 2016/11/17 11:07:40 UTC

How to stop long running/memory eating query

Hello,

We found a query which was running forever and thus causing OOM (
q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL
world where we can watch currently executed queries and able to kill them.
This can be desiring feature in these situations and avoid whole cluster
going down. Is there any existing JIRA/can create one.

Also what would be the different ways we can examine and stop such queries
to execute.

Thanks,
Susheel

Re: How to stop long running/memory eating query

Posted by Erick Erickson <er...@gmail.com>.

Right. Each shard has to sort over 30M documents, ship the candidate
30M to the aggregator which sorts into the final top 10 (assuming
rows=10). gah...

You want to see either the cursorMark stuff or the export handler,
depending on whether the goal is to return one page at a time or the
entire set. Note that export has some restrictions (i.e. it only
returns docValues fields).

The cursormark capability was explicitly added to handle this case,
although it does _not_ handle something like "go to last page", rather
it handles paging through to the last page (which for something like
this would only be a program of some sort).

BTW, an interesting trick for "go to last page" is to reverse the sort
order, i.e. sort by score _ascending_. Then last becomes first......
In general, though, "go to last page" isn't all that useful
considering what it takes to support it.

https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Best,
Erick

On Thu, Nov 17, 2016 at 8:39 AM, Susheel Kumar <su...@gmail.com> wrote:
> Hi Erick, you got it.  I missed to put the rest of the query and the
> parameter which caused the issue is start parameter.  The start parameter
> for this query was put like 30+ milllion by the user due to bad UI design
> (deep pagination issue) and bringing the whole cluster down .
>
> Thnx
>
> On Thu, Nov 17, 2016 at 11:08 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> That query frankly doesn't seem like it'd lead to OOM or run for a
>> very long time unless there are (at least) hundreds of terms and a
>> _lot_ of documents. Or you're trying to return a zillion rows. Or
>> you're faceting on a high cardinality field. Or....
>>
>> The terms should be being kept in MMapDirectory space.
>>
>> My guess is that you aren't showing the part that's really causing the
>> problem, perhaps try peeling parts of the query off until you find the
>> culprit?
>>
>> And if you're sorting, faceting or the like docValues will help
>> prevent OOM problems.
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 17, 2016 at 7:17 AM, Davis, Daniel (NIH/NLM) [C]
>> <da...@nih.gov> wrote:
>> > Mikhail,
>> >
>> > If the query is not asynchronous, it would certainly be OK to stop the
>> long-running query if the client socket is disconnected.   I know that is a
>> feature of the niche indexer used in the products of www.indexengines.com,
>> because I wrote it.   We did not have asynchronous queries, and because of
>> the content and query-time deduplication, some queries could take hours
>> -that's 72 billion objects on a 2U box for you.   Hope they've added better
>> index-time deduplication by now.
>> >
>> > Thanks,
>> >
>> > -dan
>> >
>> > -----Original Message-----
>> > From: Mikhail Khludnev [mailto:mkhl@apache.org]
>> > Sent: Thursday, November 17, 2016 6:55 AM
>> > To: solr-user <so...@lucene.apache.org>
>> > Subject: Re: How to stop long running/memory eating query
>> >
>> > There is a circuit breaker
>> > https://cwiki.apache.org/confluence/display/solr/
>> Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
>> > If I'm right, it does not interrupt faceting.
>> >
>> > On Thu, Nov 17, 2016 at 2:07 PM, Susheel Kumar <su...@gmail.com>
>> > wrote:
>> >
>> >> Hello,
>> >>
>> >> We found a query which was running forever and thus causing OOM (
>> >> q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL
>> >> world where we can watch currently executed queries and able to kill
>> them.
>> >> This can be desiring feature in these situations and avoid whole
>> >> cluster going down. Is there any existing JIRA/can create one.
>> >>
>> >> Also what would be the different ways we can examine and stop such
>> >> queries to execute.
>> >>
>> >> Thanks,
>> >> Susheel
>> >>
>> >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>>

Re: How to stop long running/memory eating query

Posted by Susheel Kumar <su...@gmail.com>.

Hi Erick, you got it.  I missed to put the rest of the query and the
parameter which caused the issue is start parameter.  The start parameter
for this query was put like 30+ milllion by the user due to bad UI design
(deep pagination issue) and bringing the whole cluster down .

Thnx

On Thu, Nov 17, 2016 at 11:08 AM, Erick Erickson <er...@gmail.com>
wrote:

> That query frankly doesn't seem like it'd lead to OOM or run for a
> very long time unless there are (at least) hundreds of terms and a
> _lot_ of documents. Or you're trying to return a zillion rows. Or
> you're faceting on a high cardinality field. Or....
>
> The terms should be being kept in MMapDirectory space.
>
> My guess is that you aren't showing the part that's really causing the
> problem, perhaps try peeling parts of the query off until you find the
> culprit?
>
> And if you're sorting, faceting or the like docValues will help
> prevent OOM problems.
>
> Best,
> Erick
>
> On Thu, Nov 17, 2016 at 7:17 AM, Davis, Daniel (NIH/NLM) [C]
> <da...@nih.gov> wrote:
> > Mikhail,
> >
> > If the query is not asynchronous, it would certainly be OK to stop the
> long-running query if the client socket is disconnected.   I know that is a
> feature of the niche indexer used in the products of www.indexengines.com,
> because I wrote it.   We did not have asynchronous queries, and because of
> the content and query-time deduplication, some queries could take hours
> -that's 72 billion objects on a 2U box for you.   Hope they've added better
> index-time deduplication by now.
> >
> > Thanks,
> >
> > -dan
> >
> > -----Original Message-----
> > From: Mikhail Khludnev [mailto:mkhl@apache.org]
> > Sent: Thursday, November 17, 2016 6:55 AM
> > To: solr-user <so...@lucene.apache.org>
> > Subject: Re: How to stop long running/memory eating query
> >
> > There is a circuit breaker
> > https://cwiki.apache.org/confluence/display/solr/
> Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
> > If I'm right, it does not interrupt faceting.
> >
> > On Thu, Nov 17, 2016 at 2:07 PM, Susheel Kumar <su...@gmail.com>
> > wrote:
> >
> >> Hello,
> >>
> >> We found a query which was running forever and thus causing OOM (
> >> q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL
> >> world where we can watch currently executed queries and able to kill
> them.
> >> This can be desiring feature in these situations and avoid whole
> >> cluster going down. Is there any existing JIRA/can create one.
> >>
> >> Also what would be the different ways we can examine and stop such
> >> queries to execute.
> >>
> >> Thanks,
> >> Susheel
> >>
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>

Re: How to stop long running/memory eating query

Posted by Erick Erickson <er...@gmail.com>.

That query frankly doesn't seem like it'd lead to OOM or run for a
very long time unless there are (at least) hundreds of terms and a
_lot_ of documents. Or you're trying to return a zillion rows. Or
you're faceting on a high cardinality field. Or....

The terms should be being kept in MMapDirectory space.

My guess is that you aren't showing the part that's really causing the
problem, perhaps try peeling parts of the query off until you find the
culprit?

And if you're sorting, faceting or the like docValues will help
prevent OOM problems.

Best,
Erick

On Thu, Nov 17, 2016 at 7:17 AM, Davis, Daniel (NIH/NLM) [C]
<da...@nih.gov> wrote:
> Mikhail,
>
> If the query is not asynchronous, it would certainly be OK to stop the long-running query if the client socket is disconnected.   I know that is a feature of the niche indexer used in the products of www.indexengines.com, because I wrote it.   We did not have asynchronous queries, and because of the content and query-time deduplication, some queries could take hours -that's 72 billion objects on a 2U box for you.   Hope they've added better index-time deduplication by now.
>
> Thanks,
>
> -dan
>
> -----Original Message-----
> From: Mikhail Khludnev [mailto:mkhl@apache.org]
> Sent: Thursday, November 17, 2016 6:55 AM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: How to stop long running/memory eating query
>
> There is a circuit breaker
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
> If I'm right, it does not interrupt faceting.
>
> On Thu, Nov 17, 2016 at 2:07 PM, Susheel Kumar <su...@gmail.com>
> wrote:
>
>> Hello,
>>
>> We found a query which was running forever and thus causing OOM (
>> q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL
>> world where we can watch currently executed queries and able to kill them.
>> This can be desiring feature in these situations and avoid whole
>> cluster going down. Is there any existing JIRA/can create one.
>>
>> Also what would be the different ways we can examine and stop such
>> queries to execute.
>>
>> Thanks,
>> Susheel
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev

RE: How to stop long running/memory eating query

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

Mikhail,

If the query is not asynchronous, it would certainly be OK to stop the long-running query if the client socket is disconnected.   I know that is a feature of the niche indexer used in the products of www.indexengines.com, because I wrote it.   We did not have asynchronous queries, and because of the content and query-time deduplication, some queries could take hours -that's 72 billion objects on a 2U box for you.   Hope they've added better index-time deduplication by now.

Thanks,

-dan

-----Original Message-----
From: Mikhail Khludnev [mailto:mkhl@apache.org] 
Sent: Thursday, November 17, 2016 6:55 AM
To: solr-user <so...@lucene.apache.org>
Subject: Re: How to stop long running/memory eating query

There is a circuit breaker
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
If I'm right, it does not interrupt faceting.

On Thu, Nov 17, 2016 at 2:07 PM, Susheel Kumar <su...@gmail.com>
wrote:

> Hello,
>
> We found a query which was running forever and thus causing OOM ( 
> q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL 
> world where we can watch currently executed queries and able to kill them.
> This can be desiring feature in these situations and avoid whole 
> cluster going down. Is there any existing JIRA/can create one.
>
> Also what would be the different ways we can examine and stop such 
> queries to execute.
>
> Thanks,
> Susheel
>

--
Sincerely yours
Mikhail Khludnev

Re: How to stop long running/memory eating query

Posted by Mikhail Khludnev <mk...@apache.org>.

There is a circuit breaker
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
If I'm right, it does not interrupt faceting.

On Thu, Nov 17, 2016 at 2:07 PM, Susheel Kumar <su...@gmail.com>
wrote:

> Hello,
>
> We found a query which was running forever and thus causing OOM (
> q=+AND++AND+Tom+AND+Jerry...).  Is there any way similar to SQL/No SQL
> world where we can watch currently executed queries and able to kill them.
> This can be desiring feature in these situations and avoid whole cluster
> going down. Is there any existing JIRA/can create one.
>
> Also what would be the different ways we can examine and stop such queries
> to execute.
>
> Thanks,
> Susheel
>

-- 
Sincerely yours
Mikhail Khludnev