You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kshitij tyagi <ks...@gmail.com> on 2020/12/18 11:23:20 UTC

solrCloud client socketTimeout initiates retries

Hi,

We have a Solrcloud setup and are using CloudSolrClient, What we are seeing
is if socketTimeoutOccurs then the same request is sent to other solr
server.

So if I set socketTimeout to a very low value say 100ms and my query takes
around 200ms then client tries to query second server, then next and so
on(basically all available servers with same query).

I see that we have *numServersToTry* in LBSolrClient class but not able to
set this using CloudSolrClient. Using this we can restrict the above
feature.

Should a jira be created to support numServersToTry by CloudSolrClient? Or
is there any other way to control the request to other solr servers?.

Regards,
kshitij

Re: solrCloud client socketTimeout initiates retries

Posted by kshitij tyagi <ks...@gmail.com>.
Hi erick,

Thanks. Yes we will be upgrading soon to 8.8
till we upgrade we are increasing socket timeout and it helps for time
being to some extent.

regards,
kshitij

On Fri, Dec 18, 2020 at 7:48 PM Erick Erickson <er...@gmail.com>
wrote:

> Right, there are several alternatives. Try going here:
> http://jirasearch.mikemccandless.com/search.py?index=jira
>
> and search for “circuit breaker” and you’ll find a bunch
> of JIRAs. Unfortunately, some are in 8.8..
>
> That said, some of the circuit breakers are in much earlier
> releases. Would it suffice until you can upgrade to set
> the circuit breakers?
>
> One problem with your solution is that the query keeps
> on running, admittedly on only one replica of each shard.
> With circuit breakers, the query itself is stoped, thus freeing
> up resources.
>
> Additionally, if you see a pattern (for instance, certain
> wildcard patterns) you could intercept that before sending.
>
> Best,
> Erick
>
> > On Dec 18, 2020, at 8:52 AM, kshitij tyagi <ks...@gmail.com>
> wrote:
> >
> > Hi Erick,
> >
> > I agree but in a huge cluster the retries keeps on happening, cant we
> have
> > this feature implemented in client.
> > i was referring to this jira
> > https://issues.apache.org/jira/browse/SOLR-10479
> > We have seen that some malicious queries come to system which takes
> > significant time and these queries propagating to other solr servers
> choke
> > the entire cluster.
> >
> > Regards,
> > kshitij
> >
> >
> >
> >
> >
> > On Fri, Dec 18, 2020 at 7:12 PM Erick Erickson <er...@gmail.com>
> > wrote:
> >
> >> Why do you want to do this? This sounds like an XY problem, you
> >> think you’re going to solve some problem X by doing Y. Y in this case
> >> is setting the numServersToTry, but you haven’t explained what X,
> >> the problem you’re trying to solve is.
> >>
> >> Offhand, this seems like a terrible idea. If you’re requests are timing
> >> out, what purpose is served by _not_ trying the next one on the
> >> list? With, of course, a much longer timeout interval…
> >>
> >> The code is structured that way on the theory that you want the request
> >> to succeed and the system needs to be tolerant of momentary
> >> glitches due to network congestion, reading indexes into memory, etc.
> >> Bypassing that assumption needs some justification….
> >>
> >> Best,
> >> Erick
> >>
> >>> On Dec 18, 2020, at 6:23 AM, kshitij tyagi <ks...@gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We have a Solrcloud setup and are using CloudSolrClient, What we are
> >> seeing
> >>> is if socketTimeoutOccurs then the same request is sent to other solr
> >>> server.
> >>>
> >>> So if I set socketTimeout to a very low value say 100ms and my query
> >> takes
> >>> around 200ms then client tries to query second server, then next and so
> >>> on(basically all available servers with same query).
> >>>
> >>> I see that we have *numServersToTry* in LBSolrClient class but not able
> >> to
> >>> set this using CloudSolrClient. Using this we can restrict the above
> >>> feature.
> >>>
> >>> Should a jira be created to support numServersToTry by CloudSolrClient?
> >> Or
> >>> is there any other way to control the request to other solr servers?.
> >>>
> >>> Regards,
> >>> kshitij
> >>
> >>
>
>

Re: solrCloud client socketTimeout initiates retries

Posted by Erick Erickson <er...@gmail.com>.
Right, there are several alternatives. Try going here:
http://jirasearch.mikemccandless.com/search.py?index=jira

and search for “circuit breaker” and you’ll find a bunch
of JIRAs. Unfortunately, some are in 8.8..

That said, some of the circuit breakers are in much earlier
releases. Would it suffice until you can upgrade to set
the circuit breakers?

One problem with your solution is that the query keeps
on running, admittedly on only one replica of each shard.
With circuit breakers, the query itself is stoped, thus freeing
up resources.

Additionally, if you see a pattern (for instance, certain
wildcard patterns) you could intercept that before sending.

Best,
Erick

> On Dec 18, 2020, at 8:52 AM, kshitij tyagi <ks...@gmail.com> wrote:
> 
> Hi Erick,
> 
> I agree but in a huge cluster the retries keeps on happening, cant we have
> this feature implemented in client.
> i was referring to this jira
> https://issues.apache.org/jira/browse/SOLR-10479
> We have seen that some malicious queries come to system which takes
> significant time and these queries propagating to other solr servers choke
> the entire cluster.
> 
> Regards,
> kshitij
> 
> 
> 
> 
> 
> On Fri, Dec 18, 2020 at 7:12 PM Erick Erickson <er...@gmail.com>
> wrote:
> 
>> Why do you want to do this? This sounds like an XY problem, you
>> think you’re going to solve some problem X by doing Y. Y in this case
>> is setting the numServersToTry, but you haven’t explained what X,
>> the problem you’re trying to solve is.
>> 
>> Offhand, this seems like a terrible idea. If you’re requests are timing
>> out, what purpose is served by _not_ trying the next one on the
>> list? With, of course, a much longer timeout interval…
>> 
>> The code is structured that way on the theory that you want the request
>> to succeed and the system needs to be tolerant of momentary
>> glitches due to network congestion, reading indexes into memory, etc.
>> Bypassing that assumption needs some justification….
>> 
>> Best,
>> Erick
>> 
>>> On Dec 18, 2020, at 6:23 AM, kshitij tyagi <ks...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> We have a Solrcloud setup and are using CloudSolrClient, What we are
>> seeing
>>> is if socketTimeoutOccurs then the same request is sent to other solr
>>> server.
>>> 
>>> So if I set socketTimeout to a very low value say 100ms and my query
>> takes
>>> around 200ms then client tries to query second server, then next and so
>>> on(basically all available servers with same query).
>>> 
>>> I see that we have *numServersToTry* in LBSolrClient class but not able
>> to
>>> set this using CloudSolrClient. Using this we can restrict the above
>>> feature.
>>> 
>>> Should a jira be created to support numServersToTry by CloudSolrClient?
>> Or
>>> is there any other way to control the request to other solr servers?.
>>> 
>>> Regards,
>>> kshitij
>> 
>> 


Re: solrCloud client socketTimeout initiates retries

Posted by kshitij tyagi <ks...@gmail.com>.
Hi Erick,

I agree but in a huge cluster the retries keeps on happening, cant we have
this feature implemented in client.
 i was referring to this jira
https://issues.apache.org/jira/browse/SOLR-10479
We have seen that some malicious queries come to system which takes
significant time and these queries propagating to other solr servers choke
the entire cluster.

Regards,
kshitij





On Fri, Dec 18, 2020 at 7:12 PM Erick Erickson <er...@gmail.com>
wrote:

> Why do you want to do this? This sounds like an XY problem, you
> think you’re going to solve some problem X by doing Y. Y in this case
> is setting the numServersToTry, but you haven’t explained what X,
> the problem you’re trying to solve is.
>
> Offhand, this seems like a terrible idea. If you’re requests are timing
> out, what purpose is served by _not_ trying the next one on the
> list? With, of course, a much longer timeout interval…
>
> The code is structured that way on the theory that you want the request
> to succeed and the system needs to be tolerant of momentary
> glitches due to network congestion, reading indexes into memory, etc.
> Bypassing that assumption needs some justification….
>
> Best,
> Erick
>
> > On Dec 18, 2020, at 6:23 AM, kshitij tyagi <ks...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > We have a Solrcloud setup and are using CloudSolrClient, What we are
> seeing
> > is if socketTimeoutOccurs then the same request is sent to other solr
> > server.
> >
> > So if I set socketTimeout to a very low value say 100ms and my query
> takes
> > around 200ms then client tries to query second server, then next and so
> > on(basically all available servers with same query).
> >
> > I see that we have *numServersToTry* in LBSolrClient class but not able
> to
> > set this using CloudSolrClient. Using this we can restrict the above
> > feature.
> >
> > Should a jira be created to support numServersToTry by CloudSolrClient?
> Or
> > is there any other way to control the request to other solr servers?.
> >
> > Regards,
> > kshitij
>
>

Re: solrCloud client socketTimeout initiates retries

Posted by Erick Erickson <er...@gmail.com>.
Why do you want to do this? This sounds like an XY problem, you
think you’re going to solve some problem X by doing Y. Y in this case
is setting the numServersToTry, but you haven’t explained what X,
the problem you’re trying to solve is.

Offhand, this seems like a terrible idea. If you’re requests are timing
out, what purpose is served by _not_ trying the next one on the
list? With, of course, a much longer timeout interval…

The code is structured that way on the theory that you want the request
to succeed and the system needs to be tolerant of momentary
glitches due to network congestion, reading indexes into memory, etc.
Bypassing that assumption needs some justification….

Best,
Erick

> On Dec 18, 2020, at 6:23 AM, kshitij tyagi <ks...@gmail.com> wrote:
> 
> Hi,
> 
> We have a Solrcloud setup and are using CloudSolrClient, What we are seeing
> is if socketTimeoutOccurs then the same request is sent to other solr
> server.
> 
> So if I set socketTimeout to a very low value say 100ms and my query takes
> around 200ms then client tries to query second server, then next and so
> on(basically all available servers with same query).
> 
> I see that we have *numServersToTry* in LBSolrClient class but not able to
> set this using CloudSolrClient. Using this we can restrict the above
> feature.
> 
> Should a jira be created to support numServersToTry by CloudSolrClient? Or
> is there any other way to control the request to other solr servers?.
> 
> Regards,
> kshitij