You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Munendra S N (Jira)" <ji...@apache.org> on 2020/03/06 06:10:00 UTC

[jira] [Commented] (SOLR-13765) Deadlock on Solr cloud request causing 'Too many open files' error

    [ https://issues.apache.org/jira/browse/SOLR-13765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053061#comment-17053061 ] 

Munendra S N commented on SOLR-13765:
-------------------------------------

This issue is resolved in SOLR-13793 by limiting the number of forwarding request to total number of replicas. 


{code:java}
Not sure what the purpose of finding an inactive node to handle request in HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem
Context for this change is provided in SOLR-4553 and SOLR-13793
[~ichattopadhyaya] [~vincewu] should this issue be resolved?
This fix is also backported to 7_7. So, if there is 7.7.3 it will contain the fix

> Deadlock on Solr cloud request causing 'Too many open files' error
> ------------------------------------------------------------------
>
>                 Key: SOLR-13765
>                 URL: https://issues.apache.org/jira/browse/SOLR-13765
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 7.7.2
>            Reporter: Lei Wu
>            Priority: Major
>
> Hi there,
> We are seeing an issue about deadlock on Solr cloud request. 
> Say we have a collection with one shard and two replicas for that shard. For whatever reason the cluster appears to be active but each individual replica is down. And when a request comes in, Solr (replica 1) tries to find a remote node (replica 2) to handle the request since the local core (replica 1) is down and when the other node (replica 2) receives the request it does the same to forward the request back to the original node (replica 1). This causes deadlock and eventually uses all the socket causing `{color:#FF0000}Too many open files{color}`.
> Not sure what the purpose of finding an inactive node to handle request in HttpSolrCall.getRemoteCoreUrl but taking that out seems to fix the problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org