You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2022/03/14 23:22:00 UTC

[jira] [Created] (SOLR-16099) HTTP Client threads can hang in Jetty's InputStreamResponseListener when using HTTP2 - impacts intra-node communication

Chris M. Hostetter created SOLR-16099:
-----------------------------------------

             Summary: HTTP Client threads can hang in Jetty's InputStreamResponseListener when using HTTP2 - impacts intra-node communication
                 Key: SOLR-16099
                 URL: https://issues.apache.org/jira/browse/SOLR-16099
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud, SolrJ
            Reporter: Chris M. Hostetter


There apearrs to be a Jetty HttpClient bug that makes it possible for a request thread to hang indefinitely while waiting to parse the response from remote jetty servers. The cause of the hung thread is because it calls {{.wait()}} on monitor lock that _should_ be notified by another (internal jetty client) thread when a chunk of data is available from the wire – but in some cases this evidently may not happen.

In the case of {{distrib=true}} requests processed by Solr (aggregating multiple per-shard responses from other nodes) this can manifest with stack traces that look like the following (taken from Solr 8.8.2)...
{noformat}
 "thread",{
   "id":14253,
   "name":"httpShardExecutor-7-thread-13819-...",
   "state":"WAITING",
   "lock":"org.eclipse.jetty.client.util.InputStreamResponseListener@12b59075",
   "lock-waiting":{
     "name":"org.eclipse.jetty.client.util.InputStreamResponseListener@12b59075",
     "owner":null},
   "synchronizers-locked":["java.util.concurrent.ThreadPoolExecutor$Worker@1ec1aed0"],
   "cpuTime":"65.4882ms",
   "userTime":"60.0000ms",
   "stackTrace":["java.base@11.0.14/java.lang.Object.wait(Native Method)",
     "java.base@11.0.14/java.lang.Object.wait(Unknown Source)",
     "org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:318)",
     "org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:90)",
     "org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:99)",
     "org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:217)",
     "org.apache.solr.common.util.JavaBinCodec._init(JavaBinCodec.java:211)",
     "org.apache.solr.common.util.JavaBinCodec.initRead(JavaBinCodec.java:202)",
     "org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:195)",
     "org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:51)",
     "org.apache.solr.client.solrj.impl.Http2SolrClient.processErrorsAndResponse(Http2SolrClient.java:696)",
     "org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:412)",
     "org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:761)",
     "org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369)",
     "org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297)",
     "org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(HttpShardHandlerFactory.java:371)",
     "org.apache.solr.handler.component.ShardRequestor.call(ShardRequestor.java:132)",
     "org.apache.solr.handler.component.ShardRequestor.call(ShardRequestor.java:41)",
     "java.base@11.0.14/java.util.concurrent.FutureTask.run(Unknown Source)",
     "java.base@11.0.14/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)",
     "java.base@11.0.14/java.util.concurrent.FutureTask.run(Unknown Source)",
     "com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)",
     "org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)",
     "org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$175/0x0000000840243c40.run(Unknown Source)",
     "java.base@11.0.14/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)",
     "java.base@11.0.14/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)",
     "java.base@11.0.14/java.lang.Thread.run(Unknown Source)"]},
{noformat}
...these {{httpShardExecutor}} threads can stay hung, tying up system resources, indefinitely (unless they get a spuriuos {{notify()}} from the JVM). (In once case, it seems to have caused a request to hang for {*}10.3 hours{*})

Anecdotally:
 * There is some evidence that this problem did _*NOT*_ affect Solr 8.6.3, but does affect later versions
 ** suggesting the bug didn't exist in Jetty until _after_ 9.4.27.v20200227
 * Forcing the Jetty HttpClient to use HTTP1.1 transport seems to prevent this problem from happening
 ** In Solr this can be done by setting the {{"solr.http1"}} system property
 ** Or using the {{Http2SolrClient.Builder.useHttp1_1()}} method in client application code



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org