You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Will Miller <wm...@fbbrands.com> on 2017/04/03 20:46:55 UTC

Ping handler queuing/problem?

I have 3 different SolrCloud clusters that share a single set (3) of zookeeper servers. Each SolrCloud cluster has its own set of collections stored on Zookeeper. Twice in the past week all 3 clusters have had about a 1 minute period where all requests stopped coming in. Solr recovers and further requests are fine.

Inbound connections to the SolrCloud clusters are via load balancers. One of the clusters uses the /Ping request handler, the other two use TCP checks on port 8983 (yes, this is not ideal). The logs from the 2 clusters that use TCP for health check don't have anything notable in the logs except a lack of inbound requests. The logs from the cluster that has the health check to the Ping handler shows something odd at the time of the issue:

I cleaned out all requests and mutl-line errors and sorted alphabetically to line up the timestamps. You can see the Ping requests come in every 5 seconds then the Ping response for 9 of them seem to queue up inside of Solr which then causes the VMs to come out of the load balancer and stop taking requests:

INFO  - 2017-04-03 15:08:03.255; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:03.255; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=0
INFO  - 2017-04-03 15:08:08.256; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:13.254; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:18.247; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:23.250; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:28.256; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:33.263; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:38.274; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:43.266; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:48.279; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:53.289; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=10922
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=15914
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=20925
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=25932
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=30938
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=35941
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=40934
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=5909
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=899
INFO  - 2017-04-03 15:08:54.204; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=45948
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter; Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter; Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter; Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter; Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter; Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter; Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter; Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter; Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter; Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:59.207; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:59.207; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=0
INFO  - 2017-04-03 15:09:04.215; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:09:04.229; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=14
INFO  - 2017-04-03 15:09:09.229; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:09:09.229; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore; [products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=0

I can provide more log entries before and after showing commits, requests and index updates if needed. The same pattern can be seen across all 8 VMs though at roughly the same time. This feels like it could be load balancer or network related but wanted to run it by everyone to see if this potentially could be something on Solr or Zookeeper side.

The ping request handler is:

<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
   <lst name="invariants">
     <str name="qt">/select</str><!-- handler to delegate to -->
     <str name="q">*:*</str>
   </lst>
</requestHandler>

Thanks,
Will