You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Michael Gibney (Jira)" <ji...@apache.org> on 2022/03/29 17:47:00 UTC
[jira] [Comment Edited] (SOLR-15045) 2x latency of synchronous commits due to serial execution on local and distributed leaders
[ https://issues.apache.org/jira/browse/SOLR-15045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514249#comment-17514249 ]
Michael Gibney edited comment on SOLR-15045 at 3/29/22, 5:46 PM:
-----------------------------------------------------------------
I would be comfortable now to backport this to {{branch_9x}} (targeting 9.1 release). [~dsmiley], you had initially suggested letting this "bake" on {{main}}; do you see any reason to continue to wait?
was (Author: mgibney):
I would you be comfortable now to backport this to {{branch_9x}} (targeting 9.1 release). [~dsmiley], you had initially suggested letting this "bake" on {{main}}; do you see any reason to continue to wait?
> 2x latency of synchronous commits due to serial execution on local and distributed leaders
> ------------------------------------------------------------------------------------------
>
> Key: SOLR-15045
> URL: https://issues.apache.org/jira/browse/SOLR-15045
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 8.5.2
> Environment: Operating system: Linux (centos 7.7.1908)
> Reporter: Raj Yadav
> Assignee: Michael Gibney
> Priority: Major
> Fix For: main (10.0)
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Hi All,
> When we issue commit through curl command, not all the shards are getting `start commit` requests at the same time.
> *Solr Setup Detail : (Running in solrCloud mode)*
> It has 6 shards, and each shard has only one replica (which is also a
> leader) and the replica type is NRT.
> Each shards are hosted on the separate physical host.
> Zookeeper => We are using external zookeeper ensemble (3 separate node
> cluster)
> *Shard and Host name*
> shard1_0=>solr_199
> shard1_1=>solr_200
> shard2_0=> solr_254
> shard2_1=> solr_132
> shard3_0=>solr_133
> shard3_1=>solr_198
> *Request rate on the system is currently zero and only hourly indexing*
> *running on it.*
> We are using curl command to issue commit.
> {code:java}
> curl
> "http://solr_254:8389/solr/my_collection/update?openSearcher=true&commit=true&wt=json"{code}
> (Using solr_254 host to issue commit)
> On using the above command all the shards have started processing commit (i.e
> getting `start commit` request) except the one used in curl command (i.e
> shard2_0 which is hosted on solr_254). Individually each shards takes around
> 10 to 12 min to process hard commit (most of this time is spent on reloading
> external files).
> As per logs, shard2_0 is getting `start commit` request after 10 minutes
> (approx). This leads to following timeout error.
> {code:java}
> 2020-12-06 18:47:47.013 ERROR
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://solr_132:9744/solr/my_collection_shard2_1_replica_n21/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_254%3A9744%2Fsolr%2Fmy_collection_shard2_0_replica_n11%2F
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753)
> at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369)
> at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
> at
> org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344)
> at
> org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.TimeoutException
> at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398)
> ... 13 more{code}
> Above timeout error is between solr_254 and solr_132. Similar errors are
> there between solr_254 and other 4 shards
> Since query load is zero, mostly CPU utilization is around 3%.
> After issuing curl commit command, CPU goes up to 14% on all shards except
> shard2_0 (host: solr_254, the one used in curl command).
> And after 10 minutes (i.e after getting the `start commit` request) CPU on
> shard2_0 also goes up to 14%.
> As I mentioned earlier each shards take around 10-12 mins to process commit
> and due to delay in starting commit process on one shard (shard2_0) our
> overall commit time is doubled now. (22-24 minutes approx).
> *We are observing this delay in both hard and soft commit.*
> In our solr-5.4.0(having similar setup), we use the similar curl command to issue commit, and there all the shards are getting `start commit` request at same time. Including the one used in curl command.
>
> *Impact After deleting external files:*
> In order to nullify the impact of external files, I had deleted external
> files from all the shards and issued commit through the curl command. Commit
> operation got completed in 3 seconds. Individual shards took 1.5 seconds to
> complete the commit operation. But there was a delay of around 1.5 seconds
> on the shard whose hostname was used to issue the commit. Hence overall
> commit time is 3 seconds.
> During this operation, there was no timeout or any other kind of error
> (except `external file not found` error which is expected).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org