You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sebastian Lutter <se...@pixolution.de> on 2020/12/03 11:12:17 UTC

ConcurrentUpdateSolrClient stall prevention bug in Solr 8.4+

Hi!

I run a three nodes Solr 8.5.1 cluster and experienced a bug when
updating the index: (adding document)

{
  "responseHeader":{
    "rf":3,
    "status":500,
    "QTime":22938},
  "error":{
    "msg":"Task queue processing has stalled for 20205 ms with 0 remaining elements to process.",
    "trace":"java.io.IOException: Task queue processing has stalled for 20205 ms with 0 remaining elements to process.\n\tat org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.blockUntilFinished(ConcurrentUpdateHttp2SolrClient.java:501)\n\tat org.apache.solr.update.StreamingSolrClients.blockUntilFinished(StreamingSolrClients.java:87)\n\tat org.apache.solr.update.SolrCmdDistributor.blockAndDoRetries(SolrCmdDistributor.java:265)\n\tat org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:251)\n\tat org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:201)\n\tat org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:72)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\tat java.base/java.lang.Thread.run(Unknown Source)\n",
    "code":500}}

I have no routine yet to reproduce the issue, this happened in a
production environment.

While searching for the error I found the following other resources
dealing with the same issue:

* https://github.com/clarin-eric/VLO/issues/291
<https://github.com/clarin-eric/VLO/issues/291>

* https://lucene.472066.n3.nabble.com/Delete-on-8-5-1-td4457382.html
<https://lucene.472066.n3.nabble.com/Delete-on-8-5-1-td4457382.html>

*
https://www.mail-archive.com/solr-user@lucene.apache.org/msg152993.html
<https://www.mail-archive.com/solr-user@lucene.apache.org/msg152993.html>


I think the problem is in Solr since this feature has been added:

* https://issues.apache.org/jira/browse/SOLR-13975
<https://issues.apache.org/jira/browse/SOLR-13975>


It happens sporadically, the next time the exact same update request
works without problems. Also the update of the index seems to work well
when the error happens (index is properly updated with the documents or
deletes).


Does anyone else experience the same problem? Given the other people
that got the same error I think this is a bug in Solr 8.4+


best

Sebastian


-- 
Sebastian Lutter
CTO

https://pixolution.org

Phone: +49(0)30/60984960 | Fax: +49(0)30/60984962
gpg-key: 3770E7637DC4F67A

Read us  : http://pixolution.org/blog/
Follow us: http://twitter.com/pixolution
Try us   : http://demo.pixolution.org

pixolution GmbH | c/o Büro 2.0 | Weigandufer 45 | 12059 Berlin
HRB 120049 | Amtsgericht Charlottenburg, Berlin
Geschäftsführer / executive board: David Backstein | Prof. Dr. Kai-Uwe
Barthel

Confidentiality: This e-mail contains confidential information intended
only for the addressee.
If you are not the intended recipient you may not disclose, copy, use or
otherwise distribute the content of this email.


Re: ConcurrentUpdateSolrClient stall prevention bug in Solr 8.4+

Posted by Erick Erickson <er...@gmail.com>.
Exactly _how_ are you indexing? In particular, how often are commits happening?

If you’re committing too often, Solr can block until some of the background 
merges are complete. This can happen particularly when you are doing hard 
commits in rapid succession, either through, say, committing from the client 
(which I recommend against in almost all cases) or haveing your <autocommit> 
intervals set too short.

Your autocommit settings should be as long as your application can tolerate,
committing is expensive.

Here’s some background:
https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The other possibility is if you have very long GC pauses, so I’d also
monitor the GC activity and see if you have stop-the-world GC
pauses exceeding 20 seconds coincident with this problem.

Best,
Erick

> On Dec 3, 2020, at 6:12 AM, Sebastian Lutter <se...@pixolution.de> wrote:
> 
> Hi!
> 
> I run a three nodes Solr 8.5.1 cluster and experienced a bug when updating the index: (adding document)
> 
> {
>   "responseHeader":{
>     "rf":3,
>     "status":500,
>     "QTime":22938},
>   "error":{
>     "msg":"Task queue processing has stalled for 20205 ms with 0 remaining elements to process.",
>     "trace":"java.io.IOException: Task queue processing has stalled for 20205 ms with 0 remaining elements to process.\n\tat org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.blockUntilFinished(ConcurrentUpdateHttp2SolrClient.java:501)\n\tat org.apache.solr.update.StreamingSolrClients.blockUntilFinished(StreamingSolrClients.java:87)\n\tat org.apache.solr.update.SolrCmdDistributor.blockAndDoRetries(SolrCmdDistributor.java:265)\n\tat org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:251)\n\tat org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:201)\n\tat org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:72)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\tat java.base/java.lang.Thread.run(Unknown Source)\n",
>     "code":500}}
> 
> I have no routine yet to reproduce the issue, this happened in a production environment.
> 
> While searching for the error I found the following other resources dealing with the same issue:
> 
> * https://github.com/clarin-eric/VLO/issues/291
> 
> * https://lucene.472066.n3.nabble.com/Delete-on-8-5-1-td4457382.html
> 
> * https://www.mail-archive.com/solr-user@lucene.apache.org/msg152993.html
> 
> 
> 
> I think the problem is in Solr since this feature has been added: 
> 
> * https://issues.apache.org/jira/browse/SOLR-13975
> 
> 
> 
> It happens sporadically, the next time the exact same update request works without problems. Also the update of the index seems to work well when the error happens (index is properly updated with the documents or deletes).
> 
> 
> 
> Does anyone else experience the same problem? Given the other people that got the same error I think this is a bug in Solr 8.4+
> 
> 
> 
> best
> 
> Sebastian
> 
> 
> 
> -- 
> Sebastian Lutter
> CTO
> 
> 
> https://pixolution.org
> 
> 
> Phone: +49(0)30/60984960 | Fax: +49(0)30/60984962
> gpg-key: 3770E7637DC4F67A
> 
> Read us  : 
> http://pixolution.org/blog/
> 
> Follow us: 
> http://twitter.com/pixolution
> 
> Try us   : 
> http://demo.pixolution.org
> 
> 
> pixolution GmbH | c/o Büro 2.0 | Weigandufer 45 | 12059 Berlin
> HRB 120049 | Amtsgericht Charlottenburg, Berlin
> Geschäftsführer / executive board: David Backstein | Prof. Dr. Kai-Uwe
> Barthel
> 
> Confidentiality: This e-mail contains confidential information intended
> only for the addressee.
> If you are not the intended recipient you may not disclose, copy, use or
> otherwise distribute the content of this email.
> 
> <OpenPGP_0x3770E7637DC4F67A.asc>