You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by adfel70 <ad...@gmail.com> on 2015/05/07 18:22:07 UTC

"I was asked to wait on state recovering for shard.... but I still do not see the request state"

Hi
I have a cluster of 16 shards, 3 replicas.

I keep getting situations where a whole shard breaks.
the leader is at down state and says:
I was asked to wait on state recovering for shard.... but i still do not see
the requested state. I see state: recovering live:true leader from
ZK:http://...

the replicas are in recovering state keep failing on recovery, and putting
the same exception in the log.

any idea?

I use solr 4.10.3

Thanks.




--
View this message in context: http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

Posted by Mark Miller <ma...@gmail.com>.
Only INFO level, so I suspect not bad...

If that Overseer closed, another node should have picked up where it left
off. See that in another log?

Generally an Overseer close means a node or cluster restart.

This can cause a lot of DOWN state publishing. If it's a cluster restart, a
lot of those DOWN publishes are not processed until the cluster is started
back up - which can lead to the Overseer being overwhelmed and things not
responding fast enough. You should be able to see an active Overseer
working on publishing those states though (it shows that at INFO logging
level).

If the Overseer is simply down and another did not take over, that is just
some kind of bug. If it's overwhelmed, 5x is much much faster,
and SOLR-7281 should also help, but that is no real help for 4.x at this
point.

Anyway, key is, what is the active Overseer doing. Is there no active
Overseer? Or is it busy trying to push through a backlog of operations.

- Mark

On Wed, Feb 3, 2016 at 8:46 PM hawk <an...@hotmail.com> wrote:

> Thanks Mark.
>
> I was able to search "Overseer" in the solr logs around the time frame of
> the condition. This particular message was from the leader node of the
> shard.
>
> 160201 11:26:36.380 localhost-startStop-1 Overseer (id=null) closing
>
> Also I found this message in the zookeeper logs.
>
> 11:26:35,218 [myid:02] - INFO [ProcessThread(sid:2
> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
> processing sessionid:0x15297c0fe2e3f2d type:create cxid:0x3
> zxid:0xf0001be48
> txntype:-1 reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
> NodeExists for /overseer
>
> Any thoughts what these messages suggest?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255105.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller

Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

Posted by hawk <an...@hotmail.com>.
Thanks Mark.

I was able to search "Overseer" in the solr logs around the time frame of
the condition. This particular message was from the leader node of the
shard.

160201 11:26:36.380 localhost-startStop-1 Overseer (id=null) closing

Also I found this message in the zookeeper logs.

11:26:35,218 [myid:02] - INFO [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
processing sessionid:0x15297c0fe2e3f2d type:create cxid:0x3 zxid:0xf0001be48
txntype:-1 reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
NodeExists for /overseer

Any thoughts what these messages suggest?




--
View this message in context: http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255105.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

Posted by Mark Miller <ma...@gmail.com>.
You get this when the Overseer is either bogged down or not processing
events generally.

The Overseer is way, way faster at processing events in 5x.

If you search your logs for .Overseer you can see what it's doing. Either
nothing at the time, or bogged down processing state updates probably.

Along with 5x Overseer processing being much more efficient, SOLR-7281 is
going to take out a lot of state publishing on shutdown that can end up
getting processed on the next startup.

- Mark

On Wed, Feb 3, 2016 at 6:39 PM hawk <an...@hotmail.com> wrote:

> Here are more details around the event.
>
> 160201 11:57:22.272 http-bio-8082-exec-18 [] webapp=/solr path=/update
> params={waitSearcher=true&distrib.from=http://x:x
> /solr/xxxx/&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false}
> {commit=} 0 134
>
> 160201 11:57:25.993 RecoveryThread Error while trying to recover.
> core=xxxxx
> java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
> asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
> still do not see the requested state. I see state: recovering live:true
> leader from ZK: http://x:x/solr/xxxx/
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:188)
>         at
>
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
>         at
>
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
>         at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
> Caused by:
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
> asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
> still do not see the requested state. I see state: recovering live:true
> leader from ZK: http://x:x/solr/xxxx/
>         at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:550)
>         at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
>         at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
> 160201 11:57:25.993 RecoveryThread Recovery failed - trying again... (7)
> core=xxxx
>
> 160201 11:57:25.994 RecoveryThread Wait 256.0 seconds before trying to
> recover again (8)
>
> 160201 11:57:30.370 http-bio-8082-exec-3
> org.apache.solr.common.SolrException: no servers hosting shard:
>         at
>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
>         at
>
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255073.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller

Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

Posted by hawk <an...@hotmail.com>.
Here are more details around the event.

160201 11:57:22.272 http-bio-8082-exec-18 [] webapp=/solr path=/update
params={waitSearcher=true&distrib.from=http://x:x/solr/xxxx/&update.distrib=FROMLEADER&openSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false}
{commit=} 0 134

160201 11:57:25.993 RecoveryThread Error while trying to recover. core=xxxxx
java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
still do not see the requested state. I see state: recovering live:true
leader from ZK: http://x:x/solr/xxxx/
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:188)
	at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
	at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
	at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: I was
asked to wait on state recovering for shard2 in xxx on xxx:xx_solr but I
still do not see the requested state. I see state: recovering live:true
leader from ZK: http://x:x/solr/xxxx/
	at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:550)
	at
org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
	at
org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)

160201 11:57:25.993 RecoveryThread Recovery failed - trying again... (7)
core=xxxx

160201 11:57:25.994 RecoveryThread Wait 256.0 seconds before trying to
recover again (8)

160201 11:57:30.370 http-bio-8082-exec-3
org.apache.solr.common.SolrException: no servers hosting shard: 
	at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
	at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)



--
View this message in context: http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255073.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: "I was asked to wait on state recovering for shard.... but I still do not see the request state"

Posted by hawk <an...@hotmail.com>.
I have a similar issue on 4.10.1

160131 21:07:36.932 http-bio-8082-exec-2802
org.apache.solr.common.SolrException: I was asked to wait on state
recovering for shard2 in my_cases on localhost2:8080_solr but I still do not
see the requested state. I see state: recovering live:true leader from ZK:
http://localhost1:8080/solr/my_cases/
	at
org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:999)
	at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:245)
	at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
	at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
	at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
	at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
	at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
	at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
	at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
	at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
	at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
	at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
	at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
	at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
	at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)



--
View this message in context: http://lucene.472066.n3.nabble.com/I-was-asked-to-wait-on-state-recovering-for-shard-but-I-still-do-not-see-the-request-state-tp4204348p4255043.html
Sent from the Solr - User mailing list archive at Nabble.com.