You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by forest_soup <ta...@gmail.com> on 2015/03/30 10:51:19 UTC

Restart solr failed after applied the patch in https://issues.apache.org/jira/browse/SOLR-6359

https://issues.apache.org/jira/browse/SOLR-6359

I also posted the questions to the JIRA ticket.

We have a SolrCloud with 5 solr servers of Solr 4.7.0. There are one
collection with 80 shards(2 replicas per shard) on those 5 servers. And we
made a patch by merge the patch
(https://issues.apache.org/jira/secure/attachment/12702473/SOLR-6359.patch)
to 4.7.0 stream. And after applied the patch to our servers with the config
changing uploaded to ZooKeeper, we did a restart on one of the 5 solr
server, we met some issues on that server. Below is the details - 
The solrconfig.xml we changed:
<updateLog>
<str name="dir">$
{solr.ulog.dir:}
</str>
<int name="numRecordsToKeep">10000</int>
<int name="maxNumLogsToKeep">100</int>
</updateLog>

After we restarted one solr server without other 4 servers are running, we
met below exceptions in the restarted one:
ERROR - 2015-03-16 20:48:48.214; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: Exception writing document id
Q049bGx0bWFpbDIxL089bGxwX3VzMQ==41703656!B68BF5EC5A4A650D85257E0A00724A3B to
the index; possible analysis error.
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:703)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:857)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:556)
at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:96)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106)
at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:804)
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter
is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:659)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1525)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:236)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160)
... 37 more

It looks like https://issues.apache.org/jira/browse/SOLR-4605, but I guess
it's not the case..

Is it due to txn log reply of the old log entries? Could you please help to
explain the root cause of it and how to avoid it?

Doing a rolling restart cannot solve the issue. So we have to do a full
outage that stop all 5 solr servers, then start one, wait all cores become
"active", then start another one.

Do you have any better idea to get quick resolution of those failure?

Thanks!



--
View this message in context: http://lucene.472066.n3.nabble.com/Restart-solr-failed-after-applied-the-patch-in-https-issues-apache-org-jira-browse-SOLR-6359-tp4196251.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Restart solr failed after applied the patch in https://issues.apache.org/jira/browse/SOLR-6359

Posted by forest_soup <ta...@gmail.com>.
Yes, I also doubt the patch. I restore the patch with original .jar file,
there is no that issue.



--
View this message in context: http://lucene.472066.n3.nabble.com/Restart-solr-failed-after-applied-the-patch-in-https-issues-apache-org-jira-browse-SOLR-6359-tp4196251p4196278.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Restart solr failed after applied the patch in https://issues.apache.org/jira/browse/SOLR-6359

Posted by forest_soup <ta...@gmail.com>.
Thanks Ramkumar!

Understood. We will try 100, 10. 

But with our original steps which we found the exception, can we say that
the patch has some issue? 
1, put the patch to all 5 running solr servers(tomcat) by replacing the
tomcat/webapps/solr/WEB-INF/lib/solr-core-4.7.0.jar with the patched
solr-core-4.7-SNAPSHOT.jar I built out. And we keep them all running.
2, uploaded the solrconfig.xml to zookeeper with below changes: 
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
<int name="numRecordsToKeep">10000</int>
<int name="maxNumLogsToKeep">100</int>
</updateLog>
3, restarted solr server 1(tomcat), after it restarted, it has that
exception in my first POST.
4, restarted solr server 1 again, it still has the same issue.
5, restored the patch by replace the
tomcat/webapps/solr/WEB-INF/lib/solr-core-4.7-SNAPSHOT.jar with the orignal
4.7.0 one.
6, restarted solr server 1 again, there is no that issue. 

So we are thinking if we will have that in version 5.1, after we upgrade
solr, and doing rolling restart, will the issue emerge and we have to do a
full restart which causes service outage. 

Thanks! 



--
View this message in context: http://lucene.472066.n3.nabble.com/Restart-solr-failed-after-applied-the-patch-in-https-issues-apache-org-jira-browse-SOLR-6359-tp4196251p4197163.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Restart solr failed after applied the patch in https://issues.apache.org/jira/browse/SOLR-6359

Posted by "Ramkumar R. Aiyengar" <an...@gmail.com>.
It shouldn't be any different without the patch, or with the patch and
(100,10) as parameters. Which is why I wanted you to check with 100,10.. If
you see the same issue with that, then the patch is probably not an issue,
may be it is with the patched build in general..
On 30 Mar 2015 13:01, "forest_soup" <ta...@gmail.com> wrote:

> But if the value can only be 100,10, is there any difference with no that
> patch? Can we enlarge those 2 values? Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Restart-solr-failed-after-applied-the-patch-in-https-issues-apache-org-jira-browse-SOLR-6359-tp4196251p4196280.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Restart solr failed after applied the patch in https://issues.apache.org/jira/browse/SOLR-6359

Posted by forest_soup <ta...@gmail.com>.
But if the value can only be 100,10, is there any difference with no that
patch? Can we enlarge those 2 values? Thanks!



--
View this message in context: http://lucene.472066.n3.nabble.com/Restart-solr-failed-after-applied-the-patch-in-https-issues-apache-org-jira-browse-SOLR-6359-tp4196251p4196280.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Restart solr failed after applied the patch in https://issues.apache.org/jira/browse/SOLR-6359

Posted by "Ramkumar R. Aiyengar" <an...@gmail.com>.
I doubt this has anything to do with the patch. Do you observe the same
behaviour if you reduce the values for the config to defaults? (100, 10)
On 30 Mar 2015 09:51, "forest_soup" <ta...@gmail.com> wrote:

> https://issues.apache.org/jira/browse/SOLR-6359
>
> I also posted the questions to the JIRA ticket.
>
> We have a SolrCloud with 5 solr servers of Solr 4.7.0. There are one
> collection with 80 shards(2 replicas per shard) on those 5 servers. And we
> made a patch by merge the patch
> (https://issues.apache.org/jira/secure/attachment/12702473/SOLR-6359.patch
> )
> to 4.7.0 stream. And after applied the patch to our servers with the config
> changing uploaded to ZooKeeper, we did a restart on one of the 5 solr
> server, we met some issues on that server. Below is the details -
> The solrconfig.xml we changed:
> <updateLog>
> <str name="dir">$
> {solr.ulog.dir:}
> </str>
> <int name="numRecordsToKeep">10000</int>
> <int name="maxNumLogsToKeep">100</int>
> </updateLog>
>
> After we restarted one solr server without other 4 servers are running, we
> met below exceptions in the restarted one:
> ERROR - 2015-03-16 20:48:48.214; org.apache.solr.common.SolrException;
> org.apache.solr.common.SolrException: Exception writing document id
> Q049bGx0bWFpbDIxL089bGxwX3VzMQ==41703656!B68BF5EC5A4A650D85257E0A00724A3B
> to
> the index; possible analysis error.
> at
>
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
> at
>
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
> at
>
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> at
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:703)
> at
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:857)
> at
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:556)
> at
>
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:96)
> at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166)
> at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
> at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190)
> at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
> at
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173)
> at
>
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106)
> at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
> at
>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
> at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
> at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
> at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> at
>
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
> at
>
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
> at
>
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)
> at
>
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> at java.lang.Thread.run(Thread.java:804)
> Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter
> is closed
> at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645)
> at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:659)
> at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1525)
> at
>
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:236)
> at
>
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160)
> ... 37 more
>
> It looks like https://issues.apache.org/jira/browse/SOLR-4605, but I guess
> it's not the case..
>
> Is it due to txn log reply of the old log entries? Could you please help to
> explain the root cause of it and how to avoid it?
>
> Doing a rolling restart cannot solve the issue. So we have to do a full
> outage that stop all 5 solr servers, then start one, wait all cores become
> "active", then start another one.
>
> Do you have any better idea to get quick resolution of those failure?
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Restart-solr-failed-after-applied-the-patch-in-https-issues-apache-org-jira-browse-SOLR-6359-tp4196251.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>