You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tim Vaillancourt <ti...@elementspace.com> on 2014/02/05 22:04:25 UTC

4.3.1 SC - IndexWriter issues causing replication + failures

Hey guys,

I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2
shards over 4 Solr instances, (which results in 1 core per Solr instance).

After some time in Production without issues, we are seeing errors related
to the IndexWriter all over our logs and an infinite loop of failing
replication from Leader on our 2 replicas.

We see a flood of: "org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed" stacktraces, then the Solr replica tries to
replicate/recover, then fails replication and then the following 2 errors
show up:

1) "SolrIndexWriter was not closed prior to finalize(), indicates a bug --
POSSIBLE RESOURCE LEAK!!!"
2) "Error closing IndexWriter, trying rollback" (which results in a
null-pointer exception).

I'm guessing the best way forward would be to upgrade to latest, but that
is an undertaking that will take significant time/testing. In the meantime,
is there anything I can do to mitigate or understand the issue more?

Does anyone know what the IndexWriter errors refer to?

Below is a URL to a .txt file with summarized portions of my solr.log. Any
help is really appreciated as always!!

http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt

Thanks all,

Tim

Re: 4.3.1 SC - IndexWriter issues causing replication + failures

Posted by Tim Vaillancourt <ti...@elementspace.com>.
Some more info to provide:

-Replication almost never completes following the "this IndexWriter is
closed" stacktraces.
-When the replication begins after "this IndexWriter is closed" error, over
a few hours the replica eventually fills the disk to 100% with index files
under data/. There are so many files in the data directory it can't be
listed and takes a very long time to delete. It seems the frequent
replications are filling the disk with new files whose sum is roughly 3
times larger than the real index. Is it leaking filehandles or forgetting
it has downloaded something?

Is this a better question for the lucene list? It seems (see below) that
this stacktrace is occuring in the lucene layer vs solr, but maybe someone
could confirm?

"ERROR [2014-01-27 18:28:49.368] [org.apache.solr.common.SolrException]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
    at
org.apache.lucene.index.DocumentsWriter.ensureOpen(DocumentsWriter.java:199)
    at
org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:338)
    at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:419)
    at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1508)
    at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
    at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
    at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
    at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:519)
    at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:655)
    at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398)
    at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
    at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
    at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
    at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
    at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
    at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
    ... <chopped>"

Thanks!

Tim


On 5 February 2014 13:04, Tim Vaillancourt <ti...@elementspace.com> wrote:

> Hey guys,
>
> I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2
> shards over 4 Solr instances, (which results in 1 core per Solr instance).
>
> After some time in Production without issues, we are seeing errors related
> to the IndexWriter all over our logs and an infinite loop of failing
> replication from Leader on our 2 replicas.
>
> We see a flood of: "org.apache.lucene.store.AlreadyClosedException: this
> IndexWriter is closed" stacktraces, then the Solr replica tries to
> replicate/recover, then fails replication and then the following 2 errors
> show up:
>
> 1) "SolrIndexWriter was not closed prior to finalize(), indicates a bug --
> POSSIBLE RESOURCE LEAK!!!"
> 2) "Error closing IndexWriter, trying rollback" (which results in a
> null-pointer exception).
>
> I'm guessing the best way forward would be to upgrade to latest, but that
> is an undertaking that will take significant time/testing. In the meantime,
> is there anything I can do to mitigate or understand the issue more?
>
> Does anyone know what the IndexWriter errors refer to?
>
> Below is a URL to a .txt file with summarized portions of my solr.log. Any
> help is really appreciated as always!!
>
> http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt
>
> Thanks all,
>
> Tim
>