You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Casey Callendrello <ca...@weebly.com> on 2012/08/24 20:24:23 UTC

Solr 4.0 beta deadlock / file descriptor spike

Hi there,
I have been doing some load testing with Solr 4 beta (now, trunk). My
configuration is fairly simple - two servers, replicating via SolrCloud.
SolrCloud is configured as recommended in the wiki:

<updateRequestProcessorChain name="standard">
       <processor class="solr.LogUpdateProcessorFactory" />
       <processor class="solr.DistributedUpdateProcessorFactory" />
       <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

Twice now I've seen sudden thread and file-descriptor spikes along with
a complete deadlock, simultaneously on both machines. My max FDs is set
to 1024, and (excepting the spikes) I never see usage over 375 fds.

The first FD spike was with an older trunk revision. It was co-incident
with a corrupt transaction log. I've lost the logs, unfortunately, but
SOLR tried to re-process the same log over and over, leaking FDs and dying.

The upgraded version has not reported the corrupt transaction issue
prior to deadlock. However, according to the log files, the deadlock
persists for about 5 minutes prior to FD exhaustion. The last log line
is simply "INFO: end_commit_flush"

Upon restart, I see a frightening amount of corrupt transaction log
exceptions and " New transaction log already exists" exceptions.

Any thoughts?
Contact me for the thread dump; it's 1 MiB.

Thanks,
--Casey C.

Re: Solr 4.0 beta deadlock / file descriptor spike

Posted by Casey Callendrello <ca...@weebly.com>.
For the record, this was caused by a rookie mistake: FD exhaustion.

--Casey

On 8/24/12 11:24 AM, Casey Callendrello wrote:
> Hi there,
> I have been doing some load testing with Solr 4 beta (now, trunk). My
> configuration is fairly simple - two servers, replicating via
> SolrCloud. SolrCloud is configured as recommended in the wiki:
>
> <updateRequestProcessorChain name="standard">
>        <processor class="solr.LogUpdateProcessorFactory" />
>        <processor class="solr.DistributedUpdateProcessorFactory" />
>        <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
>
> Twice now I've seen sudden thread and file-descriptor spikes along
> with a complete deadlock, simultaneously on both machines. My max FDs
> is set to 1024, and (excepting the spikes) I never see usage over 375
> fds.
>
> The first FD spike was with an older trunk revision. It was
> co-incident with a corrupt transaction log. I've lost the logs,
> unfortunately, but SOLR tried to re-process the same log over and
> over, leaking FDs and dying.
>
> The upgraded version has not reported the corrupt transaction issue
> prior to deadlock. However, according to the log files, the deadlock
> persists for about 5 minutes prior to FD exhaustion. The last log line
> is simply "INFO: end_commit_flush"
>
> Upon restart, I see a frightening amount of corrupt transaction log
> exceptions and " New transaction log already exists" exceptions.
>
> Any thoughts?
> Contact me for the thread dump; it's 1 MiB.
>
> Thanks,
> --Casey C.