You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jérôme Etévé <je...@gmail.com> on 2009/11/02 12:13:22 UTC

Lock problems: Lock obtain timed out

Hi,

  I've got a few machines who post documents concurrently to a solr
instance. They do not issue the commit themselves, instead, I've got
autocommit set up at solr server side:
   <autoCommit>
      <maxDocs>50000</maxDocs> <!--  commit at least every 50000 docs -->
      <maxTime>60000</maxTime> <!-- Stays max 60s without commit -->
    </autoCommit>

This usually works fine, but sometime the server goes in a deadlock
state . Here's the errors I get from the log (these go on forever
until I delete the index and restart all from zero):

02-Nov-2009 10:35:27 org.apache.solr.update.SolrIndexWriter finalize
SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates
a bug -- POSSIBLE RESOURCE LEAK!!!
...
[ multiple messages like this ]
...
02-Nov-2009 10:35:27 org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: NativeFSLock@/home/solrdata/jobs/index/lucene-703db99881e56205cb910a2e5fd816d3-write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:85)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1538)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1395)
        at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:190)
        at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
        at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
        at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
        at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)


I'm wondering what could be the reason for this (if a commit takes
mire than 60 seconds for instance?), and if I should use better
locking or autocommittting options?

Here's the locking conf I've got at the moment:
   <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>
   <lockType>native</lockType>

I'm using solr trunk from 12 oct 2009 within tomcat.

Thanks for any help.

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jerome@eteve.net

Re: Lock problems: Lock obtain timed out

Posted by Chris Hostetter <ho...@fucit.org>.
: Can anyone think of a reason why these locks would hang around for more than
: 2 hours?
: 
: I have been monitoring them and they look like they are very short lived.

Typically the lock files are only left arround for more then a few seconds 
when there was a fatal crash of some kind ... an OOM Error for example, or 
as already mentioned in this thread...

: >> > > SEVERE: java.io.IOException: No space left on device

...if you check your solr logs for messages in the immediate time frame 
following the the lastModified time of the lock file you'll probably find 
something interesting.


-Hoss


Re: Lock problems: Lock obtain timed out

Posted by Ian Connor <ia...@gmail.com>.
Can anyone think of a reason why these locks would hang around for more than
2 hours?

I have been monitoring them and they look like they are very short lived.

On Tue, Jan 26, 2010 at 10:15 AM, Ian Connor <ia...@gmail.com> wrote:

> We traced one of the lock files, and it had been around for 3 hours. A
> restart removed it - but is 3 hours normal for one of these locks?
>
> Ian.
>
>
> On Mon, Jan 25, 2010 at 4:14 PM, mike anderson <sa...@gmail.com>wrote:
>
>> I am getting this exception as well, but disk space is not my problem.
>> What
>> else can I do to debug this? The solr log doesn't appear to lend any other
>> clues..
>>
>> Jan 25, 2010 4:02:22 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/update params={} status=500 QTime=1990
>> Jan 25, 2010 4:02:22 PM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
>> timed
>> out: NativeFSLock@
>> /solr8984/index/lucene-98c1cb272eb9e828b1357f68112231e0-write.lock
>> at org.apache.lucene.store.Lock.obtain(Lock.java:85)
>> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
>> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1402)
>> at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:190)
>> at
>>
>> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
>> at
>>
>> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
>> at
>>
>> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
>> at
>>
>> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
>> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>> at
>>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>> at
>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>> at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>> at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>> at
>>
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>> at
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>> at
>>
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>> at
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>> at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>> at
>>
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>> at
>>
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>> at
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>> at org.mortbay.jetty.Server.handle(Server.java:285)
>> at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>> at
>>
>> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
>> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>> at
>>
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>> at
>>
>> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>>
>>
>> Should I consider changing the lock timeout settings (currently set to
>> defaults)? If so, I'm not sure what to base these values on.
>>
>> Thanks in advance,
>> mike
>>
>>
>> On Wed, Nov 4, 2009 at 8:27 PM, Lance Norskog <go...@gmail.com> wrote:
>>
>> > This will not ever work reliably. You should have 2x total disk space
>> > for the index. Optimize, for one, requires this.
>> >
>> > On Wed, Nov 4, 2009 at 6:37 AM, Jérôme Etévé <je...@gmail.com>
>> > wrote:
>> > > Hi,
>> > >
>> > > It seems this situation is caused by some No space left on device
>> > exeptions:
>> > > SEVERE: java.io.IOException: No space left on device
>> > >        at java.io.RandomAccessFile.writeBytes(Native Method)
>> > >        at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
>> > >        at
>> >
>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
>> > >        at
>> >
>> org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
>> > >
>> > >
>> > > I'd better try to set my maxMergeDocs and mergeFactor to more
>> > > adequates values for my app (I'm indexing ~15 Gb of data on 20Gb
>> > > device, so I guess there's problem when solr tries to merge the index
>> > > bits being build.
>> > >
>> > > At the moment, they are set to   <mergeFactor>100</mergeFactor> and
>> > > <maxMergeDocs>2147483647</maxMergeDocs>
>> > >
>> > > Jerome.
>> > >
>> > > --
>> > > Jerome Eteve.
>> > > http://www.eteve.net
>> > > jerome@eteve.net
>> > >
>> >
>> >
>> >
>> > --
>> > Lance Norskog
>> > goksron@gmail.com
>> >
>>
>


-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor

Re: Lock problems: Lock obtain timed out

Posted by Ian Connor <ia...@gmail.com>.
We traced one of the lock files, and it had been around for 3 hours. A
restart removed it - but is 3 hours normal for one of these locks?

Ian.

On Mon, Jan 25, 2010 at 4:14 PM, mike anderson <sa...@gmail.com>wrote:

> I am getting this exception as well, but disk space is not my problem. What
> else can I do to debug this? The solr log doesn't appear to lend any other
> clues..
>
> Jan 25, 2010 4:02:22 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={} status=500 QTime=1990
> Jan 25, 2010 4:02:22 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
> timed
> out: NativeFSLock@
> /solr8984/index/lucene-98c1cb272eb9e828b1357f68112231e0-write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:85)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1402)
> at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:190)
> at
>
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
> at
>
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
> at
>
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
> at
>
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
> at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> at
>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> at
>
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> at org.mortbay.jetty.Server.handle(Server.java:285)
> at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> at
>
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> at
>
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> at
>
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>
>
> Should I consider changing the lock timeout settings (currently set to
> defaults)? If so, I'm not sure what to base these values on.
>
> Thanks in advance,
> mike
>
>
> On Wed, Nov 4, 2009 at 8:27 PM, Lance Norskog <go...@gmail.com> wrote:
>
> > This will not ever work reliably. You should have 2x total disk space
> > for the index. Optimize, for one, requires this.
> >
> > On Wed, Nov 4, 2009 at 6:37 AM, Jérôme Etévé <je...@gmail.com>
> > wrote:
> > > Hi,
> > >
> > > It seems this situation is caused by some No space left on device
> > exeptions:
> > > SEVERE: java.io.IOException: No space left on device
> > >        at java.io.RandomAccessFile.writeBytes(Native Method)
> > >        at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
> > >        at
> >
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
> > >        at
> >
> org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
> > >
> > >
> > > I'd better try to set my maxMergeDocs and mergeFactor to more
> > > adequates values for my app (I'm indexing ~15 Gb of data on 20Gb
> > > device, so I guess there's problem when solr tries to merge the index
> > > bits being build.
> > >
> > > At the moment, they are set to   <mergeFactor>100</mergeFactor> and
> > > <maxMergeDocs>2147483647</maxMergeDocs>
> > >
> > > Jerome.
> > >
> > > --
> > > Jerome Eteve.
> > > http://www.eteve.net
> > > jerome@eteve.net
> > >
> >
> >
> >
> > --
> > Lance Norskog
> > goksron@gmail.com
> >
>

Re: Lock problems: Lock obtain timed out

Posted by mike anderson <sa...@gmail.com>.
I am getting this exception as well, but disk space is not my problem. What
else can I do to debug this? The solr log doesn't appear to lend any other
clues..

Jan 25, 2010 4:02:22 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={} status=500 QTime=1990
Jan 25, 2010 4:02:22 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: NativeFSLock@
/solr8984/index/lucene-98c1cb272eb9e828b1357f68112231e0-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1402)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:190)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


Should I consider changing the lock timeout settings (currently set to
defaults)? If so, I'm not sure what to base these values on.

Thanks in advance,
mike


On Wed, Nov 4, 2009 at 8:27 PM, Lance Norskog <go...@gmail.com> wrote:

> This will not ever work reliably. You should have 2x total disk space
> for the index. Optimize, for one, requires this.
>
> On Wed, Nov 4, 2009 at 6:37 AM, Jérôme Etévé <je...@gmail.com>
> wrote:
> > Hi,
> >
> > It seems this situation is caused by some No space left on device
> exeptions:
> > SEVERE: java.io.IOException: No space left on device
> >        at java.io.RandomAccessFile.writeBytes(Native Method)
> >        at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
> >        at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
> >        at
> org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
> >
> >
> > I'd better try to set my maxMergeDocs and mergeFactor to more
> > adequates values for my app (I'm indexing ~15 Gb of data on 20Gb
> > device, so I guess there's problem when solr tries to merge the index
> > bits being build.
> >
> > At the moment, they are set to   <mergeFactor>100</mergeFactor> and
> > <maxMergeDocs>2147483647</maxMergeDocs>
> >
> > Jerome.
> >
> > --
> > Jerome Eteve.
> > http://www.eteve.net
> > jerome@eteve.net
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Lock problems: Lock obtain timed out

Posted by Lance Norskog <go...@gmail.com>.
This will not ever work reliably. You should have 2x total disk space
for the index. Optimize, for one, requires this.

On Wed, Nov 4, 2009 at 6:37 AM, Jérôme Etévé <je...@gmail.com> wrote:
> Hi,
>
> It seems this situation is caused by some No space left on device exeptions:
> SEVERE: java.io.IOException: No space left on device
>        at java.io.RandomAccessFile.writeBytes(Native Method)
>        at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
>        at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
>        at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
>
>
> I'd better try to set my maxMergeDocs and mergeFactor to more
> adequates values for my app (I'm indexing ~15 Gb of data on 20Gb
> device, so I guess there's problem when solr tries to merge the index
> bits being build.
>
> At the moment, they are set to   <mergeFactor>100</mergeFactor> and
> <maxMergeDocs>2147483647</maxMergeDocs>
>
> Jerome.
>
> --
> Jerome Eteve.
> http://www.eteve.net
> jerome@eteve.net
>



-- 
Lance Norskog
goksron@gmail.com

Re: Lock problems: Lock obtain timed out

Posted by Jérôme Etévé <je...@gmail.com>.
Hi,

It seems this situation is caused by some No space left on device exeptions:
SEVERE: java.io.IOException: No space left on device
        at java.io.RandomAccessFile.writeBytes(Native Method)
        at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
        at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
        at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)


I'd better try to set my maxMergeDocs and mergeFactor to more
adequates values for my app (I'm indexing ~15 Gb of data on 20Gb
device, so I guess there's problem when solr tries to merge the index
bits being build.

At the moment, they are set to   <mergeFactor>100</mergeFactor> and
<maxMergeDocs>2147483647</maxMergeDocs>

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jerome@eteve.net

Re: Lock problems: Lock obtain timed out

Posted by Chris Hostetter <ho...@fucit.org>.
: 02-Nov-2009 10:35:27 org.apache.solr.update.SolrIndexWriter finalize
: SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates
: a bug -- POSSIBLE RESOURCE LEAK!!!

can you post some context showing what the logs look like just before 
these errors?

I'm not sure what might be causing lock collision but your guess about
commit's taking too long and overlapping is a good one -- what do the log
messages about the commits say arround the time these errors start? the
commit logs when it finishes and how long it takes so it's easy to spot.

increasing your writeLockTimeout is probably a good idea, but i'm still 
confused as to why the whole server would lock up until you delete the 
index and restart, at worst i would expect the update/commit attempts that 
time out getting the lock to complain loudly, but then the "slow" one 
would eventually finish and subsequent attempts would work ok.

...very odd.

-Hoss