You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Whitman <br...@echonest.com> on 2009/01/02 17:20:36 UTC
debugging long commits
We have a distributed setup that has been experiencing glacially slow commit
times on only some of the shards. (10s on a good shard, 263s on a slow
shard.) Each shard for this index has about 10GB of lucene index data and
the documents are segregated by an md5 hash, so the distribution of
document/data types should be equal across all shards. I've turned off our
postcommit hooks to isolate the problem, so it's not a snapshot run amok or
anything. I also moved the indexes over to new machines and the same indexes
that were slow in production are also slow on the test machines.
During the slow commit, the jetty process is 100% CPU / 50% RAM on a 8GB
quad core machine. The slow commit happens every time after I add at least
one document. (If I don't add any documents the commit is immediate.)
What can I do to look into this problem?
Re: debugging long commits
Posted by Mike Klaas <mi...@gmail.com>.
Hi Brian,
You might want to follow up on the Lucene list (java-user@lucene.apache.org
). Something was causing problems with the merging and thus you ended
up with too many segments (hence the slow commits). I doubt that you
lost anything--usually the merge function doesn't modify the index
until the merge is complete. But I am not familiar enough with this
code in lucene to be sure.
-Mike
On 2-Jan-09, at 10:17 AM, Brian Whitman wrote:
> I think I'm getting close with this (sorry for the self-replies)
>
> I tried an optimize (which we never do) and it took 30m and said
> this a lot:
>
> Exception in thread "Lucene Merge Thread #4"
> org.apache.lucene.index.MergePolicy$MergeException:
> java.lang.ArrayIndexOutOfBoundsException: Array index out of range:
> 34950
> at
> org
> .apache
> .lucene
> .index
> .ConcurrentMergeScheduler
> .handleMergeException(ConcurrentMergeScheduler.java:314)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler
> $MergeThread.run(ConcurrentMergeScheduler.java:291)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out
> of
> range: 34950
> at org.apache.lucene.util.BitVector.get(BitVector.java:91)
> at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:
> 125)
> at
> org
> .apache
> .lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
> at
> org
> .apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:
> 633)
> at
> org
> .apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:
> 585)
> at
> org
> .apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:
> 546)
> at
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:
> 499)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
> at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:
> 4291)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3932)
> at
> org
> .apache
> .lucene
> .index
> .ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler
> $MergeThread.run(ConcurrentMergeScheduler.java:260)
> Jan 2, 2009 6:05:49 PM org.apache.solr.common.SolrException log
> SEVERE: java.io.IOException: background merge hit exception:
> _ks4:C2504982
> _oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201
> _1agw:C1717926
> into _1agy [optimize]
> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346)
> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280)
> at
> org
> .apache
> .solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:
> 355)
> at
> org
> .apache
> .solr
> .update.processor.RunUpdateProcessor.processCommit(RunUpdateProcesso
> ...
>
>
> But then it finished. And now commits are OK again.
>
> Anyone know what the merge hit exception means and if i lost anything?
Re: debugging long commits
Posted by Brian Whitman <br...@echonest.com>.
I think I'm getting close with this (sorry for the self-replies)
I tried an optimize (which we never do) and it took 30m and said this a lot:
Exception in thread "Lucene Merge Thread #4"
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 34950
at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out of
range: 34950
at org.apache.lucene.util.BitVector.get(BitVector.java:91)
at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125)
at
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:633)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:585)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:546)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:499)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4291)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3932)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:260)
Jan 2, 2009 6:05:49 PM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: background merge hit exception: _ks4:C2504982
_oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201 _1agw:C1717926
into _1agy [optimize]
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:355)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcesso
...
But then it finished. And now commits are OK again.
Anyone know what the merge hit exception means and if i lost anything?
Re: debugging long commits
Posted by Brian Whitman <br...@echonest.com>.
Not sure if these help. Here's the stack trace and jmap -histo output during
a long (bad) commit
Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.0-b16 mixed mode):
"Attach Listener" daemon prio=10 tid=0x00002aabf9954400 nid=0x5e1c runnable
[0x0000000000000000..0x0000000042048d20]
java.lang.Thread.State: RUNNABLE
"Lucene Merge Thread #0" daemon prio=10 tid=0x00002aabf9c7f400 nid=0x5e00
runnable [0x0000000041f47000..0x0000000041f47ba0]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.store.RAMOutputStream.flush(RAMOutputStream.java:147)
at org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:56)
at
org.apache.lucene.index.MultiLevelSkipListWriter.writeSkip(MultiLevelSkipListWriter.java:146)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:588)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:546)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:499)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4291)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3932)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:260)
"pool-2-thread-1" prio=10 tid=0x00002aabf9b94c00 nid=0x5dc3 waiting on
condition [0x0000000041a42000..0x0000000041a42aa0]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aaabecd12d0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
Source)
at java.util.concurrent.DelayQueue.take(Unknown Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
"DestroyJavaVM" prio=10 tid=0x00002aabfc006c00 nid=0x5da3 waiting on
condition [0x0000000000000000..0x000000004022ad10]
java.lang.Thread.State: RUNNABLE
"Timer-2" daemon prio=10 tid=0x00002aabfc007c00 nid=0x5dc1 in Object.wait()
[0x0000000041e46000..0x0000000041e46d20]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabef91338> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Unknown Source)
- locked <0x00002aaabef91338> (a java.util.TaskQueue)
at java.util.TimerThread.run(Unknown Source)
"Checkpointer" daemon prio=10 tid=0x00002aabf995b800 nid=0x5dc0 in
Object.wait() [0x0000000041d45000..0x0000000041d45da0]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabebf9700> (a java.lang.Object)
at java.lang.Object.wait(Object.java:485)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:152)
- locked <0x00002aaabebf9700> (a java.lang.Object)
at java.lang.Thread.run(Unknown Source)
"Cleaner-1" daemon prio=10 tid=0x00002aabf995b000 nid=0x5dbf in
Object.wait() [0x0000000041c44000..0x0000000041c44c20]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabebee158> (a java.lang.Object)
at java.lang.Object.wait(Object.java:485)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:152)
- locked <0x00002aaabebee158> (a java.lang.Object)
at java.lang.Thread.run(Unknown Source)
"INCompressor" daemon prio=10 tid=0x00002aabf9a3cc00 nid=0x5dbe in
Object.wait() [0x0000000041b43000..0x0000000041b43ca0]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabeef3500> (a java.lang.Object)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)
- locked <0x00002aaabeef3500> (a java.lang.Object)
at java.lang.Thread.run(Unknown Source)
"pool-1-thread-1" prio=10 tid=0x00002aabfa015800 nid=0x5dbc waiting on
condition [0x0000000041941000..0x0000000041941ba0]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aaabef91a08> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown
Source)
at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
"Timer-1" daemon prio=10 tid=0x00002aabfa0f1000 nid=0x5dbb in Object.wait()
[0x0000000041840000..0x0000000041840a20]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabef19a48> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Unknown Source)
- locked <0x00002aaabef19a48> (a java.util.TaskQueue)
at java.util.TimerThread.run(Unknown Source)
"btpool0-9 - Acceptor0 SelectChannelConnector@0.0.0.0:80" prio=10
tid=0x00002aabf9807800 nid=0x5dba runnable
[0x000000004173f000..0x000000004173faa0]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00002aaabebcb2a8> (a sun.nio.ch.Util$1)
- locked <0x00002aaabebcb290> (a java.util.Collections$UnmodifiableSet)
- locked <0x00002aaabebc8b00> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:411)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:167)
at
org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)
"btpool0-8 - Acceptor1 SelectChannelConnector@0.0.0.0:80" prio=10
tid=0x00002aabf980d000 nid=0x5db9 runnable
[0x000000004163e000..0x000000004163ed20]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00002aaabebcb5f8> (a sun.nio.ch.Util$1)
- locked <0x00002aaabebcb5e0> (a java.util.Collections$UnmodifiableSet)
- locked <0x00002aaabebc4d30> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:411)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:167)
at
org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)
"btpool0-7" prio=10 tid=0x00002aabf9594800 nid=0x5db8 in Object.wait()
[0x000000004153c000..0x000000004153dda0]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaad9d0c3f0> (a org.apache.solr.update.SolrIndexWriter)
at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4554)
- locked <0x00002aaad9d0c3f0> (a org.apache.solr.update.SolrIndexWriter)
at org.apache.lucene.index.IndexWriter.finishMerges(IndexWriter.java:2873)
- locked <0x00002aaad9d0c3f0> (a org.apache.solr.update.SolrIndexWriter)
at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1718)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1674)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1648)
at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:153)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:175)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:359)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:205)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)
"btpool0-6" prio=10 tid=0x00002aabf98fdc00 nid=0x5db7 in Object.wait()
[0x000000004143c000..0x000000004143cc20]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf15c7b0> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf15c7b0> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
"btpool0-5" prio=10 tid=0x00002aabf9b10000 nid=0x5db6 in Object.wait()
[0x000000004133b000..0x000000004133bca0]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf160ab8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf160ab8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
"btpool0-4" prio=10 tid=0x00002aabf9b97400 nid=0x5db5 in Object.wait()
[0x000000004123a000..0x000000004123ab20]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf160d08> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf160d08> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
"btpool0-3" prio=10 tid=0x00002aabf9b96000 nid=0x5db4 in Object.wait()
[0x0000000041139000..0x0000000041139ba0]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf160f58> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf160f58> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
"btpool0-2" prio=10 tid=0x00002aabf9a07000 nid=0x5db3 in Object.wait()
[0x0000000041038000..0x0000000041038a20]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf1611a8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf1611a8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
"btpool0-1" prio=10 tid=0x00002aabf9324400 nid=0x5db2 in Object.wait()
[0x0000000040f37000..0x0000000040f37aa0]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf1613f8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf1613f8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
"btpool0-0" prio=10 tid=0x00002aabf9a83800 nid=0x5db1 in Object.wait()
[0x0000000040e36000..0x0000000040e36d20]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabf161648> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:541)
- locked <0x00002aaabf161648> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
"Timer-0" daemon prio=10 tid=0x00002aabf9af6000 nid=0x5db0 in Object.wait()
[0x0000000040d35000..0x0000000040d35da0]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabebacf80> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Unknown Source)
- locked <0x00002aaabebacf80> (a java.util.TaskQueue)
at java.util.TimerThread.run(Unknown Source)
"Low Memory Detector" daemon prio=10 tid=0x00002aabf933fc00 nid=0x5dac
runnable [0x0000000000000000..0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread1" daemon prio=10 tid=0x00002aabf933d000 nid=0x5dab waiting
on condition [0x0000000000000000..0x0000000040a31360]
java.lang.Thread.State: RUNNABLE
"CompilerThread0" daemon prio=10 tid=0x00002aabf9339000 nid=0x5daa waiting
on condition [0x0000000000000000..0x00000000409303e0]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00002aabf9337400 nid=0x5da9 waiting
on condition [0x0000000000000000..0x00000000408306b0]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00002aabf9314400 nid=0x5da8 in
Object.wait() [0x000000004072f000..0x000000004072faa0]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabeb86f50> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
- locked <0x00002aaabeb86f50> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
"Reference Handler" daemon prio=10 tid=0x00002aabf9312800 nid=0x5da7 in
Object.wait() [0x000000004062e000..0x000000004062ed20]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaabeb86ec8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
- locked <0x00002aaabeb86ec8> (a java.lang.ref.Reference$Lock)
"VM Thread" prio=10 tid=0x00002aabf917a000 nid=0x5da6 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000004011c800 nid=0x5da4
runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000004011e000 nid=0x5da5
runnable
"VM Periodic Task Thread" prio=10 tid=0x00002aabf9342000 nid=0x5dad waiting
on condition
JNI global references: 971
Heap
PSYoungGen total 1395264K, used 965841K [0x00002aab8d4b0000,
0x00002aabf7f50000, 0x00002aabf7f50000)
eden space 1030400K, 93% used
[0x00002aab8d4b0000,0x00002aabc83d4788,0x00002aabcc2f0000)
from space 364864K, 0% used
[0x00002aabe1b00000,0x00002aabe1b10000,0x00002aabf7f50000)
to space 352320K, 0% used
[0x00002aabcc2f0000,0x00002aabcc2f0000,0x00002aabe1b00000)
PSOldGen total 3495296K, used 642758K [0x00002aaab7f50000,
0x00002aab8d4b0000, 0x00002aab8d4b0000)
object space 3495296K, 18% used
[0x00002aaab7f50000,0x00002aaadf301a78,0x00002aab8d4b0000)
PSPermGen total 21248K, used 19258K [0x00002aaaaff50000,
0x00002aaab1410000, 0x00002aaab7f50000)
object space 21248K, 90% used
[0x00002aaaaff50000,0x00002aaab121e8d8,0x00002aaab1410000)
num #instances #bytes class name
----------------------------------------------
1: 6459678 491568792 [C
2: 6456059 258242360 java.lang.String
3: 6282264 251290560 org.apache.lucene.index.TermInfo
4: 6282189 201030048 org.apache.lucene.index.Term
5: 70220 39109632 [I
6: 6082 25264288 [B
7: 149 20355504 [J
8: 133 20354208 [Lorg.apache.lucene.index.Term;
9: 133 20354208 [Lorg.apache.lucene.index.TermInfo;
10: 160230 8972880 java.nio.HeapByteBuffer
11: 160218 8972208 java.nio.HeapCharBuffer
12: 160210 8971760
org.apache.lucene.index.FieldsReader$FieldForMerge
13: 30440 4095480 <constMethodKlass>
14: 30440 3660128 <methodKlass>
15: 2605 3026184 <constantPoolKlass>
16: 22065 3025120 [Ljava.lang.Object;
17: 1297 2411792 [Ljava.util.HashMap$Entry;
18: 48691 2309696 <symbolKlass>
19: 2604 1981728 <instanceKlassKlass>
20: 2194 1889888 <constantPoolCacheKlass>
21: 27444 1317312 java.util.HashMap$Entry
22: 24954 998160 java.util.AbstractList$Itr
23: 18834 753360 org.apache.lucene.index.FieldInfo
24: 2846 523664 java.lang.Class
25: 13021 520840 java.util.ArrayList
26: 12471 399072 org.apache.lucene.document.Document
27: 3895 372216 [[I
28: 3904 309592 [S
29: 534 249632 <methodDataKlass>
30: 3451 220864
org.apache.lucene.index.SegmentReader$Norm
31: 1547 136136
org.apache.lucene.store.FSDirectory$FSIndexInput
32: 213 120984 <objArrayKlassKlass>
33: 737 112024 java.lang.reflect.Method
34: 1575 100800 java.lang.ref.Finalizer
35: 1345 86080
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor
36: 1188 76032 java.util.HashMap
...
On Fri, Jan 2, 2009 at 11:20 AM, Brian Whitman <br...@echonest.com> wrote:
> We have a distributed setup that has been experiencing glacially slow
> commit times on only some of the shards. (10s on a good shard, 263s on a
> slow shard.) Each shard for this index has about 10GB of lucene index data
> and the documents are segregated by an md5 hash, so the distribution of
> document/data types should be equal across all shards. I've turned off our
> postcommit hooks to isolate the problem, so it's not a snapshot run amok or
> anything. I also moved the indexes over to new machines and the same indexes
> that were slow in production are also slow on the test machines.
> During the slow commit, the jetty process is 100% CPU / 50% RAM on a 8GB
> quad core machine. The slow commit happens every time after I add at least
> one document. (If I don't add any documents the commit is immediate.)
>
> What can I do to look into this problem?
>
>
>
>