You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by michallos <mi...@gmail.com> on 2013/11/29 14:08:34 UTC

Constantly increasing time of full data import

Hi,

On our Solr Cloud based application we use full data import (as a delta
import) every minute
(http://HOST:PORT/solr/collection1/dataimport/?command=full-import&commit=true&optimize=false&clean=false&synchronous=true).
Solr cloud is deployed on 4 nodes.
When Solr starts, this full import takes 15-20 seconds but after 2 days it
takes 5 minutes and this time is constantly increasing.
Each delta import index about 1000 documents. Our index contains about 400M
of documents (120GB size). After server restart this situation repeats.

Do you know what may be cause of this behaviour?

Best,
Michal



--
View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Constantly increasing time of full data import

Posted by michallos <mi...@gmail.com>.

One more stack trace which is active during indexing. This call task is also
executed on the same single threaded executor as registering new searcher:

"searcherExecutor-48-thread-1" prio=10 tid=0x00007f24c0715000 nid=0x3de6
runnable [0x00007f24b096d000]
   java.lang.Thread.State: RUNNABLE
	at
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:111)
	at
org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:131)
	at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:311)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
	at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1494)
	at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1363)
	at
org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:118)
	at
org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:465)
	at org.apache.solr.search.LRUCache.warm(LRUCache.java:188)
	at
org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:2035)
	at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1676)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)

   Locked ownable synchronizers:
	- <0x00007f2880335d38> (a java.util.concurrent.ThreadPoolExecutor$Worker)

Maybe warming queries are blocking commit?  But... why it increases during
not so high load - 1000-2000 request per hour.
And doesn't increase during very low load.

Best,
Michał 



--
View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4106318.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Constantly increasing time of full data import

Posted by michallos <mi...@gmail.com>.

I took a few thread dumps and here and the results:
- service which are indexing stuck on this stack trace:

"cmdDistribExecutor-3-thread-17669" prio=10 tid=0x00007f1aae4a6800
nid=0x44a9 runnable [0x00007f1a6c0f6000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
	at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
	at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
	at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
	at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
	at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
	at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
	at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
	at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
	at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
	at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
	at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:712)
	at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517)
	at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
	at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
	at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
	at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
	at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
	at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
	at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)

   Locked ownable synchronizers:
	- <0x00007f1db2911048> (a java.util.concurrent.ThreadPoolExecutor$Worker)


Other servers stuck on:

"http-solr03-app-pci.dc2%2F10.210.20.219-8080-177" daemon prio=10
tid=0x00007f24cc117800 nid=0x422c waiting on condition [0x00007f098d383000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00007f2719890408> (a
java.util.concurrent.FutureTask)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
	at java.util.concurrent.FutureTask.get(FutureTask.java:187)
	at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:621)
	at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
	at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
	at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
	at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
	at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
	at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
	at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
	at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
	at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235)
	at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:190)
	at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:433)
	at
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:92)
	at
org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextEstablishmentValve.java:126)
	at
org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:70)
	at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
	at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at
org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
	at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330)
	at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:829)
	at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:598)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
	at java.lang.Thread.run(Thread.java:724)

   Locked ownable synchronizers:
	- None

What I can see that DirectUpdateHandler2:621 is:
waitSearcher[0].get();


Is it possible that this is caused by delay in creating searchers or single
threaded searcher executor (SolrCode:1315)?
Why without load time stops increasing but doesn't drop?

Any ideas?

BTW - I can't do anything with profiling/sampling etc. mainly because of
security. I don't know what kind of production are you working with but
attaching sampler/profiler to JVM seems little irresponsibly.



--
View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4106135.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Constantly increasing time of full data import

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Mon, 2013-12-09 at 11:29 +0100, michallos wrote:
> "on production" - no I can't profile it (because of huge overhead) ... Maybe
> with dynamic tracing but we can't do it right now.

https://blogs.oracle.com/nbprofiler/entry/visualvm_1_3_released
First section: "Sampler In The Core Tool".

> After server restart, delta time reset to 15-20 seconds so it is not caused
> by the mergeFactor.

Unless your merges are cascading so that the amount of concurrent merges
is growing. But with fast storage and a lot of RAM for write cache, that
does not sound probable.

> We have SSD and 70GB RAM (it is enough for us).

Sounds like more than enough for a 120GB index.

- Toke Eskildsen, State and University Library, Denmark

Re: Constantly increasing time of full data import

Posted by michallos <mi...@gmail.com>.

"on production" - no I can't profile it (because of huge overhead) ... Maybe
with dynamic tracing but we can't do it right now.

After server restart, delta time reset to 15-20 seconds so it is not caused
by the mergeFactor.
We have SSD and 70GB RAM (it is enough for us).



--
View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4105658.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Constantly increasing time of full data import

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Tue, 2013-12-03 at 17:09 +0100, michallos wrote:
> This occurs only on production environment so I can't profile it :-)

Sure you can [Smiley]

If you use jvisualvm and stay away from the "Profiler"-tab, then you
should be fine. The "Sampler" performs non-intrusive profiling. Not as
thorough as real profiling, but it might help.

So far it sounds like a classic merge-issue though. This would probably
not show up in the profiler. Have you tweaked the mergeFactor?
http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor

With 16 shards/node (guessing: Same storage backend for all shards on a
single node, different storage backends across the nodes) and 15 second
commit time, a segment will be created every second (oversimplifying as
they will cluster, which makes matters worse for spinning drives). If
the mergeFactor is 10, this means that a merge will be going on every 10
seconds. Merges are bulk IO and for spinning drives they get penalized
by concurrent random access.

Consider doing a non-intrusive IO load-logging (bulk as well as IO/sec)
on a node. If you see bulk speed go down considerably when the IO/sec
rises, then you have your problem.

Some solutions are
- Increase your maxTime for autoCommit
- Increase the mergeFactor
- Use SSDs
- Maybe lower the amount of shards to lower the amount of thrashing 
  triggered by concurrent merges
- More RUM (and more RAM)


Regards,
Toke Eskildsen, State and University Library, Denmark

Re: Constantly increasing time of full data import

Posted by michallos <mi...@gmail.com>.

This occurs only on production environment so I can't profile it :-) Any
clues?

DirectUpdateHandler2 config:

 <autoCommit>
    <maxTime>15000</maxTime>
    <openSearcher>false</openSearcher>
</autoCommit>

<updateLog>
    <str name="dir">${solr.ulog.dir:}</str>
</updateLog>



--
View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4104722.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Constantly increasing time of full data import

Posted by Ryan Cutter <ry...@gmail.com>.

Michal,

I don't have much experience with DIH so I'll leave that to someone else
but I would suggest you profile Solr during imports.  That might show you
where the bottleneck is.

Generally, it's reasonable to think Solr updates will get slower the larger
the indexes get and the more load you put on the system.  It's possible
you're seeing something outside the norm - I just don't know what you were
expecting and the capabilities of your resources.

You might want to post more info (autoCommit settings, etc) as well.

Thanks, Ryan

On Mon, Dec 2, 2013 at 4:22 AM, michallos <mi...@gmail.com> wrote:

> Update: I can see that times increases when the search load is higher.
> During
> nights and weekends full load times doesn't increase. So it is not caused
> by
> the number of documents being loaded (during weekends we have the same
> number of new documents) but number of queries / minute.
>
> Anyone observe such strange behaviour? It is critical for us.
>
> Best,
> Michal
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4104370.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Constantly increasing time of full data import

Posted by michallos <mi...@gmail.com>.

Update: I can see that times increases when the search load is higher. During
nights and weekends full load times doesn't increase. So it is not caused by
the number of documents being loaded (during weekends we have the same
number of new documents) but number of queries / minute.

Anyone observe such strange behaviour? It is critical for us.

Best,
Michal



--
View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4104370.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Constantly increasing time of full data import

Posted by michallos <mi...@gmail.com>.

One more info that may be important: this index is divided into 64 logical
shards (4 replica factor).



--
View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4103874.html
Sent from the Solr - User mailing list archive at Nabble.com.