You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Stephen Lewis <sl...@panopto.com> on 2016/04/27 00:49:24 UTC

Tuning solr for large index with rapid writes

Hello,

I'm looking for some guidance on the best steps for tuning a solr cloud
cluster which is heavy on writes. We are currently running a solr cloud
fleet composed of one core, one shard, and three nodes. The cloud is hosted
in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu
and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550
GiB over 420M documents, and growing quite rapidly. We are currently doing
a bit more than 1000 document writes/deletes per second.

Recently, we've hit some trouble with our production cloud. We have had the
process on individual instances die a few times, and we see the following
error messages being logged (expanded logs at the bottom of the email):

ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
null:org.eclipse.jetty.io.EofException

WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.servlet.ServletHandler;
/solr/panopto/select
java.lang.IllegalStateException: Committed

WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response;
Committed before 500 {trace=org.eclipse.jetty.io.EofException


Another time we saw this happen, we had java OOM errors (expanded logs at
the bottom):

WARN  - 2016-04-25 22:58:43.943; org.eclipse.jetty.servlet.ServletHandler;
Error for /solr/panopto/select
java.lang.OutOfMemoryError: Java heap space
ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
...
Caused by: java.lang.OutOfMemoryError: Java heap space


When the cloud goes into recovery during live indexing, it takes about 4-6
hours for a node to recover, but when we turn off indexing, recovery only
takes about 90 minutes.

Moreover, we see that deletes are extremely slow. We do batch deletes of
about 300 documents based on two value filters, and this takes about one
minute:

Research online suggests that a larger disk cache
<https://wiki.apache.org/solr/SolrPerformanceProblems> could be helpful,
but I also see from an older page
<http://wiki.apache.org/lucene-java/ImproveSearchingSpeed> on tuning for
Lucene that turning down the swappiness on our Linux instances may be
preferred to simply increasing space for the disk cache.

Moreover, to scale in the past, we've simply rolled our cluster while
increasing the memory on the new machines, but I wonder if we're hitting
the limit for how much we should scale vertically. My impression is that
sharding will allow us to warm searchers faster and maintain a more
effective cache as we scale. Will we really be helped by sharding, or is it
only a matter of total CPU/Memory in the cluster?

Thanks!

Stephen

(206)753-9320
stephen-lewis.net

​Logs:​

ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
null:org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
at
org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
at
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)

WARN  - 2016-04-25 22:58:43.943; org.eclipse.jetty.servlet.ServletHandler;
Error for /solr/panopto/select
java.lang.OutOfMemoryError: Java heap space
ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

WARN  - 2016-04-26 00:56:43.873; org.eclipse.jetty.server.Response;
Committed before 500 {trace=org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
at
org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
at
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
,code=500}

Re: Tuning solr for large index with rapid writes

Posted by Stephen Lewis <sl...@panopto.com>.
Thanks for the good suggestions on read traffic. I have been simulating
reads through parsing our elb logs and replaying them from a fleet of test
servers acting as frontends using Siege <https://www.joedog.org/siege-home/>.
We are hoping to tune mostly based on exact use case, and so this seems the
most effective route. I see why for the average user experience, 0-hit
queries would provide some better data. Our plan is to start with exact
user patterns and then branch and refine our metrics from there.

For writes, I am using an index rebuild which we have written. We use this
for building anew or refreshing an existing index in case of changes to our
data model, document structure, schema, etc... It was actually turning on
this rebuild to our main cluster that started edging us toward the
performance limits on writes.

After writing last, we discovered we were garbage collection limited in our
current cluster. We noticed that when doing writes, especially the large
volume of writes our background rebuild was using, we generally do okay,
but eventually the GC would do a deep pass and we'd see 504 gateway
timeouts. We updated with the settings from Shawn Heisey
<https://wiki.apache.org/solr/ShawnHeisey>'s page, and we have only seen
timeouts a couple of times since then (these don't kill the rebuild, they
simply get retried later). I see from you here and on another thread right
now that gc seems to be an area of active discussion.

Best,
Stephen

On Mon, May 2, 2016 at 9:20 AM, Erick Erickson <er...@gmail.com>
wrote:

> Bram:
>
> That works. I try to monitor the number of 0-hit
> queries when I generate a test set on the theory that
> those are _usually_ groups of random terms I've
> selected that aren't a good model. So it's often
> a sequence like "generate my list, see which
> ones give 0 results and remove them". Rinse,
> repeat.
>
> Like you said, imperfect but _loads_ better than
> trying to create them without real user queries
> as guidance...
>
> Best,
> Erick
>
> On Sat, Apr 30, 2016 at 4:19 AM, Bram Van Dam <br...@intix.eu>
> wrote:
> >> If I'm reading this right, you have 420M docs on a single shard?
> >> Yep, you were reading it right.
> >
> > Is Erick mentioned, it's hard to give concrete sizing advice, but we've
> > found 120M to be the magic number. When a shard contains more than 120M
> > documents, performance goes down rapidly & GC pauses grow a lot longer.
> > Up until 250M things remain acceptable. But then performance starts to
> > drop very quickly after that.
> >
> >  - Bram
> >
>



-- 
Stephen

(206)753-9320
stephen-lewis.net

Re: Tuning solr for large index with rapid writes

Posted by Erick Erickson <er...@gmail.com>.
Bram:

That works. I try to monitor the number of 0-hit
queries when I generate a test set on the theory that
those are _usually_ groups of random terms I've
selected that aren't a good model. So it's often
a sequence like "generate my list, see which
ones give 0 results and remove them". Rinse,
repeat.

Like you said, imperfect but _loads_ better than
trying to create them without real user queries
as guidance...

Best,
Erick

On Sat, Apr 30, 2016 at 4:19 AM, Bram Van Dam <br...@intix.eu> wrote:
>> If I'm reading this right, you have 420M docs on a single shard?
>> Yep, you were reading it right.
>
> Is Erick mentioned, it's hard to give concrete sizing advice, but we've
> found 120M to be the magic number. When a shard contains more than 120M
> documents, performance goes down rapidly & GC pauses grow a lot longer.
> Up until 250M things remain acceptable. But then performance starts to
> drop very quickly after that.
>
>  - Bram
>

Re: Tuning solr for large index with rapid writes

Posted by Bram Van Dam <br...@intix.eu>.
> If I'm reading this right, you have 420M docs on a single shard?
> Yep, you were reading it right. 

Is Erick mentioned, it's hard to give concrete sizing advice, but we've
found 120M to be the magic number. When a shard contains more than 120M
documents, performance goes down rapidly & GC pauses grow a lot longer.
Up until 250M things remain acceptable. But then performance starts to
drop very quickly after that.

 - Bram


Re: Tuning solr for large index with rapid writes

Posted by Bram Van Dam <br...@intix.eu>.
On 29/04/16 16:33, Erick Erickson wrote:
> You have one huge advantage when doing prototyping, you can
> mine your current logs for real user queries. It's actually
> surprisingly difficult to generate, say, 10,000 "realistic" queries. And
> IMO you need something approaching that number to insure that
> you're queries don't hit the caches etc....

Our approach is to log queries for a while, boil them down to their
different use cases (full text search, simple facet, complex 2D ranged
with stats, etc) and then generate realistic parameter values for each
search field used in those queries. It's not perfect, but it gives you
large amounts of reasonably realistic queries.

Also, you can bypass the query cache by adding {!cache=false} to your query.

 - Bram



Re: Tuning solr for large index with rapid writes

Posted by Erick Erickson <er...@gmail.com>.
Good luck!

You have one huge advantage when doing prototyping, you can
mine your current logs for real user queries. It's actually
surprisingly difficult to generate, say, 10,000 "realistic" queries. And
IMO you need something approaching that number to insure that
you're queries don't hit the caches etc....

Anyway, sounds like you're off and running.

Best,
Erick

On Wed, Apr 27, 2016 at 10:12 AM, Stephen Lewis <sl...@panopto.com> wrote:
>>
> If I'm reading this right, you have 420M docs on a single shard?
> Yep, you were reading it right. Thanks for your guidance. We will do
> various prototyping following "the sizing exercise".
>
> Best,
> Stephen
>
> On Tue, Apr 26, 2016 at 6:17 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>>
>> If I'm reading this right, you have 420M docs on a single shard? If that's
>> true
>> you are pushing the envelope of what I've seen work and be performant. Your
>> OOM errors are the proverbial 'smoking gun' that you're putting too many
>> docs
>> on too few nodes.
>>
>> You say that the document count is "growing quite rapidly". My expectation
>> is
>> that your problems will only get worse as you cram more docs into your
>> shard.
>>
>> You're correct that adding more memory (and consequently more JVM
>> memory?) only gets you so far before you start running into GC trouble,
>> when you hit full GC pauses they'll get longer and longer which is its own
>> problem. And you don't want to have huge JVM memory at the expense
>> of op system memory due the fact that Lucene uses MMapDirectory, see
>> Uwe's excellent blog:
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>
>> I'd _strongly_ recommend you do "the sizing exercise". There are lots of
>> details here:
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> You've already done some of this inadvertently, unfortunately it sounds
>> like
>> it's in production. If I were going to guess, I'd say the maximum number of
>> docs on any shard should be less than half what you currently have. So you
>> need to figure out how many docs you expect to host in this collection
>> eventually
>> and have N/200M shards. At least.
>>
>> There are various strategies when the answer is "I don't know", you
>> might add new
>> collections when you max out and then use "collection aliasing" to
>> query them etc.
>>
>> Best,
>> Erick
>>
>> On Tue, Apr 26, 2016 at 3:49 PM, Stephen Lewis <sl...@panopto.com> wrote:
>> > Hello,
>> >
>> > I'm looking for some guidance on the best steps for tuning a solr cloud
>> > cluster which is heavy on writes. We are currently running a solr cloud
>> > fleet composed of one core, one shard, and three nodes. The cloud is
>> hosted
>> > in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu
>> > and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550
>> > GiB over 420M documents, and growing quite rapidly. We are currently
>> doing
>> > a bit more than 1000 document writes/deletes per second.
>> >
>> > Recently, we've hit some trouble with our production cloud. We have had
>> the
>> > process on individual instances die a few times, and we see the following
>> > error messages being logged (expanded logs at the bottom of the email):
>> >
>> > ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
>> > null:org.eclipse.jetty.io.EofException
>> >
>> > WARN  - 2016-04-26 00:55:29.571;
>> org.eclipse.jetty.servlet.ServletHandler;
>> > /solr/panopto/select
>> > java.lang.IllegalStateException: Committed
>> >
>> > WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response;
>> > Committed before 500 {trace=org.eclipse.jetty.io.EofException
>> >
>> >
>> > Another time we saw this happen, we had java OOM errors (expanded logs at
>> > the bottom):
>> >
>> > WARN  - 2016-04-25 22:58:43.943;
>> org.eclipse.jetty.servlet.ServletHandler;
>> > Error for /solr/panopto/select
>> > java.lang.OutOfMemoryError: Java heap space
>> > ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
>> > null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap
>> space
>> > ...
>> > Caused by: java.lang.OutOfMemoryError: Java heap space
>> >
>> >
>> > When the cloud goes into recovery during live indexing, it takes about
>> 4-6
>> > hours for a node to recover, but when we turn off indexing, recovery only
>> > takes about 90 minutes.
>> >
>> > Moreover, we see that deletes are extremely slow. We do batch deletes of
>> > about 300 documents based on two value filters, and this takes about one
>> > minute:
>> >
>> > Research online suggests that a larger disk cache
>> > <https://wiki.apache.org/solr/SolrPerformanceProblems> could be helpful,
>> > but I also see from an older page
>> > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed> on tuning for
>> > Lucene that turning down the swappiness on our Linux instances may be
>> > preferred to simply increasing space for the disk cache.
>> >
>> > Moreover, to scale in the past, we've simply rolled our cluster while
>> > increasing the memory on the new machines, but I wonder if we're hitting
>> > the limit for how much we should scale vertically. My impression is that
>> > sharding will allow us to warm searchers faster and maintain a more
>> > effective cache as we scale. Will we really be helped by sharding, or is
>> it
>> > only a matter of total CPU/Memory in the cluster?
>> >
>> > Thanks!
>> >
>> > Stephen
>> >
>> > (206)753-9320
>> > stephen-lewis.net
>> >
>> > Logs:
>> >
>> > ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
>> > null:org.eclipse.jetty.io.EofException
>> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
>> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
>> > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
>> > at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
>> > at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
>> > at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
>> > at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
>> > at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
>> > at
>> >
>> org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
>> > at
>> >
>> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> > at
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> > at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> > at
>> >
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> > at java.lang.Thread.run(Thread.java:745)
>> >
>> > WARN  - 2016-04-25 22:58:43.943;
>> org.eclipse.jetty.servlet.ServletHandler;
>> > Error for /solr/panopto/select
>> > java.lang.OutOfMemoryError: Java heap space
>> > ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
>> > null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap
>> space
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> > at
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> > at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
>> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
>> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> > at
>> >
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> > at java.lang.Thread.run(Thread.java:745)
>> > Caused by: java.lang.OutOfMemoryError: Java heap space
>> >
>> > WARN  - 2016-04-26 00:56:43.873; org.eclipse.jetty.server.Response;
>> > Committed before 500 {trace=org.eclipse.jetty.io.EofException
>> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
>> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
>> > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
>> > at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
>> > at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
>> > at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
>> > at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
>> > at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
>> > at
>> >
>> org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
>> > at
>> >
>> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
>> > at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> > at
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> > at
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> > at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> > at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> > at
>> >
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> > at
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>> > at
>> >
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> > at
>> >
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> > at
>> >
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> > at
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> > at java.lang.Thread.run(Thread.java:745)
>> > ,code=500}
>>
>
>
>
> --
> Stephen
>
> (206)753-9320
> stephen-lewis.net

Re: Tuning solr for large index with rapid writes

Posted by Stephen Lewis <sl...@panopto.com>.
​>
If I'm reading this right, you have 420M docs on a single shard?
Yep, you were reading it right. Thanks for your guidance. We will do
various prototyping following "the sizing exercise".

Best,
Stephen

On Tue, Apr 26, 2016 at 6:17 PM, Erick Erickson <er...@gmail.com>
wrote:

> ​​
> If I'm reading this right, you have 420M docs on a single shard? If that's
> true
> you are pushing the envelope of what I've seen work and be performant. Your
> OOM errors are the proverbial 'smoking gun' that you're putting too many
> docs
> on too few nodes.
>
> You say that the document count is "growing quite rapidly". My expectation
> is
> that your problems will only get worse as you cram more docs into your
> shard.
>
> You're correct that adding more memory (and consequently more JVM
> memory?) only gets you so far before you start running into GC trouble,
> when you hit full GC pauses they'll get longer and longer which is its own
> problem. And you don't want to have huge JVM memory at the expense
> of op system memory due the fact that Lucene uses MMapDirectory, see
> Uwe's excellent blog:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> I'd _strongly_ recommend you do "the sizing exercise". There are lots of
> details here:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> You've already done some of this inadvertently, unfortunately it sounds
> like
> it's in production. If I were going to guess, I'd say the maximum number of
> docs on any shard should be less than half what you currently have. So you
> need to figure out how many docs you expect to host in this collection
> eventually
> and have N/200M shards. At least.
>
> There are various strategies when the answer is "I don't know", you
> might add new
> collections when you max out and then use "collection aliasing" to
> query them etc.
>
> Best,
> Erick
>
> On Tue, Apr 26, 2016 at 3:49 PM, Stephen Lewis <sl...@panopto.com> wrote:
> > Hello,
> >
> > I'm looking for some guidance on the best steps for tuning a solr cloud
> > cluster which is heavy on writes. We are currently running a solr cloud
> > fleet composed of one core, one shard, and three nodes. The cloud is
> hosted
> > in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu
> > and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550
> > GiB over 420M documents, and growing quite rapidly. We are currently
> doing
> > a bit more than 1000 document writes/deletes per second.
> >
> > Recently, we've hit some trouble with our production cloud. We have had
> the
> > process on individual instances die a few times, and we see the following
> > error messages being logged (expanded logs at the bottom of the email):
> >
> > ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
> > null:org.eclipse.jetty.io.EofException
> >
> > WARN  - 2016-04-26 00:55:29.571;
> org.eclipse.jetty.servlet.ServletHandler;
> > /solr/panopto/select
> > java.lang.IllegalStateException: Committed
> >
> > WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response;
> > Committed before 500 {trace=org.eclipse.jetty.io.EofException
> >
> >
> > Another time we saw this happen, we had java OOM errors (expanded logs at
> > the bottom):
> >
> > WARN  - 2016-04-25 22:58:43.943;
> org.eclipse.jetty.servlet.ServletHandler;
> > Error for /solr/panopto/select
> > java.lang.OutOfMemoryError: Java heap space
> > ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
> > null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap
> space
> > ...
> > Caused by: java.lang.OutOfMemoryError: Java heap space
> >
> >
> > When the cloud goes into recovery during live indexing, it takes about
> 4-6
> > hours for a node to recover, but when we turn off indexing, recovery only
> > takes about 90 minutes.
> >
> > Moreover, we see that deletes are extremely slow. We do batch deletes of
> > about 300 documents based on two value filters, and this takes about one
> > minute:
> >
> > Research online suggests that a larger disk cache
> > <https://wiki.apache.org/solr/SolrPerformanceProblems> could be helpful,
> > but I also see from an older page
> > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed> on tuning for
> > Lucene that turning down the swappiness on our Linux instances may be
> > preferred to simply increasing space for the disk cache.
> >
> > Moreover, to scale in the past, we've simply rolled our cluster while
> > increasing the memory on the new machines, but I wonder if we're hitting
> > the limit for how much we should scale vertically. My impression is that
> > sharding will allow us to warm searchers faster and maintain a more
> > effective cache as we scale. Will we really be helped by sharding, or is
> it
> > only a matter of total CPU/Memory in the cluster?
> >
> > Thanks!
> >
> > Stephen
> >
> > (206)753-9320
> > stephen-lewis.net
> >
> > Logs:
> >
> > ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
> > null:org.eclipse.jetty.io.EofException
> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
> > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
> > at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
> > at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> > at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
> > at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
> > at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
> > at
> >
> org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
> > at
> >
> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> > at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> > at
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> > at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> > at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> > at
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> > at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> > at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > WARN  - 2016-04-25 22:58:43.943;
> org.eclipse.jetty.servlet.ServletHandler;
> > Error for /solr/panopto/select
> > java.lang.OutOfMemoryError: Java heap space
> > ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
> > null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap
> space
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> > at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> > at
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> > at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
> > at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> > at
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> > at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> > at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.lang.OutOfMemoryError: Java heap space
> >
> > WARN  - 2016-04-26 00:56:43.873; org.eclipse.jetty.server.Response;
> > Committed before 500 {trace=org.eclipse.jetty.io.EofException
> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
> > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
> > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
> > at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
> > at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> > at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
> > at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
> > at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
> > at
> >
> org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
> > at
> >
> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> > at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> > at
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> > at org.eclipse.jetty.server.Server.handle(Server.java:368)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> > at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> > at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> > at
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> > at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> > at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> > at java.lang.Thread.run(Thread.java:745)
> > ,code=500}
>



-- 
Stephen

(206)753-9320
stephen-lewis.net

Re: Tuning solr for large index with rapid writes

Posted by Erick Erickson <er...@gmail.com>.
If I'm reading this right, you have 420M docs on a single shard? If that's true
you are pushing the envelope of what I've seen work and be performant. Your
OOM errors are the proverbial 'smoking gun' that you're putting too many docs
on too few nodes.

You say that the document count is "growing quite rapidly". My expectation is
that your problems will only get worse as you cram more docs into your shard.

You're correct that adding more memory (and consequently more JVM
memory?) only gets you so far before you start running into GC trouble,
when you hit full GC pauses they'll get longer and longer which is its own
problem. And you don't want to have huge JVM memory at the expense
of op system memory due the fact that Lucene uses MMapDirectory, see
Uwe's excellent blog:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I'd _strongly_ recommend you do "the sizing exercise". There are lots of
details here:
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

You've already done some of this inadvertently, unfortunately it sounds like
it's in production. If I were going to guess, I'd say the maximum number of
docs on any shard should be less than half what you currently have. So you
need to figure out how many docs you expect to host in this collection
eventually
and have N/200M shards. At least.

There are various strategies when the answer is "I don't know", you
might add new
collections when you max out and then use "collection aliasing" to
query them etc.

Best,
Erick

On Tue, Apr 26, 2016 at 3:49 PM, Stephen Lewis <sl...@panopto.com> wrote:
> Hello,
>
> I'm looking for some guidance on the best steps for tuning a solr cloud
> cluster which is heavy on writes. We are currently running a solr cloud
> fleet composed of one core, one shard, and three nodes. The cloud is hosted
> in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu
> and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550
> GiB over 420M documents, and growing quite rapidly. We are currently doing
> a bit more than 1000 document writes/deletes per second.
>
> Recently, we've hit some trouble with our production cloud. We have had the
> process on individual instances die a few times, and we see the following
> error messages being logged (expanded logs at the bottom of the email):
>
> ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
> null:org.eclipse.jetty.io.EofException
>
> WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.servlet.ServletHandler;
> /solr/panopto/select
> java.lang.IllegalStateException: Committed
>
> WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response;
> Committed before 500 {trace=org.eclipse.jetty.io.EofException
>
>
> Another time we saw this happen, we had java OOM errors (expanded logs at
> the bottom):
>
> WARN  - 2016-04-25 22:58:43.943; org.eclipse.jetty.servlet.ServletHandler;
> Error for /solr/panopto/select
> java.lang.OutOfMemoryError: Java heap space
> ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
> null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
> ...
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>
> When the cloud goes into recovery during live indexing, it takes about 4-6
> hours for a node to recover, but when we turn off indexing, recovery only
> takes about 90 minutes.
>
> Moreover, we see that deletes are extremely slow. We do batch deletes of
> about 300 documents based on two value filters, and this takes about one
> minute:
>
> Research online suggests that a larger disk cache
> <https://wiki.apache.org/solr/SolrPerformanceProblems> could be helpful,
> but I also see from an older page
> <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed> on tuning for
> Lucene that turning down the swappiness on our Linux instances may be
> preferred to simply increasing space for the disk cache.
>
> Moreover, to scale in the past, we've simply rolled our cluster while
> increasing the memory on the new machines, but I wonder if we're hitting
> the limit for how much we should scale vertically. My impression is that
> sharding will allow us to warm searchers faster and maintain a more
> effective cache as we scale. Will we really be helped by sharding, or is it
> only a matter of total CPU/Memory in the cluster?
>
> Thanks!
>
> Stephen
>
> (206)753-9320
> stephen-lewis.net
>
> Logs:
>
> ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
> null:org.eclipse.jetty.io.EofException
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
> at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
> at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
> at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
> at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
> at
> org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
> at
> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
> at
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
>
> WARN  - 2016-04-25 22:58:43.943; org.eclipse.jetty.servlet.ServletHandler;
> Error for /solr/panopto/select
> java.lang.OutOfMemoryError: Java heap space
> ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
> null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
> WARN  - 2016-04-26 00:56:43.873; org.eclipse.jetty.server.Response;
> Committed before 500 {trace=org.eclipse.jetty.io.EofException
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
> at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
> at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
> at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
> at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
> at
> org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
> at
> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
> at
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
> ,code=500}