You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by cwhi <ch...@gmail.com> on 2014/01/15 22:43:53 UTC

SolrException Error when indexing new documents at scale in SolrCloud -

I have a SolrCloud installation with about 2 million documents indexed in it. 
It's been buzzing along without issue for the past 8 days, but today started
throwing errors on document adds that eventually resulted in out of memory
exceptions.  There is nothing funny going on.  There are a few infrequent
searches on the index every few minutes, and documents are being added in
batch (batches of 1000-5000) every few minutes as well.

The exceptions I'm receiving don't seem very informative.  The first
exception looks like this:

org.apache.solr.common.SolrException
	at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
	at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
	at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
	at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
-- snip --

I've now experienced this with two SolrCloud instances in a row.  The
SolrCloud instance has 3 shards, each on a separate machine (each machine is
also running Zookeeper).  Each of the machines have 4 GB of RAM, with ~1.5
GB allocated to Solr.  Solr seems to be maxing out the CPU on index, so I
don't know if that's related.

If anybody could help me in sorting out these issues, it would be greatly
appreciated.  I pulled the Solr log file and have uploaded it at
https://www.dropbox.com/s/co3r4esjnsas0tl/solr.log

Also, a short snippet of the first exception is available on pastebin at
http://pastebin.com/pWZrkGEr

Thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Error-when-indexing-new-documents-at-scale-in-SolrCloud-tp4111551.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrException Error when indexing new documents at scale in SolrCloud -

Posted by Erick Erickson <er...@gmail.com>.
When the JVM is out of memory, you get OOM exceptions, one of
the characteristics of the op system.....

I'd guess that you're not actually in the same environment on both
machines. The Solr admin page will tell you how much memory Solr
_thinks_ it has allocated to the JVM, it's worth checking just to be
sure.

Best,
Erick

On Thu, Jan 16, 2014 at 9:30 AM, cwhi <ch...@gmail.com> wrote:
> Hi Shawn,
>
> Thanks for the helpful and thorough response.  While I understand all of the
> factors that you've outlined for memory requirements (in fact, I'd
> previously read your page on Solr performance problems), it is baffling to
> me why two identical SolrCloud instances, each sharded across 3 machines
> with identical hardware, would run into these memory issues at such
> differently memory limits (one SolrCloud instance started seeing OOM issues
> at 2 million indexed documents, the other started seeing OOM issues between
> 20 and 30 million indexed documents).
>
> When I stated that approximately 1.5GB, I mean that this is how much heap
> space I allocated when launching java with -Xmx, and I can see the java
> process using that full amount of RAM.
>
> From a usage perspective, the usage doesn't seem all that heavy.  I'm
> indexing about 600k documents an hour (each of which have ~20 short numeric
> or string fields).  I have the autoSoftCommit parameter set for once a
> second, and the autoCommit time set for every 5 minutes, with openSearcher
> set to false.  Finally, I have maxWarmingSearchers at 2.  Besides indexing
> those documents, I've been doing a few small queries just to check how many
> documents have been indexed, and a few other small queries, sorting by a
> single attribute.  These searches are very infrequent though, maybe 5 or 6
> an hour.
>
> Seems like a strange issue indeed.  My expectation is that Solr would hit a
> point where it becomes horribly slow after some threshold where things don't
> fit in the cache, but I'd never expect it to simply crash like it's doing.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Error-when-indexing-new-documents-at-scale-in-SolrCloud-tp4111551p4111680.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrException Error when indexing new documents at scale in SolrCloud -

Posted by cwhi <ch...@gmail.com>.
Hi Shawn,

Thanks for the helpful and thorough response.  While I understand all of the
factors that you've outlined for memory requirements (in fact, I'd
previously read your page on Solr performance problems), it is baffling to
me why two identical SolrCloud instances, each sharded across 3 machines
with identical hardware, would run into these memory issues at such
differently memory limits (one SolrCloud instance started seeing OOM issues
at 2 million indexed documents, the other started seeing OOM issues between
20 and 30 million indexed documents). 

When I stated that approximately 1.5GB, I mean that this is how much heap
space I allocated when launching java with -Xmx, and I can see the java
process using that full amount of RAM.  

>From a usage perspective, the usage doesn't seem all that heavy.  I'm
indexing about 600k documents an hour (each of which have ~20 short numeric
or string fields).  I have the autoSoftCommit parameter set for once a
second, and the autoCommit time set for every 5 minutes, with openSearcher
set to false.  Finally, I have maxWarmingSearchers at 2.  Besides indexing
those documents, I've been doing a few small queries just to check how many
documents have been indexed, and a few other small queries, sorting by a
single attribute.  These searches are very infrequent though, maybe 5 or 6
an hour.

Seems like a strange issue indeed.  My expectation is that Solr would hit a
point where it becomes horribly slow after some threshold where things don't
fit in the cache, but I'd never expect it to simply crash like it's doing.



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Error-when-indexing-new-documents-at-scale-in-SolrCloud-tp4111551p4111680.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrException Error when indexing new documents at scale in SolrCloud -

Posted by Shawn Heisey <so...@elyograg.org>.
On 1/15/2014 3:10 PM, cwhi wrote:
> Thanks for the quick reply.  I did notice the exception you pointed out and
> had some thoughts about it maybe being the client library I'm using to
> connect to Solr (C# SolrNet) disconnecting too early, but that doesn't
> explain it eventually running out of memory altogether.  A large index
> shouldn't cause Solr to run out of memory, since it would just go to disk on
> queries to process requests instead of holding the entire index in memory.

If you're seeing OutOfMemoryError problems, that has nothing at all to 
do with total memory on the system or the OS disk cache.  The OS disk 
cache is what holds all or part of the actual on-disk index data in 
memory.  You're right that it would just go to the disk in order to 
process requests - but disks are *REALLY* slow compared to RAM, so 
whenever you have to actually hit the disk, performance drops drastically.

OutOfMemoryErrors have to to do with the Java heap.  Solr (Lucene, 
really) doesn't hold the actual index in memory, but there are certain 
query patterns that do cause a lot of heap memory to be consumed and not 
released, in the interests of performance.  One of those things is 
sorting, another is facets. I've heard that filters and field collapsing 
will do much the same thing.  If you are doing heavy indexing or issuing 
frequent index commits, a lot of heap memory can be required by that as 
well.

> I'm also not sure that the index size is the case, because I have another
> SolrCloud instance running where I saw this behaviour at ~20 million, rather
> than 2 million documents (same type of documents, so much larger on disk).
> The machines these are running on are identical Amazon EC2 instances as
> well, so that rules out the larger index succeeding for longer due to better
> hardware.

When you use memory-hungry features, the amount of heap that's required 
will typically go up with the number of total documents.  The 
discrepancy here is probably due to how Solr is being used and how 
everything is configured.

You said that approximately 1.5GB is allocated to Solr.  Is this an 
actual Java heap setting, or are you seeing that number in a graph 
somewhere? If you look at the JVM-Memory graph on the Solr 4.x 
dashboard, you'll see three numbers.  There is the currently utilized 
heap memory, the amount of memory that Java has currently allocated from 
the operating system, and the maximum amount that it CAN allocate.  The 
middle number (*NOT* the first number) is the amount of system memory 
that the Java instance is using (not including a bunch of megabytes for 
Java itself), and you can assume that the third number is the amount 
that will be used in the long term.

I've condensed a bunch of memory and performance related info into this 
wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn


Re: SolrException Error when indexing new documents at scale in SolrCloud -

Posted by cwhi <ch...@gmail.com>.
Hi Shawn,

Thanks for the quick reply.  I did notice the exception you pointed out and
had some thoughts about it maybe being the client library I'm using to
connect to Solr (C# SolrNet) disconnecting too early, but that doesn't
explain it eventually running out of memory altogether.  A large index
shouldn't cause Solr to run out of memory, since it would just go to disk on
queries to process requests instead of holding the entire index in memory.  

I'm also not sure that the index size is the case, because I have another
SolrCloud instance running where I saw this behaviour at ~20 million, rather
than 2 million documents (same type of documents, so much larger on disk). 
The machines these are running on are identical Amazon EC2 instances as
well, so that rules out the larger index succeeding for longer due to better
hardware.



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Error-when-indexing-new-documents-at-scale-in-SolrCloud-tp4111551p4111561.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrException Error when indexing new documents at scale in SolrCloud -

Posted by Shawn Heisey <so...@elyograg.org>.
On 1/15/2014 2:43 PM, cwhi wrote:
> I have a SolrCloud installation with about 2 million documents indexed in it.
> It's been buzzing along without issue for the past 8 days, but today started
> throwing errors on document adds that eventually resulted in out of memory
> exceptions.  There is nothing funny going on.  There are a few infrequent
> searches on the index every few minutes, and documents are being added in
> batch (batches of 1000-5000) every few minutes as well.
>
> The exceptions I'm receiving don't seem very informative.  The first
> exception looks like this:
>
> org.apache.solr.common.SolrException
> 	at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> 	at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> 	at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> 	at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> -- snip --
>
> I've now experienced this with two SolrCloud instances in a row.  The
> SolrCloud instance has 3 shards, each on a separate machine (each machine is
> also running Zookeeper).  Each of the machines have 4 GB of RAM, with ~1.5
> GB allocated to Solr.  Solr seems to be maxing out the CPU on index, so I
> don't know if that's related.
>
> If anybody could help me in sorting out these issues, it would be greatly
> appreciated.  I pulled the Solr log file and have uploaded it at
> https://www.dropbox.com/s/co3r4esjnsas0tl/solr.log
>
> Also, a short snippet of the first exception is available on pastebin at
> http://pastebin.com/pWZrkGEr

I think the relevant part of your exception is this is:

Caused by: org.eclipse.jetty.io.EofException
<snip>
Caused by: java.net.SocketException: Connection reset

When Jetty throws the EofException, it's almost always caused by the 
client disconnecting the TCP connection before the HTTP transaction is 
complete.  The "Connection reset" message pretty much confirms it, IMHO.

What I think *might* be happening here is that you have a low SO_TIMEOUT 
configured on whatever is making your HTTP connections, and the update 
requests are not completing before that timeout expires, so the client 
closes the TCP connection before transfer is done.  Most of the time, 
SO_TIMEOUT should either be left at infinity or configured with an 
insanely high value measured in minutes, not seconds.

An potential underlying problem is that your index has gotten too big 
and the OS disk cache is no longer able to cache it effectively.  When 
this happens, Solr performance will drop significantly.  It's very 
common for Solr to be completely fine up to a certain threshold and then 
suffer horrible performance problems once that threshold is crossed.

Thanks,
Shawn