You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Kumar Limbu <ku...@gmail.com> on 2013/05/04 10:06:59 UTC

Why is SolrCloud doing a full copy of the index?

We have Solr setup on 3 machines with only a single shard. We are using Solr
4.0 and currently have around 7 Million documents in our index. The size of
our index is around 25 GB. We have a zookeeper ensemble of 3 zookeeper
instances.
 
Let's call the servers in our setup server (A), (B) and (C). All updates to
Solr goes via server (C). Searches are performed on server (A) and (B). The
updates are normally propagated incrementally from server (C) to the other 2
servers.  Intermittently we have noted that the servers (A) and (B) makes a
full copy of the index from server (C). This is not ideal because when this
happens performance suffers. This occurs quite randomly and can occur on any
of the other 2 nodes i.e. (A) and (B). 
 
On the server (C), which is the leader, we see errors like the following .We
suspect this might be the reason why a full index copy occurs in the other
nodes but we haven't been able to find out why this error is occurring.
There is no connectivity issue with the servers.
 
See the stacktrace below:

SEVERE: shard update error StdNode:
http://serverA/solr/rn0/:org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at: http:// serverA/solr/rn0
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
        at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
        at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.http.NoHttpResponseException: The target server failed
to respond
        at
org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
        at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
        at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
        at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
        at
org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
        at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
        at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
        at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
        at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
        at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
        ... 11 more
                                
If anyone can help us troubleshoot this problem we will really appreciate
the help. If there are any questions regarding our setup or further
information regarding the error, please let me know.                 




--
View this message in context: http://lucene.472066.n3.nabble.com/Why-is-SolrCloud-doing-a-full-copy-of-the-index-tp4060800.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Why is SolrCloud doing a full copy of the index?

Posted by Erick Erickson <er...@gmail.com>.

Hmmm, there was a problem with replication where it would do a full
copy unnecessarily that was fixed in 4.2 (I think). Frankly I don't
quite know whether it was a problem with 4.0.

It's also possible that your servlet containers have a short enough
timeout that you're occasionally just getting connection timeouts, so
lengthening that interval might be worthwhile, but that's a stab in
the dark.

Best
Erick

On Sat, May 4, 2013 at 4:06 AM, Kumar Limbu <ku...@gmail.com> wrote:
> We have Solr setup on 3 machines with only a single shard. We are using Solr
> 4.0 and currently have around 7 Million documents in our index. The size of
> our index is around 25 GB. We have a zookeeper ensemble of 3 zookeeper
> instances.
>
> Let's call the servers in our setup server (A), (B) and (C). All updates to
> Solr goes via server (C). Searches are performed on server (A) and (B). The
> updates are normally propagated incrementally from server (C) to the other 2
> servers.  Intermittently we have noted that the servers (A) and (B) makes a
> full copy of the index from server (C). This is not ideal because when this
> happens performance suffers. This occurs quite randomly and can occur on any
> of the other 2 nodes i.e. (A) and (B).
>
> On the server (C), which is the leader, we see errors like the following .We
> suspect this might be the reason why a full index copy occurs in the other
> nodes but we haven't been able to find out why this error is occurring.
> There is no connectivity issue with the servers.
>
> See the stacktrace below:
>
> SEVERE: shard update error StdNode:
> http://serverA/solr/rn0/:org.apache.solr.client.solrj.SolrServerException:
> IOException occured when talking to server at: http:// serverA/solr/rn0
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>         at
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335)
>         at
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.http.NoHttpResponseException: The target server failed
> to respond
>         at
> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
>         at
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
>         at
> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
>         at
> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
>         at
> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
>         at
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
>         at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>         at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
>         at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>         ... 11 more
>
> If anyone can help us troubleshoot this problem we will really appreciate
> the help. If there are any questions regarding our setup or further
> information regarding the error, please let me know.
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Why-is-SolrCloud-doing-a-full-copy-of-the-index-tp4060800.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Why is SolrCloud doing a full copy of the index?

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

I just looked at SPM monitoring we have for Solr servers that run
search-lucene.com.  One of them has 1-2 collections/minute.  Another
one closer to 10.  These are both small servers with small JVM heaps.
Here is a graph of one of them:

https://apps.sematext.com/spm/s/104ppwguao

Just looked at some other Java servers we have running, not Solr, and
I see close to 60 small collections per minute.

So these numbers will vary a lot depending on the heap size and other
JVM settings, as well as the actual code/usage. :)

Otis
--
Solr & ElasticSearch Support
http://sematext.com/

On Mon, May 6, 2013 at 4:39 PM, Shawn Heisey <so...@elyograg.org> wrote:
> On 5/6/2013 1:39 PM, Michael Della Bitta wrote:
>>
>> Hi Shawn,
>>
>> Thanks a lot for this entry!
>>
>> I'm wondering, when you say "Garbage collections that happen more often
>> than ten or so times per minute may be an indication that the heap size is
>> too small," do you mean *any* collections, or just full collections?
>
>
> My gut reaction is any collection, but in extremely busy environments a rate
> of ten per minute might be a very slow day on a setup that's working
> perfectly.
>
> As I wrote that particular bit, I was thinking that any number I put there
> was probably wrong for some large subset of users, but I wanted to finish
> putting down my thoughts and improve it later.
>
> Thanks,
> Shawn
>

Re: Why is SolrCloud doing a full copy of the index?

Posted by Shawn Heisey <so...@elyograg.org>.

On 5/6/2013 1:39 PM, Michael Della Bitta wrote:
> Hi Shawn,
>
> Thanks a lot for this entry!
>
> I'm wondering, when you say "Garbage collections that happen more often
> than ten or so times per minute may be an indication that the heap size is
> too small," do you mean *any* collections, or just full collections?

My gut reaction is any collection, but in extremely busy environments a 
rate of ten per minute might be a very slow day on a setup that's 
working perfectly.

As I wrote that particular bit, I was thinking that any number I put 
there was probably wrong for some large subset of users, but I wanted to 
finish putting down my thoughts and improve it later.

Thanks,
Shawn

Re: Why is SolrCloud doing a full copy of the index?

Posted by Michael Della Bitta <mi...@appinions.com>.

Hi Shawn,

Thanks a lot for this entry!

I'm wondering, when you say "Garbage collections that happen more often
than ten or so times per minute may be an indication that the heap size is
too small," do you mean *any* collections, or just full collections?

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game

On Sat, May 4, 2013 at 1:55 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 5/4/2013 11:45 AM, Shawn Heisey wrote:
> > Advance warning: this is a long reply.
>
> I have condensed some relevant performance problem information into the
> following wiki page:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Anyone who has additional information for this page, feel free to add
> it.  I hope I haven't made too many mistakes!
>
> Thanks,
> Shawn
>
>

Re: Why is SolrCloud doing a full copy of the index?

Posted by Kumar Limbu <ku...@gmail.com>.

Thanks for the replies. It is really appreciated.

Based on the replies it seems like upgrading to the latest version of Solr
is something that will probably resolve this issue.

We are also update quite frequently. We update every 5 minutes. We will try
and set this to higher interval and see if that helps.

We will also try increasing the servlet timeout and see if that resolves the
issue.

Among the other suggestions we already tried increasing the zkClientTimeout
from 15 seconds to 30 seconds but that didn't seem to help. What do you
recommend is a good value to try?

Few more details about our system:
we are running this on a system with 16GB of RAM. We are using 64 bit server
and we also use SSD disks.

Also, since we are already using 4.0 in our production environment with the
aforementioned 3 servers setup, how should we go about upgrading to the
latest version (4.3)? Do we need to do a full reindex of our data or is the
index compatible between these versions? 

We will try out the suggestions and will post later if any of them help us
resolve the issue.

Again, thanks for the reply.



--
View this message in context: http://lucene.472066.n3.nabble.com/Why-is-SolrCloud-doing-a-full-copy-of-the-index-tp4060800p4060897.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Why is SolrCloud doing a full copy of the index?

Posted by Erick Erickson <er...@gmail.com>.

Second the thanks....

Erick

On Sat, May 4, 2013 at 6:08 PM, Lance Norskog <go...@gmail.com> wrote:
> Great! Thank you very much Shawn.
>
>
> On 05/04/2013 10:55 AM, Shawn Heisey wrote:
>>
>> On 5/4/2013 11:45 AM, Shawn Heisey wrote:
>>>
>>> Advance warning: this is a long reply.
>>
>> I have condensed some relevant performance problem information into the
>> following wiki page:
>>
>> http://wiki.apache.org/solr/SolrPerformanceProblems
>>
>> Anyone who has additional information for this page, feel free to add
>> it.  I hope I haven't made too many mistakes!
>>
>> Thanks,
>> Shawn
>>
>

Re: Why is SolrCloud doing a full copy of the index?

Posted by Lance Norskog <go...@gmail.com>.

Great! Thank you very much Shawn.

On 05/04/2013 10:55 AM, Shawn Heisey wrote:
> On 5/4/2013 11:45 AM, Shawn Heisey wrote:
>> Advance warning: this is a long reply.
> I have condensed some relevant performance problem information into the
> following wiki page:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Anyone who has additional information for this page, feel free to add
> it.  I hope I haven't made too many mistakes!
>
> Thanks,
> Shawn
>

Re: Why is SolrCloud doing a full copy of the index?

Posted by Shawn Heisey <so...@elyograg.org>.

On 5/4/2013 11:45 AM, Shawn Heisey wrote:
> Advance warning: this is a long reply.

I have condensed some relevant performance problem information into the
following wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

Anyone who has additional information for this page, feel free to add
it.  I hope I haven't made too many mistakes!

Thanks,
Shawn

Re: Why is SolrCloud doing a full copy of the index?

Posted by Kristopher Kane <kk...@gmail.com>.

> 
> Advance warning: this is a long reply.
> 

Awesome Shawn.  Thanks!

Re: Why is SolrCloud doing a full copy of the index?

Posted by Shawn Heisey <so...@elyograg.org>.

On 5/4/2013 2:06 AM, Kumar Limbu wrote:
> We have Solr setup on 3 machines with only a single shard. We are using Solr
> 4.0 and currently have around 7 Million documents in our index. The size of
> our index is around 25 GB. We have a zookeeper ensemble of 3 zookeeper
> instances.
>  
> Let's call the servers in our setup server (A), (B) and (C). All updates to
> Solr goes via server (C). Searches are performed on server (A) and (B). The
> updates are normally propagated incrementally from server (C) to the other 2
> servers.  Intermittently we have noted that the servers (A) and (B) makes a
> full copy of the index from server (C). This is not ideal because when this
> happens performance suffers. This occurs quite randomly and can occur on any
> of the other 2 nodes i.e. (A) and (B). 
>  
> On the server (C), which is the leader, we see errors like the following .We
> suspect this might be the reason why a full index copy occurs in the other
> nodes but we haven't been able to find out why this error is occurring.
> There is no connectivity issue with the servers.

Advance warning: this is a long reply.

The first thing that jumped out at me was the Solr version.  Version 4.0
was brand new in October of last year.  It's a senior citizen now.  It
has a lot of bugs, particularly in SolrCloud stability.  I would
recommend upgrading to at least 4.2.1.

Version 4.3.0 (the fourth since 4.0) is quite literally about to be
unveiled.  It is already on a lot of download mirrors, the announcement
is due any time now.

Now for things to consider that don't involve upgrading, but might still
be issues after upgrading:

You might be able to make your system more stable by increasing your
zkClientTimeout.  A typical example value for this setting is 15
seconds. Next we will discuss why you might be exceeding the timeout:

Slow operations, especially on commits, can be responsible for exceeding
timeouts.  One of the things you can do to decrease commit time is to
lower the autowarmCount on your Solr caches.  You can also decrease the
frequency of your commits.

A 25GB index is relatively large, and requires a lot of memory for
proper operation.  The reason it requires a lot of memory is because
Solr is very reliant on the operating system disk cache, which uses free
memory.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

With a 25GB index, you want to have between 15 and 25GB of memory over
and above the memory that your programs use.  You would probably want to
give the Java heap for Solr between 4 and 8GB.  For a dedicated Solr
server with your index, a really good amount of total system memory
would be 32GB, with 24GB being a reasonable starting point.

It should go without saying that you need a 64 bit server, a 64 bit
operating system, and 64 bit Java for all this to work correctly.  32
bit software is not good at dealing with large amounts of memory, and 32
bit Java cannot have a heap size larger than 2GB.

If you upgrade to 4.2.1 or later and reindex, your index size will drop
due to compression of certain pieces.  Those pieces don't normally
affect minimum memory requirements very much, so your free memory
requirement will still probably be at least 15GB.

Unless you are using a commercial JVM with low-pause characteristics
(like Zing), a heap of 4GB or larger can give you problems with
stop-the-world GC pauses.  A large heap is unfortunately required with a
large index.  The default collector that Java gives you is a *terrible*
choice for large heaps in general and Solr in particular.  Even changing
to the CMS collector may not be enough - more tuning is required.

Thanks,
Shawn