You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ilay raja <il...@gmail.com> on 2013/12/19 11:44:51 UTC

Solr cloud (4.6.0) instances going down

Hi,

  I have deployed solr cloud with external zookeeper ensemble (5
instances). I am running solr instances on two servers with single shard
index. There are 6 replicas. I often see solr going down during high search
load (or) whenever i run indexing documents. I tried tuning hardcommit
(kept as 15 mins) and softcommits(12 mins). Also, set zkClientTimeout as 30
secs. I observed sometimes OOM, Socket exceptions., EOF exceptions in solr
logs while the instance is going down. Also, zookeeper recovery for the
solr instance is going in loop .... My use case is sort of high search (100
queries per sec) / heavy indexing (10 K docs per minute). What is the best
way to keep stable solr cloud isntances with external ensemble. Should we
try running zookeeper internally, because looks like zookeeper handshaking
might be an issue as well. Is solr cloud stable for production ? or there
are open issues still. Please guide me.

RE: Solr cloud (4.6.0) instances going down

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

 

this is the development mailing list. Please ask such questions on solr-user@lucene.apache.org

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: ilay raja [mailto:ilay.msp@gmail.com] 
Sent: Thursday, December 19, 2013 11:45 AM
To: solr-user@lucene.apache.org; solr-dev@lucene.apache.org
Subject: Solr cloud (4.6.0) instances going down

 

Hi,

 

  I have deployed solr cloud with external zookeeper ensemble (5 instances). I am running solr instances on two servers with single shard index. There are 6 replicas. I often see solr going down during high search load (or) whenever i run indexing documents. I tried tuning hardcommit (kept as 15 mins) and softcommits(12 mins). Also, set zkClientTimeout as 30 secs. I observed sometimes OOM, Socket exceptions., EOF exceptions in solr logs while the instance is going down. Also, zookeeper recovery for the solr instance is going in loop .... My use case is sort of high search (100 queries per sec) / heavy indexing (10 K docs per minute). What is the best way to keep stable solr cloud isntances with external ensemble. Should we try running zookeeper internally, because looks like zookeeper handshaking might be an issue as well. Is solr cloud stable for production ? or there are open issues still. Please guide me.


Re: Solr cloud (4.6.0) instances going down

Posted by ilay raja <il...@gmail.com>.
On Thu, Dec 19, 2013 at 4:14 PM, ilay raja <il...@gmail.com> wrote:

> Hi,
>
>   I have deployed solr cloud with external zookeeper ensemble (5
> instances). I am running solr instances on two servers with single shard
> index. There are 6 replicas. I often see solr going down during high search
> load (or) whenever i run indexing documents. I tried tuning hardcommit
> (kept as 15 mins) and softcommits(12 mins). Also, set zkClientTimeout as 30
> secs. I observed sometimes OOM, Socket exceptions., EOF exceptions in solr
> logs while the instance is going down. Also, zookeeper recovery for the
> solr instance is going in loop .... My use case is sort of high search (100
> queries per sec) / heavy indexing (10 K docs per minute). What is the best
> way to keep stable solr cloud isntances with external ensemble. Should we
> try running zookeeper internally, because looks like zookeeper handshaking
> might be an issue as well. Is solr cloud stable for production ? or there
> are open issues still. Please guide me.
>

Re: Solr cloud (4.6.0) instances going down

Posted by Yago Riveiro <ya...@gmail.com>.
I have a lot of problem with the stability of my cloud. 

To improve the stability:

- Move zookeeper to another disk, the I/O from solr.home can kill your ensemble.

- Raise the zkTimeoutLimit to 60s

- Don't use a very big heap if you don't need, try with values around 4g and increase until OOM doesn't happen.

- Use the recommendations to tune the heap from http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning, 99% of my problems with zookeeper was fixed.

- Log gc times, I discover pauses of 32s on my boxes, totally killer for zookeeper, the result, tons of session expired. 


-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, December 19, 2013 at 5:45 PM, Shawn Heisey wrote:

> On 12/19/2013 3:44 AM, ilay raja wrote:
> > I have deployed solr cloud with external zookeeper ensemble (5
> > instances). I am running solr instances on two servers with single shard
> > index. There are 6 replicas. I often see solr going down during high search
> > load (or) whenever i run indexing documents. I tried tuning hardcommit
> > (kept as 15 mins) and softcommits(12 mins). Also, set zkClientTimeout as 30
> > secs. I observed sometimes OOM, Socket exceptions., EOF exceptions in solr
> > logs while the instance is going down. Also, zookeeper recovery for the
> > solr instance is going in loop .... My use case is sort of high search (100
> > queries per sec) / heavy indexing (10 K docs per minute). What is the best
> > way to keep stable solr cloud isntances with external ensemble. Should we
> > try running zookeeper internally, because looks like zookeeper handshaking
> > might be an issue as well. Is solr cloud stable for production ? or there
> > are open issues still. Please guide me.
> > 
> 
> 
> You definitely do not want to run zookeeper embedded in Solr. The
> simple reason for this is simply because if you stop Solr, you also stop
> zookeeper. Zookeeper works best if it remains up all the time, so an
> external ensemble is highly recommended.
> 
> It's probably a good idea to set the max heap on the zookeeper startup
> ... one of my zk java instances is using 65MB resident memory, so unless
> it's a very large cloud, a low number like 128MB would probably be enough.
> 
> I've heard that heavy I/O on the disk with the zookeeper data can cause
> problems for zookeeper. This is the one danger that can come from
> putting both Solr and an external zookeeper on the same host, which is
> usually a very safe thing to do. Unless you've got very fast I/O, it's
> recommended that the zookeeper data is put on separate disk spindles
> from anything else. When Solr has performance problems, it's usually
> from heavy I/O, and if heavy I/O is causing problems with zookeeper,
> then the problem just compounds itself.
> 
> You haven't indicated how big the java heap for Solr is. Severe
> stability problems can result from GC pauses, so it's extremely
> important to tune your garbage collection unless your Solr max heap is
> very very small (less than 1GB). Here's my personal wiki page with
> settings that work for me, they seem to work for others too:
> 
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
> 
> Severe GC pause problems can also result from the Solr java heap being
> too small. Here's a more involved wiki page on performance issues that
> I have seen:
> 
> http://wiki.apache.org/solr/SolrPerformanceProblems
> 
> Thanks,
> Shawn
> 
> 



Re: Solr cloud (4.6.0) instances going down

Posted by Shawn Heisey <so...@elyograg.org>.
On 12/19/2013 3:44 AM, ilay raja wrote:
>   I have deployed solr cloud with external zookeeper ensemble (5
> instances). I am running solr instances on two servers with single shard
> index. There are 6 replicas. I often see solr going down during high search
> load (or) whenever i run indexing documents. I tried tuning hardcommit
> (kept as 15 mins) and softcommits(12 mins). Also, set zkClientTimeout as 30
> secs. I observed sometimes OOM, Socket exceptions., EOF exceptions in solr
> logs while the instance is going down. Also, zookeeper recovery for the
> solr instance is going in loop .... My use case is sort of high search (100
> queries per sec) / heavy indexing (10 K docs per minute). What is the best
> way to keep stable solr cloud isntances with external ensemble. Should we
> try running zookeeper internally, because looks like zookeeper handshaking
> might be an issue as well. Is solr cloud stable for production ? or there
> are open issues still. Please guide me.

You definitely do not want to run zookeeper embedded in Solr.  The
simple reason for this is simply because if you stop Solr, you also stop
zookeeper.  Zookeeper works best if it remains up all the time, so an
external ensemble is highly recommended.

It's probably a good idea to set the max heap on the zookeeper startup
... one of my zk java instances is using 65MB resident memory, so unless
it's a very large cloud, a low number like 128MB would probably be enough.

I've heard that heavy I/O on the disk with the zookeeper data can cause
problems for zookeeper.  This is the one danger that can come from
putting both Solr and an external zookeeper on the same host, which is
usually a very safe thing to do.  Unless you've got very fast I/O, it's
recommended that the zookeeper data is put on separate disk spindles
from anything else.  When Solr has performance problems, it's usually
from heavy I/O, and if heavy I/O is causing problems with zookeeper,
then the problem just compounds itself.

You haven't indicated how big the java heap for Solr is.  Severe
stability problems can result from GC pauses, so it's extremely
important to tune your garbage collection unless your Solr max heap is
very very small (less than 1GB).  Here's my personal wiki page with
settings that work for me, they seem to work for others too:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Severe GC pause problems can also result from the Solr java heap being
too small.  Here's a more involved wiki page on performance issues that
I have seen:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn