You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Doğacan Güney <do...@gmail.com> on 2011/03/14 19:52:55 UTC

Solr performance issue

Hello everyone,

First of all here is our Solr setup:

- Solr nightly build 986158
- Running solr inside the default jetty comes with solr build
- 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of RAM)
- Index replicated (on optimize) to slaves via Solr Replication
- Size of index is around 2.5gb
- No incremental writes, index is created from scratch(delete old documents
-> commit new documents -> optimize)  every 6 hours
- Avg # of request per second is around 60 (for a single slave)
- Avg time per request is around 25ms (before having problems)
- Load on each is slave is around 2

We are using this set-up for months without any problem. However last week
we started to experience very weird performance problems like :

- Avg time per request increased from 25ms to 200-300ms (even higher if we
don't restart the slaves)
- Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu)

When we profile solr we see two very strange things :

1 - This is the jconsole output:

https://skitch.com/meralan/rwwcf/mail-886x691

As you see gc runs for every 10-15 seconds and collects more than 1 gb of
memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb
consistently)

2 - This is the newrelic output :

https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm

As you see solr spent ridiculously long time in
SolrDispatchFilter.doFilter() method.


Apart form these, when we clean the index directory, re-replicate and
restart  each slave one by one we see a relief in the system but after some
time servers start to melt down again. Although deleting index and
replicating doesn't solve the problem, we think that these problems are
somehow related to replication. Because symptoms started after replication
and once it heals itself after replication. I also see lucene-write.lock
files in slaves (we don't have write.lock files in the master) which I think
we shouldn't see.


If anyone can give any sort of ideas, we will appreciate it.

Regards,
Dogacan Guney

Re: Solr performance issue

Posted by Markus Jelsma <ma...@openindex.io>.

You might also want to add the following switches for your GC log.

> JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails - Xloggc:/var/log/tomcat6/gc.log"

-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

> 
> Also, what JVM version are you using and what are your other JVM settings?
> Are Xms and Xmx at the same value? I see you're using the throughput
> collector. You might want to use CMS because it partially runs
> concurrently (the low- pause collector) and has less stop-the-world
> interruptions.
> 
> http://download.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html
> 
> Again, this may not be the issue ;)
> 
> > Btw, our current revision was just a random choice but up until two weeks
> > ago it has been rock-solid so we have been
> > reluctant to update to another version. Would you recommend upgrading to
> > latest trunk?
> 
> I don't know what changes have been made since your revision. Please
> consult the CHANGES.txt for that.
> 
> > > It might not have anything to do with memory at all but i'm just
> > > asking. There
> > > may be a bug in your revision causing this.
> > > 
> > > > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
> > > > get
> > > 
> > > any
> > > 
> > > > improvement in load. I can try monitoring with Jconsole
> > > > with 8gigs of heap to see if it helps.
> > > > 
> > > > > Cheers,
> > > > > 
> > > > > > Hello everyone,
> > > > > > 
> > > > > > First of all here is our Solr setup:
> > > > > > 
> > > > > > - Solr nightly build 986158
> > > > > > - Running solr inside the default jetty comes with solr build
> > > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > > > > > 24gb
> > > 
> > > of
> > > 
> > > > > > RAM) - Index replicated (on optimize) to slaves via Solr
> > > > > > Replication - Size of index is around 2.5gb
> > > > > > - No incremental writes, index is created from scratch(delete old
> > > > > 
> > > > > documents
> > > > > 
> > > > > > -> commit new documents -> optimize)  every 6 hours
> > > > > > - Avg # of request per second is around 60 (for a single slave)
> > > > > > - Avg time per request is around 25ms (before having problems)
> > > > > > - Load on each is slave is around 2
> > > > > > 
> > > > > > We are using this set-up for months without any problem. However
> > > > > > last
> > > > > 
> > > > > week
> > > > > 
> > > > > > we started to experience very weird performance problems like :
> > > > > > 
> > > > > > - Avg time per request increased from 25ms to 200-300ms (even
> > > > > > higher
> > > 
> > > if
> > > 
> > > > > we
> > > > > 
> > > > > > don't restart the slaves)
> > > > > > - Load on each slave increased from 2 to 15-20 (solr uses
> > > > > > %400-%600 cpu)
> > > > > > 
> > > > > > When we profile solr we see two very strange things :
> > > > > > 
> > > > > > 1 - This is the jconsole output:
> > > > > > 
> > > > > > https://skitch.com/meralan/rwwcf/mail-886x691
> > > > > > 
> > > > > > As you see gc runs for every 10-15 seconds and collects more than
> > > > > > 1
> > > 
> > > gb
> > > 
> > > > > > of memory. (Actually if you wait more than 10 minutes you see
> > > > > > spikes up to
> > > > > 
> > > > > 4gb
> > > > > 
> > > > > > consistently)
> > > > > > 
> > > > > > 2 - This is the newrelic output :
> > > > > > 
> > > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > > > > 
> > > > > > As you see solr spent ridiculously long time in
> > > > > > SolrDispatchFilter.doFilter() method.
> > > > > > 
> > > > > > 
> > > > > > Apart form these, when we clean the index directory, re-replicate
> > > > > > and restart  each slave one by one we see a relief in the system
> > > > > > but
> > > 
> > > after
> > > 
> > > > > some
> > > > > 
> > > > > > time servers start to melt down again. Although deleting index
> > > > > > and replicating doesn't solve the problem, we think that these
> > > > > > problems
> > > 
> > > are
> > > 
> > > > > > somehow related to replication. Because symptoms started after
> > > > > 
> > > > > replication
> > > > > 
> > > > > > and once it heals itself after replication. I also see
> > > > > > lucene-write.lock files in slaves (we don't have write.lock files
> > > > > > in the master) which I think we shouldn't see.
> > > > > > 
> > > > > > 
> > > > > > If anyone can give any sort of ideas, we will appreciate it.
> > > > > > 
> > > > > > Regards,
> > > > > > Dogacan Guney

Re: Solr performance issue

Posted by Markus Jelsma <ma...@openindex.io>.

> Nope, no OOM errors.

That's a good start!

> Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
> functions.
> 
> Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
> to 8gb every 20 seconds or so,
> gc runs, falls down to 1gb.

Hmm, maybe the garbage collector takes up a lot of CPU time. Could you check 
your garbage collector log? It must be enabled via some JVM options:

JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -
Xloggc:/var/log/tomcat6/gc.log"

Also, what JVM version are you using and what are your other JVM settings? Are 
Xms and Xmx at the same value? I see you're using the throughput collector. 
You might want to use CMS because it partially runs concurrently (the low-
pause collector) and has less stop-the-world interruptions.

http://download.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html

Again, this may not be the issue ;)

> 
> Btw, our current revision was just a random choice but up until two weeks
> ago it has been rock-solid so we have been
> reluctant to update to another version. Would you recommend upgrading to
> latest trunk?

I don't know what changes have been made since your revision. Please consult 
the CHANGES.txt for that.

> 
> > It might not have anything to do with memory at all but i'm just asking.
> > There
> > may be a bug in your revision causing this.
> > 
> > > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
> > 
> > any
> > 
> > > improvement in load. I can try monitoring with Jconsole
> > > with 8gigs of heap to see if it helps.
> > > 
> > > > Cheers,
> > > > 
> > > > > Hello everyone,
> > > > > 
> > > > > First of all here is our Solr setup:
> > > > > 
> > > > > - Solr nightly build 986158
> > > > > - Running solr inside the default jetty comes with solr build
> > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > > > > 24gb
> > 
> > of
> > 
> > > > > RAM) - Index replicated (on optimize) to slaves via Solr
> > > > > Replication - Size of index is around 2.5gb
> > > > > - No incremental writes, index is created from scratch(delete old
> > > > 
> > > > documents
> > > > 
> > > > > -> commit new documents -> optimize)  every 6 hours
> > > > > - Avg # of request per second is around 60 (for a single slave)
> > > > > - Avg time per request is around 25ms (before having problems)
> > > > > - Load on each is slave is around 2
> > > > > 
> > > > > We are using this set-up for months without any problem. However
> > > > > last
> > > > 
> > > > week
> > > > 
> > > > > we started to experience very weird performance problems like :
> > > > > 
> > > > > - Avg time per request increased from 25ms to 200-300ms (even
> > > > > higher
> > 
> > if
> > 
> > > > we
> > > > 
> > > > > don't restart the slaves)
> > > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > > > > cpu)
> > > > > 
> > > > > When we profile solr we see two very strange things :
> > > > > 
> > > > > 1 - This is the jconsole output:
> > > > > 
> > > > > https://skitch.com/meralan/rwwcf/mail-886x691
> > > > > 
> > > > > As you see gc runs for every 10-15 seconds and collects more than 1
> > 
> > gb
> > 
> > > > > of memory. (Actually if you wait more than 10 minutes you see
> > > > > spikes up to
> > > > 
> > > > 4gb
> > > > 
> > > > > consistently)
> > > > > 
> > > > > 2 - This is the newrelic output :
> > > > > 
> > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > > > 
> > > > > As you see solr spent ridiculously long time in
> > > > > SolrDispatchFilter.doFilter() method.
> > > > > 
> > > > > 
> > > > > Apart form these, when we clean the index directory, re-replicate
> > > > > and restart  each slave one by one we see a relief in the system
> > > > > but
> > 
> > after
> > 
> > > > some
> > > > 
> > > > > time servers start to melt down again. Although deleting index and
> > > > > replicating doesn't solve the problem, we think that these problems
> > 
> > are
> > 
> > > > > somehow related to replication. Because symptoms started after
> > > > 
> > > > replication
> > > > 
> > > > > and once it heals itself after replication. I also see
> > > > > lucene-write.lock files in slaves (we don't have write.lock files
> > > > > in the master) which I think we shouldn't see.
> > > > > 
> > > > > 
> > > > > If anyone can give any sort of ideas, we will appreciate it.
> > > > > 
> > > > > Regards,
> > > > > Dogacan Guney

Re: Solr performance issue

Posted by Jonathan Rochkind <ro...@jhu.edu>.

I've definitely had cases in 1.4.1 where even though I didn't have an 
OOM error, Solr was being weirdly slow, and increasing the JVM heap size 
fixed it.  I can't explain why it happened, or exactly how you'd know 
this was going on, I didn't see anything odd in the logs to indicate, I 
just tried increasing the JVM heap to see what happened, and it worked 
great.

The one case I remember specifically is when I was using the 
StatsComponent, with a stats.facet.  Pathologically slow, increasing 
heap magically made it go down to negligible again.

On 3/14/2011 3:38 PM, Markus Jelsma wrote:
>> Hello,
>>
>> 2011/3/14 Markus Jelsma<ma...@openindex.io>
>>
>>> Hi Doğacan,
>>>
>>> Are you, at some point, running out of heap space? In my experience,
>>> that's the common cause of increased load and excessivly high response
>>> times (or time
>>> outs).
>> How much of a heap size would be enough? Our index size is growing slowly
>> but we did not have this problem
>> a couple weeks ago where index size was maybe 100mb smaller.
> Telling how much heap space is needed isn't easy to say. It usually needs to
> be increased when you run out of memory and get those nasty OOM errors, are
> you getting them?
> Replication eventes will increase heap usage due to cache warming queries and
> autowarming.
>
>> We left most of the caches in solrconfig as default and only increased
>> filterCache to 1024. We only ask for "id"s (which
>> are unique) and no other fields during queries (though we do faceting).
>> Btw, 1.6gb of our index is stored fields (we store
>> everything for now, even though we do not get them during queries), and
>> about 1gb of index.
> Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are there
> a lot of entries? Is there an insanity count? Do you use boost functions?
>
> It might not have anything to do with memory at all but i'm just asking. There
> may be a bug in your revision causing this.
>
>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any
>> improvement in load. I can try monitoring with Jconsole
>> with 8gigs of heap to see if it helps.
>>
>>> Cheers,
>>>
>>>> Hello everyone,
>>>>
>>>> First of all here is our Solr setup:
>>>>
>>>> - Solr nightly build 986158
>>>> - Running solr inside the default jetty comes with solr build
>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
>>>> RAM) - Index replicated (on optimize) to slaves via Solr Replication
>>>> - Size of index is around 2.5gb
>>>> - No incremental writes, index is created from scratch(delete old
>>> documents
>>>
>>>> ->  commit new documents ->  optimize)  every 6 hours
>>>> - Avg # of request per second is around 60 (for a single slave)
>>>> - Avg time per request is around 25ms (before having problems)
>>>> - Load on each is slave is around 2
>>>>
>>>> We are using this set-up for months without any problem. However last
>>> week
>>>
>>>> we started to experience very weird performance problems like :
>>>>
>>>> - Avg time per request increased from 25ms to 200-300ms (even higher if
>>> we
>>>
>>>> don't restart the slaves)
>>>> - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
>>>> cpu)
>>>>
>>>> When we profile solr we see two very strange things :
>>>>
>>>> 1 - This is the jconsole output:
>>>>
>>>> https://skitch.com/meralan/rwwcf/mail-886x691
>>>>
>>>> As you see gc runs for every 10-15 seconds and collects more than 1 gb
>>>> of memory. (Actually if you wait more than 10 minutes you see spikes
>>>> up to
>>> 4gb
>>>
>>>> consistently)
>>>>
>>>> 2 - This is the newrelic output :
>>>>
>>>> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
>>>>
>>>> As you see solr spent ridiculously long time in
>>>> SolrDispatchFilter.doFilter() method.
>>>>
>>>>
>>>> Apart form these, when we clean the index directory, re-replicate and
>>>> restart  each slave one by one we see a relief in the system but after
>>> some
>>>
>>>> time servers start to melt down again. Although deleting index and
>>>> replicating doesn't solve the problem, we think that these problems are
>>>> somehow related to replication. Because symptoms started after
>>> replication
>>>
>>>> and once it heals itself after replication. I also see
>>>> lucene-write.lock files in slaves (we don't have write.lock files in
>>>> the master) which I think we shouldn't see.
>>>>
>>>>
>>>> If anyone can give any sort of ideas, we will appreciate it.
>>>>
>>>> Regards,
>>>> Dogacan Guney

Re: Solr performance issue

Posted by Shawn Heisey <so...@elyograg.org>.

The host is dual quad-core, each Xen VM has been given two CPUs.  Not 
counting dom0, two of the hosts have 10/8 CPUs allocated, two of them 
have 8/8.  The dom0 VM is also allocated two CPUs.

I'm not really sure how that works out when it comes to Java running on 
the VM, but if at all possible, it is likely that Xen would try and keep 
both VM cpus on the same physical CPU and the VM's memory allocation on 
the same NUMA node.  If that's the case, it would meet what you've 
stated as the recommendation for incremental mode.

Shawn

On 3/15/2011 9:10 AM, Markus Jelsma wrote:
> CMS is very good for multicore CPU's. Use incremental mode only when you have
> a single CPU with only one or two cores.

Re: Solr performance issue

Posted by Markus Jelsma <ma...@openindex.io>.

CMS is very good for multicore CPU's. Use incremental mode only when you have 
a single CPU with only one or two cores.

On Tuesday 15 March 2011 16:03:38 Shawn Heisey wrote:
> My solr+jetty+java6 install seems to work well with these GC options.
> It's a dual processor environment:
> 
> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> 
> I've never had a real problem with memory, so I've not done any kind of
> auditing.  I probably should, but time is a limited resource.
> 
> Shawn
> 
> On 3/14/2011 2:29 PM, Markus Jelsma wrote:
> > That depends on your GC settings and generation sizes. And, instead of
> > UseParallelGC you'd better use UseParNewGC in combination with CMS.
> > 
> > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
> > 
> >> It's actually, as I understand it, expected JVM behavior to see the heap
> >> rise to close to it's limit before it gets GC'd, that's how Java GC
> >> works.  Whether that should happen every 20 seconds or what, I don't
> >> nkow.
> >> 
> >> Another option is setting better JVM garbage collection arguments, so GC
> >> doesn't "stop the world" so often. I have had good luck with my Solr
> >> using this:  -XX:+UseParallelGC

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Solr performance issue

Posted by Doğacan Güney <do...@gmail.com>.

2011/3/14 Markus Jelsma <ma...@openindex.io>

> Mmm. SearchHander.handleRequestBody takes care of sharding. Could your
> system
> suffer from
> http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock
> ?
>
>
We increased thread limit (which was 10000 before) but it did not help.

Anyway, we will try to disable sharding tomorrow. Maybe this can give us a
better picture.

Thanks for the help, everyone.


> I'm not sure, i haven't seen a similar issue in a sharded environment,
> probably because it was a controlled environment.
>
>
> > Hello,
> >
> > 2011/3/14 Markus Jelsma <ma...@openindex.io>
> >
> > > That depends on your GC settings and generation sizes. And, instead of
> > > UseParallelGC you'd better use UseParNewGC in combination with CMS.
> >
> > JConsole now shows a different profile output but load is still high and
> > performance is still bad.
> >
> > Btw, here is the thread profile from newrelic:
> >
> > https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm
> >
> > Note that we do use a form of sharding so I maybe all the time spent
> > waiting for handleRequestBody
> > is results from sharding?
> >
> > > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
> > >
> > > > It's actually, as I understand it, expected JVM behavior to see the
> > > > heap rise to close to it's limit before it gets GC'd, that's how Java
> > > > GC works.  Whether that should happen every 20 seconds or what, I
> > > > don't
> > >
> > > nkow.
> > >
> > > > Another option is setting better JVM garbage collection arguments, so
> > > > GC doesn't "stop the world" so often. I have had good luck with my
> > > > Solr using this:  -XX:+UseParallelGC
> > > >
> > > > On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > > > > Hello again,
> > > > >
> > > > > 2011/3/14 Markus Jelsma<ma...@openindex.io>
> > > > >
> > > > >>> Hello,
> > > > >>>
> > > > >>> 2011/3/14 Markus Jelsma<ma...@openindex.io>
> > > > >>>
> > > > >>>> Hi Doğacan,
> > > > >>>>
> > > > >>>> Are you, at some point, running out of heap space? In my
> > > > >>>> experience, that's the common cause of increased load and
> > > > >>>> excessivly high
> > >
> > > response
> > >
> > > > >>>> times (or time
> > > > >>>> outs).
> > > > >>>
> > > > >>> How much of a heap size would be enough? Our index size is
> growing
> > > > >>> slowly but we did not have this problem
> > > > >>> a couple weeks ago where index size was maybe 100mb smaller.
> > > > >>
> > > > >> Telling how much heap space is needed isn't easy to say. It
> usually
> > > > >> needs to
> > > > >> be increased when you run out of memory and get those nasty OOM
> > >
> > > errors,
> > >
> > > > >> are you getting them?
> > > > >> Replication eventes will increase heap usage due to cache warming
> > > > >> queries and
> > > > >> autowarming.
> > > > >
> > > > > Nope, no OOM errors.
> > > > >
> > > > >>> We left most of the caches in solrconfig as default and only
> > >
> > > increased
> > >
> > > > >>> filterCache to 1024. We only ask for "id"s (which
> > > > >>> are unique) and no other fields during queries (though we do
> > >
> > > faceting).
> > >
> > > > >>> Btw, 1.6gb of our index is stored fields (we store
> > > > >>> everything for now, even though we do not get them during
> queries),
> > >
> > > and
> > >
> > > > >>> about 1gb of index.
> > > > >>
> > > > >> Hmm, it seems 4000 would be enough indeed. What about the
> > > > >> fieldCache, are there
> > > > >> a lot of entries? Is there an insanity count? Do you use boost
> > > > >> functions?
> > > > >
> > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > > > > boosting functions.
> > > > >
> > > > > Btw, I am monitoring output via jconsole with 8gb of ram and it
> still
> > > > > goes to 8gb every 20 seconds or so,
> > > > > gc runs, falls down to 1gb.
> > > > >
> > > > > Btw, our current revision was just a random choice but up until two
> > >
> > > weeks
> > >
> > > > > ago it has been rock-solid so we have been
> > > > > reluctant to update to another version. Would you recommend
> upgrading
> > >
> > > to
> > >
> > > > > latest trunk?
> > > > >
> > > > >> It might not have anything to do with memory at all but i'm just
> > >
> > > asking.
> > >
> > > > >> There
> > > > >> may be a bug in your revision causing this.
> > > > >>
> > > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did
> not
> > >
> > > get
> > >
> > > > >> any
> > > > >>
> > > > >>> improvement in load. I can try monitoring with Jconsole
> > > > >>> with 8gigs of heap to see if it helps.
> > > > >>>
> > > > >>>> Cheers,
> > > > >>>>
> > > > >>>>> Hello everyone,
> > > > >>>>>
> > > > >>>>> First of all here is our Solr setup:
> > > > >>>>>
> > > > >>>>> - Solr nightly build 986158
> > > > >>>>> - Running solr inside the default jetty comes with solr build
> > > > >>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > >
> > > 24gb
> > >
> > > > >> of
> > > > >>
> > > > >>>>> RAM) - Index replicated (on optimize) to slaves via Solr
> > >
> > > Replication
> > >
> > > > >>>>> - Size of index is around 2.5gb
> > > > >>>>> - No incremental writes, index is created from scratch(delete
> old
> > > > >>>>
> > > > >>>> documents
> > > > >>>>
> > > > >>>>> ->  commit new documents ->  optimize)  every 6 hours
> > > > >>>>> - Avg # of request per second is around 60 (for a single slave)
> > > > >>>>> - Avg time per request is around 25ms (before having problems)
> > > > >>>>> - Load on each is slave is around 2
> > > > >>>>>
> > > > >>>>> We are using this set-up for months without any problem.
> However
> > >
> > > last
> > >
> > > > >>>> week
> > > > >>>>
> > > > >>>>> we started to experience very weird performance problems like :
> > > > >>>>>
> > > > >>>>> - Avg time per request increased from 25ms to 200-300ms (even
> > >
> > > higher
> > >
> > > > >> if
> > > > >>
> > > > >>>> we
> > > > >>>>
> > > > >>>>> don't restart the slaves)
> > > > >>>>> - Load on each slave increased from 2 to 15-20 (solr uses
> > > > >>>>> %400-%600 cpu)
> > > > >>>>>
> > > > >>>>> When we profile solr we see two very strange things :
> > > > >>>>>
> > > > >>>>> 1 - This is the jconsole output:
> > > > >>>>>
> > > > >>>>> https://skitch.com/meralan/rwwcf/mail-886x691
> > > > >>>>>
> > > > >>>>> As you see gc runs for every 10-15 seconds and collects more
> than
> > > > >>>>> 1
> > > > >>
> > > > >> gb
> > > > >>
> > > > >>>>> of memory. (Actually if you wait more than 10 minutes you see
> > >
> > > spikes
> > >
> > > > >>>>> up to
> > > > >>>>
> > > > >>>> 4gb
> > > > >>>>
> > > > >>>>> consistently)
> > > > >>>>>
> > > > >>>>> 2 - This is the newrelic output :
> > > > >>>>>
> > > > >>>>>
> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > > >>>>>
> > > > >>>>> As you see solr spent ridiculously long time in
> > > > >>>>> SolrDispatchFilter.doFilter() method.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Apart form these, when we clean the index directory,
> re-replicate
> > >
> > > and
> > >
> > > > >>>>> restart  each slave one by one we see a relief in the system
> but
> > > > >>
> > > > >> after
> > > > >>
> > > > >>>> some
> > > > >>>>
> > > > >>>>> time servers start to melt down again. Although deleting index
> > > > >>>>> and replicating doesn't solve the problem, we think that these
> > > > >>>>> problems
> > > > >>
> > > > >> are
> > > > >>
> > > > >>>>> somehow related to replication. Because symptoms started after
> > > > >>>>
> > > > >>>> replication
> > > > >>>>
> > > > >>>>> and once it heals itself after replication. I also see
> > > > >>>>> lucene-write.lock files in slaves (we don't have write.lock
> files
> > >
> > > in
> > >
> > > > >>>>> the master) which I think we shouldn't see.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> If anyone can give any sort of ideas, we will appreciate it.
> > > > >>>>>
> > > > >>>>> Regards,
> > > > >>>>> Dogacan Guney
>



-- 
Doğacan Güney

Re: Solr performance issue

Posted by Markus Jelsma <ma...@openindex.io>.

Mmm. SearchHander.handleRequestBody takes care of sharding. Could your system 
suffer from http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock 
?

I'm not sure, i haven't seen a similar issue in a sharded environment, 
probably because it was a controlled environment.


> Hello,
> 
> 2011/3/14 Markus Jelsma <ma...@openindex.io>
> 
> > That depends on your GC settings and generation sizes. And, instead of
> > UseParallelGC you'd better use UseParNewGC in combination with CMS.
> 
> JConsole now shows a different profile output but load is still high and
> performance is still bad.
> 
> Btw, here is the thread profile from newrelic:
> 
> https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm
> 
> Note that we do use a form of sharding so I maybe all the time spent
> waiting for handleRequestBody
> is results from sharding?
> 
> > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
> > 
> > > It's actually, as I understand it, expected JVM behavior to see the
> > > heap rise to close to it's limit before it gets GC'd, that's how Java
> > > GC works.  Whether that should happen every 20 seconds or what, I
> > > don't
> > 
> > nkow.
> > 
> > > Another option is setting better JVM garbage collection arguments, so
> > > GC doesn't "stop the world" so often. I have had good luck with my
> > > Solr using this:  -XX:+UseParallelGC
> > > 
> > > On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > > > Hello again,
> > > > 
> > > > 2011/3/14 Markus Jelsma<ma...@openindex.io>
> > > > 
> > > >>> Hello,
> > > >>> 
> > > >>> 2011/3/14 Markus Jelsma<ma...@openindex.io>
> > > >>> 
> > > >>>> Hi Doğacan,
> > > >>>> 
> > > >>>> Are you, at some point, running out of heap space? In my
> > > >>>> experience, that's the common cause of increased load and
> > > >>>> excessivly high
> > 
> > response
> > 
> > > >>>> times (or time
> > > >>>> outs).
> > > >>> 
> > > >>> How much of a heap size would be enough? Our index size is growing
> > > >>> slowly but we did not have this problem
> > > >>> a couple weeks ago where index size was maybe 100mb smaller.
> > > >> 
> > > >> Telling how much heap space is needed isn't easy to say. It usually
> > > >> needs to
> > > >> be increased when you run out of memory and get those nasty OOM
> > 
> > errors,
> > 
> > > >> are you getting them?
> > > >> Replication eventes will increase heap usage due to cache warming
> > > >> queries and
> > > >> autowarming.
> > > > 
> > > > Nope, no OOM errors.
> > > > 
> > > >>> We left most of the caches in solrconfig as default and only
> > 
> > increased
> > 
> > > >>> filterCache to 1024. We only ask for "id"s (which
> > > >>> are unique) and no other fields during queries (though we do
> > 
> > faceting).
> > 
> > > >>> Btw, 1.6gb of our index is stored fields (we store
> > > >>> everything for now, even though we do not get them during queries),
> > 
> > and
> > 
> > > >>> about 1gb of index.
> > > >> 
> > > >> Hmm, it seems 4000 would be enough indeed. What about the
> > > >> fieldCache, are there
> > > >> a lot of entries? Is there an insanity count? Do you use boost
> > > >> functions?
> > > > 
> > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > > > boosting functions.
> > > > 
> > > > Btw, I am monitoring output via jconsole with 8gb of ram and it still
> > > > goes to 8gb every 20 seconds or so,
> > > > gc runs, falls down to 1gb.
> > > > 
> > > > Btw, our current revision was just a random choice but up until two
> > 
> > weeks
> > 
> > > > ago it has been rock-solid so we have been
> > > > reluctant to update to another version. Would you recommend upgrading
> > 
> > to
> > 
> > > > latest trunk?
> > > > 
> > > >> It might not have anything to do with memory at all but i'm just
> > 
> > asking.
> > 
> > > >> There
> > > >> may be a bug in your revision causing this.
> > > >> 
> > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
> > 
> > get
> > 
> > > >> any
> > > >> 
> > > >>> improvement in load. I can try monitoring with Jconsole
> > > >>> with 8gigs of heap to see if it helps.
> > > >>> 
> > > >>>> Cheers,
> > > >>>> 
> > > >>>>> Hello everyone,
> > > >>>>> 
> > > >>>>> First of all here is our Solr setup:
> > > >>>>> 
> > > >>>>> - Solr nightly build 986158
> > > >>>>> - Running solr inside the default jetty comes with solr build
> > > >>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > 
> > 24gb
> > 
> > > >> of
> > > >> 
> > > >>>>> RAM) - Index replicated (on optimize) to slaves via Solr
> > 
> > Replication
> > 
> > > >>>>> - Size of index is around 2.5gb
> > > >>>>> - No incremental writes, index is created from scratch(delete old
> > > >>>> 
> > > >>>> documents
> > > >>>> 
> > > >>>>> ->  commit new documents ->  optimize)  every 6 hours
> > > >>>>> - Avg # of request per second is around 60 (for a single slave)
> > > >>>>> - Avg time per request is around 25ms (before having problems)
> > > >>>>> - Load on each is slave is around 2
> > > >>>>> 
> > > >>>>> We are using this set-up for months without any problem. However
> > 
> > last
> > 
> > > >>>> week
> > > >>>> 
> > > >>>>> we started to experience very weird performance problems like :
> > > >>>>> 
> > > >>>>> - Avg time per request increased from 25ms to 200-300ms (even
> > 
> > higher
> > 
> > > >> if
> > > >> 
> > > >>>> we
> > > >>>> 
> > > >>>>> don't restart the slaves)
> > > >>>>> - Load on each slave increased from 2 to 15-20 (solr uses
> > > >>>>> %400-%600 cpu)
> > > >>>>> 
> > > >>>>> When we profile solr we see two very strange things :
> > > >>>>> 
> > > >>>>> 1 - This is the jconsole output:
> > > >>>>> 
> > > >>>>> https://skitch.com/meralan/rwwcf/mail-886x691
> > > >>>>> 
> > > >>>>> As you see gc runs for every 10-15 seconds and collects more than
> > > >>>>> 1
> > > >> 
> > > >> gb
> > > >> 
> > > >>>>> of memory. (Actually if you wait more than 10 minutes you see
> > 
> > spikes
> > 
> > > >>>>> up to
> > > >>>> 
> > > >>>> 4gb
> > > >>>> 
> > > >>>>> consistently)
> > > >>>>> 
> > > >>>>> 2 - This is the newrelic output :
> > > >>>>> 
> > > >>>>> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > >>>>> 
> > > >>>>> As you see solr spent ridiculously long time in
> > > >>>>> SolrDispatchFilter.doFilter() method.
> > > >>>>> 
> > > >>>>> 
> > > >>>>> Apart form these, when we clean the index directory, re-replicate
> > 
> > and
> > 
> > > >>>>> restart  each slave one by one we see a relief in the system but
> > > >> 
> > > >> after
> > > >> 
> > > >>>> some
> > > >>>> 
> > > >>>>> time servers start to melt down again. Although deleting index
> > > >>>>> and replicating doesn't solve the problem, we think that these
> > > >>>>> problems
> > > >> 
> > > >> are
> > > >> 
> > > >>>>> somehow related to replication. Because symptoms started after
> > > >>>> 
> > > >>>> replication
> > > >>>> 
> > > >>>>> and once it heals itself after replication. I also see
> > > >>>>> lucene-write.lock files in slaves (we don't have write.lock files
> > 
> > in
> > 
> > > >>>>> the master) which I think we shouldn't see.
> > > >>>>> 
> > > >>>>> 
> > > >>>>> If anyone can give any sort of ideas, we will appreciate it.
> > > >>>>> 
> > > >>>>> Regards,
> > > >>>>> Dogacan Guney

Re: Solr performance issue

Posted by Shawn Heisey <so...@elyograg.org>.

My solr+jetty+java6 install seems to work well with these GC options.  
It's a dual processor environment:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

I've never had a real problem with memory, so I've not done any kind of 
auditing.  I probably should, but time is a limited resource.

Shawn


On 3/14/2011 2:29 PM, Markus Jelsma wrote:
> That depends on your GC settings and generation sizes. And, instead of
> UseParallelGC you'd better use UseParNewGC in combination with CMS.
>
> See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
>> It's actually, as I understand it, expected JVM behavior to see the heap
>> rise to close to it's limit before it gets GC'd, that's how Java GC
>> works.  Whether that should happen every 20 seconds or what, I don't nkow.
>>
>> Another option is setting better JVM garbage collection arguments, so GC
>> doesn't "stop the world" so often. I have had good luck with my Solr
>> using this:  -XX:+UseParallelGC

Re: Solr performance issue

Posted by Doğacan Güney <do...@gmail.com>.

Hello,

2011/3/14 Markus Jelsma <ma...@openindex.io>

> That depends on your GC settings and generation sizes. And, instead of
> UseParallelGC you'd better use UseParNewGC in combination with CMS.
>
>
JConsole now shows a different profile output but load is still high and
performance is still bad.

Btw, here is the thread profile from newrelic:

https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm

Note that we do use a form of sharding so I maybe all the time spent waiting
for handleRequestBody
is results from sharding?


> See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
>
> > It's actually, as I understand it, expected JVM behavior to see the heap
> > rise to close to it's limit before it gets GC'd, that's how Java GC
> > works.  Whether that should happen every 20 seconds or what, I don't
> nkow.
> >
> > Another option is setting better JVM garbage collection arguments, so GC
> > doesn't "stop the world" so often. I have had good luck with my Solr
> > using this:  -XX:+UseParallelGC
> >
> > On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > > Hello again,
> > >
> > > 2011/3/14 Markus Jelsma<ma...@openindex.io>
> > >
> > >>> Hello,
> > >>>
> > >>> 2011/3/14 Markus Jelsma<ma...@openindex.io>
> > >>>
> > >>>> Hi Doğacan,
> > >>>>
> > >>>> Are you, at some point, running out of heap space? In my experience,
> > >>>> that's the common cause of increased load and excessivly high
> response
> > >>>> times (or time
> > >>>> outs).
> > >>>
> > >>> How much of a heap size would be enough? Our index size is growing
> > >>> slowly but we did not have this problem
> > >>> a couple weeks ago where index size was maybe 100mb smaller.
> > >>
> > >> Telling how much heap space is needed isn't easy to say. It usually
> > >> needs to
> > >> be increased when you run out of memory and get those nasty OOM
> errors,
> > >> are you getting them?
> > >> Replication eventes will increase heap usage due to cache warming
> > >> queries and
> > >> autowarming.
> > >
> > > Nope, no OOM errors.
> > >
> > >>> We left most of the caches in solrconfig as default and only
> increased
> > >>> filterCache to 1024. We only ask for "id"s (which
> > >>> are unique) and no other fields during queries (though we do
> faceting).
> > >>> Btw, 1.6gb of our index is stored fields (we store
> > >>> everything for now, even though we do not get them during queries),
> and
> > >>> about 1gb of index.
> > >>
> > >> Hmm, it seems 4000 would be enough indeed. What about the fieldCache,
> > >> are there
> > >> a lot of entries? Is there an insanity count? Do you use boost
> > >> functions?
> > >
> > > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > > boosting functions.
> > >
> > > Btw, I am monitoring output via jconsole with 8gb of ram and it still
> > > goes to 8gb every 20 seconds or so,
> > > gc runs, falls down to 1gb.
> > >
> > > Btw, our current revision was just a random choice but up until two
> weeks
> > > ago it has been rock-solid so we have been
> > > reluctant to update to another version. Would you recommend upgrading
> to
> > > latest trunk?
> > >
> > >> It might not have anything to do with memory at all but i'm just
> asking.
> > >> There
> > >> may be a bug in your revision causing this.
> > >>
> > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
> get
> > >>
> > >> any
> > >>
> > >>> improvement in load. I can try monitoring with Jconsole
> > >>> with 8gigs of heap to see if it helps.
> > >>>
> > >>>> Cheers,
> > >>>>
> > >>>>> Hello everyone,
> > >>>>>
> > >>>>> First of all here is our Solr setup:
> > >>>>>
> > >>>>> - Solr nightly build 986158
> > >>>>> - Running solr inside the default jetty comes with solr build
> > >>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with
> 24gb
> > >>
> > >> of
> > >>
> > >>>>> RAM) - Index replicated (on optimize) to slaves via Solr
> Replication
> > >>>>> - Size of index is around 2.5gb
> > >>>>> - No incremental writes, index is created from scratch(delete old
> > >>>>
> > >>>> documents
> > >>>>
> > >>>>> ->  commit new documents ->  optimize)  every 6 hours
> > >>>>> - Avg # of request per second is around 60 (for a single slave)
> > >>>>> - Avg time per request is around 25ms (before having problems)
> > >>>>> - Load on each is slave is around 2
> > >>>>>
> > >>>>> We are using this set-up for months without any problem. However
> last
> > >>>>
> > >>>> week
> > >>>>
> > >>>>> we started to experience very weird performance problems like :
> > >>>>>
> > >>>>> - Avg time per request increased from 25ms to 200-300ms (even
> higher
> > >>
> > >> if
> > >>
> > >>>> we
> > >>>>
> > >>>>> don't restart the slaves)
> > >>>>> - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > >>>>> cpu)
> > >>>>>
> > >>>>> When we profile solr we see two very strange things :
> > >>>>>
> > >>>>> 1 - This is the jconsole output:
> > >>>>>
> > >>>>> https://skitch.com/meralan/rwwcf/mail-886x691
> > >>>>>
> > >>>>> As you see gc runs for every 10-15 seconds and collects more than 1
> > >>
> > >> gb
> > >>
> > >>>>> of memory. (Actually if you wait more than 10 minutes you see
> spikes
> > >>>>> up to
> > >>>>
> > >>>> 4gb
> > >>>>
> > >>>>> consistently)
> > >>>>>
> > >>>>> 2 - This is the newrelic output :
> > >>>>>
> > >>>>> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > >>>>>
> > >>>>> As you see solr spent ridiculously long time in
> > >>>>> SolrDispatchFilter.doFilter() method.
> > >>>>>
> > >>>>>
> > >>>>> Apart form these, when we clean the index directory, re-replicate
> and
> > >>>>> restart  each slave one by one we see a relief in the system but
> > >>
> > >> after
> > >>
> > >>>> some
> > >>>>
> > >>>>> time servers start to melt down again. Although deleting index and
> > >>>>> replicating doesn't solve the problem, we think that these problems
> > >>
> > >> are
> > >>
> > >>>>> somehow related to replication. Because symptoms started after
> > >>>>
> > >>>> replication
> > >>>>
> > >>>>> and once it heals itself after replication. I also see
> > >>>>> lucene-write.lock files in slaves (we don't have write.lock files
> in
> > >>>>> the master) which I think we shouldn't see.
> > >>>>>
> > >>>>>
> > >>>>> If anyone can give any sort of ideas, we will appreciate it.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Dogacan Guney
>



-- 
Doğacan Güney

Re: Solr performance issue

Posted by Markus Jelsma <ma...@openindex.io>.

That depends on your GC settings and generation sizes. And, instead of 
UseParallelGC you'd better use UseParNewGC in combination with CMS.

See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html

> It's actually, as I understand it, expected JVM behavior to see the heap
> rise to close to it's limit before it gets GC'd, that's how Java GC
> works.  Whether that should happen every 20 seconds or what, I don't nkow.
> 
> Another option is setting better JVM garbage collection arguments, so GC
> doesn't "stop the world" so often. I have had good luck with my Solr
> using this:  -XX:+UseParallelGC
> 
> On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > Hello again,
> > 
> > 2011/3/14 Markus Jelsma<ma...@openindex.io>
> > 
> >>> Hello,
> >>> 
> >>> 2011/3/14 Markus Jelsma<ma...@openindex.io>
> >>> 
> >>>> Hi Doğacan,
> >>>> 
> >>>> Are you, at some point, running out of heap space? In my experience,
> >>>> that's the common cause of increased load and excessivly high response
> >>>> times (or time
> >>>> outs).
> >>> 
> >>> How much of a heap size would be enough? Our index size is growing
> >>> slowly but we did not have this problem
> >>> a couple weeks ago where index size was maybe 100mb smaller.
> >> 
> >> Telling how much heap space is needed isn't easy to say. It usually
> >> needs to
> >> be increased when you run out of memory and get those nasty OOM errors,
> >> are you getting them?
> >> Replication eventes will increase heap usage due to cache warming
> >> queries and
> >> autowarming.
> > 
> > Nope, no OOM errors.
> > 
> >>> We left most of the caches in solrconfig as default and only increased
> >>> filterCache to 1024. We only ask for "id"s (which
> >>> are unique) and no other fields during queries (though we do faceting).
> >>> Btw, 1.6gb of our index is stored fields (we store
> >>> everything for now, even though we do not get them during queries), and
> >>> about 1gb of index.
> >> 
> >> Hmm, it seems 4000 would be enough indeed. What about the fieldCache,
> >> are there
> >> a lot of entries? Is there an insanity count? Do you use boost
> >> functions?
> > 
> > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > boosting functions.
> > 
> > Btw, I am monitoring output via jconsole with 8gb of ram and it still
> > goes to 8gb every 20 seconds or so,
> > gc runs, falls down to 1gb.
> > 
> > Btw, our current revision was just a random choice but up until two weeks
> > ago it has been rock-solid so we have been
> > reluctant to update to another version. Would you recommend upgrading to
> > latest trunk?
> > 
> >> It might not have anything to do with memory at all but i'm just asking.
> >> There
> >> may be a bug in your revision causing this.
> >> 
> >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
> >> 
> >> any
> >> 
> >>> improvement in load. I can try monitoring with Jconsole
> >>> with 8gigs of heap to see if it helps.
> >>> 
> >>>> Cheers,
> >>>> 
> >>>>> Hello everyone,
> >>>>> 
> >>>>> First of all here is our Solr setup:
> >>>>> 
> >>>>> - Solr nightly build 986158
> >>>>> - Running solr inside the default jetty comes with solr build
> >>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb
> >> 
> >> of
> >> 
> >>>>> RAM) - Index replicated (on optimize) to slaves via Solr Replication
> >>>>> - Size of index is around 2.5gb
> >>>>> - No incremental writes, index is created from scratch(delete old
> >>>> 
> >>>> documents
> >>>> 
> >>>>> ->  commit new documents ->  optimize)  every 6 hours
> >>>>> - Avg # of request per second is around 60 (for a single slave)
> >>>>> - Avg time per request is around 25ms (before having problems)
> >>>>> - Load on each is slave is around 2
> >>>>> 
> >>>>> We are using this set-up for months without any problem. However last
> >>>> 
> >>>> week
> >>>> 
> >>>>> we started to experience very weird performance problems like :
> >>>>> 
> >>>>> - Avg time per request increased from 25ms to 200-300ms (even higher
> >> 
> >> if
> >> 
> >>>> we
> >>>> 
> >>>>> don't restart the slaves)
> >>>>> - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> >>>>> cpu)
> >>>>> 
> >>>>> When we profile solr we see two very strange things :
> >>>>> 
> >>>>> 1 - This is the jconsole output:
> >>>>> 
> >>>>> https://skitch.com/meralan/rwwcf/mail-886x691
> >>>>> 
> >>>>> As you see gc runs for every 10-15 seconds and collects more than 1
> >> 
> >> gb
> >> 
> >>>>> of memory. (Actually if you wait more than 10 minutes you see spikes
> >>>>> up to
> >>>> 
> >>>> 4gb
> >>>> 
> >>>>> consistently)
> >>>>> 
> >>>>> 2 - This is the newrelic output :
> >>>>> 
> >>>>> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> >>>>> 
> >>>>> As you see solr spent ridiculously long time in
> >>>>> SolrDispatchFilter.doFilter() method.
> >>>>> 
> >>>>> 
> >>>>> Apart form these, when we clean the index directory, re-replicate and
> >>>>> restart  each slave one by one we see a relief in the system but
> >> 
> >> after
> >> 
> >>>> some
> >>>> 
> >>>>> time servers start to melt down again. Although deleting index and
> >>>>> replicating doesn't solve the problem, we think that these problems
> >> 
> >> are
> >> 
> >>>>> somehow related to replication. Because symptoms started after
> >>>> 
> >>>> replication
> >>>> 
> >>>>> and once it heals itself after replication. I also see
> >>>>> lucene-write.lock files in slaves (we don't have write.lock files in
> >>>>> the master) which I think we shouldn't see.
> >>>>> 
> >>>>> 
> >>>>> If anyone can give any sort of ideas, we will appreciate it.
> >>>>> 
> >>>>> Regards,
> >>>>> Dogacan Guney

Re: Solr performance issue

Posted by Doğacan Güney <do...@gmail.com>.

Hello,

The problem turned out to be some sort of sharding/searching weirdness. We
modified some code in sharding but I don't think it is related. In any case,
we just added a new server that just shards (but doesn't do any searching /
doesn't contain any index) and performance is very very good.

Thanks for all the help.

On Tue, Mar 22, 2011 at 14:30, Alexey Serba <as...@gmail.com> wrote:

> > Btw, I am monitoring output via jconsole with 8gb of ram and it still
> goes
> > to 8gb every 20 seconds or so,
> > gc runs, falls down to 1gb.
>
> Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot.
>
> Do you return all results (ids) for your queries? Any tricky
> faceting/sorting/function queries?
>

-- 
Doğacan Güney

Re: Solr performance issue

Posted by Alexey Serba <as...@gmail.com>.

> Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
> to 8gb every 20 seconds or so,
> gc runs, falls down to 1gb.

Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot.

Do you return all results (ids) for your queries? Any tricky
faceting/sorting/function queries?

Re: Solr performance issue

Posted by Jonathan Rochkind <ro...@jhu.edu>.

It's actually, as I understand it, expected JVM behavior to see the heap 
rise to close to it's limit before it gets GC'd, that's how Java GC 
works.  Whether that should happen every 20 seconds or what, I don't nkow.

Another option is setting better JVM garbage collection arguments, so GC 
doesn't "stop the world" so often. I have had good luck with my Solr 
using this:  -XX:+UseParallelGC





On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> Hello again,
>
> 2011/3/14 Markus Jelsma<ma...@openindex.io>
>
>>> Hello,
>>>
>>> 2011/3/14 Markus Jelsma<ma...@openindex.io>
>>>
>>>> Hi Doğacan,
>>>>
>>>> Are you, at some point, running out of heap space? In my experience,
>>>> that's the common cause of increased load and excessivly high response
>>>> times (or time
>>>> outs).
>>> How much of a heap size would be enough? Our index size is growing slowly
>>> but we did not have this problem
>>> a couple weeks ago where index size was maybe 100mb smaller.
>> Telling how much heap space is needed isn't easy to say. It usually needs
>> to
>> be increased when you run out of memory and get those nasty OOM errors, are
>> you getting them?
>> Replication eventes will increase heap usage due to cache warming queries
>> and
>> autowarming.
>>
>>
> Nope, no OOM errors.
>
>
>>> We left most of the caches in solrconfig as default and only increased
>>> filterCache to 1024. We only ask for "id"s (which
>>> are unique) and no other fields during queries (though we do faceting).
>>> Btw, 1.6gb of our index is stored fields (we store
>>> everything for now, even though we do not get them during queries), and
>>> about 1gb of index.
>> Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are
>> there
>> a lot of entries? Is there an insanity count? Do you use boost functions?
>>
>>
> Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
> functions.
>
> Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
> to 8gb every 20 seconds or so,
> gc runs, falls down to 1gb.
>
> Btw, our current revision was just a random choice but up until two weeks
> ago it has been rock-solid so we have been
> reluctant to update to another version. Would you recommend upgrading to
> latest trunk?
>
>
>> It might not have anything to do with memory at all but i'm just asking.
>> There
>> may be a bug in your revision causing this.
>>
>>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
>> any
>>> improvement in load. I can try monitoring with Jconsole
>>> with 8gigs of heap to see if it helps.
>>>
>>>> Cheers,
>>>>
>>>>> Hello everyone,
>>>>>
>>>>> First of all here is our Solr setup:
>>>>>
>>>>> - Solr nightly build 986158
>>>>> - Running solr inside the default jetty comes with solr build
>>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb
>> of
>>>>> RAM) - Index replicated (on optimize) to slaves via Solr Replication
>>>>> - Size of index is around 2.5gb
>>>>> - No incremental writes, index is created from scratch(delete old
>>>> documents
>>>>
>>>>> ->  commit new documents ->  optimize)  every 6 hours
>>>>> - Avg # of request per second is around 60 (for a single slave)
>>>>> - Avg time per request is around 25ms (before having problems)
>>>>> - Load on each is slave is around 2
>>>>>
>>>>> We are using this set-up for months without any problem. However last
>>>> week
>>>>
>>>>> we started to experience very weird performance problems like :
>>>>>
>>>>> - Avg time per request increased from 25ms to 200-300ms (even higher
>> if
>>>> we
>>>>
>>>>> don't restart the slaves)
>>>>> - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
>>>>> cpu)
>>>>>
>>>>> When we profile solr we see two very strange things :
>>>>>
>>>>> 1 - This is the jconsole output:
>>>>>
>>>>> https://skitch.com/meralan/rwwcf/mail-886x691
>>>>>
>>>>> As you see gc runs for every 10-15 seconds and collects more than 1
>> gb
>>>>> of memory. (Actually if you wait more than 10 minutes you see spikes
>>>>> up to
>>>> 4gb
>>>>
>>>>> consistently)
>>>>>
>>>>> 2 - This is the newrelic output :
>>>>>
>>>>> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
>>>>>
>>>>> As you see solr spent ridiculously long time in
>>>>> SolrDispatchFilter.doFilter() method.
>>>>>
>>>>>
>>>>> Apart form these, when we clean the index directory, re-replicate and
>>>>> restart  each slave one by one we see a relief in the system but
>> after
>>>> some
>>>>
>>>>> time servers start to melt down again. Although deleting index and
>>>>> replicating doesn't solve the problem, we think that these problems
>> are
>>>>> somehow related to replication. Because symptoms started after
>>>> replication
>>>>
>>>>> and once it heals itself after replication. I also see
>>>>> lucene-write.lock files in slaves (we don't have write.lock files in
>>>>> the master) which I think we shouldn't see.
>>>>>
>>>>>
>>>>> If anyone can give any sort of ideas, we will appreciate it.
>>>>>
>>>>> Regards,
>>>>> Dogacan Guney
>
>

Re: Solr performance issue

Posted by Doğacan Güney <do...@gmail.com>.

Hello again,

2011/3/14 Markus Jelsma <ma...@openindex.io>

> > Hello,
> >
> > 2011/3/14 Markus Jelsma <ma...@openindex.io>
> >
> > > Hi Doğacan,
> > >
> > > Are you, at some point, running out of heap space? In my experience,
> > > that's the common cause of increased load and excessivly high response
> > > times (or time
> > > outs).
> >
> > How much of a heap size would be enough? Our index size is growing slowly
> > but we did not have this problem
> > a couple weeks ago where index size was maybe 100mb smaller.
>
> Telling how much heap space is needed isn't easy to say. It usually needs
> to
> be increased when you run out of memory and get those nasty OOM errors, are
> you getting them?
> Replication eventes will increase heap usage due to cache warming queries
> and
> autowarming.
>
>
Nope, no OOM errors.


> >
> > We left most of the caches in solrconfig as default and only increased
> > filterCache to 1024. We only ask for "id"s (which
> > are unique) and no other fields during queries (though we do faceting).
> > Btw, 1.6gb of our index is stored fields (we store
> > everything for now, even though we do not get them during queries), and
> > about 1gb of index.
>
> Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are
> there
> a lot of entries? Is there an insanity count? Do you use boost functions?
>
>
Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
functions.

Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
to 8gb every 20 seconds or so,
gc runs, falls down to 1gb.

Btw, our current revision was just a random choice but up until two weeks
ago it has been rock-solid so we have been
reluctant to update to another version. Would you recommend upgrading to
latest trunk?


> It might not have anything to do with memory at all but i'm just asking.
> There
> may be a bug in your revision causing this.
>
> >
> > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
> any
> > improvement in load. I can try monitoring with Jconsole
> > with 8gigs of heap to see if it helps.
> >
> > > Cheers,
> > >
> > > > Hello everyone,
> > > >
> > > > First of all here is our Solr setup:
> > > >
> > > > - Solr nightly build 986158
> > > > - Running solr inside the default jetty comes with solr build
> > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb
> of
> > > > RAM) - Index replicated (on optimize) to slaves via Solr Replication
> > > > - Size of index is around 2.5gb
> > > > - No incremental writes, index is created from scratch(delete old
> > >
> > > documents
> > >
> > > > -> commit new documents -> optimize)  every 6 hours
> > > > - Avg # of request per second is around 60 (for a single slave)
> > > > - Avg time per request is around 25ms (before having problems)
> > > > - Load on each is slave is around 2
> > > >
> > > > We are using this set-up for months without any problem. However last
> > >
> > > week
> > >
> > > > we started to experience very weird performance problems like :
> > > >
> > > > - Avg time per request increased from 25ms to 200-300ms (even higher
> if
> > >
> > > we
> > >
> > > > don't restart the slaves)
> > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > > > cpu)
> > > >
> > > > When we profile solr we see two very strange things :
> > > >
> > > > 1 - This is the jconsole output:
> > > >
> > > > https://skitch.com/meralan/rwwcf/mail-886x691
> > > >
> > > > As you see gc runs for every 10-15 seconds and collects more than 1
> gb
> > > > of memory. (Actually if you wait more than 10 minutes you see spikes
> > > > up to
> > >
> > > 4gb
> > >
> > > > consistently)
> > > >
> > > > 2 - This is the newrelic output :
> > > >
> > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > >
> > > > As you see solr spent ridiculously long time in
> > > > SolrDispatchFilter.doFilter() method.
> > > >
> > > >
> > > > Apart form these, when we clean the index directory, re-replicate and
> > > > restart  each slave one by one we see a relief in the system but
> after
> > >
> > > some
> > >
> > > > time servers start to melt down again. Although deleting index and
> > > > replicating doesn't solve the problem, we think that these problems
> are
> > > > somehow related to replication. Because symptoms started after
> > >
> > > replication
> > >
> > > > and once it heals itself after replication. I also see
> > > > lucene-write.lock files in slaves (we don't have write.lock files in
> > > > the master) which I think we shouldn't see.
> > > >
> > > >
> > > > If anyone can give any sort of ideas, we will appreciate it.
> > > >
> > > > Regards,
> > > > Dogacan Guney
>



-- 
Doğacan Güney

Re: Solr performance issue

Posted by Markus Jelsma <ma...@openindex.io>.

> Hello,
> 
> 2011/3/14 Markus Jelsma <ma...@openindex.io>
> 
> > Hi Doğacan,
> > 
> > Are you, at some point, running out of heap space? In my experience,
> > that's the common cause of increased load and excessivly high response
> > times (or time
> > outs).
> 
> How much of a heap size would be enough? Our index size is growing slowly
> but we did not have this problem
> a couple weeks ago where index size was maybe 100mb smaller.

Telling how much heap space is needed isn't easy to say. It usually needs to 
be increased when you run out of memory and get those nasty OOM errors, are 
you getting them? 
Replication eventes will increase heap usage due to cache warming queries and 
autowarming.

> 
> We left most of the caches in solrconfig as default and only increased
> filterCache to 1024. We only ask for "id"s (which
> are unique) and no other fields during queries (though we do faceting).
> Btw, 1.6gb of our index is stored fields (we store
> everything for now, even though we do not get them during queries), and
> about 1gb of index.

Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are there 
a lot of entries? Is there an insanity count? Do you use boost functions?

It might not have anything to do with memory at all but i'm just asking. There 
may be a bug in your revision causing this.

> 
> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any
> improvement in load. I can try monitoring with Jconsole
> with 8gigs of heap to see if it helps.
> 
> > Cheers,
> > 
> > > Hello everyone,
> > > 
> > > First of all here is our Solr setup:
> > > 
> > > - Solr nightly build 986158
> > > - Running solr inside the default jetty comes with solr build
> > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
> > > RAM) - Index replicated (on optimize) to slaves via Solr Replication
> > > - Size of index is around 2.5gb
> > > - No incremental writes, index is created from scratch(delete old
> > 
> > documents
> > 
> > > -> commit new documents -> optimize)  every 6 hours
> > > - Avg # of request per second is around 60 (for a single slave)
> > > - Avg time per request is around 25ms (before having problems)
> > > - Load on each is slave is around 2
> > > 
> > > We are using this set-up for months without any problem. However last
> > 
> > week
> > 
> > > we started to experience very weird performance problems like :
> > > 
> > > - Avg time per request increased from 25ms to 200-300ms (even higher if
> > 
> > we
> > 
> > > don't restart the slaves)
> > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > > cpu)
> > > 
> > > When we profile solr we see two very strange things :
> > > 
> > > 1 - This is the jconsole output:
> > > 
> > > https://skitch.com/meralan/rwwcf/mail-886x691
> > > 
> > > As you see gc runs for every 10-15 seconds and collects more than 1 gb
> > > of memory. (Actually if you wait more than 10 minutes you see spikes
> > > up to
> > 
> > 4gb
> > 
> > > consistently)
> > > 
> > > 2 - This is the newrelic output :
> > > 
> > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > 
> > > As you see solr spent ridiculously long time in
> > > SolrDispatchFilter.doFilter() method.
> > > 
> > > 
> > > Apart form these, when we clean the index directory, re-replicate and
> > > restart  each slave one by one we see a relief in the system but after
> > 
> > some
> > 
> > > time servers start to melt down again. Although deleting index and
> > > replicating doesn't solve the problem, we think that these problems are
> > > somehow related to replication. Because symptoms started after
> > 
> > replication
> > 
> > > and once it heals itself after replication. I also see
> > > lucene-write.lock files in slaves (we don't have write.lock files in
> > > the master) which I think we shouldn't see.
> > > 
> > > 
> > > If anyone can give any sort of ideas, we will appreciate it.
> > > 
> > > Regards,
> > > Dogacan Guney

Re: Solr performance issue

Posted by Doğacan Güney <do...@gmail.com>.

Hello,

2011/3/14 Markus Jelsma <ma...@openindex.io>

> Hi Doğacan,
>
> Are you, at some point, running out of heap space? In my experience, that's
> the common cause of increased load and excessivly high response times (or
> time
> outs).
>
>
How much of a heap size would be enough? Our index size is growing slowly
but we did not have this problem
a couple weeks ago where index size was maybe 100mb smaller.

We left most of the caches in solrconfig as default and only increased
filterCache to 1024. We only ask for "id"s (which
are unique) and no other fields during queries (though we do faceting). Btw,
1.6gb of our index is stored fields (we store
everything for now, even though we do not get them during queries), and
about 1gb of index.

Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any
improvement in load. I can try monitoring with Jconsole
with 8gigs of heap to see if it helps.


> Cheers,
>
> > Hello everyone,
> >
> > First of all here is our Solr setup:
> >
> > - Solr nightly build 986158
> > - Running solr inside the default jetty comes with solr build
> > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
> > RAM) - Index replicated (on optimize) to slaves via Solr Replication
> > - Size of index is around 2.5gb
> > - No incremental writes, index is created from scratch(delete old
> documents
> > -> commit new documents -> optimize)  every 6 hours
> > - Avg # of request per second is around 60 (for a single slave)
> > - Avg time per request is around 25ms (before having problems)
> > - Load on each is slave is around 2
> >
> > We are using this set-up for months without any problem. However last
> week
> > we started to experience very weird performance problems like :
> >
> > - Avg time per request increased from 25ms to 200-300ms (even higher if
> we
> > don't restart the slaves)
> > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu)
> >
> > When we profile solr we see two very strange things :
> >
> > 1 - This is the jconsole output:
> >
> > https://skitch.com/meralan/rwwcf/mail-886x691
> >
> > As you see gc runs for every 10-15 seconds and collects more than 1 gb of
> > memory. (Actually if you wait more than 10 minutes you see spikes up to
> 4gb
> > consistently)
> >
> > 2 - This is the newrelic output :
> >
> > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> >
> > As you see solr spent ridiculously long time in
> > SolrDispatchFilter.doFilter() method.
> >
> >
> > Apart form these, when we clean the index directory, re-replicate and
> > restart  each slave one by one we see a relief in the system but after
> some
> > time servers start to melt down again. Although deleting index and
> > replicating doesn't solve the problem, we think that these problems are
> > somehow related to replication. Because symptoms started after
> replication
> > and once it heals itself after replication. I also see lucene-write.lock
> > files in slaves (we don't have write.lock files in the master) which I
> > think we shouldn't see.
> >
> >
> > If anyone can give any sort of ideas, we will appreciate it.
> >
> > Regards,
> > Dogacan Guney
>



-- 
Doğacan Güney

Re: Solr performance issue

Posted by Markus Jelsma <ma...@openindex.io>.

Hi Doğacan,

Are you, at some point, running out of heap space? In my experience, that's 
the common cause of increased load and excessivly high response times (or time 
outs).

Cheers,

> Hello everyone,
> 
> First of all here is our Solr setup:
> 
> - Solr nightly build 986158
> - Running solr inside the default jetty comes with solr build
> - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
> RAM) - Index replicated (on optimize) to slaves via Solr Replication
> - Size of index is around 2.5gb
> - No incremental writes, index is created from scratch(delete old documents
> -> commit new documents -> optimize)  every 6 hours
> - Avg # of request per second is around 60 (for a single slave)
> - Avg time per request is around 25ms (before having problems)
> - Load on each is slave is around 2
> 
> We are using this set-up for months without any problem. However last week
> we started to experience very weird performance problems like :
> 
> - Avg time per request increased from 25ms to 200-300ms (even higher if we
> don't restart the slaves)
> - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu)
> 
> When we profile solr we see two very strange things :
> 
> 1 - This is the jconsole output:
> 
> https://skitch.com/meralan/rwwcf/mail-886x691
> 
> As you see gc runs for every 10-15 seconds and collects more than 1 gb of
> memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb
> consistently)
> 
> 2 - This is the newrelic output :
> 
> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> 
> As you see solr spent ridiculously long time in
> SolrDispatchFilter.doFilter() method.
> 
> 
> Apart form these, when we clean the index directory, re-replicate and
> restart  each slave one by one we see a relief in the system but after some
> time servers start to melt down again. Although deleting index and
> replicating doesn't solve the problem, we think that these problems are
> somehow related to replication. Because symptoms started after replication
> and once it heals itself after replication. I also see lucene-write.lock
> files in slaves (we don't have write.lock files in the master) which I
> think we shouldn't see.
> 
> 
> If anyone can give any sort of ideas, we will appreciate it.
> 
> Regards,
> Dogacan Guney