You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Simon Wistow <si...@thegestalt.org> on 2010/11/02 00:27:24 UTC

Possible memory leaks with frequent replication

We've been trying to get a setup in which a slave replicates from a 
master every few seconds (ideally every second but currently we have it 
set at every 5s).

Everything seems to work fine until, periodically, the slave just stops 
responding from what looks like it running out of memory:

org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.OutOfMemoryError: Java heap space


(our monitoring seems to confirm this).

Looking around my suspicion is that it takes new Readers longer to warm 
than the gap between replication and thus they just build up until all 
memory is consumed (which, I suppose isn't really memory 'leaking' per 
se, more just resource consumption)

That said, we've tried turning off caching on the slave and that didn't 
help either so it's possible I'm wrong.

Is there anything we can do about this? I'm reluctant to increase the 
heap space since I suspect that will mean that there's just a longer 
period between failures. Might Zoie help here? Or should we just query 
against the Master?


Thanks,

Simon

Re: Possible memory leaks with frequent replication

Posted by Lance Norskog <go...@gmail.com>.
Do you use EmbeddedSolr in the query server? There is a memory leak
that shows up when taking a lot of replications.

On Wed, Nov 3, 2010 at 8:28 AM, Jonathan Rochkind <ro...@jhu.edu> wrote:
> Ah, but reading Peter's email message I reference more carefully, it seems
> that Solr already DOES provide an info-level log warning you about
> over-lapping warming, awesome. (But again, I'm pretty sure it does NOT throw
> or HTTP error in that condition, based on my and others experience).
>
>
>> To check if your Solr environment is suffering from this, turn on INFO
>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping
>> onDeckSearchers=x'.
>
> Sweet, good to know, and I'll definitely add this to my debugging toolbox.
> Peter's listserv message really ought to be a wiki page, I think.  Any
> reason for me not to just add it as a new one with title "Commit frequency
> and auto-warming" or something like that?  Unless it's already in the wiki
> somewhere I haven't found, assuming the wiki will let an ordinary
> user-created account add a new page.
> //
> Jonathan Rochkind wrote:
>>
>> I hadn't looked at the code, am not familiar with Solr code, and can't say
>> what that code does.
>>
>> But I have experienced issues that I _believe_ were caused by too frequent
>> commits causing over-lapping searcher preperation. And I've definitely seen
>> Solr documentation that suggests this is an issue. Let me find it now to see
>> if the experts think these documented suggests are still correct or not:
>>
>> "On the other hand, autowarming (populating) a new collection could take a
>> lot of time, especially since it uses only one thread and one CPU. If your
>> settings fire off snapinstaller too frequently, then a Solr slave could be
>> in the undesirable condition of handing-off queries to one (old) collection,
>> and, while warming a new collection, a second “new” one could be snapped and
>> begin warming!
>>
>> If we attempted to solve such a situation, we would have to invalidate the
>> first “new” collection in order to use the second one, then when a “third”
>> new collection would be snapped and warmed, we would have to invalidate the
>> “second” new collection, and so on ad infinitum. A completely warmed
>> collection would never make it to full term before it was aborted. This can
>> be prevented with a properly tuned configuration so new collections do not
>> get installed too rapidly. "
>>
>>
>> http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs
>>
>> I think I've seen that same advice on another wiki page without being
>> specifically regarding replication, but just being about commit frequency
>> balanced with auto-warming, leading to overlapping warming, leading to
>> spiraling RAM/CPU usage -- but NOT an exception being thrown or HTTP error
>> delivered.
>>
>> I can't find it on the wiki, but here's a listserv post with someone
>> reporting findings that match my understanding:
>> http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00528.html
>>
>> How does this advice square with the code Lance found?  Is my
>> understanding of how frequent commits can interact with time it takes to
>> warm a new collection correct? Appreciate any additional info.
>>
>>
>>
>>
>> Lance Norskog wrote:
>>
>>>
>>> Isn't that what this code does?
>>>
>>>      onDeckSearchers++;
>>>      if (onDeckSearchers < 1) {
>>>        // should never happen... just a sanity check
>>>        log.error(logid+"ERROR!!! onDeckSearchers is " + onDeckSearchers);
>>>        onDeckSearchers=1;  // reset
>>>      } else if (onDeckSearchers > maxWarmingSearchers) {
>>>        onDeckSearchers--;
>>>        String msg="Error opening new searcher. exceeded limit of
>>> maxWarmingSearchers="+maxWarmingSearchers + ", try again later.";
>>>        log.warn(logid+""+ msg);
>>>        // HTTP 503==service unavailable, or 409==Conflict
>>>        throw new
>>> SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true);
>>>      } else if (onDeckSearchers > 1) {
>>>        log.info(logid+"PERFORMANCE WARNING: Overlapping
>>> onDeckSearchers=" + onDeckSearchers);
>>>      }
>>>
>>>
>>> On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind <ro...@jhu.edu>
>>> wrote:
>>>
>>>>
>>>> It's definitely a known 'issue' that you can't replicate (or do any
>>>> other
>>>> kind of index change, including a commit) at a faster frequency than
>>>> your
>>>> warming queries take to complete, or you'll wind up with something like
>>>> you've seen.
>>>>
>>>> It's in some documentation somewhere I saw, for sure.
>>>>
>>>> The advice to 'just query against the master' is kind of odd, because,
>>>> then... why have a slave at all, if you aren't going to query against
>>>> it?  I
>>>> guess just for backup purposes.
>>>>
>>>> But even with just one solr, or querying master, if you commit at rate
>>>> such
>>>> that commits come before the warming queries can complete, you're going
>>>> to
>>>> have the same issue.
>>>>
>>>> The only answer I know of is "Don't commit (or replicate) at a faster
>>>> rate
>>>> than it takes your warming to complete."  You can reduce your warming
>>>> queries/operations, or reduce your commit/replicate frequency.
>>>>
>>>> Would be interesting/useful if Solr noticed this going on, and gave you
>>>> some
>>>> kind of error in the log (or even an exception when started with a
>>>> certain
>>>> parameter for testing) "Overlapping warming queries, you're committing
>>>> too
>>>> fast" or something. Because it's easy to make this happen without
>>>> realizing
>>>> it, and then your Solr does what Simon says, runs out of RAM and/or uses
>>>> a
>>>> whole lot of CPU and disk io.
>>>>
>>>> Lance Norskog wrote:
>>>>
>>>>>
>>>>> You should query against the indexer. I'm impressed that you got 5s
>>>>> replication to work reliably.
>>>>>
>>>>> On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <si...@thegestalt.org>
>>>>> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> We've been trying to get a setup in which a slave replicates from a
>>>>>> master every few seconds (ideally every second but currently we have
>>>>>> it
>>>>>> set at every 5s).
>>>>>>
>>>>>> Everything seems to work fine until, periodically, the slave just
>>>>>> stops
>>>>>> responding from what looks like it running out of memory:
>>>>>>
>>>>>> org.apache.catalina.core.StandardWrapperValve invoke
>>>>>> SEVERE: Servlet.service() for servlet jsp threw exception
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>
>>>>>>
>>>>>> (our monitoring seems to confirm this).
>>>>>>
>>>>>> Looking around my suspicion is that it takes new Readers longer to
>>>>>> warm
>>>>>> than the gap between replication and thus they just build up until all
>>>>>> memory is consumed (which, I suppose isn't really memory 'leaking' per
>>>>>> se, more just resource consumption)
>>>>>>
>>>>>> That said, we've tried turning off caching on the slave and that
>>>>>> didn't
>>>>>> help either so it's possible I'm wrong.
>>>>>>
>>>>>> Is there anything we can do about this? I'm reluctant to increase the
>>>>>> heap space since I suspect that will mean that there's just a longer
>>>>>> period between failures. Might Zoie help here? Or should we just query
>>>>>> against the Master?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Simon
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Possible memory leaks with frequent replication

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Ah, but reading Peter's email message I reference more carefully, it 
seems that Solr already DOES provide an info-level log warning you about 
over-lapping warming, awesome. (But again, I'm pretty sure it does NOT 
throw or HTTP error in that condition, based on my and others experience).


 > To check if your Solr environment is suffering from this, turn on INFO
 > level logging, and look for: 'PERFORMANCE WARNING: Overlapping
 > onDeckSearchers=x'.

Sweet, good to know, and I'll definitely add this to my debugging 
toolbox. Peter's listserv message really ought to be a wiki page, I 
think.  Any reason for me not to just add it as a new one with title 
"Commit frequency and auto-warming" or something like that?  Unless it's 
already in the wiki somewhere I haven't found, assuming the wiki will 
let an ordinary user-created account add a new page.
//
Jonathan Rochkind wrote:
> I hadn't looked at the code, am not familiar with Solr code, and can't 
> say what that code does.
>
> But I have experienced issues that I _believe_ were caused by too 
> frequent commits causing over-lapping searcher preperation. And I've 
> definitely seen Solr documentation that suggests this is an issue. Let 
> me find it now to see if the experts think these documented suggests are 
> still correct or not:
>
> "On the other hand, autowarming (populating) a new collection could take 
> a lot of time, especially since it uses only one thread and one CPU. If 
> your settings fire off snapinstaller too frequently, then a Solr slave 
> could be in the undesirable condition of handing-off queries to one 
> (old) collection, and, while warming a new collection, a second “new” 
> one could be snapped and begin warming!
>
> If we attempted to solve such a situation, we would have to invalidate 
> the first “new” collection in order to use the second one, then when a 
> “third” new collection would be snapped and warmed, we would have to 
> invalidate the “second” new collection, and so on ad infinitum. A 
> completely warmed collection would never make it to full term before it 
> was aborted. This can be prevented with a properly tuned configuration 
> so new collections do not get installed too rapidly. "
>
> http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs
>
> I think I've seen that same advice on another wiki page without being 
> specifically regarding replication, but just being about commit 
> frequency balanced with auto-warming, leading to overlapping warming, 
> leading to spiraling RAM/CPU usage -- but NOT an exception being thrown 
> or HTTP error delivered.
>
> I can't find it on the wiki, but here's a listserv post with someone 
> reporting findings that match my understanding: 
> http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00528.html
>
> How does this advice square with the code Lance found?  Is my 
> understanding of how frequent commits can interact with time it takes to 
> warm a new collection correct? Appreciate any additional info.
>
>
>
>
> Lance Norskog wrote:
>   
>> Isn't that what this code does?
>>
>>       onDeckSearchers++;
>>       if (onDeckSearchers < 1) {
>>         // should never happen... just a sanity check
>>         log.error(logid+"ERROR!!! onDeckSearchers is " + onDeckSearchers);
>>         onDeckSearchers=1;  // reset
>>       } else if (onDeckSearchers > maxWarmingSearchers) {
>>         onDeckSearchers--;
>>         String msg="Error opening new searcher. exceeded limit of
>> maxWarmingSearchers="+maxWarmingSearchers + ", try again later.";
>>         log.warn(logid+""+ msg);
>>         // HTTP 503==service unavailable, or 409==Conflict
>>         throw new
>> SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true);
>>       } else if (onDeckSearchers > 1) {
>>         log.info(logid+"PERFORMANCE WARNING: Overlapping
>> onDeckSearchers=" + onDeckSearchers);
>>       }
>>
>>
>> On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind <ro...@jhu.edu> wrote:
>>   
>>     
>>> It's definitely a known 'issue' that you can't replicate (or do any other
>>> kind of index change, including a commit) at a faster frequency than your
>>> warming queries take to complete, or you'll wind up with something like
>>> you've seen.
>>>
>>> It's in some documentation somewhere I saw, for sure.
>>>
>>> The advice to 'just query against the master' is kind of odd, because,
>>> then... why have a slave at all, if you aren't going to query against it?  I
>>> guess just for backup purposes.
>>>
>>> But even with just one solr, or querying master, if you commit at rate such
>>> that commits come before the warming queries can complete, you're going to
>>> have the same issue.
>>>
>>> The only answer I know of is "Don't commit (or replicate) at a faster rate
>>> than it takes your warming to complete."  You can reduce your warming
>>> queries/operations, or reduce your commit/replicate frequency.
>>>
>>> Would be interesting/useful if Solr noticed this going on, and gave you some
>>> kind of error in the log (or even an exception when started with a certain
>>> parameter for testing) "Overlapping warming queries, you're committing too
>>> fast" or something. Because it's easy to make this happen without realizing
>>> it, and then your Solr does what Simon says, runs out of RAM and/or uses a
>>> whole lot of CPU and disk io.
>>>
>>> Lance Norskog wrote:
>>>     
>>>       
>>>> You should query against the indexer. I'm impressed that you got 5s
>>>> replication to work reliably.
>>>>
>>>> On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <si...@thegestalt.org> wrote:
>>>>
>>>>       
>>>>         
>>>>> We've been trying to get a setup in which a slave replicates from a
>>>>> master every few seconds (ideally every second but currently we have it
>>>>> set at every 5s).
>>>>>
>>>>> Everything seems to work fine until, periodically, the slave just stops
>>>>> responding from what looks like it running out of memory:
>>>>>
>>>>> org.apache.catalina.core.StandardWrapperValve invoke
>>>>> SEVERE: Servlet.service() for servlet jsp threw exception
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>
>>>>>
>>>>> (our monitoring seems to confirm this).
>>>>>
>>>>> Looking around my suspicion is that it takes new Readers longer to warm
>>>>> than the gap between replication and thus they just build up until all
>>>>> memory is consumed (which, I suppose isn't really memory 'leaking' per
>>>>> se, more just resource consumption)
>>>>>
>>>>> That said, we've tried turning off caching on the slave and that didn't
>>>>> help either so it's possible I'm wrong.
>>>>>
>>>>> Is there anything we can do about this? I'm reluctant to increase the
>>>>> heap space since I suspect that will mean that there's just a longer
>>>>> period between failures. Might Zoie help here? Or should we just query
>>>>> against the Master?
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Simon
>>>>>
>>>>>
>>>>>         
>>>>>           
>>>>       
>>>>         
>>
>>   
>>     

Re: Possible memory leaks with frequent replication

Posted by Jonathan Rochkind <ro...@jhu.edu>.
I hadn't looked at the code, am not familiar with Solr code, and can't 
say what that code does.

But I have experienced issues that I _believe_ were caused by too 
frequent commits causing over-lapping searcher preperation. And I've 
definitely seen Solr documentation that suggests this is an issue. Let 
me find it now to see if the experts think these documented suggests are 
still correct or not:

"On the other hand, autowarming (populating) a new collection could take 
a lot of time, especially since it uses only one thread and one CPU. If 
your settings fire off snapinstaller too frequently, then a Solr slave 
could be in the undesirable condition of handing-off queries to one 
(old) collection, and, while warming a new collection, a second “new” 
one could be snapped and begin warming!

If we attempted to solve such a situation, we would have to invalidate 
the first “new” collection in order to use the second one, then when a 
“third” new collection would be snapped and warmed, we would have to 
invalidate the “second” new collection, and so on ad infinitum. A 
completely warmed collection would never make it to full term before it 
was aborted. This can be prevented with a properly tuned configuration 
so new collections do not get installed too rapidly. "

http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs

I think I've seen that same advice on another wiki page without being 
specifically regarding replication, but just being about commit 
frequency balanced with auto-warming, leading to overlapping warming, 
leading to spiraling RAM/CPU usage -- but NOT an exception being thrown 
or HTTP error delivered.

I can't find it on the wiki, but here's a listserv post with someone 
reporting findings that match my understanding: 
http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00528.html

How does this advice square with the code Lance found?  Is my 
understanding of how frequent commits can interact with time it takes to 
warm a new collection correct? Appreciate any additional info.




Lance Norskog wrote:
> Isn't that what this code does?
>
>       onDeckSearchers++;
>       if (onDeckSearchers < 1) {
>         // should never happen... just a sanity check
>         log.error(logid+"ERROR!!! onDeckSearchers is " + onDeckSearchers);
>         onDeckSearchers=1;  // reset
>       } else if (onDeckSearchers > maxWarmingSearchers) {
>         onDeckSearchers--;
>         String msg="Error opening new searcher. exceeded limit of
> maxWarmingSearchers="+maxWarmingSearchers + ", try again later.";
>         log.warn(logid+""+ msg);
>         // HTTP 503==service unavailable, or 409==Conflict
>         throw new
> SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true);
>       } else if (onDeckSearchers > 1) {
>         log.info(logid+"PERFORMANCE WARNING: Overlapping
> onDeckSearchers=" + onDeckSearchers);
>       }
>
>
> On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind <ro...@jhu.edu> wrote:
>   
>> It's definitely a known 'issue' that you can't replicate (or do any other
>> kind of index change, including a commit) at a faster frequency than your
>> warming queries take to complete, or you'll wind up with something like
>> you've seen.
>>
>> It's in some documentation somewhere I saw, for sure.
>>
>> The advice to 'just query against the master' is kind of odd, because,
>> then... why have a slave at all, if you aren't going to query against it?  I
>> guess just for backup purposes.
>>
>> But even with just one solr, or querying master, if you commit at rate such
>> that commits come before the warming queries can complete, you're going to
>> have the same issue.
>>
>> The only answer I know of is "Don't commit (or replicate) at a faster rate
>> than it takes your warming to complete."  You can reduce your warming
>> queries/operations, or reduce your commit/replicate frequency.
>>
>> Would be interesting/useful if Solr noticed this going on, and gave you some
>> kind of error in the log (or even an exception when started with a certain
>> parameter for testing) "Overlapping warming queries, you're committing too
>> fast" or something. Because it's easy to make this happen without realizing
>> it, and then your Solr does what Simon says, runs out of RAM and/or uses a
>> whole lot of CPU and disk io.
>>
>> Lance Norskog wrote:
>>     
>>> You should query against the indexer. I'm impressed that you got 5s
>>> replication to work reliably.
>>>
>>> On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <si...@thegestalt.org> wrote:
>>>
>>>       
>>>> We've been trying to get a setup in which a slave replicates from a
>>>> master every few seconds (ideally every second but currently we have it
>>>> set at every 5s).
>>>>
>>>> Everything seems to work fine until, periodically, the slave just stops
>>>> responding from what looks like it running out of memory:
>>>>
>>>> org.apache.catalina.core.StandardWrapperValve invoke
>>>> SEVERE: Servlet.service() for servlet jsp threw exception
>>>> java.lang.OutOfMemoryError: Java heap space
>>>>
>>>>
>>>> (our monitoring seems to confirm this).
>>>>
>>>> Looking around my suspicion is that it takes new Readers longer to warm
>>>> than the gap between replication and thus they just build up until all
>>>> memory is consumed (which, I suppose isn't really memory 'leaking' per
>>>> se, more just resource consumption)
>>>>
>>>> That said, we've tried turning off caching on the slave and that didn't
>>>> help either so it's possible I'm wrong.
>>>>
>>>> Is there anything we can do about this? I'm reluctant to increase the
>>>> heap space since I suspect that will mean that there's just a longer
>>>> period between failures. Might Zoie help here? Or should we just query
>>>> against the Master?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Simon
>>>>
>>>>
>>>>         
>>>
>>>
>>>       
>
>
>
>   

Re: Possible memory leaks with frequent replication

Posted by Lance Norskog <go...@gmail.com>.
Isn't that what this code does?

      onDeckSearchers++;
      if (onDeckSearchers < 1) {
        // should never happen... just a sanity check
        log.error(logid+"ERROR!!! onDeckSearchers is " + onDeckSearchers);
        onDeckSearchers=1;  // reset
      } else if (onDeckSearchers > maxWarmingSearchers) {
        onDeckSearchers--;
        String msg="Error opening new searcher. exceeded limit of
maxWarmingSearchers="+maxWarmingSearchers + ", try again later.";
        log.warn(logid+""+ msg);
        // HTTP 503==service unavailable, or 409==Conflict
        throw new
SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true);
      } else if (onDeckSearchers > 1) {
        log.info(logid+"PERFORMANCE WARNING: Overlapping
onDeckSearchers=" + onDeckSearchers);
      }


On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind <ro...@jhu.edu> wrote:
> It's definitely a known 'issue' that you can't replicate (or do any other
> kind of index change, including a commit) at a faster frequency than your
> warming queries take to complete, or you'll wind up with something like
> you've seen.
>
> It's in some documentation somewhere I saw, for sure.
>
> The advice to 'just query against the master' is kind of odd, because,
> then... why have a slave at all, if you aren't going to query against it?  I
> guess just for backup purposes.
>
> But even with just one solr, or querying master, if you commit at rate such
> that commits come before the warming queries can complete, you're going to
> have the same issue.
>
> The only answer I know of is "Don't commit (or replicate) at a faster rate
> than it takes your warming to complete."  You can reduce your warming
> queries/operations, or reduce your commit/replicate frequency.
>
> Would be interesting/useful if Solr noticed this going on, and gave you some
> kind of error in the log (or even an exception when started with a certain
> parameter for testing) "Overlapping warming queries, you're committing too
> fast" or something. Because it's easy to make this happen without realizing
> it, and then your Solr does what Simon says, runs out of RAM and/or uses a
> whole lot of CPU and disk io.
>
> Lance Norskog wrote:
>>
>> You should query against the indexer. I'm impressed that you got 5s
>> replication to work reliably.
>>
>> On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <si...@thegestalt.org> wrote:
>>
>>>
>>> We've been trying to get a setup in which a slave replicates from a
>>> master every few seconds (ideally every second but currently we have it
>>> set at every 5s).
>>>
>>> Everything seems to work fine until, periodically, the slave just stops
>>> responding from what looks like it running out of memory:
>>>
>>> org.apache.catalina.core.StandardWrapperValve invoke
>>> SEVERE: Servlet.service() for servlet jsp threw exception
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>>
>>> (our monitoring seems to confirm this).
>>>
>>> Looking around my suspicion is that it takes new Readers longer to warm
>>> than the gap between replication and thus they just build up until all
>>> memory is consumed (which, I suppose isn't really memory 'leaking' per
>>> se, more just resource consumption)
>>>
>>> That said, we've tried turning off caching on the slave and that didn't
>>> help either so it's possible I'm wrong.
>>>
>>> Is there anything we can do about this? I'm reluctant to increase the
>>> heap space since I suspect that will mean that there's just a longer
>>> period between failures. Might Zoie help here? Or should we just query
>>> against the Master?
>>>
>>>
>>> Thanks,
>>>
>>> Simon
>>>
>>>
>>
>>
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Possible memory leaks with frequent replication

Posted by Jonathan Rochkind <ro...@jhu.edu>.
It's definitely a known 'issue' that you can't replicate (or do any 
other kind of index change, including a commit) at a faster frequency 
than your warming queries take to complete, or you'll wind up with 
something like you've seen.

It's in some documentation somewhere I saw, for sure.

The advice to 'just query against the master' is kind of odd, because, 
then... why have a slave at all, if you aren't going to query against 
it?  I guess just for backup purposes.

But even with just one solr, or querying master, if you commit at rate 
such that commits come before the warming queries can complete, you're 
going to have the same issue.

The only answer I know of is "Don't commit (or replicate) at a faster 
rate than it takes your warming to complete."  You can reduce your 
warming queries/operations, or reduce your commit/replicate frequency.

Would be interesting/useful if Solr noticed this going on, and gave you 
some kind of error in the log (or even an exception when started with a 
certain parameter for testing) "Overlapping warming queries, you're 
committing too fast" or something. Because it's easy to make this happen 
without realizing it, and then your Solr does what Simon says, runs out 
of RAM and/or uses a whole lot of CPU and disk io.

Lance Norskog wrote:
> You should query against the indexer. I'm impressed that you got 5s
> replication to work reliably.
>
> On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <si...@thegestalt.org> wrote:
>   
>> We've been trying to get a setup in which a slave replicates from a
>> master every few seconds (ideally every second but currently we have it
>> set at every 5s).
>>
>> Everything seems to work fine until, periodically, the slave just stops
>> responding from what looks like it running out of memory:
>>
>> org.apache.catalina.core.StandardWrapperValve invoke
>> SEVERE: Servlet.service() for servlet jsp threw exception
>> java.lang.OutOfMemoryError: Java heap space
>>
>>
>> (our monitoring seems to confirm this).
>>
>> Looking around my suspicion is that it takes new Readers longer to warm
>> than the gap between replication and thus they just build up until all
>> memory is consumed (which, I suppose isn't really memory 'leaking' per
>> se, more just resource consumption)
>>
>> That said, we've tried turning off caching on the slave and that didn't
>> help either so it's possible I'm wrong.
>>
>> Is there anything we can do about this? I'm reluctant to increase the
>> heap space since I suspect that will mean that there's just a longer
>> period between failures. Might Zoie help here? Or should we just query
>> against the Master?
>>
>>
>> Thanks,
>>
>> Simon
>>
>>     
>
>
>
>   

Re: Possible memory leaks with frequent replication

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Nov 2, 2010 at 12:32 PM, Simon Wistow <si...@thegestalt.org> wrote:
> On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said:
>> You should query against the indexer. I'm impressed that you got 5s
>> replication to work reliably.
>
> That's our current solution - I was just wondering if there was anything
> I was missing.

You could also try dialing down maxWarmingSearchers to 1 - that should
prevent multiple searchers warming at the same time and may be the
source of you running out of memory.

-Yonik
http://www.lucidimagination.com

Re: Possible memory leaks with frequent replication

Posted by Simon Wistow <si...@thegestalt.org>.
On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said:
> You should query against the indexer. I'm impressed that you got 5s
> replication to work reliably.

That's our current solution - I was just wondering if there was anything 
I was missing. 

Thanks!
 

Re: Possible memory leaks with frequent replication

Posted by Lance Norskog <go...@gmail.com>.
You should query against the indexer. I'm impressed that you got 5s
replication to work reliably.

On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <si...@thegestalt.org> wrote:
> We've been trying to get a setup in which a slave replicates from a
> master every few seconds (ideally every second but currently we have it
> set at every 5s).
>
> Everything seems to work fine until, periodically, the slave just stops
> responding from what looks like it running out of memory:
>
> org.apache.catalina.core.StandardWrapperValve invoke
> SEVERE: Servlet.service() for servlet jsp threw exception
> java.lang.OutOfMemoryError: Java heap space
>
>
> (our monitoring seems to confirm this).
>
> Looking around my suspicion is that it takes new Readers longer to warm
> than the gap between replication and thus they just build up until all
> memory is consumed (which, I suppose isn't really memory 'leaking' per
> se, more just resource consumption)
>
> That said, we've tried turning off caching on the slave and that didn't
> help either so it's possible I'm wrong.
>
> Is there anything we can do about this? I'm reluctant to increase the
> heap space since I suspect that will mean that there's just a longer
> period between failures. Might Zoie help here? Or should we just query
> against the Master?
>
>
> Thanks,
>
> Simon
>



-- 
Lance Norskog
goksron@gmail.com