You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Markus Jelsma <ma...@openindex.io> on 2018/07/03 11:26:12 UTC

RE: 7.3 appears to leak

Hello Erick,

Even the silliest ideas may help us, but unfortunately this is not the case. All our Solr nodes run binaries from the same source from our central build server, with the same libraries thanks to provisioning. Only schema and config are different, but the <lib/> directive is the same all over.

Are there any other ideas, speculations, whatever, on why only our main text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 and every version up?

Many thanks?
Markus
 
-----Original message-----
> From:Erick Erickson <er...@gmail.com>
> Sent: Friday 29th June 2018 19:34
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: 7.3 appears to leak
> 
> This is truly puzzling then, I'm clueless. It's hard to imagine this
> is lurking out there and nobody else notices, but you've eliminated
> the custom code. And this is also very peculiar:
> 
> * it occurs only in our main text search collection, all other
> collections are unaffected;
> * despite what i said earlier, it is so far unreproducible outside
> production, even when mimicking production as good as we can;
> 
> Here's a tedious idea. Restart Solr with the -v option, I _think_ that
> shows you each and every jar file Solr loads. Is it "somehow" possible
> that your main collection is loading some jar from somewhere that's
> different than you expect? 'cause silly ideas like this are all I can
> come up with.
> 
> Erick
> 
> On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
> <ma...@openindex.io> wrote:
> > Hello Erick,
> >
> > The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does:
> >
> >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
> >     super.handleRequestBody(req, rsp);
> >
> >     if (rsp.getToLog().get("hits") instanceof Integer) {
> >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Integer)rsp.getToLog().get("hits")));
> >     }
> >     if (rsp.getToLog().get("hits") instanceof Long) {
> >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Long)rsp.getToLog().get("hits")));
> >     }
> >   }
> >
> > I am not sure this qualifies as one more to go.
> >
> > Re: compiler warnings on resources, yes! This and tests failing due to resources leaks have always warned me when i forgot to release something or decrement a reference. But except for the above method (and the token filters which i really can't disable) are all that is left.
> >
> > I am quite desperate about this problem so although i am unwilling to disable stuff, i can do it if i must. But i so reason, yet, to remove the search handler or the token filter stuff, i mean, how could those leak a SolrIndexSearcher?
> >
> > Let me know :)
> >
> > Many thanks!
> > Markus
> >
> > -----Original message-----
> >> From:Erick Erickson <er...@gmail.com>
> >> Sent: Friday 29th June 2018 18:46
> >> To: solr-user <so...@lucene.apache.org>
> >> Subject: Re: 7.3 appears to leak
> >>
> >> bq. The only custom stuff left is an extension of SearchHandler that
> >> only writes numFound to the response headers.
> >>
> >> Well, one more to go ;). It's incredibly easy to overlook
> >> innocent-seeming calls that increment the underlying reference count
> >> of some objects but don't decrement them, usually through a close
> >> call. Which isn't necessarily a close if the underlying reference
> >> count is still > 0.
> >>
> >> You may infer that I've been there and done that ;). Sometime the
> >> compiler warnings about "resource leak" can help pinpoint those too.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
> >> <ma...@openindex.io> wrote:
> >> > Hello Yonik,
> >> >
> >> > I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.
> >> >
> >> > I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.
> >> >
> >> > You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.
> >> >
> >> > So, what is next? I have no clues left.
> >> >
> >> > Many, many thanks,
> >> > Markus
> >> >
> >> > -----Original message-----
> >> >> From:Markus Jelsma <ma...@openindex.io>
> >> >> Sent: Thursday 28th June 2018 23:52
> >> >> To: solr-user@lucene.apache.org
> >> >> Subject: RE: 7.3 appears to leak
> >> >>
> >> >> Hello Yonik,
> >> >>
> >> >> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
> >> >>
> >> >> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
> >> >>
> >> >> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
> >> >>
> >> >> I will get back to this topic tomorrow!
> >> >>
> >> >> Many thanks,
> >> >> Markus
> >> >>
> >> >>
> >> >>
> >> >> -----Original message-----
> >> >> > From:Yonik Seeley <ys...@gmail.com>
> >> >> > Sent: Thursday 28th June 2018 23:30
> >> >> > To: solr-user@lucene.apache.org
> >> >> > Subject: Re: 7.3 appears to leak
> >> >> >
> >> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
> >> >> >
> >> >> > If these are actually filterCache entries being leaked, it stands to
> >> >> > reason that a whole searcher is being leaked somewhere.
> >> >> >
> >> >> > -Yonik
> >> >> >
> >> >>
> >>
> 

Re: 7.3 appears to leak

Posted by Kydryavtsev Andrey <we...@yandex.ru>.
If it is not possible to find a resource leak by code analysis and there is no better ideas, I can suggest a brute force approach:
- Clone Solr's sources from appropriate branch https://github.com/apache/lucene-solr/tree/branch_7_3
- Log every searcher's holder increment/decrement operation in a way to catch every caller name (use Thread.currentThread().getStackTrace() or something) https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java
- Build custom artefacts and upload them on prod
- After memory leak happened - analyse logs to see what part of functionality doesn't decrement searcher after counter was incremented. If searchers are leaked - there should be such code I guess.

This is not something someone would like to do, but it is what it is.



Thank you,

Andrey Kudryavtsev


03.07.2018, 14:26, "Markus Jelsma" <ma...@openindex.io>:
> Hello Erick,
>
> Even the silliest ideas may help us, but unfortunately this is not the case. All our Solr nodes run binaries from the same source from our central build server, with the same libraries thanks to provisioning. Only schema and config are different, but the <lib/> directive is the same all over.
>
> Are there any other ideas, speculations, whatever, on why only our main text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 and every version up?
>
> Many thanks?
> Markus
>
> -----Original message-----
>>  From:Erick Erickson <er...@gmail.com>
>>  Sent: Friday 29th June 2018 19:34
>>  To: solr-user <so...@lucene.apache.org>
>>  Subject: Re: 7.3 appears to leak
>>
>>  This is truly puzzling then, I'm clueless. It's hard to imagine this
>>  is lurking out there and nobody else notices, but you've eliminated
>>  the custom code. And this is also very peculiar:
>>
>>  * it occurs only in our main text search collection, all other
>>  collections are unaffected;
>>  * despite what i said earlier, it is so far unreproducible outside
>>  production, even when mimicking production as good as we can;
>>
>>  Here's a tedious idea. Restart Solr with the -v option, I _think_ that
>>  shows you each and every jar file Solr loads. Is it "somehow" possible
>>  that your main collection is loading some jar from somewhere that's
>>  different than you expect? 'cause silly ideas like this are all I can
>>  come up with.
>>
>>  Erick
>>
>>  On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
>>  <ma...@openindex.io> wrote:
>>  > Hello Erick,
>>  >
>>  > The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does:
>>  >
>>  >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
>>  >     super.handleRequestBody(req, rsp);
>>  >
>>  >     if (rsp.getToLog().get("hits") instanceof Integer) {
>>  >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Integer)rsp.getToLog().get("hits")));
>>  >     }
>>  >     if (rsp.getToLog().get("hits") instanceof Long) {
>>  >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Long)rsp.getToLog().get("hits")));
>>  >     }
>>  >   }
>>  >
>>  > I am not sure this qualifies as one more to go.
>>  >
>>  > Re: compiler warnings on resources, yes! This and tests failing due to resources leaks have always warned me when i forgot to release something or decrement a reference. But except for the above method (and the token filters which i really can't disable) are all that is left.
>>  >
>>  > I am quite desperate about this problem so although i am unwilling to disable stuff, i can do it if i must. But i so reason, yet, to remove the search handler or the token filter stuff, i mean, how could those leak a SolrIndexSearcher?
>>  >
>>  > Let me know :)
>>  >
>>  > Many thanks!
>>  > Markus
>>  >
>>  > -----Original message-----
>>  >> From:Erick Erickson <er...@gmail.com>
>>  >> Sent: Friday 29th June 2018 18:46
>>  >> To: solr-user <so...@lucene.apache.org>
>>  >> Subject: Re: 7.3 appears to leak
>>  >>
>>  >> bq. The only custom stuff left is an extension of SearchHandler that
>>  >> only writes numFound to the response headers.
>>  >>
>>  >> Well, one more to go ;). It's incredibly easy to overlook
>>  >> innocent-seeming calls that increment the underlying reference count
>>  >> of some objects but don't decrement them, usually through a close
>>  >> call. Which isn't necessarily a close if the underlying reference
>>  >> count is still > 0.
>>  >>
>>  >> You may infer that I've been there and done that ;). Sometime the
>>  >> compiler warnings about "resource leak" can help pinpoint those too.
>>  >>
>>  >> Best,
>>  >> Erick
>>  >>
>>  >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
>>  >> <ma...@openindex.io> wrote:
>>  >> > Hello Yonik,
>>  >> >
>>  >> > I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.
>>  >> >
>>  >> > I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.
>>  >> >
>>  >> > You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.
>>  >> >
>>  >> > So, what is next? I have no clues left.
>>  >> >
>>  >> > Many, many thanks,
>>  >> > Markus
>>  >> >
>>  >> > -----Original message-----
>>  >> >> From:Markus Jelsma <ma...@openindex.io>
>>  >> >> Sent: Thursday 28th June 2018 23:52
>>  >> >> To: solr-user@lucene.apache.org
>>  >> >> Subject: RE: 7.3 appears to leak
>>  >> >>
>>  >> >> Hello Yonik,
>>  >> >>
>>  >> >> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
>>  >> >>
>>  >> >> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
>>  >> >>
>>  >> >> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
>>  >> >>
>>  >> >> I will get back to this topic tomorrow!
>>  >> >>
>>  >> >> Many thanks,
>>  >> >> Markus
>>  >> >>
>>  >> >>
>>  >> >>
>>  >> >> -----Original message-----
>>  >> >> > From:Yonik Seeley <ys...@gmail.com>
>>  >> >> > Sent: Thursday 28th June 2018 23:30
>>  >> >> > To: solr-user@lucene.apache.org
>>  >> >> > Subject: Re: 7.3 appears to leak
>>  >> >> >
>>  >> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
>>  >> >> >
>>  >> >> > If these are actually filterCache entries being leaked, it stands to
>>  >> >> > reason that a whole searcher is being leaked somewhere.
>>  >> >> >
>>  >> >> > -Yonik
>>  >> >> >
>>  >> >>
>>  >>