You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eric Bus <er...@websight.nl> on 2013/10/22 10:00:18 UTC

SOLR/Tomcat6 keeping references to deleted tlog files

Hi,

I've been running a SolrCloud setup running SOLR 4.4 consisting of 3 nodes for some time. The cloud is hosting about 40 small collections that receive updates once a day. The collections are using different shard and replication configurations (varying from 2 shards without replication to 2 shard with 3 replicas).

After running Tomcat for a couple of weeks, I notice the number of open files is dramatically increasing. Most of those files are deleted tlog files that SOLR keeps open:

eric@node1:/ # lsof -np 16810 | grep deleted | wc -l
36345

Those files are no longer on disk, but SOLR still has a handle open. My disk use is going through the roof. 6GB is currently 'in use' by deleted but still open files. When I restart Tomcat, the space is freed and it starts all over again. All of my nodes experience this behavior.

First I thought it had something to do with the lack of commits. But it happens on all my collections, even the ones with fast autoCommit:

    <autoCommit>
      <maxDocs>5000</maxDocs>
      <maxTime>120000</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>

My update process always triggers a commit or rollback and updates are showing up correctly.

I read something about SOLR having TCP connections in CLOSE_WAIT. The only CLOSE_WAIT connection I see are between the nodes. And there are only about 10 of them. Those connections can't be causing 36k open files, right?

Any suggestions/tips? At the moment, I have to restart my leader every couple of weeks and that's not really something I would like to do :)

Best regards,
Eric Bus


Re: SOLR/Tomcat6 keeping references to deleted tlog files

Posted by Erick Erickson <er...@gmail.com>.
Hmmmm, sounds like you've put some time into sleuthing here, cool!

Do you notice that your open file handles are increasing roughly
linearly with time? Assuming a relatively constant indexing rate, that's
what
I'd expect if Solr is just failing to close the tlog somehow.

I'm assuming no custom code here, thought I'd check to be sure though.

But what I'd do is wait a few more hours and see if some of the people deep
into SolrCloud answer (Yonik, Shalin, Noble, Mark, etc.). but absent a
response
from those folks this sounds like a JIRA in the making to me.... Those
folks are scattered all over the world...

Best,
Erick

P.S.
This is really a bit unrelated, but unless you're only indexing documents
very
slowly, your maxDocs number of docs is rather short FWIW. But this should
have no bearing on increasing file handles, just a side comment.


On Tue, Oct 22, 2013 at 10:00 AM, Eric Bus <er...@websight.nl> wrote:

> Hi,
>
> I've been running a SolrCloud setup running SOLR 4.4 consisting of 3 nodes
> for some time. The cloud is hosting about 40 small collections that receive
> updates once a day. The collections are using different shard and
> replication configurations (varying from 2 shards without replication to 2
> shard with 3 replicas).
>
> After running Tomcat for a couple of weeks, I notice the number of open
> files is dramatically increasing. Most of those files are deleted tlog
> files that SOLR keeps open:
>
> eric@node1:/ # lsof -np 16810 | grep deleted | wc -l
> 36345
>
> Those files are no longer on disk, but SOLR still has a handle open. My
> disk use is going through the roof. 6GB is currently 'in use' by deleted
> but still open files. When I restart Tomcat, the space is freed and it
> starts all over again. All of my nodes experience this behavior.
>
> First I thought it had something to do with the lack of commits. But it
> happens on all my collections, even the ones with fast autoCommit:
>
>     <autoCommit>
>       <maxDocs>5000</maxDocs>
>       <maxTime>120000</maxTime>
>       <openSearcher>false</openSearcher>
>     </autoCommit>
>
> My update process always triggers a commit or rollback and updates are
> showing up correctly.
>
> I read something about SOLR having TCP connections in CLOSE_WAIT. The only
> CLOSE_WAIT connection I see are between the nodes. And there are only about
> 10 of them. Those connections can't be causing 36k open files, right?
>
> Any suggestions/tips? At the moment, I have to restart my leader every
> couple of weeks and that's not really something I would like to do :)
>
> Best regards,
> Eric Bus
>
>