You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Markus Jelsma <ma...@openindex.io> on 2012/03/01 11:36:52 UTC

[SoldCloud] leaking file descriptors

 Hi,

 Yesterday we had an issue with too many open files, which was solved 
 because a username was misspelled. But there is still a problem with 
 open files.

 We cannot succesfully index a few millions documents from MapReduce to 
 a 5-node Solr cloud cluster. One of the problems is that after a while 
 ClassNotFoundErrors and other similar weirdness begin to appear. This 
 will not solve itself if indexing is stopped.

 With lsof i found that Solr keeps open roughly 9k files 8 hours after 
 indexing failed. Out of the 9k there are roughly 7.5k deleted files that 
 still have a file descriptor open for the tomcat6 user, these are all 
 segments files:

 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
 java      10049 tomcat6  DEL       REG                9,0               
 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx
 java      10049 tomcat6  DEL       REG                9,0               
 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx
 java      10049 tomcat6  DEL       REG                9,0               
 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs
 java      10049 tomcat6  DEL       REG                9,0               
 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs
 java      10049 tomcat6  DEL       REG                9,0               
 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim
 java      10049 tomcat6  DEL       REG                9,0               
 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx
 java      10049 tomcat6  DEL       REG                9,0               
 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq
 .... any many more

 Did i misconfigure anything? This is a pretty standard (no changes to 
 IndexDefaults section) and a recent Solr trunk revision. Is there a bug 
 somewhere?

 Thanks,
 Markus

Re: [SoldCloud] leaking file descriptors

Posted by Markus Jelsma <ma...@openindex.io>.

On Thursday 01 March 2012 13:03:18 Bernd Fehling wrote:
> What is netstat telling you about the connections on the servers?
> 
> Any connections in "CLOSE_WAIT" (passive close) hanging?

I can't tell exact numbers right now but there were a lot between all the 
cores and the indexing clients.

> 
> Saw this on my servers last week.
> Used a little proggi to spoof a local connection on those servers ports
> and was able to fake the TCP-stack to close those connections.
> It also immediately released all open fd's set to DEL and cleaned
> everything up without restarting.

Interesting! But sounds like a sneaky work-around :)

> 
> Regards
> Bernd
> 
> Am 01.03.2012 11:36, schrieb Markus Jelsma:
> > Hi,
> > 
> > Yesterday we had an issue with too many open files, which was solved
> > because a username was misspelled. But there is still a problem with
> > open files.
> > 
> > We cannot succesfully index a few millions documents from MapReduce to a
> > 5-node Solr cloud cluster. One of the problems is that after a while
> > ClassNotFoundErrors and other similar weirdness begin to appear. This
> > will not solve itself if indexing is stopped.
> > 
> > With lsof i found that Solr keeps open roughly 9k files 8 hours after
> > indexing failed. Out of the 9k there are roughly 7.5k deleted files that
> > still have a file descriptor open for the tomcat6 user, these are all
> > segments files:
> > 
> > /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
> > java 10049 tomcat6 DEL REG 9,0 515607
> > /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx java 10049
> > tomcat6 DEL REG 9,0 515504
> > /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx java 10049
> > tomcat6 DEL REG 9,0 515735
> > /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs java 10049
> > tomcat6 DEL REG 9,0 515595
> > /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs java 10049
> > tomcat6 DEL REG 9,0 515592
> > /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim java 10049
> > tomcat6 DEL REG 9,0 515591
> > /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx java 10049
> > tomcat6 DEL REG 9,0 515590
> > /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq .... any many
> > more
> > 
> > Did i misconfigure anything? This is a pretty standard (no changes to
> > IndexDefaults section) and a recent Solr trunk revision. Is there a bug
> > somewhere?
> > 
> > Thanks,
> > Markus

-- 
Markus Jelsma - CTO - Openindex

Re: [SoldCloud] leaking file descriptors

Posted by Bernd Fehling <be...@uni-bielefeld.de>.
What is netstat telling you about the connections on the servers?

Any connections in "CLOSE_WAIT" (passive close) hanging?

Saw this on my servers last week.
Used a little proggi to spoof a local connection on those servers ports
and was able to fake the TCP-stack to close those connections.
It also immediately released all open fd's set to DEL and cleaned
everything up without restarting.

Regards
Bernd


Am 01.03.2012 11:36, schrieb Markus Jelsma:
> Hi,
>
> Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open
> files.
>
> We cannot succesfully index a few millions documents from MapReduce to a 5-node Solr cloud cluster. One of the problems is that after a while
> ClassNotFoundErrors and other similar weirdness begin to appear. This will not solve itself if indexing is stopped.
>
> With lsof i found that Solr keeps open roughly 9k files 8 hours after indexing failed. Out of the 9k there are roughly 7.5k deleted files that
> still have a file descriptor open for the tomcat6 user, these are all segments files:
>
> /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
> java 10049 tomcat6 DEL REG 9,0 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx
> java 10049 tomcat6 DEL REG 9,0 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx
> java 10049 tomcat6 DEL REG 9,0 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs
> java 10049 tomcat6 DEL REG 9,0 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs
> java 10049 tomcat6 DEL REG 9,0 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim
> java 10049 tomcat6 DEL REG 9,0 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx
> java 10049 tomcat6 DEL REG 9,0 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq
> .... any many more
>
> Did i misconfigure anything? This is a pretty standard (no changes to IndexDefaults section) and a recent Solr trunk revision. Is there a bug
> somewhere?
>
> Thanks,
> Markus

Re: [SoldCloud] leaking file descriptors

Posted by Sami Siren <ss...@gmail.com>.
Do you have autocommit enabled? I tested this with 1m docs indexed by
using the default example config and saw used file descriptors go up
to 2400 (did not come down even after the final commit at the end).
Then I disabled autocommit, reindexed and the descriptor count stayed
pretty much flat at around 400-500.

--
 Sami Siren



On Thu, Mar 1, 2012 at 12:36 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> Hi,
>
> Yesterday we had an issue with too many open files, which was solved because
> a username was misspelled. But there is still a problem with open files.
>
> We cannot succesfully index a few millions documents from MapReduce to a
> 5-node Solr cloud cluster. One of the problems is that after a while
> ClassNotFoundErrors and other similar weirdness begin to appear. This will
> not solve itself if indexing is stopped.
>
> With lsof i found that Solr keeps open roughly 9k files 8 hours after
> indexing failed. Out of the 9k there are roughly 7.5k deleted files that
> still have a file descriptor open for the tomcat6 user, these are all
> segments files:
>
> /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
> java      10049 tomcat6  DEL       REG                9,0
> 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx
> java      10049 tomcat6  DEL       REG                9,0
> 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx
> java      10049 tomcat6  DEL       REG                9,0
> 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs
> java      10049 tomcat6  DEL       REG                9,0
> 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs
> java      10049 tomcat6  DEL       REG                9,0
> 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim
> java      10049 tomcat6  DEL       REG                9,0
> 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx
> java      10049 tomcat6  DEL       REG                9,0
> 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq
> .... any many more
>
> Did i misconfigure anything? This is a pretty standard (no changes to
> IndexDefaults section) and a recent Solr trunk revision. Is there a bug
> somewhere?
>
> Thanks,
> Markus