You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Markus Jelsma <ma...@openindex.io> on 2012/03/01 11:36:52 UTC
[SoldCloud] leaking file descriptors
Hi,
Yesterday we had an issue with too many open files, which was solved
because a username was misspelled. But there is still a problem with
open files.
We cannot succesfully index a few millions documents from MapReduce to
a 5-node Solr cloud cluster. One of the problems is that after a while
ClassNotFoundErrors and other similar weirdness begin to appear. This
will not solve itself if indexing is stopped.
With lsof i found that Solr keeps open roughly 9k files 8 hours after
indexing failed. Out of the 9k there are roughly 7.5k deleted files that
still have a file descriptor open for the tomcat6 user, these are all
segments files:
/opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
java 10049 tomcat6 DEL REG 9,0
515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx
java 10049 tomcat6 DEL REG 9,0
515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx
java 10049 tomcat6 DEL REG 9,0
515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs
java 10049 tomcat6 DEL REG 9,0
515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs
java 10049 tomcat6 DEL REG 9,0
515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim
java 10049 tomcat6 DEL REG 9,0
515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx
java 10049 tomcat6 DEL REG 9,0
515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq
.... any many more
Did i misconfigure anything? This is a pretty standard (no changes to
IndexDefaults section) and a recent Solr trunk revision. Is there a bug
somewhere?
Thanks,
Markus
Re: [SoldCloud] leaking file descriptors
Posted by Markus Jelsma <ma...@openindex.io>.
On Thursday 01 March 2012 13:03:18 Bernd Fehling wrote:
> What is netstat telling you about the connections on the servers?
>
> Any connections in "CLOSE_WAIT" (passive close) hanging?
I can't tell exact numbers right now but there were a lot between all the
cores and the indexing clients.
>
> Saw this on my servers last week.
> Used a little proggi to spoof a local connection on those servers ports
> and was able to fake the TCP-stack to close those connections.
> It also immediately released all open fd's set to DEL and cleaned
> everything up without restarting.
Interesting! But sounds like a sneaky work-around :)
>
> Regards
> Bernd
>
> Am 01.03.2012 11:36, schrieb Markus Jelsma:
> > Hi,
> >
> > Yesterday we had an issue with too many open files, which was solved
> > because a username was misspelled. But there is still a problem with
> > open files.
> >
> > We cannot succesfully index a few millions documents from MapReduce to a
> > 5-node Solr cloud cluster. One of the problems is that after a while
> > ClassNotFoundErrors and other similar weirdness begin to appear. This
> > will not solve itself if indexing is stopped.
> >
> > With lsof i found that Solr keeps open roughly 9k files 8 hours after
> > indexing failed. Out of the 9k there are roughly 7.5k deleted files that
> > still have a file descriptor open for the tomcat6 user, these are all
> > segments files:
> >
> > /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
> > java 10049 tomcat6 DEL REG 9,0 515607
> > /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx java 10049
> > tomcat6 DEL REG 9,0 515504
> > /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx java 10049
> > tomcat6 DEL REG 9,0 515735
> > /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs java 10049
> > tomcat6 DEL REG 9,0 515595
> > /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs java 10049
> > tomcat6 DEL REG 9,0 515592
> > /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim java 10049
> > tomcat6 DEL REG 9,0 515591
> > /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx java 10049
> > tomcat6 DEL REG 9,0 515590
> > /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq .... any many
> > more
> >
> > Did i misconfigure anything? This is a pretty standard (no changes to
> > IndexDefaults section) and a recent Solr trunk revision. Is there a bug
> > somewhere?
> >
> > Thanks,
> > Markus
--
Markus Jelsma - CTO - Openindex
Re: [SoldCloud] leaking file descriptors
Posted by Bernd Fehling <be...@uni-bielefeld.de>.
What is netstat telling you about the connections on the servers?
Any connections in "CLOSE_WAIT" (passive close) hanging?
Saw this on my servers last week.
Used a little proggi to spoof a local connection on those servers ports
and was able to fake the TCP-stack to close those connections.
It also immediately released all open fd's set to DEL and cleaned
everything up without restarting.
Regards
Bernd
Am 01.03.2012 11:36, schrieb Markus Jelsma:
> Hi,
>
> Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open
> files.
>
> We cannot succesfully index a few millions documents from MapReduce to a 5-node Solr cloud cluster. One of the problems is that after a while
> ClassNotFoundErrors and other similar weirdness begin to appear. This will not solve itself if indexing is stopped.
>
> With lsof i found that Solr keeps open roughly 9k files 8 hours after indexing failed. Out of the 9k there are roughly 7.5k deleted files that
> still have a file descriptor open for the tomcat6 user, these are all segments files:
>
> /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
> java 10049 tomcat6 DEL REG 9,0 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx
> java 10049 tomcat6 DEL REG 9,0 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx
> java 10049 tomcat6 DEL REG 9,0 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs
> java 10049 tomcat6 DEL REG 9,0 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs
> java 10049 tomcat6 DEL REG 9,0 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim
> java 10049 tomcat6 DEL REG 9,0 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx
> java 10049 tomcat6 DEL REG 9,0 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq
> .... any many more
>
> Did i misconfigure anything? This is a pretty standard (no changes to IndexDefaults section) and a recent Solr trunk revision. Is there a bug
> somewhere?
>
> Thanks,
> Markus
Re: [SoldCloud] leaking file descriptors
Posted by Sami Siren <ss...@gmail.com>.
Do you have autocommit enabled? I tested this with 1m docs indexed by
using the default example config and saw used file descriptors go up
to 2400 (did not come down even after the final commit at the end).
Then I disabled autocommit, reindexed and the descriptor count stayed
pretty much flat at around 400-500.
--
Sami Siren
On Thu, Mar 1, 2012 at 12:36 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> Hi,
>
> Yesterday we had an issue with too many open files, which was solved because
> a username was misspelled. But there is still a problem with open files.
>
> We cannot succesfully index a few millions documents from MapReduce to a
> 5-node Solr cloud cluster. One of the problems is that after a while
> ClassNotFoundErrors and other similar weirdness begin to appear. This will
> not solve itself if indexing is stopped.
>
> With lsof i found that Solr keeps open roughly 9k files 8 hours after
> indexing failed. Out of the 9k there are roughly 7.5k deleted files that
> still have a file descriptor open for the tomcat6 user, these are all
> segments files:
>
> /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd
> java 10049 tomcat6 DEL REG 9,0
> 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx
> java 10049 tomcat6 DEL REG 9,0
> 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx
> java 10049 tomcat6 DEL REG 9,0
> 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs
> java 10049 tomcat6 DEL REG 9,0
> 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs
> java 10049 tomcat6 DEL REG 9,0
> 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim
> java 10049 tomcat6 DEL REG 9,0
> 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx
> java 10049 tomcat6 DEL REG 9,0
> 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq
> .... any many more
>
> Did i misconfigure anything? This is a pretty standard (no changes to
> IndexDefaults section) and a recent Solr trunk revision. Is there a bug
> somewhere?
>
> Thanks,
> Markus