You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Eric Bus (JIRA)" <ji...@apache.org> on 2013/11/07 10:27:17 UTC
[jira] [Updated] (SOLR-5427) SolrCloud leaking (many) filehandles
to deleted files
[ https://issues.apache.org/jira/browse/SOLR-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Bus updated SOLR-5427:
---------------------------
Description:
I'm running SolrCloud on three nodes. I've been experiencing strange problems on these nodes. The main problem is that my disk is filling up, because old tlog files are not being released by SOLR.
I suspect this problem is caused by a lot of open connectins between the nodes in CLOSE_WAIT status. After running a node for only 2 days, the node already has 33 connections and about 11.000 deleted files that are still open.
I'm running about 100 cores on each nodes. Could this be causing the rate in which things are going wrong? I suspect that on a setup with only 1 collection and 3 shards, the problem stays hidden for quite some time.
lsof -p 15452 -n | grep -i tcp | grep CLOSE_WAIT
java 15452 root 45u IPv6 70692577 0t0 TCP 11.1.0.12:46533->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root 48u IPv6 70692579 0t0 TCP 11.1.0.12:46535->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root 205u IPv6 72759434 0t0 TCP 11.1.0.12:41744->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root 378u IPv6 72359115 0t0 TCP 11.1.0.12:44767->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root 381u IPv6 72359116 0t0 TCP 11.1.0.12:44768->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root 5252u IPv6 72759445 0t0 TCP 11.1.0.12:41751->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root 6193u IPv6 74021651 0t0 TCP 11.1.0.12:39170->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *150u IPv6 74021648 0t0 TCP 11.1.0.12:53865->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root *152u IPv6 72759424 0t0 TCP 11.1.0.12:41737->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *526u IPv6 74027995 0t0 TCP 11.1.0.12:53965->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root *986u IPv6 72768637 0t0 TCP 11.1.0.12:42246->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *626u IPv6 72749983 0t0 TCP 11.1.0.12:41297->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *476u IPv6 72768633 0t0 TCP 11.1.0.12:42243->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *567u IPv6 72768622 0t0 TCP 11.1.0.12:42234->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *732u IPv6 72768599 0t0 TCP 11.1.0.12:42230->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *799u IPv6 72759427 0t0 TCP 11.1.0.12:41739->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *259u IPv6 72768626 0t0 TCP 11.1.0.12:42237->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *272u IPv6 72768997 0t0 TCP 11.1.0.12:42263->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *493u IPv6 72759407 0t0 TCP 11.1.0.12:41729->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *693u IPv6 74020909 0t0 TCP 11.1.0.12:53853->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root *740u IPv6 72749996 0t0 TCP 11.1.0.12:41306->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *749u IPv6 73975230 0t0 TCP 11.1.0.12:38825->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *750u IPv6 73974619 0t0 TCP 11.1.0.12:53499->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root *771u IPv6 72759420 0t0 TCP 11.1.0.12:41734->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *793u IPv6 72768653 0t0 TCP 11.1.0.12:42256->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *900u IPv6 72768618 0t0 TCP 11.1.0.12:42233->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *045u IPv6 72766477 0t0 TCP 11.1.0.12:41181->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *233u IPv6 73975035 0t0 TCP 11.1.0.12:38812->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *476u IPv6 74025479 0t0 TCP 11.1.0.12:39225->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *512u IPv6 74030407 0t0 TCP 11.1.0.12:39312->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *533u IPv6 74021649 0t0 TCP 11.1.0.12:40102->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *716u IPv6 74020899 0t0 TCP 11.1.0.12:53850->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root *837u IPv6 73975224 0t0 TCP 11.1.0.12:38819->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *009u IPv6 74020894 0t0 TCP 11.1.0.12:53849->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root *112u IPv6 74021642 0t0 TCP 11.1.0.12:53861->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root *118u IPv6 74020764 0t0 TCP 11.1.0.12:39147->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *119u IPv6 74021645 0t0 TCP 11.1.0.12:40100->11.1.0.12:http-alt (CLOSE_WAIT)
java 15452 root *145u IPv6 74020893 0t0 TCP 11.1.0.12:53848->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root *189u IPv6 73975034 0t0 TCP 11.1.0.12:38811->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *246u IPv6 73975226 0t0 TCP 11.1.0.12:38821->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *285u IPv6 74020912 0t0 TCP 11.1.0.12:53854->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root *364u IPv6 73974620 0t0 TCP 11.1.0.12:38804->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *368u IPv6 74020903 0t0 TCP 11.1.0.12:53852->11.1.0.13:http-alt (CLOSE_WAIT)
java 15452 root *546u IPv6 74021647 0t0 TCP 11.1.0.12:39167->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *668u IPv6 73975222 0t0 TCP 11.1.0.12:38817->11.1.0.11:http-alt (CLOSE_WAIT)
java 15452 root *717u IPv6 73975249 0t0 TCP 11.1.0.12:53530->11.1.0.13:http-alt (CLOSE_WAIT)
was:
I'm running SolrCloud on three nodes. I've been experiencing strange problems on these nodes. The main problem is that my disk is filling up, because old tlog files are not being released by SOLR.
I suspect this problem is caused by a lot of open connectins between the nodes in CLOSE_WAIT status. After running a node for only 2 days, the node already has 33 connections and about 11.000 deleted files that are still open.
I'm running about 100 cores on each nodes. Could this be causing the rate in which things are going wrong? I suspect that on a setup with only 1 collection and 3 shards, the problem stays hidden for quite some time.
> SolrCloud leaking (many) filehandles to deleted files
> -----------------------------------------------------
>
> Key: SOLR-5427
> URL: https://issues.apache.org/jira/browse/SOLR-5427
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 4.3, 4.4, 4.5
> Environment: Debian Linux 6.0 running on VMWare
> Tomcat 6
> Reporter: Eric Bus
>
> I'm running SolrCloud on three nodes. I've been experiencing strange problems on these nodes. The main problem is that my disk is filling up, because old tlog files are not being released by SOLR.
> I suspect this problem is caused by a lot of open connectins between the nodes in CLOSE_WAIT status. After running a node for only 2 days, the node already has 33 connections and about 11.000 deleted files that are still open.
> I'm running about 100 cores on each nodes. Could this be causing the rate in which things are going wrong? I suspect that on a setup with only 1 collection and 3 shards, the problem stays hidden for quite some time.
> lsof -p 15452 -n | grep -i tcp | grep CLOSE_WAIT
> java 15452 root 45u IPv6 70692577 0t0 TCP 11.1.0.12:46533->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root 48u IPv6 70692579 0t0 TCP 11.1.0.12:46535->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root 205u IPv6 72759434 0t0 TCP 11.1.0.12:41744->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root 378u IPv6 72359115 0t0 TCP 11.1.0.12:44767->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root 381u IPv6 72359116 0t0 TCP 11.1.0.12:44768->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root 5252u IPv6 72759445 0t0 TCP 11.1.0.12:41751->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root 6193u IPv6 74021651 0t0 TCP 11.1.0.12:39170->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *150u IPv6 74021648 0t0 TCP 11.1.0.12:53865->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root *152u IPv6 72759424 0t0 TCP 11.1.0.12:41737->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *526u IPv6 74027995 0t0 TCP 11.1.0.12:53965->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root *986u IPv6 72768637 0t0 TCP 11.1.0.12:42246->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *626u IPv6 72749983 0t0 TCP 11.1.0.12:41297->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *476u IPv6 72768633 0t0 TCP 11.1.0.12:42243->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *567u IPv6 72768622 0t0 TCP 11.1.0.12:42234->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *732u IPv6 72768599 0t0 TCP 11.1.0.12:42230->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *799u IPv6 72759427 0t0 TCP 11.1.0.12:41739->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *259u IPv6 72768626 0t0 TCP 11.1.0.12:42237->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *272u IPv6 72768997 0t0 TCP 11.1.0.12:42263->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *493u IPv6 72759407 0t0 TCP 11.1.0.12:41729->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *693u IPv6 74020909 0t0 TCP 11.1.0.12:53853->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root *740u IPv6 72749996 0t0 TCP 11.1.0.12:41306->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *749u IPv6 73975230 0t0 TCP 11.1.0.12:38825->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *750u IPv6 73974619 0t0 TCP 11.1.0.12:53499->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root *771u IPv6 72759420 0t0 TCP 11.1.0.12:41734->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *793u IPv6 72768653 0t0 TCP 11.1.0.12:42256->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *900u IPv6 72768618 0t0 TCP 11.1.0.12:42233->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *045u IPv6 72766477 0t0 TCP 11.1.0.12:41181->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *233u IPv6 73975035 0t0 TCP 11.1.0.12:38812->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *476u IPv6 74025479 0t0 TCP 11.1.0.12:39225->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *512u IPv6 74030407 0t0 TCP 11.1.0.12:39312->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *533u IPv6 74021649 0t0 TCP 11.1.0.12:40102->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *716u IPv6 74020899 0t0 TCP 11.1.0.12:53850->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root *837u IPv6 73975224 0t0 TCP 11.1.0.12:38819->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *009u IPv6 74020894 0t0 TCP 11.1.0.12:53849->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root *112u IPv6 74021642 0t0 TCP 11.1.0.12:53861->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root *118u IPv6 74020764 0t0 TCP 11.1.0.12:39147->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *119u IPv6 74021645 0t0 TCP 11.1.0.12:40100->11.1.0.12:http-alt (CLOSE_WAIT)
> java 15452 root *145u IPv6 74020893 0t0 TCP 11.1.0.12:53848->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root *189u IPv6 73975034 0t0 TCP 11.1.0.12:38811->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *246u IPv6 73975226 0t0 TCP 11.1.0.12:38821->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *285u IPv6 74020912 0t0 TCP 11.1.0.12:53854->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root *364u IPv6 73974620 0t0 TCP 11.1.0.12:38804->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *368u IPv6 74020903 0t0 TCP 11.1.0.12:53852->11.1.0.13:http-alt (CLOSE_WAIT)
> java 15452 root *546u IPv6 74021647 0t0 TCP 11.1.0.12:39167->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *668u IPv6 73975222 0t0 TCP 11.1.0.12:38817->11.1.0.11:http-alt (CLOSE_WAIT)
> java 15452 root *717u IPv6 73975249 0t0 TCP 11.1.0.12:53530->11.1.0.13:http-alt (CLOSE_WAIT)
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org