You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Eric Bus (JIRA)" <ji...@apache.org> on 2013/11/07 10:27:17 UTC

[jira] [Updated] (SOLR-5427) SolrCloud leaking (many) filehandles to deleted files

     [ https://issues.apache.org/jira/browse/SOLR-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Bus updated SOLR-5427:
---------------------------

    Description: 
I'm running SolrCloud on three nodes. I've been experiencing strange problems on these nodes. The main problem is that my disk is filling up, because old tlog files are not being released by SOLR.

I suspect this problem is caused by a lot of open connectins between the nodes in CLOSE_WAIT status. After running a node for only 2 days, the node already has 33 connections and about 11.000 deleted files that are still open.

I'm running about 100 cores on each nodes. Could this be causing the rate in which things are going wrong? I suspect that on a setup with only 1 collection and 3 shards, the problem stays hidden for quite some time.

lsof -p 15452 -n | grep -i tcp | grep CLOSE_WAIT

java    15452 root   45u  IPv6           70692577        0t0      TCP 11.1.0.12:46533->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root   48u  IPv6           70692579        0t0      TCP 11.1.0.12:46535->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root  205u  IPv6           72759434        0t0      TCP 11.1.0.12:41744->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root  378u  IPv6           72359115        0t0      TCP 11.1.0.12:44767->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root  381u  IPv6           72359116        0t0      TCP 11.1.0.12:44768->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root 5252u  IPv6           72759445        0t0      TCP 11.1.0.12:41751->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root 6193u  IPv6           74021651        0t0      TCP 11.1.0.12:39170->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *150u  IPv6           74021648        0t0      TCP 11.1.0.12:53865->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root *152u  IPv6           72759424        0t0      TCP 11.1.0.12:41737->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *526u  IPv6           74027995        0t0      TCP 11.1.0.12:53965->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root *986u  IPv6           72768637        0t0      TCP 11.1.0.12:42246->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *626u  IPv6           72749983        0t0      TCP 11.1.0.12:41297->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *476u  IPv6           72768633        0t0      TCP 11.1.0.12:42243->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *567u  IPv6           72768622        0t0      TCP 11.1.0.12:42234->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *732u  IPv6           72768599        0t0      TCP 11.1.0.12:42230->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *799u  IPv6           72759427        0t0      TCP 11.1.0.12:41739->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *259u  IPv6           72768626        0t0      TCP 11.1.0.12:42237->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *272u  IPv6           72768997        0t0      TCP 11.1.0.12:42263->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *493u  IPv6           72759407        0t0      TCP 11.1.0.12:41729->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *693u  IPv6           74020909        0t0      TCP 11.1.0.12:53853->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root *740u  IPv6           72749996        0t0      TCP 11.1.0.12:41306->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *749u  IPv6           73975230        0t0      TCP 11.1.0.12:38825->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *750u  IPv6           73974619        0t0      TCP 11.1.0.12:53499->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root *771u  IPv6           72759420        0t0      TCP 11.1.0.12:41734->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *793u  IPv6           72768653        0t0      TCP 11.1.0.12:42256->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *900u  IPv6           72768618        0t0      TCP 11.1.0.12:42233->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *045u  IPv6           72766477        0t0      TCP 11.1.0.12:41181->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *233u  IPv6           73975035        0t0      TCP 11.1.0.12:38812->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *476u  IPv6           74025479        0t0      TCP 11.1.0.12:39225->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *512u  IPv6           74030407        0t0      TCP 11.1.0.12:39312->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *533u  IPv6           74021649        0t0      TCP 11.1.0.12:40102->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *716u  IPv6           74020899        0t0      TCP 11.1.0.12:53850->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root *837u  IPv6           73975224        0t0      TCP 11.1.0.12:38819->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *009u  IPv6           74020894        0t0      TCP 11.1.0.12:53849->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root *112u  IPv6           74021642        0t0      TCP 11.1.0.12:53861->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root *118u  IPv6           74020764        0t0      TCP 11.1.0.12:39147->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *119u  IPv6           74021645        0t0      TCP 11.1.0.12:40100->11.1.0.12:http-alt (CLOSE_WAIT)
java    15452 root *145u  IPv6           74020893        0t0      TCP 11.1.0.12:53848->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root *189u  IPv6           73975034        0t0      TCP 11.1.0.12:38811->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *246u  IPv6           73975226        0t0      TCP 11.1.0.12:38821->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *285u  IPv6           74020912        0t0      TCP 11.1.0.12:53854->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root *364u  IPv6           73974620        0t0      TCP 11.1.0.12:38804->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *368u  IPv6           74020903        0t0      TCP 11.1.0.12:53852->11.1.0.13:http-alt (CLOSE_WAIT)
java    15452 root *546u  IPv6           74021647        0t0      TCP 11.1.0.12:39167->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *668u  IPv6           73975222        0t0      TCP 11.1.0.12:38817->11.1.0.11:http-alt (CLOSE_WAIT)
java    15452 root *717u  IPv6           73975249        0t0      TCP 11.1.0.12:53530->11.1.0.13:http-alt (CLOSE_WAIT)


  was:
I'm running SolrCloud on three nodes. I've been experiencing strange problems on these nodes. The main problem is that my disk is filling up, because old tlog files are not being released by SOLR.

I suspect this problem is caused by a lot of open connectins between the nodes in CLOSE_WAIT status. After running a node for only 2 days, the node already has 33 connections and about 11.000 deleted files that are still open.

I'm running about 100 cores on each nodes. Could this be causing the rate in which things are going wrong? I suspect that on a setup with only 1 collection and 3 shards, the problem stays hidden for quite some time.


> SolrCloud leaking (many) filehandles to deleted files
> -----------------------------------------------------
>
>                 Key: SOLR-5427
>                 URL: https://issues.apache.org/jira/browse/SOLR-5427
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.3, 4.4, 4.5
>         Environment: Debian Linux 6.0 running on VMWare
> Tomcat 6
>            Reporter: Eric Bus
>
> I'm running SolrCloud on three nodes. I've been experiencing strange problems on these nodes. The main problem is that my disk is filling up, because old tlog files are not being released by SOLR.
> I suspect this problem is caused by a lot of open connectins between the nodes in CLOSE_WAIT status. After running a node for only 2 days, the node already has 33 connections and about 11.000 deleted files that are still open.
> I'm running about 100 cores on each nodes. Could this be causing the rate in which things are going wrong? I suspect that on a setup with only 1 collection and 3 shards, the problem stays hidden for quite some time.
> lsof -p 15452 -n | grep -i tcp | grep CLOSE_WAIT
> java    15452 root   45u  IPv6           70692577        0t0      TCP 11.1.0.12:46533->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root   48u  IPv6           70692579        0t0      TCP 11.1.0.12:46535->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root  205u  IPv6           72759434        0t0      TCP 11.1.0.12:41744->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root  378u  IPv6           72359115        0t0      TCP 11.1.0.12:44767->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root  381u  IPv6           72359116        0t0      TCP 11.1.0.12:44768->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root 5252u  IPv6           72759445        0t0      TCP 11.1.0.12:41751->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root 6193u  IPv6           74021651        0t0      TCP 11.1.0.12:39170->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *150u  IPv6           74021648        0t0      TCP 11.1.0.12:53865->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root *152u  IPv6           72759424        0t0      TCP 11.1.0.12:41737->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *526u  IPv6           74027995        0t0      TCP 11.1.0.12:53965->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root *986u  IPv6           72768637        0t0      TCP 11.1.0.12:42246->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *626u  IPv6           72749983        0t0      TCP 11.1.0.12:41297->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *476u  IPv6           72768633        0t0      TCP 11.1.0.12:42243->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *567u  IPv6           72768622        0t0      TCP 11.1.0.12:42234->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *732u  IPv6           72768599        0t0      TCP 11.1.0.12:42230->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *799u  IPv6           72759427        0t0      TCP 11.1.0.12:41739->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *259u  IPv6           72768626        0t0      TCP 11.1.0.12:42237->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *272u  IPv6           72768997        0t0      TCP 11.1.0.12:42263->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *493u  IPv6           72759407        0t0      TCP 11.1.0.12:41729->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *693u  IPv6           74020909        0t0      TCP 11.1.0.12:53853->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root *740u  IPv6           72749996        0t0      TCP 11.1.0.12:41306->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *749u  IPv6           73975230        0t0      TCP 11.1.0.12:38825->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *750u  IPv6           73974619        0t0      TCP 11.1.0.12:53499->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root *771u  IPv6           72759420        0t0      TCP 11.1.0.12:41734->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *793u  IPv6           72768653        0t0      TCP 11.1.0.12:42256->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *900u  IPv6           72768618        0t0      TCP 11.1.0.12:42233->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *045u  IPv6           72766477        0t0      TCP 11.1.0.12:41181->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *233u  IPv6           73975035        0t0      TCP 11.1.0.12:38812->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *476u  IPv6           74025479        0t0      TCP 11.1.0.12:39225->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *512u  IPv6           74030407        0t0      TCP 11.1.0.12:39312->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *533u  IPv6           74021649        0t0      TCP 11.1.0.12:40102->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *716u  IPv6           74020899        0t0      TCP 11.1.0.12:53850->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root *837u  IPv6           73975224        0t0      TCP 11.1.0.12:38819->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *009u  IPv6           74020894        0t0      TCP 11.1.0.12:53849->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root *112u  IPv6           74021642        0t0      TCP 11.1.0.12:53861->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root *118u  IPv6           74020764        0t0      TCP 11.1.0.12:39147->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *119u  IPv6           74021645        0t0      TCP 11.1.0.12:40100->11.1.0.12:http-alt (CLOSE_WAIT)
> java    15452 root *145u  IPv6           74020893        0t0      TCP 11.1.0.12:53848->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root *189u  IPv6           73975034        0t0      TCP 11.1.0.12:38811->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *246u  IPv6           73975226        0t0      TCP 11.1.0.12:38821->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *285u  IPv6           74020912        0t0      TCP 11.1.0.12:53854->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root *364u  IPv6           73974620        0t0      TCP 11.1.0.12:38804->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *368u  IPv6           74020903        0t0      TCP 11.1.0.12:53852->11.1.0.13:http-alt (CLOSE_WAIT)
> java    15452 root *546u  IPv6           74021647        0t0      TCP 11.1.0.12:39167->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *668u  IPv6           73975222        0t0      TCP 11.1.0.12:38817->11.1.0.11:http-alt (CLOSE_WAIT)
> java    15452 root *717u  IPv6           73975249        0t0      TCP 11.1.0.12:53530->11.1.0.13:http-alt (CLOSE_WAIT)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org