You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Bernd Fehling (Created) (JIRA)" <ji...@apache.org> on 2012/03/27 11:01:34 UTC

[jira] [Created] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication
-------------------------------------------------------------------------------------------

                 Key: SOLR-3280
                 URL: https://issues.apache.org/jira/browse/SOLR-3280
             Project: Solr
          Issue Type: Improvement
    Affects Versions: 3.5, 3.6, 4.0
            Reporter: Bernd Fehling


There are sometimes to many and also stale CLOSE_WAIT connections during/after replication left over on SLAVE server.
Normally GC should clean up this but this is not always the case.
Also if a CLOSE_WAIT is hanging then the new replication won't load.

Dirty work around so far is to fake a TCP connection as root to that connection and close it. 
After that the new replication will load, the old index and searcher released and the system will
return to normal operation.

Background:
The SnapPuller is using Apache httpclient 3.x and uses the MultiThreadedHttpConnectionManager.
The manager holds a connection in CLOSE_WAIT after its use for further requests.
This is done by calling releaseConnection. But if a connection is stuck it is not available any more and a new
connection from the pool is used.

Solution:
After calling releaseConnection clean up with closeIdleConnections(0).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

Posted by "Bernd Fehling (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246196#comment-13246196 ] 

Bernd Fehling commented on SOLR-3280:
-------------------------------------

Sorry I can't specify it any closer, a "network hiccup" or the computing center is configuring something at the network. I don't know. There is nothing in the solr logs, just hanging. The old index is still at work and serving the requests.
I located this with the server sys logs because the space the index located in data directory had doubled its size for longer than 1 day. One slave had this in August and October last year (solr 3.3) the other slave in October (solr 3.3) and January this year (solr 3.5). After seeing with netstat the CLOSE_WAIT and forcing it to close the system went back to normal operation, started a new searcher with new index and close the old searcher and deleted the old index.


                
> to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3280
>                 URL: https://issues.apache.org/jira/browse/SOLR-3280
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5, 3.6, 4.0
>            Reporter: Bernd Fehling
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: SOLR-3280.patch
>
>
> There are sometimes to many and also stale CLOSE_WAIT connections during/after replication left over on SLAVE server.
> Normally GC should clean up this but this is not always the case.
> Also if a CLOSE_WAIT is hanging then the new replication won't load.
> Dirty work around so far is to fake a TCP connection as root to that connection and close it. 
> After that the new replication will load, the old index and searcher released and the system will
> return to normal operation.
> Background:
> The SnapPuller is using Apache httpclient 3.x and uses the MultiThreadedHttpConnectionManager.
> The manager holds a connection in CLOSE_WAIT after its use for further requests.
> This is done by calling releaseConnection. But if a connection is stuck it is not available any more and a new
> connection from the pool is used.
> Solution:
> After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

Posted by "Bernd Fehling (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bernd Fehling reassigned SOLR-3280:
-----------------------------------

    Assignee: Robert Muir  (was: Bernd Fehling)
    
> to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3280
>                 URL: https://issues.apache.org/jira/browse/SOLR-3280
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5, 3.6, 4.0
>            Reporter: Bernd Fehling
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: SOLR-3280.patch
>
>
> There are sometimes to many and also stale CLOSE_WAIT connections during/after replication left over on SLAVE server.
> Normally GC should clean up this but this is not always the case.
> Also if a CLOSE_WAIT is hanging then the new replication won't load.
> Dirty work around so far is to fake a TCP connection as root to that connection and close it. 
> After that the new replication will load, the old index and searcher released and the system will
> return to normal operation.
> Background:
> The SnapPuller is using Apache httpclient 3.x and uses the MultiThreadedHttpConnectionManager.
> The manager holds a connection in CLOSE_WAIT after its use for further requests.
> This is done by calling releaseConnection. But if a connection is stuck it is not available any more and a new
> connection from the pool is used.
> Solution:
> After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

Posted by "Sami Siren (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246154#comment-13246154 ] 

Sami Siren commented on SOLR-3280:
----------------------------------

bq. but if something goes wrong (which is very seldom on my systems) the connection will hang on CLOSE_WAIT and the new index is not swapped in.

do you have idea what this something is, anything in the logs?


bq. The patch is just releasing the connection, if it hangs or not, and keeps everything operational. So no harm or performance impact for replication.

Yeah I agree. The performance impact is minimal.
                
> to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3280
>                 URL: https://issues.apache.org/jira/browse/SOLR-3280
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5, 3.6, 4.0
>            Reporter: Bernd Fehling
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: SOLR-3280.patch
>
>
> There are sometimes to many and also stale CLOSE_WAIT connections during/after replication left over on SLAVE server.
> Normally GC should clean up this but this is not always the case.
> Also if a CLOSE_WAIT is hanging then the new replication won't load.
> Dirty work around so far is to fake a TCP connection as root to that connection and close it. 
> After that the new replication will load, the old index and searcher released and the system will
> return to normal operation.
> Background:
> The SnapPuller is using Apache httpclient 3.x and uses the MultiThreadedHttpConnectionManager.
> The manager holds a connection in CLOSE_WAIT after its use for further requests.
> This is done by calling releaseConnection. But if a connection is stuck it is not available any more and a new
> connection from the pool is used.
> Solution:
> After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

Posted by "Sami Siren (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246130#comment-13246130 ] 

Sami Siren commented on SOLR-3280:
----------------------------------

I did some testing around replication: 2 nodes on same lan, heavy replication/heavy indexing and did not see any sockets in CLOSE_WAIT state after running it for about 1 hour. 

Perhaps you have a firewall between master and slave that drops "idle" connections somehow wrongly?

                
> to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3280
>                 URL: https://issues.apache.org/jira/browse/SOLR-3280
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5, 3.6, 4.0
>            Reporter: Bernd Fehling
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: SOLR-3280.patch
>
>
> There are sometimes to many and also stale CLOSE_WAIT connections during/after replication left over on SLAVE server.
> Normally GC should clean up this but this is not always the case.
> Also if a CLOSE_WAIT is hanging then the new replication won't load.
> Dirty work around so far is to fake a TCP connection as root to that connection and close it. 
> After that the new replication will load, the old index and searcher released and the system will
> return to normal operation.
> Background:
> The SnapPuller is using Apache httpclient 3.x and uses the MultiThreadedHttpConnectionManager.
> The manager holds a connection in CLOSE_WAIT after its use for further requests.
> This is done by calling releaseConnection. But if a connection is stuck it is not available any more and a new
> connection from the pool is used.
> Solution:
> After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

Posted by "Bernd Fehling (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bernd Fehling updated SOLR-3280:
--------------------------------

    Attachment: SOLR-3280.patch

This patch will fix the CLOSE_WAIT issue.

                
> to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3280
>                 URL: https://issues.apache.org/jira/browse/SOLR-3280
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 3.5, 3.6, 4.0
>            Reporter: Bernd Fehling
>         Attachments: SOLR-3280.patch
>
>
> There are sometimes to many and also stale CLOSE_WAIT connections during/after replication left over on SLAVE server.
> Normally GC should clean up this but this is not always the case.
> Also if a CLOSE_WAIT is hanging then the new replication won't load.
> Dirty work around so far is to fake a TCP connection as root to that connection and close it. 
> After that the new replication will load, the old index and searcher released and the system will
> return to normal operation.
> Background:
> The SnapPuller is using Apache httpclient 3.x and uses the MultiThreadedHttpConnectionManager.
> The manager holds a connection in CLOSE_WAIT after its use for further requests.
> This is done by calling releaseConnection. But if a connection is stuck it is not available any more and a new
> connection from the pool is used.
> Solution:
> After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

Posted by "Bernd Fehling (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246139#comment-13246139 ] 

Bernd Fehling commented on SOLR-3280:
-------------------------------------

Nope, no firewall. I have 1 master and 2 slaves on the same lan. After replication finished the connection on master is closed, the connection on slave is in CLOSE_WAIT with a Receive-Queue 1 byte. If everything goes well the connection will be reused by MultiThreadedHttpConnectionManager, but if something goes wrong (which is very seldom on my systems) the connection will hang on CLOSE_WAIT and the new index is not swapped in.
If you use jvisualvm on that slave and go to the MBeans tab you can see "solr/" in the tree but you can't open it because there is no sub-tree.
The patch is just releasing the connection, if it hangs or not, and keeps everything operational. So no harm or performance impact for replication.


                
> to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3280
>                 URL: https://issues.apache.org/jira/browse/SOLR-3280
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5, 3.6, 4.0
>            Reporter: Bernd Fehling
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: SOLR-3280.patch
>
>
> There are sometimes to many and also stale CLOSE_WAIT connections during/after replication left over on SLAVE server.
> Normally GC should clean up this but this is not always the case.
> Also if a CLOSE_WAIT is hanging then the new replication won't load.
> Dirty work around so far is to fake a TCP connection as root to that connection and close it. 
> After that the new replication will load, the old index and searcher released and the system will
> return to normal operation.
> Background:
> The SnapPuller is using Apache httpclient 3.x and uses the MultiThreadedHttpConnectionManager.
> The manager holds a connection in CLOSE_WAIT after its use for further requests.
> This is done by calling releaseConnection. But if a connection is stuck it is not available any more and a new
> connection from the pool is used.
> Solution:
> After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

Posted by "Bernd Fehling (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bernd Fehling updated SOLR-3280:
--------------------------------

      Priority: Minor  (was: Major)
    Issue Type: Bug  (was: Improvement)
    
> to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3280
>                 URL: https://issues.apache.org/jira/browse/SOLR-3280
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.5, 3.6, 4.0
>            Reporter: Bernd Fehling
>            Assignee: Bernd Fehling
>            Priority: Minor
>         Attachments: SOLR-3280.patch
>
>
> There are sometimes to many and also stale CLOSE_WAIT connections during/after replication left over on SLAVE server.
> Normally GC should clean up this but this is not always the case.
> Also if a CLOSE_WAIT is hanging then the new replication won't load.
> Dirty work around so far is to fake a TCP connection as root to that connection and close it. 
> After that the new replication will load, the old index and searcher released and the system will
> return to normal operation.
> Background:
> The SnapPuller is using Apache httpclient 3.x and uses the MultiThreadedHttpConnectionManager.
> The manager holds a connection in CLOSE_WAIT after its use for further requests.
> This is done by calling releaseConnection. But if a connection is stuck it is not available any more and a new
> connection from the pool is used.
> Solution:
> After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

Posted by "Bernd Fehling (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bernd Fehling reassigned SOLR-3280:
-----------------------------------

    Assignee: Bernd Fehling
    
> to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3280
>                 URL: https://issues.apache.org/jira/browse/SOLR-3280
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 3.5, 3.6, 4.0
>            Reporter: Bernd Fehling
>            Assignee: Bernd Fehling
>         Attachments: SOLR-3280.patch
>
>
> There are sometimes to many and also stale CLOSE_WAIT connections during/after replication left over on SLAVE server.
> Normally GC should clean up this but this is not always the case.
> Also if a CLOSE_WAIT is hanging then the new replication won't load.
> Dirty work around so far is to fake a TCP connection as root to that connection and close it. 
> After that the new replication will load, the old index and searcher released and the system will
> return to normal operation.
> Background:
> The SnapPuller is using Apache httpclient 3.x and uses the MultiThreadedHttpConnectionManager.
> The manager holds a connection in CLOSE_WAIT after its use for further requests.
> This is done by calling releaseConnection. But if a connection is stuck it is not available any more and a new
> connection from the pool is used.
> Solution:
> After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org