You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by "Jörg Henne (JIRA)" <ji...@apache.org> on 2006/10/12 10:50:36 UTC

[jira] Commented: (DIRSERVER-586) Reliable hang of DS during query

    [ http://issues.apache.org/jira/browse/DIRSERVER-586?page=comments#action_12441669 ] 
            
Jörg Henne commented on DIRSERVER-586:
--------------------------------------

After a thorough debugging session I've come to the conclusion that this is, in fact, not a problem of either MINA or DS, but a problem generated by Windows XP's application level gateway which is part of the Windows internet firewall. Sorry for accusing non-culprits for this mess... :-/

Just in case anybody cares, I'll give a quick roundup of what I found:
I started by generating traces of calls and data flow both on the client and the server side by adding appropriate debugging code to MINA's SocketIoProcessor and SUN's LDAP Connection object (the latter by downloading and modifying the sources). I generated separate traces per connection, i.e. text files named after the local port number of the client. Early on I noticed that the port numbers on the client and the server didn't match, because of the fact that the Windows internet firewall proxies those calls through the application level gateway (i.e. there are in fact two connections, one from the client to the gateway and one from the gateway to the server - all of which can be seen using netstat or Sysinternal's TCPView). I wasn't terribly worried about this, because things should work even with the gateway in place.

One interesting thing I noticed is that under high networking loads, i.e. about 20 active and open connections, the application level gateway seems to "lose it", which is indicated by new connections being made directly, bypassing the application level gateway. In other words: for some new connections the port numbers did suddenly match up. Note to the guys with the black hats: you' may want to try to by-pass the application level gateway by inundating it with connections for a brief period.

Anyway, back to the problem: once those "direct" connections start to occur, some other, previously existing connections seem to go dead: the client sends something, but the server never receives anything causing the client to time out.

The weird thing about the application level gateway is that it is not only used for connections crossing a protected gateway, but for all connections, even local loopbacks. In other words: if you have even one interface with an active firewall in your system (which I do, for the wireless interface), even if this interface is down, all TCP connections go through the application level gateway.

Well, an of course the punchline of all that is: once you completely turn off the Windows internet firewall by shutting down the respective service, everything works fine and rock-solid again. *sigh* 

> Reliable hang of DS during query
> --------------------------------
>
>                 Key: DIRSERVER-586
>                 URL: http://issues.apache.org/jira/browse/DIRSERVER-586
>             Project: Directory ApacheDS
>          Issue Type: Bug
>         Environment: DS 0.9.3, Windows, JDK 1.5
>            Reporter: Jörg Henne
>         Assigned To: Emmanuel Lecharny
>         Attachments: bugreport.zip, TestHang.java
>
>
> When running the attached test, the directory server hangs after executing a slew of operations when searching for objects.
> First of all, some background on the test case:
> The attached test case (in the form of an exported eclipse project) is, unfortunately, based on quite a few classes. They are part of a project I am currently working on: an object to ldap mapper with a similar approach as castor for XML or hibernate for RDBMS, albeit a lot more modest in complexity (I'll, hopefully, one day be able to open-source it - for now it is still much to immature). I have supplied all that stuff mainly for your reference.
> To run the test case, please make sure that the constant "URL" in LDAPDirectoryTest points to a valid directory. The URL the context points to must exist. It will, however, subsequently create lots of nodes below it.
> The hang seems to be related to some kind of deadlock, since it doesn't occur once the whole test is run via a single context only. To achieve this, set the constant "ONE_CONTEXT" to true (each LDAPDirectory uses its own set of contexts).
> If you have any problems running the test, please don't hesitate to contact me.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira