You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2017/07/17 15:42:52 UTC

[Bug 61313] New: JNDIRealm LDAP server failover to alternateURL takes very long 15m32s

https://bz.apache.org/bugzilla/show_bug.cgi?id=61313

            Bug ID: 61313
           Summary: JNDIRealm LDAP server failover to alternateURL takes
                    very long 15m32s
           Product: Tomcat 8
           Version: 8.5.16
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Catalina
          Assignee: dev@tomcat.apache.org
          Reporter: peter.maloney@brockmann-consult.de
  Target Milestone: ----

Created attachment 35146
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35146&action=edit
my hacky patch

I have worked with ‎csutherl‎ on #tomcat on irc.freenode.net who decided this
is a bug and should be reported here rather than the ML.

JDK version tested was oracle JDK 1.8.0_66.

I have set connectionURL and alternateURL to try to get LDAP server failover to
work.

If both servers are up, it "works well".
If only the connectionURL server is down (firewall is set to REJECT) then a
newly restarted tomcat works fine, but if it goes down while tomcat is already
running, the next LDAP lookup takes 15m32s.
If only the alternateURL server is down (firewall is set to REJECT) then a
newly restarted tomcat works fine, but if it goes down while tomcat is already
running, and the connectionURL server was down before but up again now (so the
JNDIRealm's instance variable "context" is currently set using alternateURL)
the next LDAP lookup takes 15m32s.

Setting connectionTimeout has no effect on the time. I have verified it ends up
in the Hashtable returned by getDirectoryContextEnvironment().

And if I apply my hacky patch (attached, applies to tomcat85 git repo, tag
TOMCAT_8_5_16) for the method "JNDIRealm.open()" so it works like a fresh
tomcat startup, and never returns the old context (closes it and sets it to
null, then lets the rest of the code run), it always "works well", taking 4-7s
on a fresh tomcat, or less than 0.1s on a warmed up tomcat.

server.xml snippets:

    <Realm className="org.apache.catalina.realm.MemoryRealm" digest="MD5" />
    </Realm>

    <Realm className="org.apache.catalina.realm.JNDIRealm"
        connectionURL="ldap://auth1:389"
        connectionTimeout="1000"
        connectionAttempt="0"
        alternateURL="ldap://auth2:389"
        userPattern="uid={0},ou=People,dc=example,dc=com"
        userRoleAttribute="gidNumber"
        roleBase="ou=Group,dc=example,dc=com"
        roleName="cn"
        roleSearch="(|(gidNumber={2})(memberUid={1}))"
        />

an example webapp that can be used for testing is attached

firewall test code (assuming otherwise blank firewall with policy ACCEPT):
    t() { iptables -D INPUT 1 ; iptables -I INPUT 1 -p tcp -s 10.3.0.21 -j "$1"
; iptables -nvL; }

    # run this on machines that should work
    t LOG
    # run this on machines that should fail
    t REJECT

the test:
    # prerequisite for the test is a server named "auth1" and another "auth2"
which run on port 389. In our case it's slapd, and they have start_tls enabled
but not required.
    # on both LDAP servers:
    t LOG

    # then start tomcat, then
    # on the first LDAP server:
    t REJECT

    # on a test machine (content of the user and password here shouldn't
matter... we aren't testing authentication, only time taken)
    time curl --user someuser:somepassword
http://exampledomain:8080/dummy-service/test/ldap

    (ignore response, but look at time taken)

result:
    Almost always takes 15m32.4s plus up to 4s or so, but usually within a few
ms.
    When it "works well" (described in detail above), the first run takes about
4-7s, and after that it takes around 0.02s to 0.1s.

expected:
    It should always be quick, within some small multiple of the connection
timeout, preferrably 1x. So in this case, it should take about 1s extra, or at
most a few seconds, so 1.1s on a warmed up tomcat.

Side comment: other LDAP clients support an arbitrary number of urls, or one
line with all the urls together... I find it limiting to have only 2 that you
can set here. We have 3 LDAP servers.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61313] JNDIRealm LDAP server failover to alternateURL takes very long 15m32s

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61313

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #7 from Mark Thomas <ma...@apache.org> ---
Thanks for the work on this and especially for the patches.

There is already retry code in the public authenticate() method.

Fundamentally this hits up against a problem that the previous code was trying
to deal with - how do you tell if a NamingException is something that requires
fail over? The short answer is you can't always tell and the previous code was
not failing over unless there was a clear indication (via the Exception type)
that such a fail over was required.

I have changed the logic so it always attempts to fail over. This won't be
perfect either but I'd rather it always attempted to fail over and therefore
always worked when it could and we get a few extra error messages than it
didn't try to fail over and therefore failed when a fail over could have
resulted in correct operation.

I've also included the readTimeout improvements.

Fixed in:
- trunk for 9.0.0.M25 onwards
- 8.5.x for 8.5.19 onwards
- 8.0.x for 8.0.46 onwards
- 7.0.x for 7.0.80 onwards

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61313] JNDIRealm LDAP server failover to alternateURL takes very long 15m32s

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61313

Peter <pe...@brockmann-consult.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #4 from Peter <pe...@brockmann-consult.de> ---
Using Felix's patch, letting the tomcat warm up with some ok requests and both
auth servers online, then taking the first down, then doing a request (to our
regular app, or the dummy), it takes 6s to fail and says:

<!doctype html><html lang="en"><head><title>HTTP Status 401 –
Unauthorized</title><style type="text/css">h1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
h2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
h3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
body {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;}
b {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} p
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}
a {color:black;} a.name {color:black;} .line
{height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP
Status 401 – Unauthorized</h1><hr class="line" /><p><b>Type</b> Status
Report</p><p><b>Description</b> The request has not been applied because it
lacks valid authentication credentials for the target resource.</p><hr
class="line" /><h3>Apache Tomcat/8.5.16</h3></body></html>

And then subsequent requests work fine. (but the dummy always says the above,
so I guess you can't test that)

Doing the failure the other way around works the same.

If I set the timeout to 500 in the server.xml, it takes 1s instead.

So the timeout works great, but it doesn't seem to fall back to the
alternate/connection URL. So I think the patch is good, but incomplete.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61313] JNDIRealm LDAP server failover to alternateURL takes very long 15m32s

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61313

--- Comment #1 from Peter <pe...@brockmann-consult.de> ---
Created attachment 35147
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35147&action=edit
an example webapp that can be used for testing

If it says "The origin server did not find a current representation for the
target resource or is not willing to disclose that one exists." that's fine...
it just means the user doesn't exist or something, but it got to the LDAP
server. I have verified it is fast with my patch and slow without it.

If it says "The request has not been applied because it lacks valid
authentication credentials for the target resource." it means it can't contact
any LDAP server, which is probably fine as long as it's fast. I verified that
it's fast with my patch and slow without it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61313] JNDIRealm LDAP server failover to alternateURL takes very long 15m32s

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61313

--- Comment #6 from Peter <pe...@brockmann-consult.de> ---
Created attachment 35151
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35151&action=edit
second possibly hacky patch, includes Felix's patch

Here's another possibly hacky patch. This one looks less hacky, but it only
covers the one place where the exception I observed happens. I chose that place
because it is the first call of a protected method in the stack; the others
before are public. And I don't know if it could loop forever, or pointlessly
retry when there is another kind of exception which is also a NamingException
(this one is a read timeout... I don't know what else there is).

With this patch, the problem is fixed in my testing... a failed server that was
already in a previous context will cause only a delay about 0.5s plus the
readTimeout. And both servers down will result in the "HTTP Status 401 –
Unauthorized" error and delay will also be about 0.5s plus the readTimeout.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61313] JNDIRealm LDAP server failover to alternateURL takes very long 15m32s

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61313

--- Comment #8 from Peter <pe...@brockmann-consult.de> ---
Thanks. I tested commit 30e89f9efb571a19d2d3239db7b5e0c17f86b812 (tomcat85
repo) which works like the second possibly hacky patch.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61313] JNDIRealm LDAP server failover to alternateURL takes very long 15m32s

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61313

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #3 from Mark Thomas <ma...@apache.org> ---
Please test Felix's proposed patch and report back.

Note: if it works, the documentation will need to be updated as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61313] JNDIRealm LDAP server failover to alternateURL takes very long 15m32s

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61313

--- Comment #5 from Peter <pe...@brockmann-consult.de> ---
The code just closes the context and returns null, and does not retry, so I
guess that explains why it fails.


line 1326 JNDIRealm.java
        } catch (NamingException e) {

            // Log the problem for posterity
            containerLog.error(sm.getString("jndiRealm.exception"), e);

            // Close the connection so that it gets reopened next time
            if (context != null)
                close(context);

            // Return "not authenticated" for this request
            if (containerLog.isDebugEnabled())
                containerLog.debug("Returning null principal.");
            return null;

        }


SEVERE: org.apache.catalina.realm.JNDIRealm: Exception performing
authentication
javax.naming.NamingException: LDAP response read timed out, timeout
used:500ms.; remaining name 'uid=urban1,ou=People,dc=bc,dc=local'
        at com.sun.jndi.ldap.Connection.readReply(Connection.java:490)
        at com.sun.jndi.ldap.LdapClient.ldapBind(LdapClient.java:365)
        at com.sun.jndi.ldap.LdapClient.authenticate(LdapClient.java:214)
        at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2788)
        at com.sun.jndi.ldap.LdapCtx.ensureOpen(LdapCtx.java:2696)
        at com.sun.jndi.ldap.LdapCtx.ensureOpen(LdapCtx.java:2670)
        at com.sun.jndi.ldap.LdapCtx.doSearch(LdapCtx.java:1941)
        at com.sun.jndi.ldap.LdapCtx.doSearchOnce(LdapCtx.java:1933)
        at com.sun.jndi.ldap.LdapCtx.c_getAttributes(LdapCtx.java:1325)
        at
com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:235)
        at
com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:141)
        at
com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:129)
        at
javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:142)
        at
org.apache.catalina.realm.JNDIRealm.getUserByPattern(JNDIRealm.java:1558)
        at
org.apache.catalina.realm.JNDIRealm.getUserByPattern(JNDIRealm.java:1624)
        at org.apache.catalina.realm.JNDIRealm.getUser(JNDIRealm.java:1499)
        at
org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1377)
        at
org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1295)
        at
org.apache.catalina.authenticator.BasicAuthenticator.doAuthenticate(BasicAuthenticator.java:73)
        at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:560)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
        at
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:624)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
        at
org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:799)
        at
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
        at
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)
        at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1455)
        at
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61313] JNDIRealm LDAP server failover to alternateURL takes very long 15m32s

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61313

Felix Schumacher <fe...@internetallee.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #35146|0                           |1
        is obsolete|                            |

--- Comment #2 from Felix Schumacher <fe...@internetallee.de> ---
Created attachment 35148
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35148&action=edit
Add an option to set a read timeout to ldap connections used by JNDI Realm

Java LDAP API seems to have an option for read timeouts. This should shorten
the timeouts that you see when the firewall looses the connection state and
starts dropping the packets.

I haven't tested the patch yet and I think it would be even better to have an
timeout on connections that aren't used for a long time, too. If the LDAP API
has no such timeout, we could emulate it by keeping a timestamp up to date in
close and open calls.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org