You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by Matthias Loepfe <Ma...@AdNovum.CH> on 1999/07/28 21:51:37 UTC

general/4787: tcp keepalive and ExitAfterIdleTimeout

>Number:         4787
>Category:       general
>Synopsis:       tcp keepalive and ExitAfterIdleTimeout
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    apache
>State:          open
>Class:          change-request
>Submitter-Id:   apache
>Arrival-Date:   Wed Jul 28 14:10:00 PDT 1999
>Last-Modified:
>Originator:     Matthias.Loepfe@AdNovum.CH
>Organization:
apache
>Release:        1.3.6
>Environment:
SunOS mauro 5.6 Generic_105181-11 sun4u sparc SUNW,Ultra-5_10
>Description:
We are using apache in a filrewall environment. More precise between two
firewall (in the DMZ). There we had the problem that the firewall does open
(as the wire get cut) idle tcp connections after certain amount of time (about
60 to 90 minutes).

If there are open tcp connection in a child process which are idle for a long
time, these processes hang forever, because the fw does not 'reset' and the
tcp implementation does not recognise it.

This ca happen in two situations (for each of which I propose a patch):

1. because (at least on Solaris) the SO_KEEPALIVE which is set on the listener
   socket gets NOT inherited by the new accepted socket. That means if you have
   for example an application which does a 'server push' which can be idle for
   a long time, you get hanging child. The same can happen if the module does
   not properly use the *timeout calls.

2. You use some for of 'static' connections to some backend services (e.g DB).
   In this situation the server will hang if it does not use SO_KEEPALIVE and
   the child must wait a very long time (about 8 min) on the next call of the 
   backend service, because it takes that long to recognise and shutdown a 
   broken tcp connection.

To solve the case 1 I set the SO_KEEPALIVE just after the accept() call.
For the seconf case I added a new functionality 'ExitAfterIdleTimeout' which
terminates an idle child after a certain amount of time (less than the fw 
timeout). This solve the case 2 an it provides a mean which lets the number
of idle child slowly decrement from MaxIdle to MinIdle Childs (in a period of
low load).
>How-To-Repeat:
Play around with firewalls and have patient
>Fix:
see the following patches:

*** apache_1.3.6/src/main/http_core.c	Sat Mar 20 00:54:08 1999
--- apache_1.3.6-2.3.9/src/main/http_core.c	Tue Jul 27 22:38:35 1999
***************
*** 2650,2655 ****
--- 2650,2668 ----
  }
  #endif
  
+ #ifndef NO_ADN_ADDON
+ extern int ap_exit_after_idle_timeout; /* http_main.c */
+ static const char *set_idle_timeout (cmd_parms *cmd, void *dummy, char *arg) 
+ {
+     const char *err = ap_check_cmd_context(cmd, GLOBAL_ONLY);
+     if (err != NULL) {
+         return err;
+     }
+     ap_exit_after_idle_timeout = atoi (arg);
+     return NULL;
+ }
+ #endif /* NO_ADN_ADDON */
+ 
  /* Note --- ErrorDocument will now work from .htaccess files.  
   * The AllowOverride of Fileinfo allows webmasters to turn it off
   */
***************
*** 2875,2880 ****
--- 2888,2897 ----
    (void*)XtOffsetOf(core_dir_config, limit_req_body),
    OR_ALL, TAKE1,
    "Limit (in bytes) on maximum size of request message body" },
+ #ifndef NO_ADN_ADDON /* idle server shutdown */
+ { "ExitAfterIdleTimeout", set_idle_timeout, NULL, RSRC_CONF, TAKE1,
+   "Maximum number of seconds a particular child server stays idle before dying. (0 => never)" },
+ #endif /* NO_ADN_ADDON */
  { NULL }
  };
  



*** apache_1.3.6/src/main/http_main.c	Tue Jul 27 22:22:08 1999
--- apache_1.3.6-2.3.9/src/main/http_main.c	Tue Jul 27 22:38:37 1999
***************
*** 253,258 ****
--- 253,263 ----
  API_VAR_EXPORT ap_ctx *ap_global_ctx;
  #endif /* EAPI */
  
+ #ifndef NO_ADN_ADDON /* idle server shutdown */
+ /* configured in http_config.c */
+ int ap_exit_after_idle_timeout = 0;
+ #endif /* NO_ADN_ADDON */
+ 
  /*
   * The max child slot ever assigned, preserved across restarts.  Necessary
   * to deal with MaxClients changes across SIGUSR1 restarts.  We use this
***************
*** 3714,3719 ****
--- 3719,3731 ----
  
  	(void) ap_update_child_status(my_child_num, SERVER_READY, (request_rec *) NULL);
  
+ #ifndef NO_ADN_ADDON /* idle server shutdown */
+ 	if (ap_exit_after_idle_timeout) {
+ 		signal(SIGALRM, (void (*)())just_die);
+ 		alarm(ap_exit_after_idle_timeout);
+ 	}
+ #endif /* NO_ADN_ADDON */
+ 
  	/*
  	 * Wait for an acceptable connection to arrive.
  	 */
***************
*** 3760,3766 ****
--- 3772,3796 ----
  		clen = sizeof(sa_client);
  		csd = accept(sd, &sa_client, &clen);
  		if (csd >= 0 || errno != EINTR)
+ #ifndef NO_ADN_ADDON /* accepted socket keepalive */
+  		/** code changes by AdNovum (sgw) 4/28/98 **/
+ 		/** It seems like the KEEPALIVE on the listener socket **/
+ 		/** is not properly propagated over an accept(). **/
+ 		/** In order to fix that problem we set it again here. **/
+ 		{
+ 				int one = 1;
+                 if (setsockopt(csd, SOL_SOCKET,SO_KEEPALIVE,
+                 			   (char *)&one,sizeof(int)) < 0) {
+         			ap_log_unixerr("setsockopt", "(SO_KEEPALIVE)", NULL,
+         						server_conf);
+         			exit(1);
+     			}
+ 				break;
+ 				/** end of AdNovum fix for KEEPALICE **/
+ 		}
+ #else /* NO_ADN_ADDON */
  		    break;
+ #endif /* NO_ADN_ADDON */
  		if (deferred_die) {
  		    /* we didn't get a socket, and we were told to die */
  		    clean_child_exit(0);
***************
*** 3847,3852 ****
--- 3877,3887 ----
  
  	SAFE_ACCEPT(accept_mutex_off());	/* unlock after "accept" */
  
+ #ifndef NO_ADN_ADDON /* idle server shutdown */
+ 	/* reset any alarm signals */
+ 	alarm(0);
+ #endif /* NO_ADN_ADDON */
+ 
  	/* We've got a socket, let's at least process one request off the
  	 * socket before we accept a graceful restart request.
  	 */
***************
*** 4009,4014 ****
--- 4044,4055 ----
      }
  
      if (one_process) {
+ 
+ #ifndef NO_ADN_ADDON /* debug support */
+     /* added by AdNovum (tl) to prevent timeouts on open tcp sessions **/
+     s->keep_alive = 0;
+ #endif /* NO_ADN_ADDON */
+ 
  	signal(SIGHUP, just_die);
  	signal(SIGINT, just_die);
  	signal(SIGQUIT, SIG_DFL);
>Audit-Trail:
>Unformatted:
[In order for any reply to be added to the PR database, you need]
[to include <ap...@Apache.Org> in the Cc line and make sure the]
[subject line starts with the report component and number, with ]
[or without any 'Re:' prefixes (such as "general/1098:" or      ]
["Re: general/1098:").  If the subject doesn't match this       ]
[pattern, your message will be misfiled and ignored.  The       ]
["apbugs" address is not added to the Cc line of messages from  ]
[the database automatically because of the potential for mail   ]
[loops.  If you do not include this Cc, your reply may be ig-   ]
[nored unless you are responding to an explicit request from a  ]
[developer.  Reply only with text; DO NOT SEND ATTACHMENTS!     ]