You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by rb...@covalent.net on 2001/06/04 18:20:46 UTC

New graceful restart mechanism.

This patch implements the pipe-of-death mechanism that was discussed last
week.  This only modifies the prefork MPM.  If this patch is accepted,
then the other MPMs are trivial to modify.  The one thing I didn't do,
which could easily be done now, is to remove the process_status field from
the scoreboard.  That field is not used once this patch is applied, and
can safely be removed.

The basic premise, is that signals are just not very good when it comes to
gracefully shutting down the server.  They don't work well, either for
threaded or non-threaded MPMs.  So, we have implemented a series of
pipe-of-death functions.  The pod is no longer selected on, allowing for
SINGLE_LISTEN_UNSERIALIZED_ACCEPT again.  Instead, the first thing a
thread does after accepting a connection, is to check the pod.  If there
is data there, the process dies (after serving the current request), if
there is no data, it doesn't die.  This could be improved a bit, by using
OOB data to wake up the thread, and we would only check the pod if we
received OOB data.  This is an optimization however, and I am not sure how
that would work in real-life.

I have tested this, but of course it could use far more people banging on
it before I commit for good.  Assuming nobody complains, I will commit it
tomorrow or the next day.

Ryan

? build.log
? build.err
Index: include/mpm_common.h
===================================================================
RCS file: /home/cvs/httpd-2.0/include/mpm_common.h,v
retrieving revision 1.20
diff -u -d -b -w -u -r1.20 mpm_common.h
--- include/mpm_common.h	2001/03/22 12:32:21	1.20
+++ include/mpm_common.h	2001/06/04 15:59:24
@@ -164,6 +164,14 @@

 #define AP_MPM_HARD_LIMITS_FILE APACHE_MPM_DIR "/mpm_default.h"

+#ifdef AP_MPM_USES_POD
+AP_DECLARE(apr_file_t *) ap_mpm_pod_open(apr_pool_t *p);
+AP_DECLARE(apr_status_t) ap_mpm_pod_check(void);
+AP_DECLARE(void) ap_mpm_pod_close(void);
+AP_DECLARE(void) ap_mpm_pod_signal(apr_pool_t *p);
+AP_DECLARE(void) ap_mpm_pod_killpg(apr_pool_t *p, int num);
+#endif
+
 #ifdef __cplusplus
 }
 #endif
Index: server/mpm_common.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/mpm_common.c,v
retrieving revision 1.47
diff -u -d -b -w -u -r1.47 mpm_common.c
--- server/mpm_common.c	2001/05/16 20:51:28	1.47
+++ server/mpm_common.c	2001/06/04 16:00:37
@@ -338,4 +338,61 @@
 }
 #endif /* def NEED_INITGROUPS */

+#ifdef AP_MPM_USES_POD
+static apr_file_t *pod_in = NULL, *pod_out = NULL;
+
+AP_DECLARE(apr_status_t) ap_mpm_pod_open(apr_pool_t *p)
+{
+    apr_file_t *pod;

+    pod = apr_file_pipe_create(&pod_in, &pod_out, p);
+    apr_file_pipe_timeout_set(out, 0);
+    return APR_SUCCESS;
+}
+
+AP_DECLARE(apr_status_t) ap_mpm_pod_check(void)
+{
+    char c;
+    apr_size_t len = 1;
+    apr_status_t rv;
+
+    rv = apr_file_read(pod_in, &c, &len);
+
+    if ((rv == APR_SUCCESS) && (len == 1)) {
+        return APR_SUCCESS;
+    }
+    return 1;
+}
+
+AP_DECLARE(void) ap_mpm_pod_close(void)
+{
+    apr_file_close(pod_out);
+    apr_file_close(pod_in);
+}
+
+AP_DECLARE(void) ap_mpm_pod_signal(apr_pool_t *p)
+{
+    apr_socket_t *sock = NULL;
+    apr_sockaddr_t *sa = NULL;
+
+    if ((rv = apr_file_write(pipe_of_death_out, &char_of_death, &one))
+                                 != APR_SUCCESS) {
+        if (APR_STATUS_IS_EINTR(rv)) continue;
+            ap_log_error(APLOG_MARK, APLOG_WARNING, rv, ap_server_conf,
+                         "write pipe_of_death");
+        }
+    }
+
+    apr_sockaddr_info_get(&sa, "127.0.0.1", APR_UNSPEC, ap_listeners->sd->port, 0, p);
+    apr_socket_create(&sock, sa->family, SOCK_STREAM, p);
+    apr_connect(sock, sa);
+}
+
+AP_DECLARE(void) ap_mpm_pod_killpg(apr_pool_t *p, int num)
+{
+    int i;
+    for (i = 0; i < num; i++) {
+        ap_mpm_pod_signal(p);
+    }
+}
+#endif
Index: server/mpm/prefork/mpm.h
===================================================================
RCS file: /home/cvs/httpd-2.0/server/mpm/prefork/mpm.h,v
retrieving revision 1.11
diff -u -d -b -w -u -r1.11 mpm.h
--- server/mpm/prefork/mpm.h	2001/04/13 19:00:38	1.11
+++ server/mpm/prefork/mpm.h	2001/06/04 16:00:49
@@ -68,6 +68,7 @@

 #define MPM_NAME "Prefork"

+#define AP_MPM_USES_POD 1
 #define AP_MPM_NEEDS_RECLAIM_CHILD_PROCESSES 1
 #define MPM_SYNC_CHILD_TABLE() (ap_sync_scoreboard_image())
 #define MPM_CHILD_PID(i) (ap_scoreboard_image->parent[i].pid)
Index: server/mpm/prefork/prefork.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/mpm/prefork/prefork.c,v
retrieving revision 1.175
diff -u -d -b -w -u -r1.175 prefork.c
--- server/mpm/prefork/prefork.c	2001/05/15 02:38:15	1.175
+++ server/mpm/prefork/prefork.c	2001/06/04 16:00:49
@@ -182,6 +182,8 @@
 char tpf_server_name[INETD_SERVNAME_LENGTH+1];
 #endif /* TPF */

+int die_now = 0;
+
 #ifdef GPROF
 /*
  * change directory for gprof to plop the gmon.out file
@@ -232,7 +234,7 @@
     if (pchild) {
 	apr_pool_destroy(pchild);
     }
-    ap_scoreboard_image->parent[my_child_num].process_status = SB_WORKING;
+    ap_mpm_pod_close();
     chdir_for_gprof();
     exit(code);
 }
@@ -378,15 +380,7 @@
 static void please_die_gracefully(int sig)
 {
     /* clean_child_exit(0); */
-    ap_scoreboard_image->parent[my_child_num].process_status = SB_IDLE_DIE;
-    if (sig == SIGHUP) {
-        (void) ap_update_child_status(AP_CHILD_THREAD_FROM_ID(my_child_num),
-                                      SERVER_GRACEFUL, (request_rec *) NULL);
-    }
-    else {
-        (void) ap_update_child_status(AP_CHILD_THREAD_FROM_ID(my_child_num),
-                                      SERVER_IDLE_KILL, (request_rec *) NULL);
-    }
+    die_now = 1;
 }

 /* volatile just in case */
@@ -414,9 +408,11 @@
 	return;
     }
     restart_pending = 1;
+#if 0
     if ((is_graceful = (sig == SIGWINCH))) {
         apr_pool_cleanup_kill(pconf, NULL, ap_cleanup_scoreboard);
     }
+#endif
 }

 static void set_signals(void)
@@ -479,12 +475,9 @@

     /* we want to ignore HUPs and WINCH while we're busy processing one */
     sigaddset(&sa.sa_mask, SIGHUP);
-    sigaddset(&sa.sa_mask, SIGWINCH);
     sa.sa_handler = restart;
     if (sigaction(SIGHUP, &sa, NULL) < 0)
 	ap_log_error(APLOG_MARK, APLOG_WARNING, errno, ap_server_conf, "sigaction(SIGHUP)");
-    if (sigaction(SIGWINCH, &sa, NULL) < 0)
-	ap_log_error(APLOG_MARK, APLOG_WARNING, errno, ap_server_conf, "sigaction(SIGWINCH)");
 #else
     if (!one_process) {
 	apr_signal(SIGSEGV, sig_coredump);
@@ -513,7 +506,7 @@
     apr_signal(SIGHUP, restart);
 #endif /* SIGHUP */
 #ifdef SIGWINCH
-    apr_signal(SIGWINCH, restart);
+    apr_signal(SIGWINCH, SIG_IGN);
 #endif /* SIGWINCH */
 #ifdef SIGPIPE
     apr_signal(SIGPIPE, SIG_IGN);
@@ -533,9 +526,6 @@
 static int requests_this_child;
 static fd_set main_fds;

-#define I_AM_TO_SHUTDOWN()                                                   \
-(ap_scoreboard_image->parent[my_child_num].process_status != SB_WORKING)
-
 int ap_graceful_stop_signalled(void)
 {
     /* not ever called anymore... */
@@ -581,12 +571,11 @@
     apr_signal(SIGHUP, please_die_gracefully);

     ap_sync_scoreboard_image();
-    while (!I_AM_TO_SHUTDOWN()) {
+    while (!die_now) {

 	/* Prepare to receive a SIGWINCH due to graceful restart so that
 	 * we can exit cleanly.
 	 */
-	apr_signal(SIGWINCH, please_die_gracefully);
         apr_signal(SIGTERM, just_die);

 	/*
@@ -667,16 +656,22 @@
 	    /* if we accept() something we don't want to die, so we have to
 	     * defer the exit
 	     */
-            apr_signal(SIGTERM, please_die_gracefully);
 	    for (;;) {
                 ap_sync_scoreboard_image();
-		if (I_AM_TO_SHUTDOWN()) {
+		if (die_now) {
 		    /* we didn't get a socket, and we were told to die */
 		    clean_child_exit(0);
 		}
 		stat = apr_accept(&csd, sd, ptrans);
 		if (stat == APR_SUCCESS || !APR_STATUS_IS_EINTR(stat))
 		    break;
+                /* In reality, this could be done later, but to keep it
+                 * consistent with MPMs that have a thread race-condition,
+                 * we will do it here.
+                 */
+                if (ap_mpm_pod_check()) {
+                    die_now = 1;
+                }
 	    }

 	    if (stat == APR_SUCCESS)
@@ -778,19 +773,13 @@
 	    }

             ap_sync_scoreboard_image();
-	    if (I_AM_TO_SHUTDOWN()) {
+	    if (die_now) {
 		clean_child_exit(0);
 	    }
 	}

 	SAFE_ACCEPT(accept_mutex_off());	/* unlock after "accept" */

-	/* We've got a socket, let's at least process one request off the
-	 * socket before we accept a graceful restart request.  We set
-	 * the signal to ignore because we don't want to disturb any
-	 * third party code.
-	 */
-	apr_signal(SIGWINCH, SIG_IGN);
 	/*
 	 * We now have a connection, so set it up with the appropriate
 	 * socket options, file descriptors, and read/write buffers.
@@ -895,7 +884,6 @@
 	 * requested there's no race condition here.
 	 */
 	apr_signal(SIGHUP, please_die_gracefully);
-	apr_signal(SIGWINCH, please_die_gracefully);
 	apr_signal(SIGTERM, just_die);
         ap_scoreboard_image->parent[slot].process_status = SB_WORKING;
 	child_main(slot);
@@ -941,7 +929,7 @@
 #endif
 static int hold_off_on_exponential_spawning;

-static void perform_idle_server_maintenance(void)
+static void perform_idle_server_maintenance(apr_pool_t *p)
 {
     int i;
     int to_kill;
@@ -1003,7 +991,7 @@
 	 * shut down gracefully, in case it happened to pick up a request
 	 * while we were counting
 	 */
-	kill(ap_scoreboard_image->parent[to_kill].pid, SIGWINCH);
+	ap_mpm_pod_signal(p);
 	idle_spawn_rate = 1;
     }
     else if (idle_count < ap_daemons_min_free) {
@@ -1100,6 +1088,7 @@
 	/* XXX: hey, what's the right way for the mpm to indicate a fatal error? */
 	return 1;
     }
+    ap_mpm_pod_open(pconf);

     SAFE_ACCEPT(accept_mutex_init(pconf));
     if (!is_graceful) {
@@ -1210,7 +1199,7 @@
 	    continue;
 	}

-	perform_idle_server_maintenance();
+	perform_idle_server_maintenance(pconf);
 #ifdef TPF
     shutdown_pending = os_check_server(tpf_server_name);
     ap_check_signals();
@@ -1245,7 +1234,6 @@

     /* we've been told to restart */
     apr_signal(SIGHUP, SIG_IGN);
-    apr_signal(SIGWINCH, SIG_IGN);
     if (one_process) {
 	/* not worth thinking about */
 	return 1;
@@ -1265,12 +1253,11 @@

     if (is_graceful) {
 	ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_NOTICE, 0, ap_server_conf,
-		    "SIGWINCH received.  Doing graceful restart");
+		    "Graceful restart requested, doing restart");

 	/* kill off the idle ones */
-	if (unixd_killpg(getpgrp(), SIGWINCH) < 0) {
-	    ap_log_error(APLOG_MARK, APLOG_WARNING, errno, ap_server_conf, "killpg SIGWINCH");
-	}
+        ap_mpm_pod_killpg(pconf, ap_daemons_limit);
+
 #ifndef SCOREBOARD_FILE
 	/* This is mostly for debugging... so that we know what is still
 	    * gracefully dealing with existing request.  But we can't really

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: New graceful restart mechanism.

Posted by rb...@covalent.net.

Actually, I was about to commit everything earlier today, and I got
side-tracked.  I'll commit this afternoon.

Ryan

On 6 Jun 2001, Jeff Trawick wrote:

> <rb...@covalent.net> writes:
>
> > The latest version seems to have a small leak that I need to track down.
> > I am unlikely to do it right away, but a new patch will come sometime this
> > week.
>
> what the heck... commit the pod stuff (unused by anything) and post
> your latest prefork diff and maybe somebod else can help
>
> --
> Jeff Trawick | trawickj@bellsouth.net | PGP public key at web site:
>        http://www.geocities.com/SiliconValley/Park/9289/
>              Born in Roswell... married an alien...
>
>


_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: New graceful restart mechanism.

Posted by Jeff Trawick <tr...@bellsouth.net>.

<rb...@covalent.net> writes:

> The latest version seems to have a small leak that I need to track down.
> I am unlikely to do it right away, but a new patch will come sometime this
> week.

what the heck... commit the pod stuff (unused by anything) and post
your latest prefork diff and maybe somebod else can help

-- 
Jeff Trawick | trawickj@bellsouth.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: New graceful restart mechanism.

Posted by rb...@covalent.net.

On Tue, 5 Jun 2001, Greg Stein wrote:

> On Tue, Jun 05, 2001 at 08:06:03AM -0700, rbb@covalent.net wrote:
> >...
> > Change made, although I still disagree with it.  The problem is that
> > instead of having the pod global "local" to the pod code, now the pod
> > global is sitting in the MPM.
>
> Yes, but now the MPM can choose what to do with it. I assume that the
> complexity you were seeing was due to trying to store and pass around the
> pod_ctx. We can choose to do it that way, or choose to use a global. Using> globals within the pod code removes the MPM's choices.
>
> Thanks for making the change, though. I was going to volunteer to make it
> after you checked in your patch.
>
> >...
> > > But it *is* weird. It isn't a match of "catching" ... any compile should
> > > outright barf on an unknown variable. Maybe you were accidentally using a
> > > different MPM, thus not compilig prefork with the "uses POD" macro?
> >
> > I thought of the MPM thing, but that wasn't it.  I don't know what it was,
> > but my compile is catching the problems now.  It looks like I needed to do
> > a distclean before trying to build for some reason.   :-(
>
> Whacky. But no matter... we've got reviewers and your compiler is working
> again :-)
>
> Thanks for the patch... I hope it will increase the MPM's reliability.

The latest version seems to have a small leak that I need to track down.
I am unlikely to do it right away, but a new patch will come sometime this
week.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: New graceful restart mechanism.

Posted by Greg Stein <gs...@lyra.org>.

On Tue, Jun 05, 2001 at 08:06:03AM -0700, rbb@covalent.net wrote:
>...
> Change made, although I still disagree with it.  The problem is that
> instead of having the pod global "local" to the pod code, now the pod
> global is sitting in the MPM.

Yes, but now the MPM can choose what to do with it. I assume that the
complexity you were seeing was due to trying to store and pass around the
pod_ctx. We can choose to do it that way, or choose to use a global. Using
globals within the pod code removes the MPM's choices.

Thanks for making the change, though. I was going to volunteer to make it
after you checked in your patch.

>...
> > But it *is* weird. It isn't a match of "catching" ... any compile should
> > outright barf on an unknown variable. Maybe you were accidentally using a
> > different MPM, thus not compilig prefork with the "uses POD" macro?
> 
> I thought of the MPM thing, but that wasn't it.  I don't know what it was,
> but my compile is catching the problems now.  It looks like I needed to do
> a distclean before trying to build for some reason.   :-(

Whacky. But no matter... we've got reviewers and your compiler is working
again :-)

Thanks for the patch... I hope it will increase the MPM's reliability.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: New graceful restart mechanism.

Posted by rb...@covalent.net.

> Huh? How hard is it for the MPM to do:
>
>   static ap_mpm_pod_ctx *pod;
>
> ...
>
>   rv = ap_mpm_pod_open(&pod, p);
>
> ...
>
>   rv = ap_mpm_pod_check(pod);
>
>
> That doesn't look complicated.
>
> Bundling up the thing into a structure, rather than Yet More Globals is
> going to be much more helpful for using the POD in different ways.
>
> Globals are inflexible.

Change made, although I still disagree with it.  The problem is that
instead of having the pod global "local" to the pod code, now the pod
global is sitting in the MPM.

> >...
> > > "pipe_of_death_out" "char_of_death" "one"  What are these variables?
> > >
> > > Are you really sure you tried this? :-)
> >
> > I'm telling you, this all worked, I haven't got a clue why the damn build
> > isn't catching these, especially since I did a make clean;make.  This is
> > however beginning to bother me.
>
> Note the smiley :-)
>
> But it *is* weird. It isn't a match of "catching" ... any compile should
> outright barf on an unknown variable. Maybe you were accidentally using a
> different MPM, thus not compilig prefork with the "uses POD" macro?

I thought of the MPM thing, but that wasn't it.  I don't know what it was,
but my compile is catching the problems now.  It looks like I needed to do
a distclean before trying to build for some reason.   :-(

Ryan
_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: New graceful restart mechanism.

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Jun 04, 2001 at 03:11:55PM -0700, rbb@covalent.net wrote:
>...
> > Please use a context structure rather than globals. That will allow us to
> > create a POD on a per-thread basis. Then you'll have your desired
> > functionality of a parent telling a child, "hey *you*. please stop."
>...
> I disagree.  By using a simple global variable, we make it much easier to
> use this in an MPM.  I had originally done exactly what you suggested, and
> I found that the code that needed to be added to the MPM was more complex
> than I wanted it to be.

Huh? How hard is it for the MPM to do:

  static ap_mpm_pod_ctx *pod;

...

  rv = ap_mpm_pod_open(&pod, p);

...

  rv = ap_mpm_pod_check(pod);

That doesn't look complicated.

Bundling up the thing into a structure, rather than Yet More Globals is
going to be much more helpful for using the POD in different ways.

Globals are inflexible.

>...
> > "pipe_of_death_out" "char_of_death" "one"  What are these variables?
> >
> > Are you really sure you tried this? :-)
> 
> I'm telling you, this all worked, I haven't got a clue why the damn build
> isn't catching these, especially since I did a make clean;make.  This is
> however beginning to bother me.

Note the smiley :-)

But it *is* weird. It isn't a match of "catching" ... any compile should
outright barf on an unknown variable. Maybe you were accidentally using a
different MPM, thus not compilig prefork with the "uses POD" macro?

> > >...
> > > +    apr_sockaddr_info_get(&sa, "127.0.0.1", APR_UNSPEC, ap_listeners->sd->port, 0, p);
> >
> > Urk. This magic of looking into ap_listeners doesn't feel right. I don't
> > have an immediate solution, but there must be some kind of abstraction for
> > fetching the listening port.
> 
> Why?  We have the listening port in the ap_listeners list, so we should
> use it.

It feels like it is breaking encapsulation. The POD stuff is "reaching out"
of its box and inspecting arbitrary variables. It is ties between code like
this which makes software brittle.

Like I said: I don't have a recommended solution handy. Simply raising a
yellow flag that some lines are being crossed.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: New graceful restart mechanism.

Posted by rb...@covalent.net.

> > +#ifdef AP_MPM_USES_POD
> > +static apr_file_t *pod_in = NULL, *pod_out = NULL;
>
> Please use a context structure rather than globals. That will allow us to
> create a POD on a per-thread basis. Then you'll have your desired
> functionality of a parent telling a child, "hey *you*. please stop."
>
> An MPM can also choose to use just one POD for all of its children, and
> signal it N times (like you did in this patch).
>
> You could also use one POD for graceful restart and one for shutdown (let's
> say that an MPM is built to reset its state, rather than expect to die/spawn
> a thread).

I disagree.  By using a simple global variable, we make it much easier to
use this in an MPM.  I had originally done exactly what you suggested, and
I found that the code that needed to be added to the MPM was more complex
than I wanted it to be.  This change makes the pod completely opaque to
MPMs.  If they want to use it some other way, then they will need to
re-write the pod stuff anyway, and they can change the prototype, or just
keep it internal to the MPM.

> > +AP_DECLARE(apr_status_t) ap_mpm_pod_open(apr_pool_t *p)
>
> AP_DECLARE(apr_status_t) ap_mpm_pod_open(ap_pod_ctx **ctx, apr_pool_t *p)
>
> Take a pool. Return an ap_pod_ctx *.
>
> Note that the ap_pod_ctx can also keep a pool so the other functions won't> need to accept one.

As I said above, I disagree.

> > +{
> > +    apr_file_t *pod;
> >
> > +    pod = apr_file_pipe_create(&pod_in, &pod_out, p);
> > +    apr_file_pipe_timeout_set(out, 0);
>
> "out" ... never heard of that variable. You sure you tested this? :-)

Compiled and tested.  This function as it happens had all sorts of
problems that the compiler missed.  I have fixed them all now.

> > +{
> > +    char c;
> > +    apr_size_t len = 1;
> > +    apr_status_t rv;
> > +
> > +    rv = apr_file_read(pod_in, &c, &len);
> > +
> > +    if ((rv == APR_SUCCESS) && (len == 1)) {
> > +        return APR_SUCCESS;
> > +    }
> > +    return 1;
>
> "1" is not a legal apr_status_t value. I would recommend one of two things
> for this function:
>
> 1) handle all errors internally (logging and whatnot) and return a boolean
>    on whether the caller should continue/die.

It is not easy to log in this function, because I don't really have enough
information.

> 2) return APR_SUCCESS, APR_ESHOULDDIE (or whatever), or an error and make
>    all callers deal with processing errors.
>
> I prefer option (1). A caller isn't going to know what to do with an error
> besides log it, so why return it to them?
>
> [ since the internals of ap_mpm_pod_check() are opaque, then when APR_EINTR
>   is returned, how can the MPM know what to do? ]

I have already fixed this, by creating a new apr_status_t for Apache to
use for this case.

> >...
> > +AP_DECLARE(void) ap_mpm_pod_signal(apr_pool_t *p)
> > +{
> > +    apr_socket_t *sock = NULL;
> > +    apr_sockaddr_t *sa = NULL;
> > +
> > +    if ((rv = apr_file_write(pipe_of_death_out, &char_of_death, &one))
>
> "pipe_of_death_out" "char_of_death" "one"  What are these variables?
>
> Are you really sure you tried this? :-)

I'm telling you, this all worked, I haven't got a clue why the damn build
isn't catching these, especially since I did a make clean;make.  This is
however beginning to bother me.

> >...
> > +    apr_sockaddr_info_get(&sa, "127.0.0.1", APR_UNSPEC, ap_listeners->sd->port, 0, p);
>
> Urk. This magic of looking into ap_listeners doesn't feel right. I don't
> have an immediate solution, but there must be some kind of abstraction for
> fetching the listening port.

Why?  We have the listening port in the ap_listeners list, so we should
use it.

> > +    apr_socket_create(&sock, sa->family, SOCK_STREAM, p);
> > +    apr_connect(sock, sa);
> > +}
>
> Besides the leak in file descriptors (as Jeff mentioned), we should also be
> careful of the pool. Is this pool going to disappear soon? Or if we're doing
> a restart, will it continue to be used?

This is pconf.  Therefore the pool will live until we restart, and then it
is cleared.

> Note that you also need to close the socket so the MPM will see that a
> request does not exist, thus allowing it to process the POD-signal right
> away. Otherwise, it will block on a read, waiting for input.

Already done.  Thanks.

Ryan
_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: New graceful restart mechanism.

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Jun 04, 2001 at 09:20:46AM -0700, rbb@covalent.net wrote:
>...
> --- server/mpm_common.c	2001/05/16 20:51:28	1.47
> +++ server/mpm_common.c	2001/06/04 16:00:37
> @@ -338,4 +338,61 @@
>  }
>  #endif /* def NEED_INITGROUPS */
> 
> +#ifdef AP_MPM_USES_POD
> +static apr_file_t *pod_in = NULL, *pod_out = NULL;

Please use a context structure rather than globals. That will allow us to
create a POD on a per-thread basis. Then you'll have your desired
functionality of a parent telling a child, "hey *you*. please stop."

An MPM can also choose to use just one POD for all of its children, and
signal it N times (like you did in this patch).

You could also use one POD for graceful restart and one for shutdown (let's
say that an MPM is built to reset its state, rather than expect to die/spawn
a thread).

> +AP_DECLARE(apr_status_t) ap_mpm_pod_open(apr_pool_t *p)

AP_DECLARE(apr_status_t) ap_mpm_pod_open(ap_pod_ctx **ctx, apr_pool_t *p)

Take a pool. Return an ap_pod_ctx *.

Note that the ap_pod_ctx can also keep a pool so the other functions won't
need to accept one.

> +{
> +    apr_file_t *pod;
> 
> +    pod = apr_file_pipe_create(&pod_in, &pod_out, p);
> +    apr_file_pipe_timeout_set(out, 0);

"out" ... never heard of that variable. You sure you tested this? :-)

>...
> +AP_DECLARE(apr_status_t) ap_mpm_pod_check(void)

Take a context.  (same for other functions)

> +{
> +    char c;
> +    apr_size_t len = 1;
> +    apr_status_t rv;
> +
> +    rv = apr_file_read(pod_in, &c, &len);
> +
> +    if ((rv == APR_SUCCESS) && (len == 1)) {
> +        return APR_SUCCESS;
> +    }
> +    return 1;

"1" is not a legal apr_status_t value. I would recommend one of two things
for this function:

1) handle all errors internally (logging and whatnot) and return a boolean
   on whether the caller should continue/die.

2) return APR_SUCCESS, APR_ESHOULDDIE (or whatever), or an error and make
   all callers deal with processing errors.

I prefer option (1). A caller isn't going to know what to do with an error
besides log it, so why return it to them?

[ since the internals of ap_mpm_pod_check() are opaque, then when APR_EINTR
  is returned, how can the MPM know what to do? ]

>...
> +AP_DECLARE(void) ap_mpm_pod_signal(apr_pool_t *p)
> +{
> +    apr_socket_t *sock = NULL;
> +    apr_sockaddr_t *sa = NULL;
> +
> +    if ((rv = apr_file_write(pipe_of_death_out, &char_of_death, &one))

"pipe_of_death_out" "char_of_death" "one"  What are these variables?

Are you really sure you tried this? :-)

>...
> +    apr_sockaddr_info_get(&sa, "127.0.0.1", APR_UNSPEC, ap_listeners->sd->port, 0, p);

Urk. This magic of looking into ap_listeners doesn't feel right. I don't
have an immediate solution, but there must be some kind of abstraction for
fetching the listening port.

> +    apr_socket_create(&sock, sa->family, SOCK_STREAM, p);
> +    apr_connect(sock, sa);
> +}

Besides the leak in file descriptors (as Jeff mentioned), we should also be
careful of the pool. Is this pool going to disappear soon? Or if we're doing
a restart, will it continue to be used?

Note that you also need to close the socket so the MPM will see that a
request does not exist, thus allowing it to process the POD-signal right
away. Otherwise, it will block on a read, waiting for input.

The MPM changes seem fine, but I didn't review as closely.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: New graceful restart mechanism.

Posted by rb...@covalent.net.

On Mon, 4 Jun 2001, Harrie Hazewinkel wrote:

> rbb@covalent.net wrote:
> >
> > This patch implements the pipe-of-death mechanism that was discussed last
> > week.  This only modifies the prefork MPM.  If this patch is accepted,
> > then the other MPMs are trivial to modify.  The one thing I didn't do,
> > which could easily be done now, is to remove the process_status field from
> > the scoreboard.  That field is not used once this patch is applied, and
> > can safely be removed.
>
> I have some concerns whether we should not keep the 'status' field.
> The reason I have is that management modules, like SNMP, can use
> this to determine the state of the processes/threads the Apache
> server has.
> Or is there another easy way on how to determine whether servers/workers
> are busy??

Harrie, this is not the status field.  The field I am talking about
getting rid of is completely different than the field that informs a
management module what any given thread/process is actually doing at this
moment in time.  That field is the status field, and is located in the
worker_score.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: New graceful restart mechanism.

Posted by Harrie Hazewinkel <ha...@covalent.net>.

rbb@covalent.net wrote:
> 
> This patch implements the pipe-of-death mechanism that was discussed last
> week.  This only modifies the prefork MPM.  If this patch is accepted,
> then the other MPMs are trivial to modify.  The one thing I didn't do,
> which could easily be done now, is to remove the process_status field from
> the scoreboard.  That field is not used once this patch is applied, and
> can safely be removed.

I have some concerns whether we should not keep the 'status' field.
The reason I have is that management modules, like SNMP, can use
this to determine the state of the processes/threads the Apache
server has.
Or is there another easy way on how to determine whether servers/workers
are busy??

Harrie
-- 
address: Covalent Technologies, 645 Howard St, San Francisco, CA - 94105
phone: +1-415-536-5221                               fax:+1-415-536-5210
personal website: http://www.lisanza.net/

Re: New graceful restart mechanism.

Posted by rb...@covalent.net.

> > > >                                    This could be improved a bit, by using
> > > > OOB data to wake up the thread, and we would only check the pod if we
> > > > received OOB data.  This is an optimization however, and I am not sure how
> > > > that would work in real-life.
> > >
> > > Err, I'm not sure either :)  On which connection would you send OOB
> > > data?  How would we detect that there was OOB data?  If select(),  I
> > > thought we were removing the need for select() (think
> > > "SINGLE_LISTEN_UNSERIALIZED_ACCEPT")...   If SIGURG, I thought we were
> > > avoiding signal-based processing (not sure of SIGURG portability
> > > anyway)...  I guess I'm missing something real basic here...
> >
> > I was thinking of doing the connect, and actually sending OOB data over
> > the wire.  I seem to recall some way to determine when OOB data was read
> > from the network, but I can't find it right now.
>
> You can have SIGURG sent when OOB data is received.  You can do
> select()/poll() to see if OOB data has been received.  You can call
> recv() or recvmsg() or recvfrom() asking for MSG_OOB.  I dunno what
> else there is.

I was really hoping I saw something during the accept phase, but looking
through it, you are correct, what I was remembering was select/poll.  :-(

> > > a) weird firewall-like rules on some possibly-weird systems may keep
> > >    the connect() from working; admin needs to be aware
> >
> > We are connecting from localhost to localhost.  While weird firewall rules
> > may stop this, remember that we are connecting on a port that should be
> > accessible, because it is meant to accept connections.  :-)
>
> The rules of firewall-like constructs often distinguish between
> accepting connections vs. requesting new connections.  They can also
> be user id or group-specific ("nobody").
>
> This is just something to be aware of when people have shutdown
> problems.  The traces after connect() fails will be helpful.

Good point.  Something to document and just be aware of.

> > > b) TCP/IP may be dead/dying and socket()+connect() won't work
> > >
> > >    I'm not sure how to kill apache on OS/390 when the stack goes away.
> > >
> > >    I was told previously by you and possibly others that I couldn't
> > >    teach the child process to tell the parent to get the @#$% out
> > >    because the stack is dead.  If the stack is dead, the user can tell
> > >    apache to go away but nothing will happen and the child processes
> > >    will (probably) continue looping and getting weird errors on
> > >    select() or accept().
> > >
> > >    This needs to be solved.  For me, simply recognizing the magic
> > >    error feedback and using the magic error status (CHILD_EXITFATAL?)
> > >    is sufficient.
> >
> > Greg Ames stated quite emphatically, that if the stack dies on a 390
> > machine, the child process will wake up from select.  Once that happens,
> > the process will die, if for no other reason than that Apache dies when
> > select returns an error other than EINTR.
>
> We're in select()/poll() on OS/390 only if there is more than one
> listening socket. In general I would expect to be in accept().

Looking at the accept code, assuming accept returns ENETDOWN, we will die
correctly in this case.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: New graceful restart mechanism.

Posted by Jeff Trawick <tr...@bellsouth.net>.

<rb...@covalent.net> writes:

> > >                                    This could be improved a bit, by using
> > > OOB data to wake up the thread, and we would only check the pod if we
> > > received OOB data.  This is an optimization however, and I am not sure how
> > > that would work in real-life.
> >
> > Err, I'm not sure either :)  On which connection would you send OOB
> > data?  How would we detect that there was OOB data?  If select(),  I
> > thought we were removing the need for select() (think
> > "SINGLE_LISTEN_UNSERIALIZED_ACCEPT")...   If SIGURG, I thought we were
> > avoiding signal-based processing (not sure of SIGURG portability
> > anyway)...  I guess I'm missing something real basic here...
> 
> I was thinking of doing the connect, and actually sending OOB data over
> the wire.  I seem to recall some way to determine when OOB data was read
> from the network, but I can't find it right now.

You can have SIGURG sent when OOB data is received.  You can do
select()/poll() to see if OOB data has been received.  You can call
recv() or recvmsg() or recvfrom() asking for MSG_OOB.  I dunno what
else there is.

> > ---------/----------
> >
> > This feels like a reasonable the way to go but when this has been
> > brought up before I've had some general worries about it that perhaps
> > folks can dispel.  (Maybe I'm just mad because when I got the great
> > idea to use connect() it wouldn't do what I wanted it to do :) )
> >
> > I remain at least a little concerned about using connect() because
> >
> > a) weird firewall-like rules on some possibly-weird systems may keep
> >    the connect() from working; admin needs to be aware
> 
> We are connecting from localhost to localhost.  While weird firewall rules
> may stop this, remember that we are connecting on a port that should be
> accessible, because it is meant to accept connections.  :-)

The rules of firewall-like constructs often distinguish between
accepting connections vs. requesting new connections.  They can also
be user id or group-specific ("nobody").

This is just something to be aware of when people have shutdown
problems.  The traces after connect() fails will be helpful.

> > b) TCP/IP may be dead/dying and socket()+connect() won't work
> >
> >    I'm not sure how to kill apache on OS/390 when the stack goes away.
> >
> >    I was told previously by you and possibly others that I couldn't
> >    teach the child process to tell the parent to get the @#$% out
> >    because the stack is dead.  If the stack is dead, the user can tell
> >    apache to go away but nothing will happen and the child processes
> >    will (probably) continue looping and getting weird errors on
> >    select() or accept().
> >
> >    This needs to be solved.  For me, simply recognizing the magic
> >    error feedback and using the magic error status (CHILD_EXITFATAL?)
> >    is sufficient.
> 
> Greg Ames stated quite emphatically, that if the stack dies on a 390
> machine, the child process will wake up from select.  Once that happens,
> the process will die, if for no other reason than that Apache dies when
> select returns an error other than EINTR.

We're in select()/poll() on OS/390 only if there is more than one
listening socket. In general I would expect to be in accept().

-- 
Jeff Trawick | trawickj@bellsouth.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: New graceful restart mechanism.

Posted by rb...@covalent.net.

> >                                    This could be improved a bit, by using
> > OOB data to wake up the thread, and we would only check the pod if we
> > received OOB data.  This is an optimization however, and I am not sure how
> > that would work in real-life.
>
> Err, I'm not sure either :)  On which connection would you send OOB
> data?  How would we detect that there was OOB data?  If select(),  I
> thought we were removing the need for select() (think
> "SINGLE_LISTEN_UNSERIALIZED_ACCEPT")...   If SIGURG, I thought we were
> avoiding signal-based processing (not sure of SIGURG portability
> anyway)...  I guess I'm missing something real basic here...

I was thinking of doing the connect, and actually sending OOB data over
the wire.  I seem to recall some way to determine when OOB data was read
from the network, but I can't find it right now.

> >                               Assuming nobody complains, I will commit it
> > tomorrow or the next day.
>
> no real complaints, but no reason not to address the minor issues
> below before committing

Some I'll fix, some I'll just comment on.  :-)

> > Index: server/mpm_common.c
> > ===================================================================
> > RCS file: /home/cvs/httpd-2.0/server/mpm_common.c,v
> > retrieving revision 1.47
> > diff -u -d -b -w -u -r1.47 mpm_common.c
> > --- server/mpm_common.c	2001/05/16 20:51:28	1.47
> > +++ server/mpm_common.c	2001/06/04 16:00:37
> > @@ -338,4 +338,61 @@
> >  }> >  #endif /* def NEED_INITGROUPS */
> >
> > +#ifdef AP_MPM_USES_POD
> > +static apr_file_t *pod_in = NULL, *pod_out = NULL;
> > +
> > +AP_DECLARE(apr_status_t) ap_mpm_pod_open(apr_pool_t *p)
> > +{
> > +    apr_file_t *pod;
> >
> > +    pod = apr_file_pipe_create(&pod_in, &pod_out, p);
> > +    apr_file_pipe_timeout_set(out, 0);
> > +    return APR_SUCCESS;
> > +}
> > +
> > +AP_DECLARE(apr_status_t) ap_mpm_pod_check(void)
> > +{
> > +    char c;
> > +    apr_size_t len = 1;
> > +    apr_status_t rv;
> > +
> > +    rv = apr_file_read(pod_in, &c, &len);
> > +
> > +    if ((rv == APR_SUCCESS) && (len == 1)) {
> > +        return APR_SUCCESS;
> > +    }
> > +    return 1;
> > +}
>
> This feels like a boolean routine.  Why not int/0/non-zero instead of
> apr_status_t/APR_SUCCESS/1?

My original thought had been to return the error code, but I was sick this
weekend, an just wanted to get something that works.  :-)  I am VERY
likely to add in the status code return.  That way, it is easier to log
errors.

> > +AP_DECLARE(void) ap_mpm_pod_close(void)
> > +{
> > +    apr_file_close(pod_out);
> > +    apr_file_close(pod_in);
> > +}
> > +
> > +AP_DECLARE(void) ap_mpm_pod_signal(apr_pool_t *p)
> > +{
> > +    apr_socket_t *sock = NULL;
> > +    apr_sockaddr_t *sa = NULL;
>
> init to NULL not necessary... this may make some folks think it is
> needed (as it is/was for some APR functions)

Force of habit, that will be fixed.

> > +
> > +    if ((rv = apr_file_write(pipe_of_death_out, &char_of_death, &one))
> > +                                 != APR_SUCCESS) {
> > +        if (APR_STATUS_IS_EINTR(rv)) continue;
> > +            ap_log_error(APLOG_MARK, APLOG_WARNING, rv, ap_server_conf,
> > +                         "write pipe_of_death");
> > +        }
> > +    }
> > +
> > +    apr_sockaddr_info_get(&sa, "127.0.0.1", APR_UNSPEC, ap_listeners->sd->port, 0, p);
> > +    apr_socket_create(&sock, sa->family, SOCK_STREAM, p);
> > +    apr_connect(sock, sa);
>
> very helpful to log errors from the last two calls, if only to know
> without a shadow of a doubt when debugging a shutdown problem that
> they worked 'cause otherwise you'd have seen the message in error_log
>
> crucial to close the socket here so you don't run out of file
> descriptors while executing the loop in ap_mpm_pod_killpg()...

Both good points.  As I said, this was minimal, because I did it in
between sleeping for hours at a time.  :-)

>
> > Index: server/mpm/prefork/prefork.c
>
> I didn't look your prefork changes in detail...
>
> SIGHUP processing in child processes should be SIG_IGN and SA_RESTART,
> right?

It is SIG_IGN, and I removed the the call to sigaddset for SIGWINCH.  I
THINK I got all the calls to SIGWINCH, but again, review is always useful.

> ---------/----------
>
> This feels like a reasonable the way to go but when this has been
> brought up before I've had some general worries about it that perhaps
> folks can dispel.  (Maybe I'm just mad because when I got the great
> idea to use connect() it wouldn't do what I wanted it to do :) )
>
> I remain at least a little concerned about using connect() because
>
> a) weird firewall-like rules on some possibly-weird systems may keep
>    the connect() from working; admin needs to be aware

We are connecting from localhost to localhost.  While weird firewall rules
may stop this, remember that we are connecting on a port that should be
accessible, because it is meant to accept connections.  :-)

> b) TCP/IP may be dead/dying and socket()+connect() won't work
>
>    I'm not sure how to kill apache on OS/390 when the stack goes away.
>
>    I was told previously by you and possibly others that I couldn't
>    teach the child process to tell the parent to get the @#$% out
>    because the stack is dead.  If the stack is dead, the user can tell
>    apache to go away but nothing will happen and the child processes
>    will (probably) continue looping and getting weird errors on
>    select() or accept().
>
>    This needs to be solved.  For me, simply recognizing the magic
>    error feedback and using the magic error status (CHILD_EXITFATAL?)
>    is sufficient.

Greg Ames stated quite emphatically, that if the stack dies on a 390
machine, the child process will wake up from select.  Once that happens,
the process will die, if for no other reason than that Apache dies when
select returns an error other than EINTR.

> c) We've invented a nice utility for MPMs but it won't work for MPMs
>    which want to parent process to decide which child should go away.
>    (I realize that isn't a current requirement.)

Yeah, I thought about this.  The only solution I have, is to have the
parent process write the pid of the process to die.  That would work, but
it can cause race-conditions that I don't want to think about today.  :-)

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: New graceful restart mechanism.

Posted by Jeff Trawick <tr...@bellsouth.net>.

<rb...@covalent.net> writes:

>                                    This could be improved a bit, by using
> OOB data to wake up the thread, and we would only check the pod if we
> received OOB data.  This is an optimization however, and I am not sure how
> that would work in real-life.

Err, I'm not sure either :)  On which connection would you send OOB
data?  How would we detect that there was OOB data?  If select(),  I
thought we were removing the need for select() (think
"SINGLE_LISTEN_UNSERIALIZED_ACCEPT")...   If SIGURG, I thought we were
avoiding signal-based processing (not sure of SIGURG portability
anyway)...  I guess I'm missing something real basic here...

>                               Assuming nobody complains, I will commit it
> tomorrow or the next day.

no real complaints, but no reason not to address the minor issues
below before committing

> Index: server/mpm_common.c
> ===================================================================
> RCS file: /home/cvs/httpd-2.0/server/mpm_common.c,v
> retrieving revision 1.47
> diff -u -d -b -w -u -r1.47 mpm_common.c
> --- server/mpm_common.c	2001/05/16 20:51:28	1.47
> +++ server/mpm_common.c	2001/06/04 16:00:37
> @@ -338,4 +338,61 @@
>  }
>  #endif /* def NEED_INITGROUPS */
> 
> +#ifdef AP_MPM_USES_POD
> +static apr_file_t *pod_in = NULL, *pod_out = NULL;
> +
> +AP_DECLARE(apr_status_t) ap_mpm_pod_open(apr_pool_t *p)
> +{
> +    apr_file_t *pod;
> 
> +    pod = apr_file_pipe_create(&pod_in, &pod_out, p);
> +    apr_file_pipe_timeout_set(out, 0);
> +    return APR_SUCCESS;
> +}
> +
> +AP_DECLARE(apr_status_t) ap_mpm_pod_check(void)
> +{
> +    char c;
> +    apr_size_t len = 1;
> +    apr_status_t rv;
> +
> +    rv = apr_file_read(pod_in, &c, &len);
> +
> +    if ((rv == APR_SUCCESS) && (len == 1)) {
> +        return APR_SUCCESS;
> +    }
> +    return 1;
> +}

This feels like a boolean routine.  Why not int/0/non-zero instead of
apr_status_t/APR_SUCCESS/1?

> +AP_DECLARE(void) ap_mpm_pod_close(void)
> +{
> +    apr_file_close(pod_out);
> +    apr_file_close(pod_in);
> +}
> +
> +AP_DECLARE(void) ap_mpm_pod_signal(apr_pool_t *p)
> +{
> +    apr_socket_t *sock = NULL;
> +    apr_sockaddr_t *sa = NULL;

init to NULL not necessary... this may make some folks think it is
needed (as it is/was for some APR functions)

> +
> +    if ((rv = apr_file_write(pipe_of_death_out, &char_of_death, &one))
> +                                 != APR_SUCCESS) {
> +        if (APR_STATUS_IS_EINTR(rv)) continue;
> +            ap_log_error(APLOG_MARK, APLOG_WARNING, rv, ap_server_conf,
> +                         "write pipe_of_death");
> +        }
> +    }
> +
> +    apr_sockaddr_info_get(&sa, "127.0.0.1", APR_UNSPEC, ap_listeners->sd->port, 0, p);
> +    apr_socket_create(&sock, sa->family, SOCK_STREAM, p);
> +    apr_connect(sock, sa);

very helpful to log errors from the last two calls, if only to know
without a shadow of a doubt when debugging a shutdown problem that
they worked 'cause otherwise you'd have seen the message in error_log

crucial to close the socket here so you don't run out of file
descriptors while executing the loop in ap_mpm_pod_killpg()... 

> Index: server/mpm/prefork/prefork.c

I didn't look your prefork changes in detail...

SIGHUP processing in child processes should be SIG_IGN and SA_RESTART,
right?

---------/----------

This feels like a reasonable the way to go but when this has been
brought up before I've had some general worries about it that perhaps
folks can dispel.  (Maybe I'm just mad because when I got the great
idea to use connect() it wouldn't do what I wanted it to do :) )

I remain at least a little concerned about using connect() because 

a) weird firewall-like rules on some possibly-weird systems may keep
   the connect() from working; admin needs to be aware

b) TCP/IP may be dead/dying and socket()+connect() won't work

   I'm not sure how to kill apache on OS/390 when the stack goes away.

   I was told previously by you and possibly others that I couldn't
   teach the child process to tell the parent to get the @#$% out
   because the stack is dead.  If the stack is dead, the user can tell
   apache to go away but nothing will happen and the child processes
   will (probably) continue looping and getting weird errors on
   select() or accept(). 

   This needs to be solved.  For me, simply recognizing the magic
   error feedback and using the magic error status (CHILD_EXITFATAL?)
   is sufficient.

c) We've invented a nice utility for MPMs but it won't work for MPMs
   which want to parent process to decide which child should go away.
   (I realize that isn't a current requirement.)

Thanks!
-- 
Jeff Trawick | trawickj@bellsouth.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...