You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Dean Gaudet <dg...@arctic.org> on 1997/11/29 08:50:14 UTC
2.0: process model design, rev 1.4
It's at <http://www.arctic.org/~dgaudet/apache/2.0/process-model>. And
here's a diff from the last time I posted a note about it.
Dean
--- process-model.html 1997/11/23 02:19:54 1.2
+++ process-model.html 1997/11/29 07:48:15 1.4
@@ -4,7 +4,7 @@
</head>
<body bgcolor="#ffffff" text="#000000" link="#0000ff" vlink="#000080" alink="#ff0000">
-<pre>$Id: process-model.html,v 1.2 1997/11/23 02:19:54 dgaudet Exp $</pre>
+<pre>$Id: process-model.html,v 1.4 1997/11/29 07:48:15 dgaudet Exp $</pre>
<h1>Process Model Design</h1>
@@ -59,11 +59,14 @@
<dt>fiber
<dd>A fiber is a user-level "thread". Fibers are <b>co-operatively
multitasked</b>, with context switching occuring only at I/O points
-or at explicit yield points. A fiber can be scheduled to run in
-any thread in the same process.
+or at explicit yield points. A fiber can be scheduled to run in any
+thread in the same process. Typically fibers are implemented entirely
+by user-level libraries, under Unix the I/O would be handled by using
+select() multiplexing. The term is borrowed from WIN32, which actually
+supplies a fiber interface in the API. Fibers are very similar to
+"co-routines".
</dl>
-
<h3>What process models are interesting?</h3>
<p>The models vary in three dimensions: number of processes, number
@@ -85,9 +88,9 @@
<li>Multiple process, single thread, multiple fiber (MSM). In each
process a user-level threads package handles context switches according
to the completion of I/O. This is typical "select-event threading"
-under Unix. This is how Zeus works, and this should be portable to
-essentially every Unix and provide performance advantage over MSS.
-In this model there should be enough processes to exploit the available
+under Unix. This is how Zeus and Squid work, and should be portable to
+essentially every Unix, with a performance advantage over MSS.
+In the MSM model there should be enough processes to exploit the available
parallelism in the hardware.
<li>Single process, multiple thread, single fiber (SMS). This is probably
@@ -131,6 +134,15 @@
things that need work:
<ul>
+<li><b>Stack Usage:</b> This is perhaps the worst issue of all.
+Many routines assume they can allocate large amounts of stack space
+as temporary storge. A fiber library allows systems to be built with
+thousands of fibers (contrast with threads, for which the OS starts to
+chunk after a certain number of threads because the context is still
+too heavy). Fibers (and kernel threads for that matter) typically have
+a <i>static</i> sized stack. One solution is to abstract string and
+URL operations a bit more and use pools more liberally.
+
<li>FILE *, we can't use it. BUFF is probably the best replacement.
While things like sfio provide a useable replacement, we've discussed
them in the past and decided that there are copyright and portability
@@ -174,7 +186,7 @@
<h4>Example: MSS, and MMS</h4>
<p>The MSS and MMS models have the easiest fiber implementation:
-each thread is a fiber. In both of these models, the entire apio
+a fiber is the same as a thread. In both of these models, the entire apio
abstraction can consist of #define wrappers for the real POSIX
functions.
@@ -285,11 +297,85 @@
/* do something specific to multi process models */
#endif
-#if (APPM_MODEL & (APPM_MULTI_PROCESS|APPM_MULTI_FIBER)) == (APPM_MULTI_PROCESS|APPM_MULTI_FIBER)
+#if APPM_MODEL == (APPM_MULTI_PROCESS|APPM_MULTI_FIBER)
/* do something specific to MSM */
#endif
</pre></blockquote>
+<h3>The Main Loop</h3>
+
+<p>Hidden somewhere in the process model is the method by which a
+request is received and then dispatched into a fiber running inside
+some thread inside some process. There is currently no API to
+tap into this, and because Apache is only an HTTP server it hasn't
+been an issue. The closest thing to an API to tap into it is the
+<code>STANDALONE_MAIN</code> definition. But to use that the user
+must supply a complete <code>standalone_main()</code> replacement, and
+use a different <code>child_main</code> (i.e. must supply a complete
+process model replacement). This won't be feasible after the process
+model has been abstracted because there will be too many models to try
+to re-implement.
+
+<p>Other than the process model itself, standalone_main/child_main
+provide the following services:
+
+<ul>
+<li>open a network socket for listening, accept/dispatch requests
+<li>the scoreboard
+<li>monitor an <code>other_child</code> (the
+ <code>register_other_child</code> API)
+</ul>
+
+<h4>monitoring sockets</h4>
+
+<p>It should be easy to abstract the network functions. In the Unix
+models it's sufficient to supply a few <code>fd_sets</code> for
+<code>select</code>, a <code>void *</code>, and a callback. TODO:
+need to figure out the cleanest way to do this so that the WIN32 and
+OS2 models can implement it well. Oh yeah, consider <code>poll()</code>
+while you're at it, because poll works better on sparse fd_sets.
+
+<h4>The Scoreboard</h4>
+
+<p>The scoreboard is very intimately tied to the process model.
+In theory some models may be able to run without a scoreboard at all.
+For example an all static content site running in MSM or SMM models
+should never need to spawn or destroy any threads/processes. Such a
+server would spawn enough threads/processes initially to exploit the
+parallelism in the hardware, and no more. In general there is
+likely to always be a scoreboard, but its use and contents may
+vary widely.
+
+<p>The scoreboard currently provides the following:
+
+<ul>
+<li>status of the server at a glance
+<li>an optimized alarm() implementation
+<li>data for a customized logging module, to log info about how long
+ a request takes for example
+</ul>
+
+<p>I propose that the first function will have to be supplied by a
+module which is process model dependant, and which uses unexported
+interfaces to the scorebard. The second function is part of
+the process model itself, and is hidden behind the timeout API
+already.
+
+<p>The third function can be provided by an API like this:
+
+<blockquote><pre>
+typedef enum {
+ REQSTAT_MILLISECONDS /* #msecs request took, unsigned long */
+ /* uhhh what other fields are of use to a module?? */
+} reqstat_t;
+
+/* returns 0, and stores result in *result if successful
+ * returns -1 and sets errno in the event of an error.
+ * possible errno values: EINVAL -- not supported in this process model
+ */
+extern int get_reqstat(request_rec *r, reqstat_t index, void *result);
+</pre></blockquote>
+
<h3>More Thoughts</h3>
<p>I think the above is general enough to implement the interesting
@@ -307,6 +393,10 @@
duplication, so the os/osname/ directories are probably not where the
models should be implemented. We should probably have pm/unix-mss,
pm/unix-msm, pm/win32-sms, etc.
+
+<p>Ben Hyde adds: "I always end up adding a priority scheme of some sort,
+and I find it is best if the priority is on the transaction and not on
+the thread. I don't know how many systems I've had to rework that in."
<p>Note that it's possible to implement an MSM model in which fibers
can migrate from process to process by using a lot of shared memory,