You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Greg Stein <gs...@lyra.org> on 2000/06/29 16:06:41 UTC

code chunks for filter challenge

Okee dokee... so I got a second wind and tackled this. I don't have a sample
module because the usage of this stuff is pretty easy. See below for the
actual code chunks, but here is what a module would do:

typedef struct {
    ap_setaside_t sa;
} my_ctx;

void my_callback(ap_filter_t *filter, ap_bucket_t *bucket)
{
    my_ctx *ctx = filter->ctx;

    ... process bucket ...
    if (dont_send_yet)
        ap_setaside_bucket(&ctx->sa, bucket, filter->r->pool);
    else {
        ap_table_set(filter->r->headers_out, "foo", "bar");
        ap_lputsetaside(filter, &ctx->sa);
	ap_clear_setaside(&ctx->sa);
    }
}

Note that the filter performs no direct allocations. ap_setaside_bucket()
certainly can, but only when data lifetimes need to change.

The zero-alloc behavior comes from the ap_selfdef_t type ("self-defined
storage"). This bucket type determines its own lifetime by virtue of
carrying around a (sub)pool of its own. ap_setaside_bucket() can then string
these babies together, until ap_lputsetaside() is called.

If other bucket types are passed to ap_setaside_bucket(), then it will make
the necessary copy to ensure that it will live past the function-return.

If too much memory data is set aside, then it is spilled to a temp file.
File buckets do not trigger the memory threshold.

ap_setaside_bucket() allocates memory only when necessary. If data gets
spilled, then it will have a sub-pool to hold ap_selfdef_t structures and
ap_file_t structures. Note that the number of these structures is no more
than 2n+1, where n is the number of files placed into the content stream.
(reducing this count would imply copying file contents, whatever algorithm
may be chosen)

The ap_lputsetaside() function will pass the elements of a setaside
structure down to the next filter. Where possible, it will pass a SELFDEF so
the next filter can grab the sucker intact without copying.


To help understand this stuff, I'll detail a bit about the ap_selfdef_t
structure... it has three types:

1) AP_SELFDEF_PTRLEN: ->pool will be set and contains the ap_selfdef_t
   itself and the data referenced by ->buf (and ->len)

2) AP_SELFDEF_FILE: ->pool will be NULL, the ap_selfdef_t is allocated in an
   unknown pool. ->file refers to the file in question (starting at the
   current file position), ->flen says how many bytes from there

3) AP_SELFDEF_SPILL: all fields empty except for ->len. This length says how
   many bytes to consume from the spill file. If you scan from the start of
   an ap_setaside_t structure, the _SPILL elements are layed out
   sequentially in the spill file. This ap_selfdef_t is allocated in the
   ap_setaside_t.pool

The ap_setaside_t simply contains a head/tail pointer for the selfdef
structures, along with (possibly) a private pool and (possibly) a spill
file. Once ->spill is set, then all memory-based content is dropped into the
spill file.

Note that the chain referenced by setaside.head will contain _PTRLEN *or*
_SPILL elements. Never both. The former is before spilling occurs, the
latter after spilling. _FILE will occur in either state.

When an ap_selfdef_t is placed into an ap_setaside_t, and it is a _FILE,
then the file will be dup'd into the sa->pool. A new ap_selfdef_t is
allocated in the sa->pool (because we don't know where the original was
allocated, so we can't determine its lifetime).

Note: that is a policy decision: a _FILE selfdef does not create a whole
pool just to hold the darn file. It certainly could, but I punted that.

_SPILL selfdefs only occur in a setaside structure and are allocated from
the sa->pool.

-----------

whew. It does sound messy, but that is simply because we are dealing with
data that can have unknown lifetimes. We need to properly manage each of
those items. If the data *does* have a known lifetime (AP_BUCKET_SELFDEF
with an AP_SELFDEF_PTRLEN), then we just drop the thing into the setaside
buffer. Piece of cake, no copying. When that thing is pulled out to be sent
to the next filter, it is sent in its original form which allows the next
filter to set it aside, too.

If you ignore this prose, and just look at the new data structures
(ap_selfdef_t and ap_setaside_t), then you can build up the logic from there.

All the guts of the mechanism is in ap_setaside_bucket().

-----------

Zero-copy path, assuming no spill:

(sa refers to an ap_setaside_t, sd refers to an ap_selfdef_t)

(1) my_callback is called with an AP_BUCKET_SELFDEF (sd) referring to an
AP_SELFDEF_PTRLEN. <sd> is directly chained into <sa>. Repeat. [ follow the
code path in ap_setaside_bucket() to verify no allocs ]

ap_lputsetaside() is called. It loops over the selfdef structures. If the
next filter uses bucket callbacks, then an AP_BUCKET_SELFDEF is built which
refers to this sd. The callback is called. This is the same state as point
(1), so induction shows we can call N filters without copying. At the
bottom, (internally) ap_lwrite() is called with the data which is passed to
ap_bwrite() with no copies.

-----------

If a spill occurs, and *only* content data is present (no files in the
content stream), then all the data will be placed into the sa->spill file
and there will be a single _SPILL selfdata (no matter how many invocations
to the callback occurred; the writes are coalesced)

ap_lputsetaside() will call ap_lsendfile() for the whole spill file. Of
course, this can map into the platform's sendfile().

-----------

When files are present in the content stream, things get a bit more fun, but
it is pretty straight-forward. ap_selfdef_t (_FILE) elements get linked. At
output time, each of the files are delivered via ap_lsendfile().
[ interleaved with regular content data ]


Okay... enough discussion for now. Below, I've appended the new functions
and types. This stuff compiles, but I haven't explicitly tested it (since
we're only going for theory here).

Here is a quick summary of the posted requireements, and a quick answer:

1) entire response must be able to be held by one module (with spill)
   => MET. ap_setaside_t and friends manage this fine

2) no memory allocated out of r->pool
   => MET, with caveats.

      ap_file_t structures are allocated. some ap_selfdef_t are allocated
      to manage lifetimes of data.
      
      this requirement is a bit too absolute given the variety of inputs to
      the filter system (e.g., ap_rwrite, ap_send_fd, ap_rprintf). for the
      case in question: where a SELFDEF is passed, no allocations occur. a
      bit happens when spill occurs. etc.
      
      I expect discussion to focus here.

3) the position in the filter chain should not matter. the algorithm should
   not disable downstream filters from doing the same thing.
   => MET. a SELFDEF will be passed to the next filter if it uses buckets.

4) write as a bucket and as a char *
   => N/A per earlier discussion

5) example filter doesn't have to modify anything; no allocation except for
   structures to refer to the content.
   => MET.
   
      note this req't allows some types of allocation. no biggy, as I avoid
      allocations in the code. but interesting to point out in light of (2)

6) do not copy data that is not modified.
   => MET, with caveats.
      
      data must be copied to account for changing lifetimes. If a SELFDEF is
      fed into the chain (or pops up inside the chain), then copies will no
      longer occur.


Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


typedef enum {
    AP_SELFDEF_PTRLEN,
    AP_SELFDEF_FILE,
    AP_SELFDEF_SPILL
} ap_selfdef_type;

struct ap_selfdef_t {
    int filter_owns;

    ap_pool_t *pool;

    ap_selfdef_type type;

    const char *buf;            /* AP_SELFDEF_PTRLEN */
    ap_size_t len;              /* AP_SELFDEF_PTRLEN, _SPILL */

    ap_file_t *file;            /* AP_SELFDEF_FILE */
    ap_ssize_t flen;            /* AP_SELFDEF_FILE */

    struct ap_selfdef_t *next;
};

struct ap_setaside_t {
    ap_size_t total_mem;
    ap_selfdef_t *head;
    ap_selfdef_t *tail;

    ap_pool_t *pool;
    ap_file_t *spill;
};

API_EXPORT(void) ap_setaside_bucket(ap_setaside_t *sa,
                                    const ap_bucket_t *bucket,
                                    ap_pool_t *pool);
API_EXPORT(void) ap_clear_setaside(ap_setaside_t *sa);

-------------------------------------------------------------------

API_EXPORT(void) ap_lputsetaside(ap_filter_t *filter, const ap_setaside_t *sa)
{
    ap_selfdef_t *scan;
    ap_selfdef_t *next;

    /* rewind the spill file, if present */
    if (sa->spill != NULL) {
        (void) ap_seek(sa->spill, APR_SET, 0);
    }

    for (scan = sa->head; scan != NULL; scan = next) {
        /* the <sa> no longer owns this if we are sending it */
        scan->filter_owns = 0;

        /* grab the next pointer. if somebody takes this item, they will
           probably change the value. */
        next = scan->next;

        switch (scan->type) {
        case AP_SELFDEF_PTRLEN:
        {
            ap_filter_t *f_next = filter->next;

            if (f_next != NULL && f_next->bucket_cb != NULL) {
                ap_bucket_t bucket = {
                    AP_BUCKET_SELFDEF, NULL, 0, NULL, NULL, NULL, 0, scan
                };
                (*f_next->bucket_cb)(f_next, &bucket);
            }
            else {
                /* send some bytes; not as optimal */
                ap_lwrite(filter, scan->buf, scan->len);
            }
            break;
        case AP_SELFDEF_FILE:
            /* send the file */
            ap_lsendfile(filter, scan->file, scan->flen);
            break;
        case AP_SELFDEF_SPILL:
            /* send (len) bytes out of the spill file */
            ap_lsendfile(filter, sa->spill, scan->len);
            break;
        default:
            /* ### error... */
            ap_assert(0);
            return;
        }
    }
}

-------------------------------------------------------------------

API_EXPORT(void) ap_setaside_bucket(ap_setaside_t *sa,
                                    const ap_bucket_t *bucket,
                                    ap_pool_t *request_pool)
{
    ap_ssize_t needed;
    ap_pool_t *subpool = NULL;
    ap_pool_t *fmtpool = NULL;
    ap_array_header_t *strs;
    const char *s;
    ap_selfdef_t *sd;

    /* a selfdef has its own pool; for the others, we will need one */
    if (bucket->type != AP_BUCKET_SELFDEF && sa->pool == NULL) {
        (void) ap_create_pool(&sa->pool, request_pool);
    }

    /* default: format directly into the spill pool */
    fmtpool = sa->pool;

    if (sa->spill == NULL
        && (bucket->type == AP_BUCKET_STRINGS
            || bucket->type == AP_BUCKET_PRINTF)) {
        /* format into a new subpool; this data may be dumped to a spill
           file and the subpool will be destroyed */
        (void) ap_create_pool(&subpool, request_pool);
        fmtpool = subpool;
    }

    switch (bucket->type) {
    case AP_BUCKET_PTRLEN:
        needed = bucket->len;
        break;
    case AP_BUCKET_STRINGS:
    {
        va_list va = bucket->va;
        int i;
        char *s2;

        strs = ap_make_array(fmtpool, 5, sizeof(const char *));

        /* compute the total size */
        needed = 0;
        while (1) {
            s = va_arg(va, const char *);
            if (s == NULL)
                break;
            *(const char **)ap_push_array(strs) = s;
            needed += strlen(s);
        }

        /* concatenate them all together */
        s = s2 = ap_palloc(fmtpool, needed);
        for (i = 0; i < strs->nelts; ++i) {
            const char *s3 = ((const char **)strs->elts)[i];
            ap_size_t len = strlen(s3);
            memcpy(s2, s3, len);
            s2 += len;
        }
        break;
    }
    case AP_BUCKET_PRINTF:
        s = ap_pvsprintf(fmtpool, bucket->fmt, bucket->va);
        needed = strlen(s);
        break;
    case AP_BUCKET_FILE:
        needed = 0;
        break;
    case AP_BUCKET_SELFDEF:
        switch (bucket->selfdef->type) {
        case AP_SELFDEF_PTRLEN:
            needed = bucket->selfdef->len;
            break;
        case AP_SELFDEF_FILE:
            needed = 0;
            break;
        case AP_SELFDEF_SPILL:
        default:
            /* ### error... */
            ap_assert(0);
            return;
        }
        break;
    default:
        /* ### error... */
        ap_assert(0);
        return;
    }

    if (sa->spill == NULL) {
        sa->total_mem += needed;
        if (sa->total_mem > SPILL_THRESHOLD) {
            const char *fname;

            if (sa->pool == NULL) {
                (void) ap_create_pool(&sa->pool, request_pool);
            }

            fname = "/tmp/spill";       /* ### yah. right */
            (void) ap_open(&sa->spill, fname,
                           APR_WRITE | APR_CREATE | APR_EXCL | APR_DELONCLOSE,
                           APR_UREAD, sa->pool);

            /* ### walk sa, spilling data. must coalesce blocks. */

            /* ### fall thru to place data into spill areas */
        }
        else {
            if (bucket->type == AP_BUCKET_SELFDEF) {
                sd = bucket->selfdef;
                sd->filter_owns = 1;
            }
            else {
                /* PTRLEN buckets need a pool to copy the data into */
                if (bucket->type == AP_BUCKET_PTRLEN) {
                    (void) ap_create_pool(&fmtpool, request_pool);
                }

                /* for _FILE, fmtpool == sa->pool */
                sd = ap_pcalloc(fmtpool, sizeof(*sd));

                switch (bucket->type) {
                case AP_BUCKET_PTRLEN:
                    sd->pool = fmtpool;
                    sd->type = AP_SELFDEF_PTRLEN;
                    sd->buf = ap_pstrndup(sd->pool, bucket->buf, needed);
                    sd->len = needed;
                    break;
                case AP_BUCKET_STRINGS:
                case AP_BUCKET_PRINTF:
                    sd->pool = fmtpool;
                    sd->type = AP_SELFDEF_PTRLEN;
                    sd->buf = s;
                    sd->len = needed;
                    break;
                case AP_BUCKET_FILE:
                    sd->type = AP_SELFDEF_FILE;
                    /* dup the file into our pool */
                    (void) ap_dupfile(&sd->file, bucket->file, sa->pool);
                    sd->flen = bucket->flen;
                    break;
                }
            }

            if (sa->head == NULL) {
                sa->head = sa->tail = sd;
            }
            else {
                sa->tail->next = sd;
                sa->tail = sd;
            }

            /* done */
            return;
        }
    }

    /* data must be placed into the spill */

    /* append to the previous spill datum? */
    if (sa->tail != NULL && sa->tail->type == AP_SELFDEF_SPILL
        && (bucket->type == AP_BUCKET_PTRLEN
            || bucket->type == AP_BUCKET_STRINGS
            || bucket->type == AP_BUCKET_PRINTF
            || (bucket->type == AP_BUCKET_SELFDEF &&
                bucket->selfdef->type == AP_SELFDEF_PTRLEN))) {

        sa->tail->len += needed;
        switch (bucket->type) {
        case AP_BUCKET_PTRLEN:
            (void) ap_write(sa->spill, bucket->buf, &needed);
            break;
        case AP_BUCKET_STRINGS:
        case AP_BUCKET_PRINTF:
            (void) ap_write(sa->spill, s, &needed);
            break;
        case AP_BUCKET_SELFDEF:
            (void) ap_write(sa->spill, bucket->selfdef->buf, &needed);
            break;
        }

        /* trash the formatting pool (its contents were spilled) */
        if (fmtpool != sa->pool)
            ap_destroy_pool(fmtpool);
        return;
    }

    /* spill to a new datum */
    sd = ap_pcalloc(sa->pool, sizeof(*sd));

    switch (bucket->type) {
    case AP_BUCKET_PTRLEN:
        sd->type = AP_SELFDEF_SPILL;
        sd->len = needed;
        (void) ap_write(sa->spill, bucket->buf, &needed);
        break;
    case AP_BUCKET_STRINGS:
    case AP_BUCKET_PRINTF:
        sd->type = AP_SELFDEF_SPILL;
        sd->len = needed;
        (void) ap_write(sa->spill, s, &needed);
        break;
    case AP_BUCKET_FILE:
        sd->type = AP_SELFDEF_FILE;
        /* dup the file into our pool */
        (void) ap_dupfile(&sd->file, bucket->file, sa->pool);
        sd->flen = bucket->flen;
        break;
    case AP_BUCKET_SELFDEF:
        switch (bucket->selfdef->type) {
        case AP_SELFDEF_PTRLEN:
            sd->type = AP_SELFDEF_SPILL;
            sd->len = needed;
            (void) ap_write(sa->spill, bucket->selfdef->buf, &needed);
            break;
        case AP_SELFDEF_FILE:
            /*
            ** ### can we just take this selfdef (and set filter_owns)?
            ** ### specifically: is selfdef->pool set, holding this file?
            */
            sd->type = AP_SELFDEF_FILE;
            /* dup the file into our pool */
            (void) ap_dupfile(&sd->file, bucket->selfdef->file, sa->pool);
            sd->flen = bucket->selfdef->flen;
            break;
        }
        break;
    }

    if (sa->head == NULL) {
        sa->head = sa->tail = sd;
    }
    else {
        sa->tail->next = sd;
        sa->tail = sd;
    }

    /* trash the formatting pool (its contents were spilled) */
    if (fmtpool != sa->pool)
        ap_destroy_pool(fmtpool);
}

API_EXPORT(void) ap_clear_setaside(ap_setaside_t *sa)
{
    ap_selfdef_t *scan;
    ap_selfdef_t *next;

    for (scan = sa->head; scan != NULL; scan = next) {
        /* grab ->next before we blast the pool (which might hold *scan) */
        next = scan->next;

        /* do not blast the pool if somebody has taken it */
        if (!scan->filter_owns && scan->pool != NULL)
            ap_destroy_pool(scan->pool);
    }
    ap_destroy_pool(sa->pool);

    memset(sa, 0, sizeof(*sa));
}

Re: another response to Roy :-)

Posted by Greg Stein <gs...@lyra.org>.
On Sun, Jul 02, 2000 at 10:55:42PM -0700, Dirk-Willem van Gulik wrote:
> On Fri, 30 Jun 2000, Greg Stein wrote:
> 
> >> ..Roy wrote..
> 
> > I don't understand this. What are the "problems of buff" that my patch
> > introduces? And I really don't understand the fragility point. It started
> 
> Interesting reply; as _You_ did not introduce them; we've been struggling
> with them from the days of shambala. 

Gotcha. Well, the BUFF is effectively a layer/filter in my updated patch. I
didn't register a filter for it, nor add it to the response (to minimize the
number of files affected, and to avoid thinking about the proper point in
the control flow to do so). However, inside of http_protocol.c, it
dispatches to BUFF_filter_callback() exactly as if it were a true filter.

Point being, the filter chain doesn't know about BUFF and just tosses the
data at it via another filter. This independence means that we can alter and
tweak BUFF at will (an BUFF_filter_callback's interaction with it) without
affecting the rest of the filter chain.

I did not want to tackle fixing/changing BUFF at *this* point in time.
Creating a clean separation is the first step. There are going to be other
tweakies such as the various points in Apache where we modify the
translations in BUFF, modify the chunking flags, alter timeouts, etc.
Gathering those up is another hunk of work.

> Perhaps this is the core of the issue at hand; some of us want to solve
> some long standing problems whilst adding a layer whereas others just want
> a layer which does what it should do.

I couldn't parse this sentence :-)

Yes, I think cleaning up BUFF would be a Great Thing(tm), too. But it isn't
easy, and a good framework is going to be the first step. I'm quite wary of
it, as the filtering patches' first foray into this area introduced BUFF
bugs. I'd like to see the filters added, then we can start to take pieces
out of BUFF and up into the filter chain; but carefully and individually
reviewable. I've started a mod_xlate to take over the xlate features of
BUFF. I'll post it as an example once it is done.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: another response to Roy :-)

Posted by Dirk-Willem van Gulik <di...@covalent.net>.

On Fri, 30 Jun 2000, Greg Stein wrote:

>> ..Roy wrote..

> I don't understand this. What are the "problems of buff" that my patch
> introduces? And I really don't understand the fragility point. It started

Interesting reply; as _You_ did not introduce them; we've been struggling
with them from the days of shambala. 

Perhaps this is the core of the issue at hand; some of us want to solve
some long standing problems whilst adding a layer whereas others just want
a layer which does what it should do.

Dw


another response to Roy :-)

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Jun 29, 2000 at 10:16:31PM -0700, Roy T. Fielding wrote:
> >Your veto is real bullshit. It is supported only by manufactured excuses,
> >illogical assumptions, and impossible preconditions.
> 
> I don't think this is fair.  We don't need an interface that has all of
> the problems of buff and is just as fragile as existing module writes
> (more if you consider that the layers introduce configurable interaction
> errors that simply didn't exist before). If there is a design on the table
> that can improve the situation, then not desiring an inferior design
> is sufficient justification for a veto.  More to the point, if your
> design doesn't significantly improve the state of our implementation,
> then it isn't worth adding to the cruft to the server.

I don't understand this. What are the "problems of buff" that my patch
introduces? And I really don't understand the fragility point. It started
with a simple, easy-to-use char* callback handler. People wanted more, so I
filled in the slots I had (but didn't code first-time around) for the bucket
stuff. Ryan asked for an example of something else, and I demonstrated that
it was possible (but it isn't part of the patch). I believe the ability to
quickly add each feature demonstrates a sound framework rather than any
fragility.

Presuming the design you're referring to is the bucket-brigades... yes,
there is a design, but no code that implements it. I believe the only
missing, basic requirements is adding a "next" pointer to the buckets in my
patch. There may be some other details, but I think my patch can implement
what you are looking for. I'd love to learn where it specifically falls down.

To your last point: I would not have placed my patch into the STATUS file
for consideration unless I truly believed that it improved the situation.
I'm not about to suggest adding "cruft" to the server.

> The char * interface cannot support sendfile or caching or writing a proxy
> in filters.  It is useless to me, at the same cost as a design that would
> actually work.  I'm not saying it isn't useful for someone -- I am saying
> the design won't accomplish my needs, whereas the bucket brigades does.
> I would veto it myself if I had time to back it up with a fully implemented
> alternative.  Even so, there is no doubt in my mind that adding a dumb
> filtering design to Apache at this point is far far worse than doing
> nothing about filtering in 2.0.

Roy. Please hold for a moment here.

I *do* have buckets. The char* interface is simply an *option* for that
"someone" you refer to. If you want buckets, then apply my patch. If you
want a list-of-buckets, then I'll add a next pointer.

Specifically, take a look at ap_bucket_t in the ap_filter.h file that is
part of my patch. A filter then implements a function that accepts one of
those buckets.

This isn't a dumb filtering interface. It is a bucket-based system with an
option for a simplified interface.

>...
> If you guys spent half as much time commenting your code as you did
> arguing about the edge cases, maybe we wouldn't have to spend so much
> time trying to communicate?  Maybe it would help clarify the intention
> behind some of the interface issues.

This is a good suggestion, but we only have so much time. Both of us hoped
that the code was enough, and the comments and doc would come in a future
pass.

> Like, for example, Greg's apparent
> desire to stick to only what is needed by the ap_rwrite interface, which
> I didn't see from reading the code.  I still don't care about that as
> a design requirement, but at least now I understand a little more about
> why the patch does what it does.

Yes. I want to work within the existing framework, and all existing modules
to be able to send their content (via ap_rwrite and friends) into the filter
chain. I fully expect to later provide an ap_rbucket() interface so that
content generators can optimize their output.

> Here is my problem.  The early patches that were proposed were capable
> of implementing 25% of the stream filter goals, with the ability to
> be extended to 75%.  The latest ones implement maybe 70% with an
> extensibility to 80%.  You seem to be suggesting that it is the first
> number that matters in order to justify applying the patch.  My problem
> is that I don't care about the first number -- I don't want a filter
> interface in Apache that isn't capable of eventually handling 100%
> of the problem space.  That's because I don't want to have to redesign
> the interface after 2.0.0 goes out and everyone discovers that what
> we have sucks.

All right. This is a very reasonable point of view.

But, please... detail the features/qualities of that missing 20% or 30%. If
you want to be completely happy with my patch, then I'll gladly code those
bits for you. But I just don't know what your "100% goals" are... It seems
somewhat unfair to state that one of the posted patches does not meet the
100% goals, yet not explain what those are. How could we ever meet that?

>...
> I can understand why we might want to start with a simple implementation
> under the covers of an ADT.  What I don't understand is the theory that
> we should start with an interface that we know isn't sufficient for the
> task at hand and replace it later on.  We already know the right interface.
> If you want a simple layer on top of that, no problem, but the most
> efficient native interface must be the baseline.

Please educate me, then. Suggest a way to improve on the ap_bucket_t in my
patch. I simply do not understand why it doesn't match the "right interface."

> Greg, I know this is more vague rambling on my part.  Let me make a
> specific example.  Somewhere in your patch (as noted while catching
> up on my e-mail today) is a filter function that is making a test
> on filter->r->connection->aborted.  That is horribly wrong.

Are you referring to the ap_l* functions? The filter callbacks never do.

(or are you thinking of Ryan's patches? in one of his macros, which is used
in the filter callbacks, they refer to the aborted flag)

I'm going to assume you *are* talking about the ap_l* functions... below...

> Consider the case where you have an active proxy implemented in the form
> of a filter where you have one input filter chain reading the inbound
> response and passing it downstream toward the client.  What r?  Which
> connection?  Why the hell does the filter need to know anything about
> the endpoints, let alone assume they are sockets?

The filter doesn't need to know (and doesn't today). The framework doesn't
need to know (but does today). I can remove those tests quite easily. But in
the *current* Apache, it makes a lot of sense. It allows the filter chain to
short-circuit the processing when it finds out that it is no longer needed.

Is that too much knowledge of the endpoint? Sure. But it isn't permanent,
and it certainly isn't a requirement of the design. If the whole chain just
kept going and shoved everything down to BUFF where it finally got
discarded, it would still work fine.

> The patch may work fine, but the design doesn't.  Data flow networks are
> a very well-defined and understood software architecture.  They have a
> single, very important constraint: no filter is allowed to know anything
> about the nature of its upstream or downstream neighbors beyond what
> is defined by the filter's own interface.  That constraint is what makes
> data flow networks highly configurable and reusable.  Those are properties
> that we want from our filters.

Sigh. Roy: none of my example filters have used any knowledge of the end
points. Never have the callbacks ever checked ->aborted. I believe my filter
design matches your requirements.

> The second problem here is that this discussion contains too many
> messages.  Both of you seem to have a need to respond to every message
> on a point-by-point basis,

I know there are too many messages. But look at it from my point of view.
You come along and post a relatively singular message. People that have
tuned out of the discussion stop to read it ("ooh. roy posted. maybe it is
time to see what is happening."). Then they read about how you don't like my
patch because it only provides the char* interface and does not implement
the bucket brigades.

What am I to do? Just let that impression go and sink in? Or do I try to
educate the readers of these emails that, *yes* it *does* have buckets?

When Ryan says the recursive-call design will suck performance wise, yet
that whole "register spill" thing truly doesn't exist in these scenarios?
Let that pass? Give people the wrong impression?

I feel it is important that people understand that I've posted a patch that
meets (IMO) our needs. If they know that, then they may stop and actually
look at the thing and provide one of the needed +1 votes. But if every
message has some comment about how it is broken here or there, or doesn't
meet this need or that, then they will keep skipping over it. We'll just
remain in limbo...

>...
> What you don't seem
> to be realizing is that you have generated so much text that the only people
> who have enough time to both read the discussion and propose alternatives
> are the two of you.  The rest of us just can't keep up.

Oh, I totally understand. That is why I've been posted things with titles
such as "what are the issues?" or "recap" or "summary". To boil all this
stuff down so that (hopefully) people can jump in at that point.

> I'd love to spend
> some time flushing out the bucket brigades design, but instead I am spending
> twelve hours a day reading the e-mail that gets stacked-up behind days
> when I'm travelling, teaching classes, or otherwise doing what my real
> job requires.  I don't want you to slow down, but for crissakes please
> boil down the discussion to the meat and ignore the side-comments if
> they are already covered by a prior message.

I get the feeling that you didn't see the ap_bucket_t stuff in my patch.
Maybe you looked at the very first one? Please look at the latest, available
in the archive with MsgID <20...@lyra.org>. There are some
examples of the filtering scheme in <20...@lyra.org>.

If you could spend just 30 minutes looking over that. Again: it is only 850
lines, most of it quite straight-forward. But sorry: no comments yet.

If that patch does not meet your bucket-brigades thoughts, or you can see
where it can't *ever* meet them, then please explain to me. I will
definitely listen and fix the code.

> And I don't see any reason why we should hurry to get this in 2.0.
> Not even my own personal favorite design is worth rushing into the
> code base -- I'd rather make sure that the existing server works.

And we all do things that are interesting and important to each one of us.
Both Ryan and myself think this would be a really cool, kickass feature for
2.0, and we are both willing to spend a lot of time to see it happen.

I'm guessing that *something* is going to go into 2.0. I hope that you can
at least spend a small amount of time to review the posted patch, and
provide your thoughts. If this is dear to you, and something is going in,
then you'll want to have a look :-)

thx,
-g

-- 
Greg Stein, http://www.lyra.org/

sick of this shit (was: Re: code chunks for filter challenge)

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Jun 29, 2000 at 03:56:18PM -0700, rbb@covalent.net wrote:
> 
> > > But isn't that information lost when we go from bucket to char * filters?
> > 
> > Yes, it is lost.
> > 
> > There is no sense in continuing this email response if you focus on the
> > char* filters.
> 
> I've been focusing on the char * filters since the beginning.  This is
> where my issue is with your design.

Damn it. You're really starting to get me pissed off, and I'm quite unhappy
about this whole thing.

Look. If you want zero-copy when you set aside a whole buffer, it is
possible *if* the content generator and upstream filters provide the right
thing. After that filter delivers the content downstream, whether it makes
it to the network or not is entirely up to those filters.

You are suggesting that we have to have a scheme that is zero-copy across
the entire chain. That is just fucked.

Consider a content recoder that maps UTF-16 content into UTF-8 content. The
length is going to change, so you cannot operate in place. What comes into
the filter is TOTALLY LOST. Something else entirely goes out the other side
of that filter.

Well, guess what? That damn zero-copy that you keep insisting on just flew
right out the window.

Want to know more? That recoder is going to use a char* handler. Why?
Because it is so damned easy, and it exactly matches the semantics that a
recoder is going to need.

I sick of your bullshit about this filtering stuff.

"char* handlers are bad." My ass. For particular filtering applications,
they are totally applicable, and the appropriate design point.

"oh. there is a memcpy() in your al_setaside_bucket()". No shit. The caller
did an ap_rvputs(r, buf1, buf2, "some string", another_buf, NULL). We sure
as hell HAVE TO COPY that data into a long-lived pool.

You are so focused on the little fucking details and how you can say "oh, it
is bad here. it is bad there." that you aren't even looking at WHY certain
things exist. That memcpy() exists because it HAS TO BE THERE. There is
nothing you are every going to be able to design and build to avoid a
memcpy() for a set-aside when the content generator calls ap_rvputs(). It
must be done. Wake the hell up.


Your veto is real bullshit. It is supported only by manufactured excuses,
illogical assumptions, and impossible preconditions.

Regards,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: code chunks for filter challenge

Posted by rb...@covalent.net.
> > But isn't that information lost when we go from bucket to char * filters?
> 
> Yes, it is lost.
> 
> There is no sense in continuing this email response if you focus on the
> char* filters.

I've been focusing on the char * filters since the beginning.  This is
where my issue is with your design.

> You asked for a mechanism to allow zero-copy from the content generator down
> to the network, as it passed through the filters, and where one or more of
> those filters must set ALL the data aside. Great. I coded it.

Yes, you did.  I have no problem with what you added last
night.  Unfortunately, either you haven't been listening, or I haven't
been clear enough.  The char * filters mean that EVERY filter must copy
the data.  You said this wasn't the case, so I asked you to prove
it.  What you proved, was that the char * filters defeat the purpose of
the bucket filters.

> My mechanism has another useful feature which allows a person to write a
> char* filter. This is INDEPENDENT of the requirement at hand. If people want
> to insert a char* filter in there, then fine. We also allow people to write
> infinite loops in their modules. We cannot be the police for everything.

This isn't independent of the requirements at hand.  I tried to highlight
this yesterday, when I was talking about the sebsequent filters on the
stack.  If the design forces a memcopy (as the char * filters do), then
the design is broken.

I don't think I can express this any clearer.  I said this yesterday
morning in my discussion with Jon Winstead, and I _THINK_ he understands
what I am saying now.

> You wanted proof that my filtering mechanism can support a zero-copy with a
> complete set-aside of the response. I provided that. I do acknowledge that
> the filter writers must use buckets throughout to see this happen.

More than that.  EVERY filter writer must use buckets to see that this
happens.  This means that if ANY filter in the filter chain uses char *,
then the benefits of the buckets disappear, and in fact we take a hit by
using them.

> Look at it this way: assume that the char* filter does not exist. Review the
> set-aside code in that light.

You can't do that.  The char * filters are the basis for the design.  Next
thing you'll be telling me that we can just remove the char * filters
later.

> [ take the position the char* feature is added later ]

I am against the char * all together.

> I spent a long night meeting your challenge. The code *can* do what you want.

Yes, it can.  If we remove the char * filters.  With the char * filters,
it cannot.  This is what I have said over and over again.
 
> Please review the code as if a char* handler does not exist. Lift your veto
> if you find no problems in that light; explain your issues otherwise. If you
> would rather shortcut an additional explanation, and can agree that char*
> handlers are not an issue, then I'll review your challenge response in
> detail, explaining why certain allocations and copies exist, how the
> lifetimes work, etc. Those items are all to be expected given the current
> Apache APIs for content generators to write data.

The char * filters are the ONLY issue.  They have been the issue since
almost day one.  They have especially been the issue since early this week
when Roy's bucket brigades finally made sense.

The current API's for content generators do not put these limits on us.  I
do have a design that I will implement soon which does not have ANY of the
limitations of this patch.  The design is what I believe Roy is talking
about with his bucket brigades.

Everything that you admitted to above is what I dislike about the design,
and it is why I have the veto in place.  In previous e-mails you have
certainly made it sound like non of this was true:


>> >               For the simple callback, the pool handling occurs
>*outside*
>> >           the callback. It remains just as simple as above. The point
>> >           here is that we have optimization choices that will not
>affect
>> >           the simple_cb style of filter.
>> 
>> The point is that this looks like an optimization, but it is really not
>> one.  In reality, this design (as well as any other that requires
>> pools) requires more memory and pool memcpy's than are necessary.
>
>False. See above.
>
>The framework retains no additional memory to deal with the char*
>callback.
>As always, the filter is free to consume memory, but there is nothing in
>the
>design that encourages unlimited consumption.
>
>A memcpy() is only needed if data must be set aside. This occurs in all
>cases.

A memcpy is required once a filter uses a bucket filter and the next
filter uses a char *.  That's what the code you posted yesterday did.

> 4) the char* handler is bad for various reasons
>
>    => I showed that each of these "reasons" are issues for filters in
>       general and not characteristic of the char* handler. regardless,
>       the
>       "reasons" do not preclude the patch, the design, or filters in
>       general.

You admit that this is so at the top of this e-mail.

> 8) need to track pools and their lifetimes
>
>    => false. demonstrated otherwise

Doesn't the code you designed last night track pool lifetimes to make sure
we aren't trying to keep memory around to long.  I could have sworn
that's what ap_lsetaside did.

> 9) the framework demands extra memory and memcpys
>
>    => false. demonstrated otherwise

Anytime char * filters are used, we have extra memory usage and extra
memcpy's, you admit this above.  This is why the char * filters are bad.

> 10) the framework precludes certain optimizations
>
>    => possibly (as with any code), but there has not been a credible
>       example of an optimization that is: important/relevant, and can't
>       be done later within this framework

We just proved that wrong.  The optimization that cannot be done in the
current design, is zero-copy.  As long as the char * filters are present
in the patch, this optimization is impossible.



My veto is remaining, because this design is flawed.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------



Re: code chunks for filter challenge

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Jun 29, 2000 at 08:46:04AM -0700, rbb@covalent.net wrote:
> 
> I do have issues with this design.  I NEED to see a full module before I
> will remove the veto.  The module doesn't even have to compile, but it
> should be complete.
> 
> > Note that the filter performs no direct allocations. ap_setaside_bucket()
> > certainly can, but only when data lifetimes need to change.
> > 
> > The zero-alloc behavior comes from the ap_selfdef_t type ("self-defined
> > storage"). This bucket type determines its own lifetime by virtue of
> > carrying around a (sub)pool of its own. ap_setaside_bucket() can then string
> > these babies together, until ap_lputsetaside() is called.
> 
> But isn't that information lost when we go from bucket to char * filters?

Yes, it is lost.

There is no sense in continuing this email response if you focus on the
char* filters.

You asked for a mechanism to allow zero-copy from the content generator down
to the network, as it passed through the filters, and where one or more of
those filters must set ALL the data aside. Great. I coded it.

My mechanism has another useful feature which allows a person to write a
char* filter. This is INDEPENDENT of the requirement at hand. If people want
to insert a char* filter in there, then fine. We also allow people to write
infinite loops in their modules. We cannot be the police for everything.

You wanted proof that my filtering mechanism can support a zero-copy with a
complete set-aside of the response. I provided that. I do acknowledge that
the filter writers must use buckets throughout to see this happen.

Look at it this way: assume that the char* filter does not exist. Review the
set-aside code in that light.

[ take the position the char* feature is added later ]

I spent a long night meeting your challenge. The code *can* do what you want.

Please review the code as if a char* handler does not exist. Lift your veto
if you find no problems in that light; explain your issues otherwise. If you
would rather shortcut an additional explanation, and can agree that char*
handlers are not an issue, then I'll review your challenge response in
detail, explaining why certain allocations and copies exist, how the
lifetimes work, etc. Those items are all to be expected given the current
Apache APIs for content generators to write data.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: code chunks for filter challenge

Posted by rb...@covalent.net.
I do have issues with this design.  I NEED to see a full module before I
will remove the veto.  The module doesn't even have to compile, but it
should be complete.

> Note that the filter performs no direct allocations. ap_setaside_bucket()
> certainly can, but only when data lifetimes need to change.
> 
> The zero-alloc behavior comes from the ap_selfdef_t type ("self-defined
> storage"). This bucket type determines its own lifetime by virtue of
> carrying around a (sub)pool of its own. ap_setaside_bucket() can then string
> these babies together, until ap_lputsetaside() is called.

But isn't that information lost when we go from bucket to char * filters?

> 
> If other bucket types are passed to ap_setaside_bucket(), then it will make
> the necessary copy to ensure that it will live past the function-return.

Here is where this design fails the challenge as I see it, although it may
be that I am not seeing the design.

The fact that the bucket carries around it's own (sub)pool is BAD!  Who
frees the pool?  When?  Does the pool get freed when the memory is written
to disk?  Who writes the function that puts the memory to disk?  If the
pool is freed when the memory is written to disk, then how does the module
that originally created the pool know this?

Another problem that I think can best be asked this way is:

IF this module is the first in the filter list, and the second module uses
char * type filter, and then there is another bucket_filter that has to
cache the whole request, does the second filter have to re-allocate the
memory?  This is the problem I see with passing around buckets and then
char *'s and then buckets.

> ap_setaside_bucket() allocates memory only when necessary. If data gets
> spilled, then it will have a sub-pool to hold ap_selfdef_t structures and
> ap_file_t structures. Note that the number of these structures is no more
> than 2n+1, where n is the number of files placed into the content stream.
> (reducing this count would imply copying file contents, whatever algorithm
> may be chosen)
>
> The ap_lputsetaside() function will pass the elements of a setaside
> structure down to the next filter. Where possible, it will pass a SELFDEF so
> the next filter can grab the sucker intact without copying.

This is what I am talking about above.  This is the failure point, right
here.  Because it can't always pass around the SELFDEF, we lose the most
important information about the data, namely that it has already been
copied once.  This nullifies the zero-copy path when you have two bucket
filters with one char * filter in-between.

The only way around this is to remove the char * filters, which basically
removes all filtering from your patch.

Here is another question:

If we go from bucket to char * filter, don't we lose the sub-pool in the
lower filters?  Doesn't that mean that the memory can't be freed down
there.  So, basically:

1) mod-filter (the module I asked for) caches all the data, and allocates
   a sub-pool.  The whole request is 100 bytes, so the data isn't spilled
   to the disk.
2)
3) mod-insert-header-footer (module that inserts a header and footer) is
   written using a char *, because it doesn't need to cache, it looks for
   the tag ?HEAD? and ?FOOT? to know where to insert the data.  In this
   case, the header and foot add 50 MB, but they are written to the next
   layer in two big chunks (one each)
4)
5) mod-gzip (module to gzip data) needs to cache the whole request so
   that it can change the content-lenght, so it uses bucket's.

At stage 1, we create a sub-pool.  At stage 2, we lose the sub-pool.  At
stage 3, we have no sub-pool.  At stage 4, we throw it back into a bucket,
and at stage 5, don't we create a new sub-pool and re-copy.  When stage 5
writes to the disk, how do we free the sub-pool created in step 1?  It
doesn't really matter in this example because that first sub-pool only has
100 bytes.  What if that sub-pool has more memory in it?  Not enough to
write to disk, but say 1 byte less than that limit?

> To help understand this stuff, I'll detail a bit about the ap_selfdef_t
> structure... it has three types:
> 
> 1) AP_SELFDEF_PTRLEN: ->pool will be set and contains the ap_selfdef_t
>    itself and the data referenced by ->buf (and ->len)
> 
> 2) AP_SELFDEF_FILE: ->pool will be NULL, the ap_selfdef_t is allocated in an
>    unknown pool. ->file refers to the file in question (starting at the
>    current file position), ->flen says how many bytes from there
> 
> 3) AP_SELFDEF_SPILL: all fields empty except for ->len. This length says how
>    many bytes to consume from the spill file. If you scan from the start of
>    an ap_setaside_t structure, the _SPILL elements are layed out
>    sequentially in the spill file. This ap_selfdef_t is allocated in the
>    ap_setaside_t.pool

I understand all of this, it is buckets.  It doesn't however fix the
problems I outlined above.

> When an ap_selfdef_t is placed into an ap_setaside_t, and it is a _FILE,
> then the file will be dup'd into the sa->pool. A new ap_selfdef_t is
> allocated in the sa->pool (because we don't know where the original was
> allocated, so we can't determine its lifetime).

This is bad.  If we use malloc/free, we explicitly know it's lifetime, so
we can avoid this dup.  We know the bucket's lifetime, because as long as
we have the bucket, the memory is still alive.  If we don't have the
bucket, the memory has already been freed.

> whew. It does sound messy, but that is simply because we are dealing with
> data that can have unknown lifetimes. We need to properly manage each of
> those items. If the data *does* have a known lifetime (AP_BUCKET_SELFDEF
> with an AP_SELFDEF_PTRLEN), then we just drop the thing into the setaside
> buffer. Piece of cake, no copying. When that thing is pulled out to be sent
> to the next filter, it is sent in its original form which allows the next
> filter to set it aside, too.

This is why pools don't work for this situation IMHO.

> If you ignore this prose, and just look at the new data structures
> (ap_selfdef_t and ap_setaside_t), then you can build up the logic from there.
> 
> All the guts of the mechanism is in ap_setaside_bucket().

OK.  I think I see the design, but I still can't get past the problems
I outlined above.  Having actually reviewed ap_setaside_bucket, you are
memcpy'ing to move from strings to buckets.  This is the un-allowed
memory allocation that is causing all of the problems.

> (1) my_callback is called with an AP_BUCKET_SELFDEF (sd) referring to an
> AP_SELFDEF_PTRLEN. <sd> is directly chained into <sa>. Repeat. [ follow the
> code path in ap_setaside_bucket() to verify no allocs ]
> 
> ap_lputsetaside() is called. It loops over the selfdef structures. If the
> next filter uses bucket callbacks, then an AP_BUCKET_SELFDEF is built which
> refers to this sd. The callback is called. This is the same state as point

Huh?  We have a selfdef already, but if we are passing to a bucket filter
we have to create a new selfdef?  Why?  Does this leave us open to
clear'ing a sub-pool that is referenced in one of the selfdef's?

> (1), so induction shows we can call N filters without copying. At the
> bottom, (internally) ap_lwrite() is called with the data which is passed to
> ap_bwrite() with no copies.

I have reviewed the code at the bottom, but we are creating empty buckets
to pass other selfdef buckets??????  To be very honest, it doesn't help
that the code for ap_setaside_bucket is 11 screens long.  This function
will be un-maintainable for anybody but you.  I assume it would be broken
into more manage-able chunks before the commit.

> If a spill occurs, and *only* content data is present (no files in the
> content stream), then all the data will be placed into the sa->spill file
> and there will be a single _SPILL selfdata (no matter how many invocations
> to the callback occurred; the writes are coalesced)
> 
> ap_lputsetaside() will call ap_lsendfile() for the whole spill file. Of
> course, this can map into the platform's sendfile().

This makes sense.

> Okay... enough discussion for now. Below, I've appended the new functions
> and types. This stuff compiles, but I haven't explicitly tested it (since
> we're only going for theory here).
> 
> Here is a quick summary of the posted requireements, and a quick answer:
> 
> 1) entire response must be able to be held by one module (with spill)
>    => MET. ap_setaside_t and friends manage this fine

Agreed, this one goes AWAY!!!!!  :-)

> 2) no memory allocated out of r->pool
>    => MET, with caveats.
> 
>       ap_file_t structures are allocated. some ap_selfdef_t are allocated
>       to manage lifetimes of data.
>       
>       this requirement is a bit too absolute given the variety of inputs to
>       the filter system (e.g., ap_rwrite, ap_send_fd, ap_rprintf). for the
>       case in question: where a SELFDEF is passed, no allocations occur. a
>       bit happens when spill occurs. etc.
>       
>       I expect discussion to focus here.

I expected structures to be allocated out of the pool.  This was not met
for a different reason.  The problem is actually in the next requirement
though.

> 
> 3) the position in the filter chain should not matter. the algorithm should
>    not disable downstream filters from doing the same thing.
>    => MET. a SELFDEF will be passed to the next filter if it uses buckets.

BUT if it or any downstream filter doesn't use buckets, then we are back
to copying.  See the portion of the code marked with !!!rbb!!!.  If we go
from bucket -> char * -> bucket, ap_setaside_bucket creates a sub-pool,
allocates memory out of it, and then does a memcopy.  The challenge was
not to create a design that allows for zero-copy if all filters are bucket
filters.  It was to create a design that allows for zero-copy in all cases
where zero-copy is possible.  This does not meet that challenge.

> 4) write as a bucket and as a char *
>    => N/A per earlier discussion


Agreed, this one goes AWAY!!!!!  :-)

> 
> 5) example filter doesn't have to modify anything; no allocation except for
>    structures to refer to the content.
>    => MET.
>    
>       note this req't allows some types of allocation. no biggy, as I avoid
>       allocations in the code. but interesting to point out in light of (2)

Yes, because allocating structures that just point to memory is VERY
different than allocating memory for the response.  This is what I was
trying to get across.  This one was met.

> 6) do not copy data that is not modified.
>    => MET, with caveats.
>       
>       data must be copied to account for changing lifetimes. If a SELFDEF is
>       fed into the chain (or pops up inside the chain), then copies will no
>       longer occur.

I disagree.  If a SELFDEF is introduced into the chain, the copy just
moves from the module to the core.  This is a failure condition.

In general, this is a recursive version of my original patch if we remove
the char *'s.  The char *'s remove all of the benefits of the selfdef
buckets IMO.  I have tried very hard to be very clear about my problems
with the code.  Please tell me if I have not been clear enough.

My veto remains because in place, because we are still allocating out of
pools (that never shrink), I am unclear as to where the pools are
cleared/destroyed, and we are copying data whenever we go from bucket ->
char * -> bucket filters.

Ryan


> typedef enum {
>     AP_SELFDEF_PTRLEN,
>     AP_SELFDEF_FILE,
>     AP_SELFDEF_SPILL
> } ap_selfdef_type;
> 
> struct ap_selfdef_t {
>     int filter_owns;
> 
>     ap_pool_t *pool;
> 
>     ap_selfdef_type type;
> 
>     const char *buf;            /* AP_SELFDEF_PTRLEN */
>     ap_size_t len;              /* AP_SELFDEF_PTRLEN, _SPILL */
> 
>     ap_file_t *file;            /* AP_SELFDEF_FILE */
>     ap_ssize_t flen;            /* AP_SELFDEF_FILE */
> 
>     struct ap_selfdef_t *next;
> };
> 
> struct ap_setaside_t {
>     ap_size_t total_mem;
>     ap_selfdef_t *head;
>     ap_selfdef_t *tail;
> 
>     ap_pool_t *pool;
>     ap_file_t *spill;
> };
> 
> API_EXPORT(void) ap_setaside_bucket(ap_setaside_t *sa,
>                                     const ap_bucket_t *bucket,
>                                     ap_pool_t *pool);
> API_EXPORT(void) ap_clear_setaside(ap_setaside_t *sa);
> 
> -------------------------------------------------------------------
> 
> API_EXPORT(void) ap_lputsetaside(ap_filter_t *filter, const ap_setaside_t *sa)
> {
>     ap_selfdef_t *scan;
>     ap_selfdef_t *next;
> 
>     /* rewind the spill file, if present */
>     if (sa->spill != NULL) {
>         (void) ap_seek(sa->spill, APR_SET, 0);
>     }
> 
>     for (scan = sa->head; scan != NULL; scan = next) {
>         /* the <sa> no longer owns this if we are sending it */
>         scan->filter_owns = 0;
> 
>         /* grab the next pointer. if somebody takes this item, they will
>            probably change the value. */
>         next = scan->next;
> 
>         switch (scan->type) {
>         case AP_SELFDEF_PTRLEN:
>         {
>             ap_filter_t *f_next = filter->next;
> 
>             if (f_next != NULL && f_next->bucket_cb != NULL) {
>                 ap_bucket_t bucket = {
>                     AP_BUCKET_SELFDEF, NULL, 0, NULL, NULL, NULL, 0, scan
>                 };
>                 (*f_next->bucket_cb)(f_next, &bucket);
>             }
>             else {
>                 /* send some bytes; not as optimal */
>                 ap_lwrite(filter, scan->buf, scan->len);
>             }
>             break;
>         case AP_SELFDEF_FILE:
>             /* send the file */
>             ap_lsendfile(filter, scan->file, scan->flen);
>             break;
>         case AP_SELFDEF_SPILL:
>             /* send (len) bytes out of the spill file */
>             ap_lsendfile(filter, sa->spill, scan->len);
>             break;
>         default:
>             /* ### error... */
>             ap_assert(0);
>             return;
>         }
>     }
> }
> 
> -------------------------------------------------------------------
> 
> API_EXPORT(void) ap_setaside_bucket(ap_setaside_t *sa,
>                                     const ap_bucket_t *bucket,
>                                     ap_pool_t *request_pool)
> {
>     ap_ssize_t needed;
>     ap_pool_t *subpool = NULL;
>     ap_pool_t *fmtpool = NULL;
>     ap_array_header_t *strs;
>     const char *s;
>     ap_selfdef_t *sd;
> 
>     /* a selfdef has its own pool; for the others, we will need one */
>     if (bucket->type != AP_BUCKET_SELFDEF && sa->pool == NULL) {
>         (void) ap_create_pool(&sa->pool, request_pool);
>     }
> 
>     /* default: format directly into the spill pool */
>     fmtpool = sa->pool;
> 
>     if (sa->spill == NULL
>         && (bucket->type == AP_BUCKET_STRINGS
>             || bucket->type == AP_BUCKET_PRINTF)) {
>         /* format into a new subpool; this data may be dumped to a spill
>            file and the subpool will be destroyed */
>         (void) ap_create_pool(&subpool, request_pool);
>         fmtpool = subpool;
>     }
> 
>     switch (bucket->type) {
>     case AP_BUCKET_PTRLEN:
>         needed = bucket->len;
>         break;
>     case AP_BUCKET_STRINGS:
>     {
>         va_list va = bucket->va;
>         int i;
>         char *s2;
> 
>         strs = ap_make_array(fmtpool, 5, sizeof(const char *));
> 
>         /* compute the total size */
>         needed = 0;
>         while (1) {
>             s = va_arg(va, const char *);
>             if (s == NULL)
>                 break;
>             *(const char **)ap_push_array(strs) = s;
>             needed += strlen(s);
>         }
> 
>         /* concatenate them all together */
>         s = s2 = ap_palloc(fmtpool, needed);
>         for (i = 0; i < strs->nelts; ++i) {
>             const char *s3 = ((const char **)strs->elts)[i];
>             ap_size_t len = strlen(s3);
>             memcpy(s2, s3, len);    !!!rbb!!!
>             s2 += len;
>         }
>         break;
>     }
>     case AP_BUCKET_PRINTF:
>         s = ap_pvsprintf(fmtpool, bucket->fmt, bucket->va);
>         needed = strlen(s);
>         break;
>     case AP_BUCKET_FILE:
>         needed = 0;
>         break;
>     case AP_BUCKET_SELFDEF:
>         switch (bucket->selfdef->type) {
>         case AP_SELFDEF_PTRLEN:
>             needed = bucket->selfdef->len;
>             break;
>         case AP_SELFDEF_FILE:
>             needed = 0;
>             break;
>         case AP_SELFDEF_SPILL:
>         default:
>             /* ### error... */
>             ap_assert(0);
>             return;
>         }
>         break;
>     default:
>         /* ### error... */
>         ap_assert(0);
>         return;
>     }
> 
>     if (sa->spill == NULL) {
>         sa->total_mem += needed;
>         if (sa->total_mem > SPILL_THRESHOLD) {
>             const char *fname;
> 
>             if (sa->pool == NULL) {
>                 (void) ap_create_pool(&sa->pool, request_pool);
>             }
> 
>             fname = "/tmp/spill";       /* ### yah. right */
>             (void) ap_open(&sa->spill, fname,
>                            APR_WRITE | APR_CREATE | APR_EXCL | APR_DELONCLOSE,
>                            APR_UREAD, sa->pool);
> 
>             /* ### walk sa, spilling data. must coalesce blocks. */
> 
>             /* ### fall thru to place data into spill areas */
>         }
>         else {
>             if (bucket->type == AP_BUCKET_SELFDEF) {
>                 sd = bucket->selfdef;
>                 sd->filter_owns = 1;
>             }
>             else {
>                 /* PTRLEN buckets need a pool to copy the data into */
>                 if (bucket->type == AP_BUCKET_PTRLEN) {
>                     (void) ap_create_pool(&fmtpool, request_pool);
>                 }
> 
>                 /* for _FILE, fmtpool == sa->pool */
>                 sd = ap_pcalloc(fmtpool, sizeof(*sd));
> 
>                 switch (bucket->type) {
>                 case AP_BUCKET_PTRLEN:
>                     sd->pool = fmtpool;
>                     sd->type = AP_SELFDEF_PTRLEN;
>                     sd->buf = ap_pstrndup(sd->pool, bucket->buf, needed);
>                     sd->len = needed;
>                     break;
>                 case AP_BUCKET_STRINGS:
>                 case AP_BUCKET_PRINTF:
>                     sd->pool = fmtpool;
>                     sd->type = AP_SELFDEF_PTRLEN;
>                     sd->buf = s;
>                     sd->len = needed;
>                     break;
>                 case AP_BUCKET_FILE:
>                     sd->type = AP_SELFDEF_FILE;
>                     /* dup the file into our pool */
>                     (void) ap_dupfile(&sd->file, bucket->file, sa->pool);
>                     sd->flen = bucket->flen;
>                     break;
>                 }
>             }
> 
>             if (sa->head == NULL) {
>                 sa->head = sa->tail = sd;
>             }
>             else {
>                 sa->tail->next = sd;
>                 sa->tail = sd;
>             }
> 
>             /* done */
>             return;
>         }
>     }
> 
>     /* data must be placed into the spill */
> 
>     /* append to the previous spill datum? */
>     if (sa->tail != NULL && sa->tail->type == AP_SELFDEF_SPILL
>         && (bucket->type == AP_BUCKET_PTRLEN
>             || bucket->type == AP_BUCKET_STRINGS
>             || bucket->type == AP_BUCKET_PRINTF
>             || (bucket->type == AP_BUCKET_SELFDEF &&
>                 bucket->selfdef->type == AP_SELFDEF_PTRLEN))) {
> 
>         sa->tail->len += needed;
>         switch (bucket->type) {
>         case AP_BUCKET_PTRLEN:
>             (void) ap_write(sa->spill, bucket->buf, &needed);
>             break;
>         case AP_BUCKET_STRINGS:
>         case AP_BUCKET_PRINTF:
>             (void) ap_write(sa->spill, s, &needed);
>             break;
>         case AP_BUCKET_SELFDEF:
>             (void) ap_write(sa->spill, bucket->selfdef->buf, &needed);
>             break;
>         }
> 
>         /* trash the formatting pool (its contents were spilled) */
>         if (fmtpool != sa->pool)
>             ap_destroy_pool(fmtpool);
>         return;
>     }
> 
>     /* spill to a new datum */
>     sd = ap_pcalloc(sa->pool, sizeof(*sd));
> 
>     switch (bucket->type) {
>     case AP_BUCKET_PTRLEN:
>         sd->type = AP_SELFDEF_SPILL;
>         sd->len = needed;
>         (void) ap_write(sa->spill, bucket->buf, &needed);
>         break;
>     case AP_BUCKET_STRINGS:
>     case AP_BUCKET_PRINTF:
>         sd->type = AP_SELFDEF_SPILL;
>         sd->len = needed;
>         (void) ap_write(sa->spill, s, &needed);
>         break;
>     case AP_BUCKET_FILE:
>         sd->type = AP_SELFDEF_FILE;
>         /* dup the file into our pool */
>         (void) ap_dupfile(&sd->file, bucket->file, sa->pool);
>         sd->flen = bucket->flen;
>         break;
>     case AP_BUCKET_SELFDEF:
>         switch (bucket->selfdef->type) {
>         case AP_SELFDEF_PTRLEN:
>             sd->type = AP_SELFDEF_SPILL;
>             sd->len = needed;
>             (void) ap_write(sa->spill, bucket->selfdef->buf, &needed);
>             break;
>         case AP_SELFDEF_FILE:
>             /*
>             ** ### can we just take this selfdef (and set filter_owns)?
>             ** ### specifically: is selfdef->pool set, holding this file?
>             */
>             sd->type = AP_SELFDEF_FILE;
>             /* dup the file into our pool */
>             (void) ap_dupfile(&sd->file, bucket->selfdef->file, sa->pool);
>             sd->flen = bucket->selfdef->flen;
>             break;
>         }
>         break;
>     }
> 
>     if (sa->head == NULL) {
>         sa->head = sa->tail = sd;
>     }
>     else {
>         sa->tail->next = sd;
>         sa->tail = sd;
>     }
> 
>     /* trash the formatting pool (its contents were spilled) */
>     if (fmtpool != sa->pool)
>         ap_destroy_pool(fmtpool);
> }
> 
> API_EXPORT(void) ap_clear_setaside(ap_setaside_t *sa)
> {
>     ap_selfdef_t *scan;
>     ap_selfdef_t *next;
> 
>     for (scan = sa->head; scan != NULL; scan = next) {
>         /* grab ->next before we blast the pool (which might hold *scan) */
>         next = scan->next;
> 
>         /* do not blast the pool if somebody has taken it */
>         if (!scan->filter_owns && scan->pool != NULL)
>             ap_destroy_pool(scan->pool);
>     }
>     ap_destroy_pool(sa->pool);
> 
>     memset(sa, 0, sizeof(*sa));
> }
> 


_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------