You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Greg Stein <gs...@lyra.org> on 2000/06/27 12:14:43 UTC

Re: Roy's filtering thoughts (was: Patch review: ...)

On Mon, Jun 26, 2000 at 09:30:14PM -0700, Roy T. Fielding wrote:
> I don't think either of you will appreciate me saying so, but the basic
> problem I have with both proposals is that they cannot be understood by
> anyone but the author.  Using macros to define an interface is lame.
> Use an ADT!  I honestly care a lot less about how the core implements
> the sucker than I do about how a module writer is supposed to design,
> debug, and deploy an application that uses these filters.  This code
> should be *easier* to understand than the current buff.c.

Given the macro comment, I presume that you have reviewed Ryan's. Have you
reviewed my patch?

Have you looked at my mod_gargle to see how a couple filters can be
implemented? I created one to process the content (the byte swap thing Ryan
provided). There is a second which inserts a header and footer. Tonite, I'll
resend the thing to implement a simple token-parser along with a subrequest.

IMO, I've provided a very simple callback interface for filter writers. I'd
be interested in hearing what makes it hard to understand.

Also: it is a bit unfair to both of us to claim "hard to understand" when
neither of us have concentrated on inserting comments in the headers or
code, or writing API documentation. I know that I can explain my "simple_cb"
in a short paragraph. The bucket_cb is more complex, but quite doable.

But really... are they hard to understand? And if they were doc'd?

> However, that doesn't mean I think the core implementation isn't
> important.  For example, I don't think it is safe to implement filters
> in Apache without either a smart allocation system or a strict limiting
> mechanism that prevents filters from buffering more than 8KB of
> memory at a time (for the entire non-flushed stream).  It isn't possible
> to create a robust server implementation using filters that allocate
> memory from a pool (or the heap, or a stack, or whatever) without somehow
> reclaiming and reusing the memory that gets written out to the network.
> There is a certain level of "optimization" that must be present before
> any filtering mechanism can be in Apache, and that means meeting the
> requirement that the server not keel over and die the first time
> a user requests a large filtered file.  XML tree manipulation is an
> example where that can happen.  I don't think either patch has
> addressed that problem yet, and I can't really evaluate them until
> they do.

I have a single allocation in the output/filter processing in my patch. It
is located in ap_lprintf(). Is that the allocation you are referring to?

If that is a problem, then it would be very easy to wrap that up in a pool
create/destroy pair. That would return the memory to the request pool for
reuse when the function returns. I avoided doing that because I perceived
there would be a performance issue. Hmm. It actually looks reasonably fast.
I'll update the code.

> I also don't believe it is simpler to implement these "easy" forms of
> filters than it is to implement the full-blown bucket brigades. 

Easier to implement the infrastructure, or a filter?

I've got the infrastructure already for both simple and bucket-based. For
the filter writer, the simple form is definitely easier since the core takes
care of converting the buckets into a pointer for you.

> The
> basic brigade interface is very simple -- a pointer to the stream head 
> (the top filter on an output filter stack) and a pointer to the
> bucket brigade (a linked list of typed memory buckets).  You can add
> helper functions on top of that which take string/int/printf arguments
> and translate those to a brigade pointer.  Memory allocation can be
> specific to each bucket, so the application can decide for itself whether
> it wants to allocate from the heap and donate the memory to the stream
> or tell the stream to make a copy if it can't process it all in one go.

We both have something quite similar. Ryan's ioblock and my bucket
correspond to what you're talking about. Neither have the memory allocation
stuff you're talking about, though.

I'm not sure how our current patches fail to allow for this kind of memory
handling.

Note that r->filters (in my patch) is the stream head you're talking about.
Ryan's is something like &((LINK_filter*)r->_hook.link_filter->elts)[0]

> I don't understand how either scheme is going to work in terms of
> placing filters on the output.  I expected to see multiple points
> where filters may be added both on top of and within the filter stack.

First round patch to introduce the basic mechanism. You're asking for too
much. If we were to include all that stuff in the initial patch, then it
would become unreviewable.

I actually get a bit tweaked at this. I omitted the bucket_cb stuff from my
first patch so that people could see the basic structure and concepts. But
no... that wasn't good enough. So now it has the bucket stuff, and the patch
is longer and less reviewable.

[ even now, Ryan is saying that I must also include the delayed http header
  stuff in my patch. The position is rather silly, however, since it is a
  straightforward change and does not change the fundamental design of
  either patch set. But no... it has to be in, otherwise it is an "issue" ]

And now? Add "multiple points" around at key spots in Apache to deal with
other places where filter insertion might be nice? Too much for a first
pass. While I'm not a big fan of Ryan's code comments which say "I'll fix
this soon after checkin", I laud him on the effort to show an *outline* of
how things will work, to be followed up later with additional specifics.
*That* is how we should be doing things. Demonstrate the plan, get buyin on
the design/architecture, and then let people deal with the details. Forcing
the details up front serves to delay and to decrease reviewability.

> For example, consider the HTTP processing code to be a filter.  I would
> expect it to process the outgoing stream looking for indicators in
> the metadata for such things as transfer encodings and simply insert
> the encoding filters on its own output when needed.

Dynamically inserting filters is quite easy. I've got a simple linked list,
and Ryan's hooks have an ap_array in there. Both can do this, and both can
get the filter inserted at the appropriate place.

But do we have an "HTTP processing filter"? Of course not. We're trying to
get the architecture to support that notion first. THEN we can add it.

That said, I *do* like the concept of that "filter". Very cool :-) I had
assumed that it would be easier to hook ap_set_transfer_encoding() or some
such, but it is probably easier and more robust to do that in the filter
processing since you *know* the data has to flow through there (e.g. you can
be guaranteed that you have a chance to insert appropriate hooks).

As part of the architecture for allowing filters to be inserted, I've
provided a facility for naming them. This allows code outside of the filter
implementation to add the right filter, at the appropriate time. (ie. the
outside code doesn't have to know any ugly details beyond the name)

> I guess what
> this is assuming is that there exists an r->outgoing pointer on the
> request_rec that points to the current head of the stream and

This is called r->filters in my patch. The exact same semantic.

Did you read my patch? :-)

> the content generators simply call r->outgoing->lwrite_brigade(),

Content generators call ap_rwrite() (and friends) just like they do today.
Underneath, however, ap_rwrite *does* call
r->filters->bucket_cb(r->filters, &bucket). In other words: just like you're
requesting.

> and further that any filter can place additional filters on (or pop
> layers off) the stream.

I've got ap_add_filter() that can be called at any time. Ryan's patch has
ap_rhook_filter() followed by ap_sort_hooks().

> Disabling content-length just because there are filters in the stream
> is a blatant cop-out.  If you have to do that then the design is wrong.

Agreed. Disagree.

It is the simplest solution to a difficult problem. How can a filter truly
know it has *all* the data. There must be a marker which says this. In my
patch, this is invoking the callbacks with NULL pointers. But it does mean
that you have at least two calls: one for data, one for NULL.

Ryan's patches don't have a flush or end-of-output marking call.

That said, I still do not believe this design is wrong. The content
generator and any given filter cannot know the Content-Length. Any
subsequent filter could change the length.

The only code point where the "real" length is known, is at the bottom.
There is simply nothing that can be done until then. In our current
codebase, the bottom is BUFF. But BUFF doesn't have the necessary support
right now, so we *temporarily* punt.

> At the very least the HTTP filter/buff should be capable of discovering
> whether it knows the content length by examing whether it has the whole
> response in buffer (or fd) before it sends out the headers.

Bing. Exactly. But you're talking about a lot of additional work which can
be completed in a second pass. It also requires a bit of extra work in BUFF,
so that it can keep the headers separate and automatically call out to a
function that says "I'm about to ship out the first block." That function
can muck the headers at will; when it returns, BUFF will send the headers
followed by the first block.

And that short blue-sky thought could possibly have any number of flaws.

The point here is that *for now*, we have to punt the Content-Length when a
filter is present. We just don't have the necessary support yet. But! That
support can be added in a second round. It does not necessarily have to
occur *before* we insert any filtering code. Again, that would simply be a
delay and a decrease in reviewability.

> Of course, it is also possible that the three of us are all thinking
> of wildly different architectures from parallel universes and all the
> argument is simply because we have no shared whiteboard to reach a
> common understanding of the general model we are trying to implement.
> But I still think we are making more progress than last week.

Not having the white board is a definite hinderence, but it isn't a panacea
either. Ryan posted a set of filtering patches way back when, and I pointed
out a bunch of issues. About a week later, I was at Covalent talking about
config stuff with Ryan, et al, and we hit on the filtering when we were
done. I had thought we left with a common understanding. Hehehe...

We are making progress, but I'm also getting a bit tired. I've coded a set
of patches which I believe are entirely valid, yet I'm still helping to code
Ryan's by providing a long series of detailed reviews and seriously friggin
long writeups. You can understand my frustration when I've got code that
works, yet I have to continue spending time on a whole different patch set.

Sigh.

All that said... what is your actual position on this stuff, Roy? I see
where you'd like to see it *head*, but I think the real question is whether
we have a valid *step* in that direction. IMO, my patch fits that, and
Ryan's could with some more fixes from my last review.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/