You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Ben Hyde <bh...@pobox.com> on 1998/09/22 19:04:12 UTC

I/O Layering in next version of Apache.

This thread is crossing over into new-httpd...

Ben Laurie writes:
>I think we need to define how I/O layering is going to work in 2.0 - its
>the only major thing we need to do, it seems to me, to get 2.0 rolling
>properly.
>
>All this talk of caching is cute, but it worries me, for various reasons
>- the most obvious being that it delays layering I/O, and for many
>purposes I suspect will not add any real benefit. I also suspect that
>for caching to not be a burden on general layering, it needs to be
>possible to ignore it completely. Which means we should be able to at
>least implement the non-cached version of layered I/O. What do people
>think?
>
>I suppose this discussion should be on new-httpd, really.
>
>Does anyone have a starting point for layered I/O? I know we kicked it
>about a bit recently - did that lead to anything concrete?
>
>Cheers,
>
>Ben.

Guess I should say something at the point of cross over...

My concern is how outragous some of the backend designs I hear people
talk about are becoming.  For example MS's office suite generating any
and all documents in various flavors of XML and vast transformation
engines to convert that into formats that leverage the bleeding edge
of the client's browser.  

 - ben hyde

Re: I/O Layering in next version of Apache.

Posted by Manoj Kasichainula <ma...@io.com>.

On Wed, Sep 23, 1998 at 10:46:47AM -0700, Dean Gaudet wrote:
> I think it was Cliff who said it this way:  in a multiple layer setup he
> wants to be able to partition the layers across servers in an arbtrary
> manner.  For example, a proxy cache on one box which the world talks to,
> and which backends to various other boxes for dynamic and static content.
> Or maybe the static content is on the same server as the proxy. If this is
> something we want to support then talking (a restricted form of) HTTP
> between layers is interesting. 
> 
> Now we can all start worrying about performance ;) 

I'm worried.

If we're being infinitely layerable here, then couldn't talking across a
network between layers just be another layer? We could have the
equivalent of netcat in the form of an Apache module which would know
everything about how to let two machines talk to each other.

This lets us delay discussion about partitioning between machines
until we can get things straight on a single machine.

-- 
Manoj Kasichainula - manojk at io dot com - http://www.io.com/~manojk/
"Show me an Ethernet collision and I'll show you a network that could
do with one user fewer." -- BOFH

Stacking up Response Handling

Posted by Ben Hyde <bh...@pobox.com>.

Alexei Kosut writes:
>The problem, as I see it, is this: Often, I suspect it will be the case
>that the module does not know what metadata it will be altering (and how)
>until after it has processed the request. i.e., a PHP script may not
>discover what dimensions it uses (as we discussed earlier) until after it
>has parsed the entire script. But if the module is functioning as an
>in-place filter, that can cause massive headaches if we need the metadata
>in a complete form *before* we sent the entity, as we do for HTTP.
>
>I'm not quite sure how to solve that problem. Anyone have any brilliant
>ideas?

This is the same as building a layout engine that incremental layout
but simpler since I doubt we'd want to allow for reflow.

Sometimes you can send output right along, sometimes you have to wait.
I visualize the output as a tree/outline and as it is swept out a
stack holds the path to the leave.  Handlers for the individual nodes
wait or proceed depending on if they can.

It's pretty design with the pipeline consisting of this stack of
output transformers/generators.  Each pipeline stage accepts a stream
of output_chunks.  I think of these output_chunks as coming in plenty
of flavors, for example transmit_file, transmit_memory, etc.  Some
pipeline stages might handle very symbolic chunks.  For example
transmit_xml_tree might be handed to transform_xml_to_html stage in
the pipeline.

I'm assuming the core server would have only a few kinds of pipeline
nodes, generate_response, generate_content_from_url_via_file_system,
generate_via_classic_module_api.  Things like convert_char_set or
do_cool_transfer_encoding, could easily be loaded at runtime and
authored outside the core.  That would be nice.

For typical fast responses we wouldn't push much on this stack at
all.  It might go something like this: Push generate_response node, 
it selects an appropriate content generator by consulting the
module community and pushes that.  Often this is 
generate_content_from_url_via_file_system which in turn does
all that ugly mapping to a file name and then passes 
transmit_file down the pipeline and pops it's self off the stack.
generate_response once back on top again does the transmit and
pops off.

For rich complex output generation we might push all kinds of things
(charset converters, transfer encoders, XML -> HTML rewriters, cache
builders, old style apache module API simulators, what ever).

The intra-stack element protocol get's interesting around issues
like error handling, blocking, etc.  

I particularly like how this allows simulation of the old module API,
as well as the API of other servers, and experimenting with other
module API which cross process or machine boundaries.

In many ways this isn't that much different from what was proposed
a year ago.  

 - ben

Re: I/O Layering in next version of Apache.

Posted by Dean Gaudet <dg...@arctic.org>.

On Wed, 23 Sep 1998, Ben Laurie wrote:

> Dean Gaudet wrote:
> > 
> > On Wed, 23 Sep 1998, Ben Laurie wrote:
> > 
> > > Is the simplest model that accomodates this actually just a stack
> > > (tree?) of webservers? Naturally, we wouldn't talk HTTP between the
> > > layers, but pass (header,content) pairs around (effectively).
> > > Interesting.
> > 
> > We could just talk "compiled" HTTP -- using a parsed representation of
> > everything essentially.
> 
> That's pretty much what I had in mind - but does it make sense? I have
> to admit, it makes a certain amount of sense to me, but I still have
> this nagging suspicion that there's a catch.

We talked about this during the developers meeting earlier this summer... 
while we were hiking, so I don't think there were any notes.

I think it'd be a useful exercise to specify a few example applications we
want to be able to support, and then consider methods of implementing
those applications.  Make the set as diverse and small as possible.  I'll
take the easiest one :)

- serve static content from arbitrary backing store (e.g. file, database) 

Once we flesh such a list out it may be easier to consider implementation
variations... 

I think it was Cliff who said it this way:  in a multiple layer setup he
wants to be able to partition the layers across servers in an arbtrary
manner.  For example, a proxy cache on one box which the world talks to,
and which backends to various other boxes for dynamic and static content.
Or maybe the static content is on the same server as the proxy. If this is
something we want to support then talking (a restricted form of) HTTP
between layers is interesting. 

Now we can all start worrying about performance ;) 

Dean

Re: I/O Layering in next version of Apache.

Posted by Alexei Kosut <ak...@leland.Stanford.EDU>.

On Wed, 23 Sep 1998, Ben Laurie wrote:

> > We could just talk "compiled" HTTP -- using a parsed representation of
> > everything essentially.
> 
> That's pretty much what I had in mind - but does it make sense? I have
> to admit, it makes a certain amount of sense to me, but I still have
> this nagging suspicion that there's a catch.

One important thing to note is that we want this server to be able to
handle non-HTTP requests. So using HTTP as the internal language (as we do
now) is not the way to go. What we talked about in SF was using a basic
set of key/value pairs to represent the metadata of the response. Which
would of course bear an uncanny resemblance to HTTP-style MIME headers...

Certainly, and this is the point I think the originator of this thread
raised, each module layer (see the emails I sent a few weeks ago for more
details on how I see *that*) needs to provide both a content filter and a
metadata filter. Certainly a module that does encoding has to be able to
alter the headers to add a Content-Encoding, Transfer-Encoding, TE, or
what have you. Many module that does anything to the content will
want to add headers, and many others will need to alter the dimensions on
which the request is served, or what the parameters to those dimensions
are for the current request. The latter is absolutely vital for cacheing.

The problem, as I see it, is this: Often, I suspect it will be the case
that the module does not know what metadata it will be altering (and how)
until after it has processed the request. i.e., a PHP script may not
discover what dimensions it uses (as we discussed earlier) until after it
has parsed the entire script. But if the module is functioning as an
in-place filter, that can cause massive headaches if we need the metadata
in a complete form *before* we sent the entity, as we do for HTTP.

I'm not quite sure how to solve that problem. Anyone have any brilliant
ideas?

(Note that for internal caching, we don't actually need the dimension data
until after the request, because we can alter the state of the cache at
any time, but if we want to place nice with HTTP and send Vary: headers
and such, we do need that information. I guess we could send Vary:
footers...)

-- Alexei Kosut <ak...@stanford.edu> <http://www.stanford.edu/~akosut/>
   Stanford University, Class of 2001 * Apache <http://www.apache.org> *

Re: I/O Layering in next version of Apache.

Posted by Ben Laurie <be...@algroup.co.uk>.

Dean Gaudet wrote:
> 
> On Wed, 23 Sep 1998, Ben Laurie wrote:
> 
> > Is the simplest model that accomodates this actually just a stack
> > (tree?) of webservers? Naturally, we wouldn't talk HTTP between the
> > layers, but pass (header,content) pairs around (effectively).
> > Interesting.
> 
> We could just talk "compiled" HTTP -- using a parsed representation of
> everything essentially.

That's pretty much what I had in mind - but does it make sense? I have
to admit, it makes a certain amount of sense to me, but I still have
this nagging suspicion that there's a catch.

Cheers,

Ben.

-- 
Ben Laurie            |Phone: +44 (181) 735 0686| Apache Group member
Freelance Consultant  |Fax:   +44 (181) 735 0689|http://www.apache.org/
and Technical Director|Email: ben@algroup.co.uk |
A.L. Digital Ltd,     |Apache-SSL author     http://www.apache-ssl.org/
London, England.      |"Apache: TDG" http://www.ora.com/catalog/apache/

WE'RE RECRUITING! http://www.aldigital.co.uk/

Re: I/O Layering in next version of Apache.

Posted by Dean Gaudet <dg...@arctic.org>.

On Wed, 23 Sep 1998, Ben Laurie wrote:

> Is the simplest model that accomodates this actually just a stack
> (tree?) of webservers? Naturally, we wouldn't talk HTTP between the
> layers, but pass (header,content) pairs around (effectively).
> Interesting.

We could just talk "compiled" HTTP -- using a parsed representation of
everything essentially.

Dean

Re: I/O Layering in next version of Apache.

Posted by Ben Laurie <be...@algroup.co.uk>.

Honza Pazdziora wrote:
> Also, as Apache::GzipChain module shows, once you process the output,
> you may need to modify the headers as well. I was hit by this when I
> tried to convert between charsets, to send out those that the browsers
> would understand. The Apache::Mason module shows that you can build
> a page from pieces. Each of the pieces might have different
> characteristics (charset, for example), so with each piece of code we
> might need to have its own headers that describe it, or at least the
> difference between the final (global) header-outs and its local.

Interesting points - and not ones we've had to consider up to now. The
GzipChain example is certainly applicable. I'm less sure about Mason: do
we want to support modules that pull in parts from various places within
Apache? Hmmm. I suppose so.

I can't remember whether it was discussed here, or on the wrong list,
but it had already been suggested that we need to pass headers downwards
(that is, away from the client). I suppose this shows we have to pass
them upwards, too.

Is the simplest model that accomodates this actually just a stack
(tree?) of webservers? Naturally, we wouldn't talk HTTP between the
layers, but pass (header,content) pairs around (effectively).
Interesting.

Cheers,

Ben.

-- 
Ben Laurie            |Phone: +44 (181) 735 0686| Apache Group member
Freelance Consultant  |Fax:   +44 (181) 735 0689|http://www.apache.org/
and Technical Director|Email: ben@algroup.co.uk |
A.L. Digital Ltd,     |Apache-SSL author     http://www.apache-ssl.org/
London, England.      |"Apache: TDG" http://www.ora.com/catalog/apache/

WE'RE RECRUITING! http://www.aldigital.co.uk/

Re: I/O Layering in next version of Apache.

Posted by Honza Pazdziora <ad...@informatics.muni.cz>.

> >
> >Does anyone have a starting point for layered I/O? I know we kicked it

Hello,

there has been a thread on modperl mailing list recently about
problems we have with the current architecture. Some of the points
were: what requerements will be put on modules to be new I/O
compliant. I believe it's the Apache::SSI vs. Apache::SSIChain
difference between 1.3.* and 2.*. The first fetches the file _and_
does the SSI, the second takes input from a different module that
either gets the HTML or runs the CGI or so, and processes its output.
Should all modules be capable of working on some other module's
output? Probably except those that actually go to disk or database for
the primary data.

Randal's point was that output of any module could be processed, so
that no module should make any assumption whether it's sending data
directly to the browser or to some other module. This can be used both
for caching, but it also one of the things to get the filtering
transparent.

Also, as Apache::GzipChain module shows, once you process the output,
you may need to modify the headers as well. I was hit by this when I
tried to convert between charsets, to send out those that the browsers
would understand. The Apache::Mason module shows that you can build
a page from pieces. Each of the pieces might have different
characteristics (charset, for example), so with each piece of code we
might need to have its own headers that describe it, or at least the
difference between the final (global) header-outs and its local.

Sorry for bringing so much Perl module names in, but modperl is
currently a way to get some layered I/O done in 1.3.*, so I only have
practical experiance with it.

Yours,

------------------------------------------------------------------------
 Honza Pazdziora | adelton@fi.muni.cz | http://www.fi.muni.cz/~adelton/
                   I can take or leave it if I please
------------------------------------------------------------------------