You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Nicolás Lichtmaier <ni...@debian.org> on 2000/01/23 19:24:51 UTC

Content-length

 I like Cocoon. I think it might be the way all things should happen in the
web. But to do that Cocoon must be as `web friendly' as posible, as static
pages are. e.g. it must send the proper HTTP headers in order to cooperate
with caches and other HTTP software. From reading the sources it seems very
easy to add the Content-length header. The whole content is first stored in
a String (in Engine.handle()). It would be a matter of sending the
string.length();. This could be done now... is there any reason this is not
being done?

 Last-modified/Expires is a little more complicated, but I have a proposal:

 You can always add these headers by hand, but this approach is wrong, as
these headers could be easily handled automatically. This is how:

 The `Last-modified' date is the greater `last-modified' date of all
components of the producer|processor path. Tha `Expires' header is the
lesser date. This is obvious if you think a bit about it. The simple
producer that reads a file would give the files' date as the last-modified
time, later, a producer that adds some information from some files would
report those files' date. The greater date would be the `last-modified'
header.

 This could be implemented with a PI like this:

<?last-modified value="<a time>"?>

 Each component in the path would be able to create this PI. The engine
*will remove this PI* before sending the document to the next component. The
engine will keep the bigger last-modified time. If a component does not
provide this information, last-modified generation would be cancelled. The
same would be done with the expires (only that it would keep the smaller
time).

 What do you think?

Re: Content-length

Posted by Gabe Beged-Dov <be...@jfinity.com>.

Nicolás Lichtmaier wrote:

> > You could have your buffer and eat it too if you were talking to
> > a HTTP/1.1 client since I assume you could use chunked transfer
> > coding. You could have your buffer set to 20K (or whatever your
> > adaptive algorithm determines) and send it out either as normal
> > or chunked depending on whether the content was done.
> 
>  I didn't get what you said, but anyway, you don't speak HTTP, the webserver
> does. You can't "send a chunk".

The "have your buffer and eat it too" phrase was a play on "have
your cake and eat it too", i.e. the best of both worlds, full and
incremental. 

The "you" I was referring to is the Cocoon engine that is
managing the response object. I was thinking that in the
streaming scenario, there would be logic that would be buffering
the initial content up to a certain threshold. When this
threshold is passed, it would add a Transfer-Encoding header with
a value of "chunked" to the response. It would then wrap the
initial buffer and all subsequently filled buffers as chunks
which would be written to the output stream. If the threshold is
not hit, then a Content-Length header would be added and the
"raw" content would be written to the stream. 

Cordially from Corvallis,

Gabe Beged-Dov

-- 
--------------------------- 
http://www.jfinity.com/gabe

Re: Content-length

Posted by Nicolás Lichtmaier <ni...@debian.org>.

> >  The first time can be buffered, and the content-length printed as well.
> > Buffer has other benefits. With buffering, if a problem arises the user
> > might be shown another page, because nothing has been sent.
> 
> You could have your buffer and eat it too if you were talking to
> a HTTP/1.1 client since I assume you could use chunked transfer
> coding. You could have your buffer set to 20K (or whatever your
> adaptive algorithm determines) and send it out either as normal
> or chunked depending on whether the content was done. 

 I didn't get what you said, but anyway, you don't speak HTTP, the webserver
does. You can't "send a chunk".

Re: Content-length

Posted by Gabe Beged-Dov <be...@jfinity.com>.

Nicolás Lichtmaier wrote:

>  The first time can be buffered, and the content-length printed as well.
> Buffer has other benefits. With buffering, if a problem arises the user
> might be shown another page, because nothing has been sent.

You could have your buffer and eat it too if you were talking to
a HTTP/1.1 client since I assume you could use chunked transfer
coding. You could have your buffer set to 20K (or whatever your
adaptive algorithm determines) and send it out either as normal
or chunked depending on whether the content was done. 

Cordially from Corvallis,

Gabe Beged-Dov

-- 
--------------------------- 
http://www.jfinity.com/gabe

Re: Content-length

Posted by Nicolás Lichtmaier <ni...@debian.org>.

> > > > It's not a problem with the DOM/SAX model, IMVHO. In both
> > > > cases the format in wich the data is supplied to the
> > > > serializer does not contain information on the content length.
> > >
> > > This can be viewed as a make-it-up-in-volume type of tradeoff
> > > similar to the first time hit on compiling JSP/XSP. You blow off
> > > trying to prepend the content-length for the first request for
> > > the cacheable page and then make sure that the cache contains the
> > > content-length for follow-on requests.
> > 
> >  The first time can be buffered, and the content-length printed as well.
> > Buffer has other benefits. With buffering, if a problem arises the user
> > might be shown another page, because nothing has been sent.
> 
> Yep... But just imagine buffering 30/40 megs of generated PDF and then
> sending them all to the client. It's too big to handle. And it takes too
> much time, because first you have to compose the thing and save it
> locally, AFTER, you can send it to the client...
> The cache mechanism could buffer to a certain amount of data (let's say
> 30/40 kb?) and if we can do it, we stream it with the content length,
> otherwise, we just avoid shipping content, stream all the content, while
> duplicating it on the disk, and on next request, we transmit everything
> (content length and data) from the cache !

 I fully agree. You wrote what I was thinking. =)

Re: Content-length

Posted by Pierpaolo Fumagalli <pi...@apache.org>.

Nicolás Lichtmaier wrote:
> 
> > > It's not a problem with the DOM/SAX model, IMVHO. In both
> > > cases the format in wich the data is supplied to the
> > > serializer does not contain information on the content length.
> >
> > This can be viewed as a make-it-up-in-volume type of tradeoff
> > similar to the first time hit on compiling JSP/XSP. You blow off
> > trying to prepend the content-length for the first request for
> > the cacheable page and then make sure that the cache contains the
> > content-length for follow-on requests.
> 
>  The first time can be buffered, and the content-length printed as well.
> Buffer has other benefits. With buffering, if a problem arises the user
> might be shown another page, because nothing has been sent.

Yep... But just imagine buffering 30/40 megs of generated PDF and then
sending them all to the client. It's too big to handle. And it takes too
much time, because first you have to compose the thing and save it
locally, AFTER, you can send it to the client...
The cache mechanism could buffer to a certain amount of data (let's say
30/40 kb?) and if we can do it, we stream it with the content length,
otherwise, we just avoid shipping content, stream all the content, while
duplicating it on the disk, and on next request, we transmit everything
(content length and data) from the cache !

	Pier
-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------

Re: Content-length

Posted by Nicolás Lichtmaier <ni...@debian.org>.

> > It's not a problem with the DOM/SAX model, IMVHO. In both
> > cases the format in wich the data is supplied to the 
> > serializer does not contain information on the content length.
> 
> This can be viewed as a make-it-up-in-volume type of tradeoff
> similar to the first time hit on compiling JSP/XSP. You blow off
> trying to prepend the content-length for the first request for
> the cacheable page and then make sure that the cache contains the
> content-length for follow-on requests. 

 The first time can be buffered, and the content-length printed as well.
Buffer has other benefits. With buffering, if a problem arises the user
might be shown another page, because nothing has been sent.

Re: Content-length

Posted by brian moseley <ix...@maz.org>.

On Sun, 23 Jan 2000, Gabe Beged-Dov wrote:

> This can be viewed as a make-it-up-in-volume type of
> tradeoff similar to the first time hit on compiling
> JSP/XSP. You blow off trying to prepend the
> content-length for the first request for the cacheable
> page and then make sure that the cache contains the
> content-length for follow-on requests.

this is begging for a tool to cache dynamic content offline,
in preparation for being deployed on a production server.
for file-based caches, obviously.

Re: Content-length

Posted by Gabe Beged-Dov <be...@jfinity.com>.

Pierpaolo Fumagalli wrote:

> It's not a problem with the DOM/SAX model, IMVHO. In both
> cases the format in wich the data is supplied to the 
> serializer does not contain information on the content length.

This can be viewed as a make-it-up-in-volume type of tradeoff
similar to the first time hit on compiling JSP/XSP. You blow off
trying to prepend the content-length for the first request for
the cacheable page and then make sure that the cache contains the
content-length for follow-on requests. 

Cordially from Corvallis,

Gabe Beged-Dov

-- 
--------------------------- 
http://www.jfinity.com/gabe

Re: Content-length

Posted by Pierpaolo Fumagalli <pi...@apache.org>.

Ross Burton wrote:
> 
> >  I like Cocoon. I think it might be the way all things should happen in the
> > web. But to do that Cocoon must be as `web friendly' as posible, as static
> > pages are. e.g. it must send the proper HTTP headers in order to cooperate
> > with caches and other HTTP software. From reading the sources it seems very
> > easy to add the Content-length header. The whole content is first stored in
> > a String (in Engine.handle()). It would be a matter of sending the
> > string.length();. This could be done now... is there any reason this is not
> > being done?
> 
> This works now, but what about the mythical Cocoon 2, where the sending of
> content may happen before all of the data has been processed, because of the
> SAX event model?

It's not a problem with the DOM/SAX model, IMVHO. In both cases the
format in wich the data is supplied to the serializer does not contain
information on the content length.
It's a problem related to the caching system. If the serialized data is
stored into a String and THEN sent to the client, we can put the
Content-Length informations, while, if it's streamed as data is coming
(regardeless if we're traversing a DOM tree or receiving SAX events), we
can't provide informations on the length of the stream.
This, of course, applies only when the document is generated, and not
when is served from the cache, because, if it resides in the cache (and
so is already a stream, either a file on the disk or a string in
memory), we can find out the length.
All the problem, IMVHO, relies on the caching mechanism.

	Pier

-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------

Re: Content-length

Posted by Eric SCHAEFFER <es...@posterconseil.com>.

----- Original Message -----
From: Nicolás Lichtmaier <ni...@debian.org>
To: <co...@xml.apache.org>
Sent: Monday, January 24, 2000 4:09 AM
Subject: Re: Content-length

> > dunno if that's possible. if there is a concept of blocking
> > and nonblocking nodes, then any methods on a node for
> > getting children or values or attributes may need to throw
> > some kind of acception until the node is unblocked. which
> > means an api change. furthermore you need to be able to
> > specify that a node blocks when you construct it.
>
>  If the node blocks for a while and then continues everything is OK, the
> contract is not broken. It's the normal consumer-producer relationship.
>  So the consumer accesses the DOM, without caring that the underlying data
> is still arriving via SAX events. When the node is blocking, this engine
is
> waiting for the SAX events that conform the node to arrive...
>
>  Note that this component would be completely generic: A DOM view over a
SAX
> stream. So it is not a Cocoon infrastructure decision: each component cand
> do as it likes...
>
>  An issue: Would be ok to modify the partial tree?
>

....

I'm not a 'guru' in whatever. But I'm creating a dynamic Web site (better :
application) for professionals based on Cocoon. I really love this
application (and OSS in general). I make all the choices, but assume them
also, and you must know that it isn't always easy to propose an OSS solution
when your boss only see commercial ones around him...

I can't use the cache system (or just a bit) because all my page are dynamic
and depends on the logged client and on the posted forms.
Several pages can take some time to be generated (database queries), and
because of the architecture (XML -DCP-> XML -XSLT-> HTML), I can't send the
begining of the page before it is totally generated.
If I could send the begining (and display a message like 'wait,
loading.../calculating...'), it would be a great feature for me...
Expiration time headers would be helpfull also.

So, if using SAX can help, please, use SAX.
:-))

Thank's a lot,
Eric.

_______________________________________

Eric SCHAEFFER
eschaeffer@posterconseil.com

POSTER CONSEIL
118 rue de Tocqueville
75017 PARIS
FRANCE
Tel. : 33-140541058
Fax : 33-140541059
_______________________________________

Re: Content-length

Posted by Nicolás Lichtmaier <ni...@debian.org>.

> dunno if that's possible. if there is a concept of blocking
> and nonblocking nodes, then any methods on a node for
> getting children or values or attributes may need to throw
> some kind of acception until the node is unblocked. which
> means an api change. furthermore you need to be able to
> specify that a node blocks when you construct it.

 If the node blocks for a while and then continues everything is OK, the
contract is not broken. It's the normal consumer-producer relationship.
 So the consumer accesses the DOM, without caring that the underlying data
is still arriving via SAX events. When the node is blocking, this engine is
waiting for the SAX events that conform the node to arrive...

 Note that this component would be completely generic: A DOM view over a SAX
stream. So it is not a Cocoon infrastructure decision: each component cand
do as it likes...

 An issue: Would be ok to modify the partial tree?

Re: Content-length

Posted by brian moseley <ix...@maz.org>.

On Sun, 23 Jan 2000, Nicolás Lichtmaier wrote:

>  I don't fully get the scheme you tell yet. Would what I
> propose serve your needs?

im not sure, since i didnt save the original messages in the
thread, and the list archive is really slow :(

but as i recall, my comment was "yes, the approach you
suggest has been implemented before and works well".

>  They would be extensions to DOM, and as such, they
> should eventually be in DOM itself. Hum... that if a
> change in the API is needed, perhaps it's just an
> implementation issue, everything abstracted under a
> normal DOM api...

dunno if that's possible. if there is a concept of blocking
and nonblocking nodes, then any methods on a node for
getting children or values or attributes may need to throw
some kind of acception until the node is unblocked. which
means an api change. furthermore you need to be able to
specify that a node blocks when you construct it.

Re: Content-length

Posted by Nicolás Lichtmaier <ni...@debian.org>.

> >  Seems similar. What I've said it's maybe a little more
> > simple I think...
> 
> sure. the asynchronous request mechanism is not directly
> related to the dom processing mechanism. in the constructors
> of individual smartnode classes we instantiate request
> objects and place them in the request queue. there are no
> other dependencies. but we came up with the buffering policy
> mechanism for exactly the reasons that started this thread.

 I don't fully get the scheme you tell yet. Would what I propose
serve your needs?

> >  But anyway, it would be nice that it were part of some
> > standard. A new version of DOM probably. And.. what's
> > cp? =)
> i dunno. i guess if nodes had a way to signal that they were
> ready to be processed, than there could be blocking and non
> blocking document traversals. but do those concepts really
> belong in dom itself, or are they simple extensions?

 They would be extensions to DOM, and as such, they should eventually
be in DOM itself. Hum... that if a change in the API is needed,
perhaps it's just an implementation issue, everything abstracted
under a normal DOM api...

> cp is critical path, where i work. i forgot that i dont use
> my cp address for this list.

 Ah.. ok =)

Re: Content-length

Posted by brian moseley <ix...@maz.org>.

On Sun, 23 Jan 2000, Nicolás Lichtmaier wrote:

>  Seems similar. What I've said it's maybe a little more
> simple I think...

sure. the asynchronous request mechanism is not directly
related to the dom processing mechanism. in the constructors
of individual smartnode classes we instantiate request
objects and place them in the request queue. there are no
other dependencies. but we came up with the buffering policy
mechanism for exactly the reasons that started this thread.

>  But anyway, it would be nice that it were part of some
> standard. A new version of DOM probably. And.. what's
> cp? =)

i dunno. i guess if nodes had a way to signal that they were
ready to be processed, than there could be blocking and non
blocking document traversals. but do those concepts really
belong in dom itself, or are they simple extensions?

cp is critical path, where i work. i forgot that i dont use
my cp address for this list.

Re: Content-length

Posted by Nicolás Lichtmaier <ni...@debian.org>.

> >  I'm no expert on XML, but I've always have this idea...
> > wouldn't it be posible to have an abstraction of the
> > issue? And that abstraction would be some kind of
> > partial DOM. The application would process the DOM tree.
> > If it touches something it hasn't arrived yet, it blocks
> > until the data is ready. Normal applications would be
> > able to ignore this, thus forcing the whole data to be
> > ready from the very first DOM manipulations. But an
> > application expecting huge amount of data would be
> > designed with some care, in order to use the tree
> > sequentially.
> 
> in fact this is almost exactly the approach we've taken at
> cp.
> 
> our templating system is html based. we have defined special
> tags that identify 'smart nodes'. each smartnode is
> associated with its own template that can contain smartnode
> tags. thus, when we process the template for the requested
> uri, we build a dom document for it, and a dom fragment for
> each of the referenced smartnodes' templates, and for each
> of those templates' smartnodes' templates, etc. at the end
> of course these pieces are all composed into a single dom
> document.
> 
> furthermore, when each smartnode is constructed, it creates
> a set of 'backend service requests' and adds them to a
> 'backend service request queue'. once the entire dom
> document is constructed, we begin asynchronously executing
> each backend service request, and then we tell the dom
> document to begin outputting itself. as each smartnode is
> reached, we pause until all of that smartnode's pending
> requests have completed and the results have been
> processed. at this point the smartnode can output its html
> representation and we continue through the dom document to
> the next smartnode.
> 
> the dom document outputs to a stream that can be buffered or
> not according to policies attached to the request uri and to
> other dimensions of the request. so for certain uris we dont
> buffer at all, but for most we do. so i cant say the
> application is totally unaware of the blocking behavior, but
> the behavior is managed with a policy abstraction.

 Seems similar. What I've said it's maybe a little more simple I think... 

 But anyway, it would be nice that it were part of some standard. A new
version of DOM probably. And.. what's cp? =)

Re: Content-length

Posted by brian moseley <ix...@maz.org>.

On Sun, 23 Jan 2000, Nicolás Lichtmaier wrote:

>  I'm no expert on XML, but I've always have this idea...
> wouldn't it be posible to have an abstraction of the
> issue? And that abstraction would be some kind of
> partial DOM. The application would process the DOM tree.
> If it touches something it hasn't arrived yet, it blocks
> until the data is ready. Normal applications would be
> able to ignore this, thus forcing the whole data to be
> ready from the very first DOM manipulations. But an
> application expecting huge amount of data would be
> designed with some care, in order to use the tree
> sequentially.

in fact this is almost exactly the approach we've taken at
cp.

our templating system is html based. we have defined special
tags that identify 'smart nodes'. each smartnode is
associated with its own template that can contain smartnode
tags. thus, when we process the template for the requested
uri, we build a dom document for it, and a dom fragment for
each of the referenced smartnodes' templates, and for each
of those templates' smartnodes' templates, etc. at the end
of course these pieces are all composed into a single dom
document.

furthermore, when each smartnode is constructed, it creates
a set of 'backend service requests' and adds them to a
'backend service request queue'. once the entire dom
document is constructed, we begin asynchronously executing
each backend service request, and then we tell the dom
document to begin outputting itself. as each smartnode is
reached, we pause until all of that smartnode's pending
requests have completed and the results have been
processed. at this point the smartnode can output its html
representation and we continue through the dom document to
the next smartnode.

the dom document outputs to a stream that can be buffered or
not according to policies attached to the request uri and to
other dimensions of the request. so for certain uris we dont
buffer at all, but for most we do. so i cant say the
application is totally unaware of the blocking behavior, but
the behavior is managed with a policy abstraction.

Re: Content-length

Posted by Nicolás Lichtmaier <ni...@debian.org>.

> >  I think that sending data when the process hassn't finished is a
> > very good thing for certain pages: long pages, pages that come
> > with a slow backend.
> > I'd say that 90% time pages are small (less than 20k). And what do
> > you get by sending them to the client before (a couple of ms) the
> > page is done? So I'd said that buffering should be the default,
> > and it should be turned of by pages which need it.
> 
> I agree enthusiastically: most pages are small in size thus requiring
> a relatively modest amount of storage. In the Cocoon environment,
> though, small doesn't necessarily imply "simple": more often than not
> otherwise "small" web pages will contain (possibly costly) dynamically
> generated content and will be subject to complex transformations.
> 
> It's in this typical context that I feel there's a right place for DOM.
> I find DOM much better suited for some data-oriented, potentially
> complex transformations whose processing model results in final
> content being known only at the latest stage (thus enabling buffering).
> As an added bonus, yes, this includes the ability to properly set the
> content length whenever possible.
> 
> This reminds me of a [heretical? :-)] post from Clark Evans in regard
> to the DOM/SAX dichotomy. Phrased as "DOM is right, SAX is wrong"
> I'd have expected it to trigger a flame war, but there was little follow
> up (mea culpa, too). I wish to hear more from Clark about this...

 I'm no expert on XML, but I've always have this idea... wouldn't it be
posible to have an abstraction of the issue? And that abstraction would be
some kind of partial DOM. The application would process the DOM tree. If it
touches something it hasn't arrived yet, it blocks until the data is ready.
Normal applications would be able to ignore this, thus forcing the whole
data to be ready from the very first DOM manipulations. But an application
expecting huge amount of data would be designed with some care, in order to
use the tree sequentially.
 The SAX API is too low level for my taste...

Re: Content-length

Posted by Nicolás Lichtmaier <ni...@debian.org>.

> >  I think that sending data when the process hassn't finished is a very good
> > thing for certain pages: long pages, pages that come with a slow backend.
> > I'd say that 90% time pages are small (less than 20k). And what do you get
> > by sending them to the client before (a couple of ms) the page is done? So
> > I'd said that buffering should be the default, and it should be turned of by
> > pages which need it.
> 
> Either that or we buffer it until it hits a certain size, and then we give
> up and start spooling to the client. Personally, I'd rather see this
> decision being made by the server, which knows what its load is, and knows
> how many connections it has to cope with right now.

 That would be good.  And note that cached pages will always have the proper
headers, no matter how large they are.

> >  I think that we must see a world where *every* site is using Cocoon. In
> > that world wouldn't we want last-modified stamps anymore?
> Indeed - I *firmly* believe that most of the commercial, and clued up sites
> in the world will be using something cocoon-like within 10 months.
> (Although thankfully I don't have a hat to eat if I'm wrong...)

 ... and my point is that Cocoon should be shaped as how we'd like the web
to be. Cocoon should play well with the protocols, and implement the best
known practices. So as to build a cacheable, indexable, meaningful web.

Re: Content-length

Posted by Paul Russell <Pa...@uea.ac.uk>.

>  I think that sending data when the process hassn't finished is a very good
> thing for certain pages: long pages, pages that come with a slow backend.
> I'd say that 90% time pages are small (less than 20k). And what do you get
> by sending them to the client before (a couple of ms) the page is done? So
> I'd said that buffering should be the default, and it should be turned of by
> pages which need it.

Either that or we buffer it until it hits a certain size, and then we give
up and start spooling to the client. Personally, I'd rather see this
decision being made by the server, which knows what its load is, and knows
how many connections it has to cope with right now.

>  I think that we must see a world where *every* site is using Cocoon. In
> that world wouldn't we want last-modified stamps anymore?

Indeed - I *firmly* believe that most of the commercial, and clued up sites
in the world will be using something cocoon-like within 10 months.
(Although thankfully I don't have a hat to eat if I'm wrong...)


Paul

RE: Content-length

Posted by Ricardo Rocha <ri...@apache.org>.

Ross Burton wrote:
> This works now, but what about the mythical Cocoon 2, where the
> sending of content may happen before all of the data has been
> processed, because of the SAX event model?

Nicolás Lichtmaier wrote:
>  I think that sending data when the process hassn't finished is a
> very good thing for certain pages: long pages, pages that come
> with a slow backend.
> I'd say that 90% time pages are small (less than 20k). And what do
> you get by sending them to the client before (a couple of ms) the
> page is done? So I'd said that buffering should be the default,
> and it should be turned of by pages which need it.

I agree enthusiastically: most pages are small in size thus requiring
a relatively modest amount of storage. In the Cocoon environment,
though, small doesn't necessarily imply "simple": more often than not
otherwise "small" web pages will contain (possibly costly) dynamically
generated content and will be subject to complex transformations.

It's in this typical context that I feel there's a right place for DOM.
I find DOM much better suited for some data-oriented, potentially
complex transformations whose processing model results in final
content being known only at the latest stage (thus enabling buffering).
As an added bonus, yes, this includes the ability to properly set the
content length whenever possible.

This reminds me of a [heretical? :-)] post from Clark Evans in regard
to the DOM/SAX dichotomy. Phrased as "DOM is right, SAX is wrong"
I'd have expected it to trigger a flame war, but there was little follow
up (mea culpa, too). I wish to hear more from Clark about this...

Re: Content-length

Posted by Nicolás Lichtmaier <ni...@debian.org>.

> >  I like Cocoon. I think it might be the way all things should happen in
> the
> > web. But to do that Cocoon must be as `web friendly' as posible, as static
> > pages are. e.g. it must send the proper HTTP headers in order to cooperate
> > with caches and other HTTP software. From reading the sources it seems
> very
> > easy to add the Content-length header. The whole content is first stored
> in
> > a String (in Engine.handle()). It would be a matter of sending the
> > string.length();. This could be done now... is there any reason this is
> not
> > being done?
> 
> This works now, but what about the mythical Cocoon 2, where the sending of
> content may happen before all of the data has been processed, because of the
> SAX event model?

 I think that sending data when the process hassn't finished is a very good
thing for certain pages: long pages, pages that come with a slow backend.
I'd say that 90% time pages are small (less than 20k). And what do you get
by sending them to the client before (a couple of ms) the page is done? So
I'd said that buffering should be the default, and it should be turned of by
pages which need it.

 I think that we must see a world where *every* site is using Cocoon. In
that world wouldn't we want last-modified stamps anymore?

Re: Content-length

Posted by Ross Burton <bu...@dcs.kcl.ac.uk>.

>  I like Cocoon. I think it might be the way all things should happen in
the
> web. But to do that Cocoon must be as `web friendly' as posible, as static
> pages are. e.g. it must send the proper HTTP headers in order to cooperate
> with caches and other HTTP software. From reading the sources it seems
very
> easy to add the Content-length header. The whole content is first stored
in
> a String (in Engine.handle()). It would be a matter of sending the
> string.length();. This could be done now... is there any reason this is
not
> being done?

This works now, but what about the mythical Cocoon 2, where the sending of
content may happen before all of the data has been processed, because of the
SAX event model?

Ross Burton

Re: Content-length

Posted by Nicolás Lichtmaier <ni...@debian.org>.

> >  I like Cocoon. I think it might be the way all things should happen in the
> > web. But to do that Cocoon must be as `web friendly' as posible, as static
> > pages are. e.g. it must send the proper HTTP headers in order to cooperate
> > with caches and other HTTP software. From reading the sources it seems very
> > easy to add the Content-length header. The whole content is first stored in
> > a String (in Engine.handle()). It would be a matter of sending the
> > string.length();. This could be done now... is there any reason this is not
> > being done?
> 
> no good reason that I'm aware of. i'll commit the patch if others agree.
> +1 from me.

 Note that you must do something about the <!-- served by Cocoon -->
message. I would move it up, and print it to the "out" stream too. You
should probably remove the "from cache" part.

> >  Last-modified/Expires is a little more complicated, but I have a proposal:
> > 
> >  You can always add these headers by hand, but this approach is wrong, as
> > these headers could be easily handled automatically. This is how:
> > 
> >  The `Last-modified' date is the greater `last-modified' date of all
> > components of the producer|processor path. Tha `Expires' header is the
> > lesser date. This is obvious if you think a bit about it. The simple
> > producer that reads a file would give the files' date as the last-modified
> > time, later, a producer that adds some information from some files would
> > report those files' date. The greater date would be the `last-modified'
> > header.
> > 
> >  This could be implemented with a PI like this:
> > 
> > <?last-modified value="<a time>"?>
> > 
> >  Each component in the path would be able to create this PI. The engine
> > *will remove this PI* before sending the document to the next component. The
> > engine will keep the bigger last-modified time. If a component does not
> > provide this information, last-modified generation would be cancelled. The
> > same would be done with the expires (only that it would keep the smaller
> > time).
> > 
> >  What do you think?
> 
> I think there are other issues at work here that we have to worry about.
> We need some way to ask the XML parser for the most recently modified date
> of any resource referenced by an XML file that it's parsing (so as to
> account for external entities). Similarly, we need some way to ask the XSL
> processor the same question (so as to account for xsl:import/include). I'm
> not certain that we have either of these facilities yet. Perhaps someone
> more conversant with the latest generation of X*L tools could chime in.

 Agreed. This would be the responsibility of each component in the pipeline,
so we could take care of this later. What do you think about the
"infrastrcture" I propose?

 If a given component (one XSLT processor) doesn't provide the PI, it
doesn't matter. Cocoon will not send the `last-modified' (or `expires').

RE: Content-length

Posted by Jeff Sonstein <je...@tlg.net>.

> > [...] Cocoon must be as `web friendly' as posible, as static
> > pages are. e.g. it must send the proper HTTP headers in order to cooperate
> > with caches and other HTTP software. [...]
> 
> [...] i'll commit the patch if others agree.
> +1 from me.

makes sense to me

jeffs

--
Jeff Sonstein, M.A.     http://ariadne.iz.net/
        http://ariadne.iz.net/~jeffs/jeffs.asc
==============================================
there are no bugs
there are just undocumented features

Re: Content-length

Posted by Donald Ball <ba...@webslingerZ.com>.

On Sun, 23 Jan 2000, [iso-8859-1] Nicol�s Lichtmaier wrote:

>  I like Cocoon. I think it might be the way all things should happen in the
> web. But to do that Cocoon must be as `web friendly' as posible, as static
> pages are. e.g. it must send the proper HTTP headers in order to cooperate
> with caches and other HTTP software. From reading the sources it seems very
> easy to add the Content-length header. The whole content is first stored in
> a String (in Engine.handle()). It would be a matter of sending the
> string.length();. This could be done now... is there any reason this is not
> being done?

no good reason that I'm aware of. i'll commit the patch if others agree.
+1 from me.

>  Last-modified/Expires is a little more complicated, but I have a proposal:
> 
>  You can always add these headers by hand, but this approach is wrong, as
> these headers could be easily handled automatically. This is how:
> 
>  The `Last-modified' date is the greater `last-modified' date of all
> components of the producer|processor path. Tha `Expires' header is the
> lesser date. This is obvious if you think a bit about it. The simple
> producer that reads a file would give the files' date as the last-modified
> time, later, a producer that adds some information from some files would
> report those files' date. The greater date would be the `last-modified'
> header.
> 
>  This could be implemented with a PI like this:
> 
> <?last-modified value="<a time>"?>
> 
>  Each component in the path would be able to create this PI. The engine
> *will remove this PI* before sending the document to the next component. The
> engine will keep the bigger last-modified time. If a component does not
> provide this information, last-modified generation would be cancelled. The
> same would be done with the expires (only that it would keep the smaller
> time).
> 
>  What do you think?

I think there are other issues at work here that we have to worry about.
We need some way to ask the XML parser for the most recently modified date
of any resource referenced by an XML file that it's parsing (so as to
account for external entities). Similarly, we need some way to ask the XSL
processor the same question (so as to account for xsl:import/include). I'm
not certain that we have either of these facilities yet. Perhaps someone
more conversant with the latest generation of X*L tools could chime in.

- donald