You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Joshua Marantz <jm...@google.com> on 2011/03/14 16:54:28 UTC

ordering output filters

Hello from mod_pagespeed again.

Our users have identified another incompatibility between standard filters
and mod_pagespeed; this time with mod_includes.   In general I think that
mod_pagespeed should run after mod_includes, for a few reasons.  But in
particular, mod_pagespeed, in its own html-centric filter architecture, has
a 'remove_comments' filter which strips out any server-side includes if
mod_pagespeed runs prior to mod_includes.   While 'remove_comments' is an
optional feature for mod_pagespeed (many web pages would be broken without
mod_pagespeed), many consider it a desirable feature they'd like to turn on.

This is all documented at length in
http://code.google.com/p/modpagespeed/issues/detail?id=182

Even in the absence of 'remove_comments', it would be preferable to have
mod_pagespeed run after mod_includes so that it has an opportunity to
optimize the included text.  The user can achieve this by putting this line
into his config file:

    AddOutputFilter INCLUDES;MOD_PAGESPEED_OUTPUT_FILTER html

But this is not desirable for a couple of reasons.  We'd like to force the
correct order automatically if possible.


The question for this mailing list is how best to achieve that.  Should we,
in mod_pagespeed's output filter, have logic that says:

  if (mod_includes was enabled in this config) {
    re-insert mod_pagespeed at the end of the AP_FTYPE_RESOURCE chain
    pass the buckets to mod_includes
  }

Or can we, at init time, call server APIs to tweak the filter order?  Is
there any filter that seeks to do that somehow?

We also have a constraint that mod_pagespeed must run before mod_deflate.
 Actually mod_pagespeed already inserts mod_deflate in the filter-chain to
run downstream of it:

  ap_add_output_filter("DEFLATE", NULL, request, request->connection);


Another hack is to have mod_pagespeed introduce a new output filter,
MOD_PAGESPEED_REORDER which would:
   remove INCLUDES
   remove MOD_PAGESPEED_OUTPUT_FILTER
   add INCLUDES
   add MOD_PAGESPEED_OUTPUT_FILTER

This seems ugly though because (a) I don't know how to remove a filter by
name and (b) it would wind up adding INCLUDES even if it was not already
registered.

A third idea is to exploit the fact that INCLUDES adds itself to the output
chain via
   ap_hook_fixups(include_fixup, NULL, NULL, APR_HOOK_LAST);
where include_fixup() does ap_add_output_filter("INCLUDES", NULL, r,
r->connection);

I suppose mod_pagespeed could set up its own call
to  ap_hook_fixups(niod_pagespeed_fixup, NULL, NULL, APR_HOOK_LAST* + 1*);
and add itself as an output filter.



But all of these ideas seem like a hack.  Any hints on how to enforce the
output-filter ordering:   INCLUDES,MOD_PAGESPEED,DEFLATE in a robust and
clean way, would be greatly appreciated.

-Josh

Re: ordering output filters

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Mon, Mar 14, 2011 at 18:54, Joshua Marantz <jm...@google.com> wrote:
> And in particular, adding an insert_filter hook sounds a little more complex
> than the AP_FTYPE_RESOURCE+1 idea.  Is there some advantage to using
> insert_filter hook?

(fashionably late to the party)

insert_filter lets you programmatically insert a filter right before
the handler runs. It's the way to go if you have a filter that is in
some way dependent on the side effects of one or more pre-handler
hooks, like a fixup from mod_headers.

Re: ordering output filters

Posted by Joshua Marantz <jm...@google.com>.
Thanks, Nick & Ben!

On Mon, Mar 14, 2011 at 1:31 PM, Nick Kew <ni...@apache.org> wrote:
>
> AP_FTYPE_RESOURCE+1.  That also leaves an admin the possibility of
> overriding it.


I didn't realize these +1/-1 hacks were available for this API.  This looks
really simple & is the direction I'm leaning.


> Why not an insert_filter hook?


This is a really good question: I didn't know about this hook.  Is there a
good place I should go to learn about these hooks myself?  Of course, asking
this mailing-list has worked really well so far thanks to Ben & yourself &
others.

And in particular, adding an insert_filter hook sounds a little more complex
than the AP_FTYPE_RESOURCE+1 idea.  Is there some advantage to using
insert_filter hook?

-Josh

Re: A few questions on Input Filters

Posted by Joe Lewis <jo...@joe-lewis.com>.
On 01/13/2012 09:24 AM, Martin Townsend wrote:
> Thanks Joe for the info, my input filter is now behaving itself again.
> I've not seen any FLUSH buckets yet so I doubt I will as we are 
> running the bare minimum of modules.  I'll let you know if I do though.
> One last question, when I don't see the EOS bucket should I return a 
> certain APR_ error code to say the POST request hasn't finished yet. 
> I'm currently return APR_OK and this seems to work.
>
> Cheers,
> Martin.

Leave it as OK.  If you return a different APR_ error, it could cause 
the entire request to hang.

By returning OK, you are simply stating that your function has finished 
what it was passed, and all is well.

Joe
--
www.silverhawk.net

Re: A few questions on Input Filters

Posted by Martin Townsend <ma...@power-oasis.com>.
Thanks Joe for the info, my input filter is now behaving itself again.
I've not seen any FLUSH buckets yet so I doubt I will as we are running 
the bare minimum of modules.  I'll let you know if I do though.
One last question, when I don't see the EOS bucket should I return a 
certain APR_ error code to say the POST request hasn't finished yet. I'm 
currently return APR_OK and this seems to work.

Cheers,
Martin.

On 11/01/2012 17:12, Joe Lewis wrote:
> On 01/11/2012 04:17 AM, Martin Townsend wrote:
>> The problem occured when the POST request was split into two brigades 
>> which are passed independently to my filter. So my first question is 
>> this expected?
>
> You should definitely expect that.  Don't assume that the entire 
> content will always come in the same way.  In this kind of development 
> architecture (where anyone can build a module), we should expect the 
> unexpected.
>
>>   I assume it is so I have to alter my filter to handle partial 
>> bucket brigades.
>> If so, I take it I can infer a partial brigade by the fact that the 
>> EOS bucket is not present?
>> Whilst looking through other input filters I notice they handle FLUSH 
>> buckets, for my input filter I take it I can ignore these buckets as 
>> all I'm trying to do is extract the POST data to a buffer and then 
>> process it without altering it.
>
> If the brigade doesn't have that EOS, there is more to the stream to 
> be read.  When you see the FLUSH bucket, you should really be passing 
> the brigade on to the next chain (FLUSH buckets are created when the 
> brigade needs to be split).
>
> I had originally thought that FLUSH buckets were output buckets to 
> prevent the client from waiting too long.  Are you seeing these on an 
> input chain?  If so, what other modules are involved?  I'm curious for 
> my own understanding of how other modules might effect some of the 
> stuff I have written.
>
>> I noticed that one module's input filter ignored sub requests, does 
>> anyone know when sub requests occur within the input filter phase and 
>> whether I can ignore these too.
>
> The input's have already been done when a sub request is created.  
> Usually, a sub request is happening when an output filter or a content 
> generator are being called, so I'm not sure a sub-request will see the 
> input from the parent filter.
>
>>
>> Many Thanks,
>> Martin.
>
> That is what the list is for.  Hope you can get things straightened out!
> Joe Lewis
> -- 
> www.silverhawk.net



Re: A few questions on Input Filters

Posted by Joe Lewis <jo...@joe-lewis.com>.
On 01/11/2012 04:17 AM, Martin Townsend wrote:
> The problem occured when the POST request was split into two brigades 
> which are passed independently to my filter. So my first question is 
> this expected?

You should definitely expect that.  Don't assume that the entire content 
will always come in the same way.  In this kind of development 
architecture (where anyone can build a module), we should expect the 
unexpected.

>   I assume it is so I have to alter my filter to handle partial bucket 
> brigades.
> If so, I take it I can infer a partial brigade by the fact that the 
> EOS bucket is not present?
> Whilst looking through other input filters I notice they handle FLUSH 
> buckets, for my input filter I take it I can ignore these buckets as 
> all I'm trying to do is extract the POST data to a buffer and then 
> process it without altering it.

If the brigade doesn't have that EOS, there is more to the stream to be 
read.  When you see the FLUSH bucket, you should really be passing the 
brigade on to the next chain (FLUSH buckets are created when the brigade 
needs to be split).

I had originally thought that FLUSH buckets were output buckets to 
prevent the client from waiting too long.  Are you seeing these on an 
input chain?  If so, what other modules are involved?  I'm curious for 
my own understanding of how other modules might effect some of the stuff 
I have written.

> I noticed that one module's input filter ignored sub requests, does 
> anyone know when sub requests occur within the input filter phase and 
> whether I can ignore these too.

The input's have already been done when a sub request is created.  
Usually, a sub request is happening when an output filter or a content 
generator are being called, so I'm not sure a sub-request will see the 
input from the parent filter.

>
> Many Thanks,
> Martin.

That is what the list is for.  Hope you can get things straightened out!
Joe Lewis
--
www.silverhawk.net

Re: A few questions on Input Filters

Posted by Nick Kew <ni...@apache.org>.
On Wed, 11 Jan 2012 11:17:30 +0000
Martin Townsend <ma...@power-oasis.com> wrote:

> Hi,
> [chop]

One point that Joe omitted to mention is that since input
filtering is a PULL API, it's entirely up to your filter
how much data to pull before returning to its upstream.
If your filter can't deal with partial data, it can block for
as long as necessary to read more.  With just a couple of Kb 
to handle it's not exactly going to be a performance hit!

Regarding the flush buckets, conceptually I'd expect they
might feature if you were handling streaming data, or
maybe multipart uploads, or some extension to the HTTP
protocol.  Maybe there's some such application- or ptotocol-
oriented filter in your chain?

-- 
Nick Kew

A few questions on Input Filters

Posted by Martin Townsend <ma...@power-oasis.com>.
Hi,

I have an input filter for processing POST requests and it was working 
well until recently.  After debugging I found that it was due to a large 
POST request (around 2680 bytes) which most of the time was handled as a 
single bucket brigade with one heap bucket.  The problem occured when 
the POST request was split into two brigades which are passed 
independently to my filter. So my first question is this expected?  I 
assume it is so I have to alter my filter to handle partial bucket 
brigades.
If so, I take it I can infer a partial brigade by the fact that the EOS 
bucket is not present?
Whilst looking through other input filters I notice they handle FLUSH 
buckets, for my input filter I take it I can ignore these buckets as all 
I'm trying to do is extract the POST data to a buffer and then process 
it without altering it.
I noticed that one module's input filter ignored sub requests, does 
anyone know when sub requests occur within the input filter phase and 
whether I can ignore these too.

Many Thanks,
Martin.




Re: ordering output filters

Posted by Nick Kew <ni...@apache.org>.
On 14 Mar 2011, at 15:54, Joshua Marantz wrote:


>  if (mod_includes was enabled in this config) {
>    re-insert mod_pagespeed at the end of the AP_FTYPE_RESOURCE chain
>    pass the buckets to mod_includes
>  }

Not good.  Modules are there to serve the server admin, not to enslave him.
In general they shouldn't touch each other (except through public APIs)
nor second-guess a server admin.

In practical terms, what about a third-party module that parses comments?
If includes get special treatment but others don't, you're making things horribly
confusing for your users.

A traditional but only slightly less ugly hack would be to declare your filter
AP_FTYPE_RESOURCE+1.  That also leaves an admin the possibility of
overriding it.

> Or can we, at init time, call server APIs to tweak the filter order?  Is
> there any filter that seeks to do that somehow?

You could take a look at how mod_proxy_html inserts mod_xml2enc
if available.

> A third idea is to exploit the fact that INCLUDES adds itself to the output
> chain via
>   ap_hook_fixups(include_fixup, NULL, NULL, APR_HOOK_LAST);
> where include_fixup() does ap_add_output_filter("INCLUDES", NULL, r,
> r->connection);

Why not an insert_filter hook?

That would be the right place to go, but then be sure to document exactly
how it works and what other modules will be auto-configured.

-- 
Nick Kew

Available for work, contract or permanent
http://www.webthing.com/~nick/cv.html


Re: ordering output filters

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Mon, Mar 14, 2011 at 16:54, Joshua Marantz <jm...@google.com> wrote:
> Even in the absence of 'remove_comments', it would be preferable to have
> mod_pagespeed run after mod_includes so that it has an opportunity to
> optimize the included text.  The user can achieve this by putting this line
> into his config file:
>
>    AddOutputFilter INCLUDES;MOD_PAGESPEED_OUTPUT_FILTER html
>
> But this is not desirable for a couple of reasons.  We'd like to force the
> correct order automatically if possible.

> We also have a constraint that mod_pagespeed must run before mod_deflate.
>  Actually mod_pagespeed already inserts mod_deflate in the filter-chain to
> run downstream of it:
>
>  ap_add_output_filter("DEFLATE", NULL, request, request->connection);

mod_include runs at AP_FTYPE_RESOURCE, mod_deflate at AP_FTYPE_CONTENT_SET.

If you register your filter at  AP_FTYPE_RESOURCE + 1 or
AP_FTYPE_CONTENT_SET - 1, it will run after mod_include but before
mod_deflate.