You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Nick Kew <ni...@webthing.com> on 2009/03/27 00:18:15 UTC

Ideas for content-aware filter modules for 2.4

Following the BoF, I'll put down a brief marker on-list on the
theme of content-awareness.  More when I'm back home
and not totally knackered.

We have handling of certain important encodings:
SSL and compression (albeit not quite bug-free) as
standard in current versions.  I'd be interested to
expand that with some new filter modules.

1.  Character Encoding.  We have very limited capability
in mod_charset_lite.  We can expand that to support
automatic detection of charset, and either setting a request
field or transforming to a selected charset.

We can also provide an API for modules to configure this,
in cases where more than one transformation is wanted.
A real-life use case for this is where users of libxml2-based
modules such as mod_proxy_html need to use charsets
other than utf-8, and particularly charsets that are not
supported by libxml2.

2.  Generic XML support.

In mod_xmlns, a SAX2 parser parses XML to a stream of
SAX events.  Events are keyed on namespace, and
application modules can register handlers for a namespace.

A good illustrative use case was my parser for the
ESI (Edge-side includes) namespace.  I've also used it
to generate HTML and RDF from a common source:
a task you might otherwise use XSLT for, at a much
higher performance cost.  I also hacked it to support
scripting and embedded SQL queries, but that's a
line I don't see as so interesting, because it gets us
into the territory of well-established alternatives
including PHP and JSP.

Joachim Zobel's mod_xml2 abstracts this further by
defining SAX event buckets (e.g. startElement bucket)
and passing them down the filter chain.  We could build
on the same approach to pass DOM or similar nodes as
buckets for applications like XSLT.

If we use expat for this, we avoid introducing any new
dependencies.

3.  Data type library.

Our filter architecture works well for tasks such as (some)
image processing.  I don't think that's something we want
to do too much of in core, but it might add something if
we provided some basics, such as encoding/decoding
of the regular Web image formats (gif/jpeg/png, and svg
using xmlns dispatch).  A similar approach might apply
more widely to other media.


I can contribute some of this from my existing work,
including relicensing where necessary.  That is,
if there's interest in adding some of these things
as standard in 2.4.

-- 
Nick Kew

Re: Ideas for content-aware filter modules for 2.4

Posted by Dan Poirier <po...@pobox.com>.
Nick Kew <ni...@webthing.com> writes:

> We have handling of certain important encodings:
> SSL and compression (albeit not quite bug-free) as
> standard in current versions.  I'd be interested to
> expand that with some new filter modules.
>
> 1.  Character Encoding.  We have very limited capability
> in mod_charset_lite.  We can expand that to support
> automatic detection of charset, and either setting a request
> field or transforming to a selected charset.
>
> We can also provide an API for modules to configure this,
> in cases where more than one transformation is wanted.
> A real-life use case for this is where users of libxml2-based
> modules such as mod_proxy_html need to use charsets
> other than utf-8, and particularly charsets that are not
> supported by libxml2.

I'd definitely be in favor of some improvements in this area.

mod_charset_lite can translate between character encodings, but there's
no general way for it to know what encoding the content coming down the
chain is using.

Can automatic detection of charsets be done reliably and with low cost?
I'd have guessed not, but would love to be educated to the contrary.

Alternatively, maybe generators and filters could indicate the encoding
of the content they're inserting.

In any case, count me as an interested party willing to help out with
this.

-- 
Dan Poirier <po...@pobox.com>


Re (somewhat late): Ideas for content-aware filter modules for 2.4

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Am Donnerstag, den 26.03.2009, 23:18 +0000 schrieb Nick Kew:
> 2.  Generic XML support.

[...]

> Joachim Zobel's mod_xml2 abstracts this further by
> defining SAX event buckets (e.g. startElement bucket)
> and passing them down the filter chain.  We could build
> on the same approach to pass DOM or similar nodes as
> buckets for applications like XSLT.
> 
> If we use expat for this, we avoid introducing any new
> dependencies.

If you want to add useful XML support, you'll pretty soon need DOM. For
this reason mod_xml2 is using libxml2. The sax buckets generated with
the help of libxml2 can much easier be used for DOM processing with
libxml2. 

So for nontrivial XML support you IMHO will have to add a new
dependency.

Sincerely,
Joachim