You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Joachim Zobel <jz...@heute-morgen.de> on 2008/01/04 22:47:16 UTC

Re: Transcoding module for libxml2-based filters

Am Dienstag, den 25.12.2007, 22:54 +0000 schrieb Nick Kew:
> As developer or co-developer of several libxml2-based filter
> modules, ...

Hey, I thought you were on the expat side :) 

> The basic features are:
>   1. Sniff charset of incoming data, from (in order):
> 	(a) HTTP headers, if available
> 	(b) XML BOM / XML Declaration
> 	(c) HTML <meta> elements
> 	(d) Configuration default

A configuration Like
XML2EncSniff HTTP XML META CONF
might be desirable for this in the long run. So one can for example
ignore META.

>   2. If the charset is not supported by libxml2,
>      convert it to UTF-8 using apr_xlate (if supported).
>   3. Remove <meta> elements that are invalidated by
>      any such conversion.
>   4. Perform other preprocessing fixups, and offer an
>      optional hook for preprocessing.

This means e.g. fix XML decl. if the header tells different?

>   5. Support post-filtering from UTF-8 to a server admin's
>      choice of charset.

Good.

> The challenging aspect of this is to enable it to be inserted
> twice in a filter chain (before and after libxml2), and perform
> different transformations each time. 

This means two different filter functions, right?

> Currently it offers
> configuration options appropriate to a pre-filter, and will
> export a function for other filter modules to insert it with
> their own configuration options (f->ctx) for post-filtering.
> Unless anyone has a better suggestion.

Why do you think it is necessary to ask other filters for configuration
this way? What is the advantage of this above simply having
configuration options for the post filter?

Hey, you may want to interface with mod_negotiate :) Charsets are not
really negotiable now, but with your module they will we.

Sincerely,
Joachim



Re: Transcoding module for libxml2-based filters

Posted by Joachim Zobel <jz...@heute-morgen.de>.
Am Freitag, den 04.01.2008, 22:06 +0000 schrieb Nick Kew:
> > This means two different filter functions, right?
> 
> No, one function, with its behaviour determined by its ctx.

Sure? IMHO two functions that call the same infrastructure function
might be clearer. But YMMV, I am an enemy of state.

[...]

> > Why do you think it is necessary to ask other filters for
> > configuration this way? What is the advantage of this above simply
> > having configuration options for the post filter?
> 
> That gets messy, with two filters both of AP_FTYPE_RESOURCE.
> If I hack it with offsets, that breaks interaction with other
> filters.

Hmmmm. Maybe this is because I always would configure my filter chain
explititely. But everybody using your module will also have to configure
his filter chain explicitely, simply because he wants your pre filter to
run before his own AP_FTYPE_RESOURCE filter.

AddOutputFilter xml2enc-pre;user-filter;xml2enc-post

or

AddOutputFilter xml2enc-pre;sax;i18n;xml2enc-post

or

AddOutputFilter xml2enc-pre;sax;htmlplus;i18n;xml2enc-post

OK, messy has its point here.

Sincerely,
Joachim



Re: Transcoding module for libxml2-based filters

Posted by Nick Kew <ni...@webthing.com>.
On Fri, 04 Jan 2008 22:47:16 +0100
Joachim Zobel <jz...@heute-morgen.de> wrote:

> Am Dienstag, den 25.12.2007, 22:54 +0000 schrieb Nick Kew:
> > As developer or co-developer of several libxml2-based filter
> > modules, ...
> 
> Hey, I thought you were on the expat side :) 

Just mod_xmlns.  All my other SAX parsing modules are libxml2.

> > The basic features are:
> >   1. Sniff charset of incoming data, from (in order):
> > 	(a) HTTP headers, if available
> > 	(b) XML BOM / XML Declaration
> > 	(c) HTML <meta> elements
> > 	(d) Configuration default
> 
> A configuration Like
> XML2EncSniff HTTP XML META CONF
> might be desirable for this in the long run. So one can for example
> ignore META.

Indeed, that's a thought.  Not to mention sniffing according
to Content-Type, since one purpose of this is *also* to support
non-markup text.

> >   2. If the charset is not supported by libxml2,
> >      convert it to UTF-8 using apr_xlate (if supported).
> >   3. Remove <meta> elements that are invalidated by
> >      any such conversion.
> >   4. Perform other preprocessing fixups, and offer an
> >      optional hook for preprocessing.
> 
> This means e.g. fix XML decl. if the header tells different?

Yes, though that's a TBD.

> >   5. Support post-filtering from UTF-8 to a server admin's
> >      choice of charset.
> 
> Good.
> 
> > The challenging aspect of this is to enable it to be inserted
> > twice in a filter chain (before and after libxml2), and perform
> > different transformations each time. 
> 
> This means two different filter functions, right?

No, one function, with its behaviour determined by its ctx.

> > Currently it offers
> > configuration options appropriate to a pre-filter, and will
> > export a function for other filter modules to insert it with
> > their own configuration options (f->ctx) for post-filtering.
> > Unless anyone has a better suggestion.
> 
> Why do you think it is necessary to ask other filters for
> configuration this way? What is the advantage of this above simply
> having configuration options for the post filter?

That gets messy, with two filters both of AP_FTYPE_RESOURCE.
If I hack it with offsets, that breaks interaction with other
filters.

> Hey, you may want to interface with mod_negotiate :) Charsets are not
> really negotiable now, but with your module they will we.

Hehe.  Well, there's also mod_charset_lite:-)

Thanks for the comments.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/