You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Nick Kew <ni...@webthing.com> on 2008/02/07 14:05:35 UTC

ANN: mod_xml2enc: improved i18n for markup filters.

I'm happy to announce that mod_xml2enc is now ready for use.

mod_xml2enc is designed to be used with libxml2-based filter
modules, such as:
	mod_accessibility
	mod_proxy_html
	mod_publisher
	mod_transform
	mod_xml2
	mod_xslt
and serves to improve their internationalisation support:

  (1) It sniffs the encoding of incoming documents, using
  HTTP headers where available, or XML or HTML rules where
  there is no HTTP information.
  (2) If a character set is not supported by libxml2, it
  converts to UTF-8 ahead of the markup filter.
  (3) It removes any encoding information that is invalidated
  by the processing, and substitutes a correct HTTP header.

To take advantage of this, filter modules should use the
xml2enc_charset optional function to retrieve the charset
argument to pass to the libxml2 parser.  Note that you may
have to handle APR_EAGAIN, if your module sets up the parser
before mod_xml2enc has been able to sniff the first data.
I'll be updating published versions of my filter modules
to use it as round tuits permit.

Filter modules can also postprocess to output a different
charset again, using the xml2enc_filter optional function.
Additional capabilities are preprocessing of bad HTML
(a function introduced in mod_proxy_html 3, but also relevant
to other HTML modules), and an additional optional hook
for preprocessing.  These extra functions are untested.

Developers, feel free to explore and send feedback!

http://apache.webthing.com/mod_xml2enc/

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/