You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Benson Margulies <be...@basistech.com> on 2000/01/27 13:57:43 UTC

Proposals for Improvements in International Character Support

Dear Apache Development,

I would like to contribute some enhancements to Apache in the area of
international text support. Since I am a (proposed) new contributor, I
thought it would be polite to ask about the tastefulness of my ideas before
bothering to code and submit them.

My proposal is as follows: I want to enhance mod_mime to understand Unicode.


Unicode files, whether UCS-2 or UTF-8, begin with BOM characters. While it
is possible to teach the existing magic number parser to recognize these, it
is cumbersome, and the 'code' would have to be repeated for each MIME type
that can be implemented as a Unicode file (text/html, text/plain, XML,
etc.). I propose, instead, to make Unicode recognition a separate axis in
mod_mime. If the magic number parse yielded no other charset parameter, and
the file was recognizably Unicode, I propose to send out an appropriate
charset for Unicode.

Thanks in advance for your consideration,

Benson Margulies
http://www.basistech.com

Re: Proposals for Improvements in International Character Support

Posted by Dean Gaudet <dg...@arctic.org>.
hi,

i think this would do better as a separate module perhaps.  mod_mime
should be viewed as static extension -> mime foo mappings.  so for
mod_mime to support different charsets on disk it'd require folks to name
things foo.html.utf-8 or othersomesuch.

what you're proposing would require mod_mime to read the first few bytes
of each file... making it more dynamic.

Dean

On Thu, 27 Jan 2000, Benson Margulies wrote:

> Dear Apache Development,
> 
> I would like to contribute some enhancements to Apache in the area of
> international text support. Since I am a (proposed) new contributor, I
> thought it would be polite to ask about the tastefulness of my ideas before
> bothering to code and submit them.
> 
> My proposal is as follows: I want to enhance mod_mime to understand Unicode.
> 
> 
> Unicode files, whether UCS-2 or UTF-8, begin with BOM characters. While it
> is possible to teach the existing magic number parser to recognize these, it
> is cumbersome, and the 'code' would have to be repeated for each MIME type
> that can be implemented as a Unicode file (text/html, text/plain, XML,
> etc.). I propose, instead, to make Unicode recognition a separate axis in
> mod_mime. If the magic number parse yielded no other charset parameter, and
> the file was recognizably Unicode, I propose to send out an appropriate
> charset for Unicode.
> 
> Thanks in advance for your consideration,
> 
> Benson Margulies
> http://www.basistech.com
>