You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Zvi Har'El <rl...@math.technion.ac.il> on 2003/12/14 14:13:34 UTC

AddCharset filename extensions

The default httpd.conf includes the lines

AddCharset ISO-8859-1  .iso8859-1  .latin1
AddCharset ISO-8859-2  .iso8859-2  .latin2 .cen
AddCharset ISO-8859-3  .iso8859-3  .latin3
AddCharset ISO-8859-4  .iso8859-4  .latin4
AddCharset ISO-8859-5  .iso8859-5  .latin5 .cyr .iso-ru
AddCharset ISO-8859-6  .iso8859-6  .latin6 .arb
AddCharset ISO-8859-7  .iso8859-7  .latin7 .grk
AddCharset ISO-8859-8  .iso8859-8  .latin8 .heb
AddCharset ISO-8859-9  .iso8859-9  .latin9 .trk

However, quick look at http://www.iana.org/assignments/character-sets shows
that calling the non-latin charsets ISO8859-N by the name latinN is wrong. 
For example, latin8 is ISO-8859-14, or iso-celtic, and certainly not
ISO-8859-8, which is just hebrew! Similarly, latin6 is ISO-8859-10, and not
ISO-8859-6, which is arabic! Finally, latin5 is ISO-8859-9, turkish, and not
ISO-8859-5, which is cyrillic. latin1-4 are ok, and I didn't find latin7 in
this reference at all. I suggest httpd.conf should be fixed accordingly.

-- 
Dr. Zvi Har'El     mailto:rl@math.technion.ac.il     Department of Mathematics
tel:+972-54-227607 icq:179294841     Technion - Israel Institute of Technology
fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/     Haifa 32000, ISRAEL
"If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
                             Sunday, 19 Kislev 5764, 14 December 2003,  3:03PM

Re: AddCharset filename extensions

Posted by Zvi Har'El <rl...@math.technion.ac.il>.
To make my point clearer, here is the patch:


--- httpd-2.0.48/docs/conf/httpd-std.conf.in.~20031011014743~	2003-10-11 03:47:43.000000000 +0200
+++ httpd-2.0.48/docs/conf/httpd-std.conf.in	2003-12-15 18:47:07.000000000 +0200
@@ -797,11 +797,15 @@
 AddCharset ISO-8859-2  .iso8859-2  .latin2 .cen
 AddCharset ISO-8859-3  .iso8859-3  .latin3
 AddCharset ISO-8859-4  .iso8859-4  .latin4
-AddCharset ISO-8859-5  .iso8859-5  .latin5 .cyr .iso-ru
-AddCharset ISO-8859-6  .iso8859-6  .latin6 .arb
-AddCharset ISO-8859-7  .iso8859-7  .latin7 .grk
-AddCharset ISO-8859-8  .iso8859-8  .latin8 .heb
-AddCharset ISO-8859-9  .iso8859-9  .latin9 .trk
+AddCharset ISO-8859-5  .iso8859-5  .cyr .iso-ru
+AddCharset ISO-8859-6  .iso8859-6  .arb
+AddCharset ISO-8859-7  .iso8859-7  .grk
+AddCharset ISO-8859-8  .iso8859-8  .heb
+AddCharset ISO-8859-9  .iso8859-9  .latin5 .trk
+AddCharset ISO-8859-10  .iso8859-10  .latin6 
+AddCharset ISO-8859-13  .iso8859-13  .latin7 
+AddCharset ISO-8859-14  .iso8859-14  .latin8 
+AddCharset ISO-8859-15  .iso8859-15  .latin9 
 AddCharset ISO-2022-JP .iso2022-jp .jis
 AddCharset ISO-2022-KR .iso2022-kr .kis
 AddCharset ISO-2022-CN .iso2022-cn .cis




I have also included latin7 and latin9, which for some reason absent from IANA,
but appear as standard in in  the FSF's "free recode". BTW, instead of
inventing new charset abbreviations like .cyr, .arb, .grk, .heb, I would
personally prefer using the IANA (RFC 1345) aliases: .cyrillic, .arabic,
.greek, .hebrew, in the same way we use .latin1, .latin2 , etc, but this is a
matter of opinion, not bug fix patching.

On Sun, 14 Dec 2003 15:13:34 +0200, Zvi Har'El wrote about "AddCharset filename extensions":
> The default httpd.conf includes the lines
> 
> AddCharset ISO-8859-1  .iso8859-1  .latin1
> AddCharset ISO-8859-2  .iso8859-2  .latin2 .cen
> AddCharset ISO-8859-3  .iso8859-3  .latin3
> AddCharset ISO-8859-4  .iso8859-4  .latin4
> AddCharset ISO-8859-5  .iso8859-5  .latin5 .cyr .iso-ru
> AddCharset ISO-8859-6  .iso8859-6  .latin6 .arb
> AddCharset ISO-8859-7  .iso8859-7  .latin7 .grk
> AddCharset ISO-8859-8  .iso8859-8  .latin8 .heb
> AddCharset ISO-8859-9  .iso8859-9  .latin9 .trk
> 
> However, quick look at http://www.iana.org/assignments/character-sets shows
> that calling the non-latin charsets ISO8859-N by the name latinN is wrong. 
> For example, latin8 is ISO-8859-14, or iso-celtic, and certainly not
> ISO-8859-8, which is just hebrew! Similarly, latin6 is ISO-8859-10, and not
> ISO-8859-6, which is arabic! Finally, latin5 is ISO-8859-9, turkish, and not
> ISO-8859-5, which is cyrillic. latin1-4 are ok, and I didn't find latin7 in
> this reference at all. I suggest httpd.conf should be fixed accordingly.
> 
> -- 
> Dr. Zvi Har'El     mailto:rl@math.technion.ac.il     Department of Mathematics
> tel:+972-54-227607 icq:179294841     Technion - Israel Institute of Technology
> fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/     Haifa 32000, ISRAEL
> "If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
>                              Sunday, 19 Kislev 5764, 14 December 2003,  3:03PM

-- 
Dr. Zvi Har'El     mailto:rl@math.technion.ac.il     Department of Mathematics
tel:+972-54-227607 icq:179294841     Technion - Israel Institute of Technology
fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/     Haifa 32000, ISRAEL
"If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
                             Monday, 21 Kislev 5764, 15 December 2003,  6:58PM