You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by "John N. Brahy" <jb...@ad2.com> on 2006/03/03 21:01:30 UTC
is there a way to force UTF-8 encoding
Is there a way to force UTF-8 encoding? I have tried
AddDefaultCharset utf-8 in the httpd.conf
OS: OpenBSD
Apache: Apache/1.3.29 (Unix) mod_perl/1.29 mod_ssl/2.8.16 OpenSSL/0.9.7g
But
1) wget -S says it's Content-Type: text/html; charset=ISO-8859-1
2) when I try the HTML validator on w3c.org it tells me that it's
ISO-8859-1
3) Internet Explorer and Firefox both have ISO-8859-1 selected
4) Firefox's Page Info shows it as ISO-8859-1
Anybody know a way to force it to utf-8?
::::: John Brahy
::::: CIO
::::: www.ad2.com
::::: jbrahy@ad2.com
::::: t: 310-356-7500
::::: f: 310-356-7520
::::: ad2, Inc.
::::: 1990 East Grand Ave, Suite 200
::::: El Segundo, CA 90245
Re: mp2: utf-8 and uc() under modperl2
Posted by Gunnar Koppel <gu...@raamatukoi.ee>.
Addition:
Seems, that under modperl2 are 4 problematic characters
(õ [otilde] ä [auml] ö [ouml] ü [uuml]) converted to ISO8859-x charset.
And i can't avoid it. Why are they converted and how to keep them as UTF?
--
Best regards,
Kõike hääd,
Gunnar Koppel
Re: mp2: utf-8 and uc() under modperl2 handler (still questions)
Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
Gunnar Koppel wrote:
>
> Gunnar Koppel kirjutas:
>
>> Thank you and all others! That's it. BTW, i always use uppercase
>
> Seems my joy was too early. As i said, I alway use uppercase
> filehandlers and so was in every real situation, where was this problem
> with UTF<->modperl2.
>
> As a script solution with STD* works fine, but as a handler it gives
> same output as earlier.
I might be able to look at this laterish... In the meantime, if your end
goal is a mod_perl2 handler .... why not use libapreq2... might solve
your problem.
--
------------------------------------------------------------------------
Philip M. Gollucci (pgollucci@p6m7g8.com) 323.219.4708
Consultant / http://p6m7g8.net/Resume/resume.shtml
Senior Software Engineer - TicketMaster - http://ticketmaster.com
1024D/A79997FA F357 0FDD 2301 6296 690F 6A47 D55A 7172 A799 97F
"It takes a minute to have a crush on someone, an hour to like someone,
and a day to love someone, but it takes a lifetime to forget someone..."
Re: mp2: utf-8 and uc() under modperl2 handler (still questions)
Posted by Jason Rhinelander <ja...@jagerman.com>.
Gunnar Koppel wrote:
> Seems my joy was too early. As i said, I alway use uppercase
> filehandlers and so was in every real situation, where was this problem
> with UTF<->modperl2.
>
> As a script solution with STD* works fine, but as a handler it gives
> same output as earlier.
>
> My little test set for handler:
> [...]
> But not as a handler. Can't understand, why? Seems that binmode STD*,
> ":utf8" has no power here?
Move your binmode()s into the handler -- otherwise they are happening
only once, when the module is loaded (and before STDOUT, etc. are tied),
instead of each time the handler runs.
--
-- Jason Rhinelander
mp2: utf-8 and uc() under modperl2 handler (still questions)
Posted by Gunnar Koppel <gu...@raamatukoi.ee>.
Gunnar Koppel kirjutas:
> Thank you and all others! That's it. BTW, i always use uppercase
Seems my joy was too early. As i said, I alway use uppercase
filehandlers and so was in every real situation, where was this problem
with UTF<->modperl2.
As a script solution with STD* works fine, but as a handler it gives
same output as earlier.
My little test set for handler:
------
package test::utf;
use strict;
use Apache2::Const qw(:common);
use CGI;
use locale;
use utf8;
binmode STDIN, ":utf8";
binmode STDOUT, ":utf8";
our $q = ();
sub handler {
my @alpha = qw(a b c d e f g h i j k l m n o p q r s š z ž t u v õ ä ö
ü x y);
$q = new CGI;
print $q->header(-type=>"text/plain", -charset=>"UTF-8", -cookie=>'');
print "\u$_ " foreach @alpha;
print "\n";
return OK;
}
1;
------
apache2 conf:
------
<VirtualHost *>
DocumentRoot /home/www/test/pub
ServerName utf.test.com
PerlModule test::utf
SetHandler perl-script
PerlHandler test::utf
PerlSendHeader On
</VirtualHost>
------
If i run this little script (from command line, under PerlRun,
PerlRegistry or as CGI), i have proper output:
------
#!/usr/bin/perl
use strict;
use warnings;
use locale;
use utf8;
binmode STDIN, ":utf8";
binmode STDOUT, ":utf8";
use test::utf;
&test::utf::handler();
------
But not as a handler. Can't understand, why? Seems that binmode STD*,
":utf8" has no power here?
And still is the question: if std* are intentionally not valid
filehandlers, why some (multibyte) UTF characters are handled correctly
under modperl2 and other not?
--
TIA,
Gunnar Koppel
Re: mp2: utf-8 and uc() under modperl2
Posted by Gunnar Koppel <gu...@raamatukoi.ee>.
Jason Rhinelander kirjutas:
> The script as above doesn't work out of the box -- the '$lc' variable
> isn't defined. Commenting out that line, I got the same results as you
Yes, i took out some lines from my test script, which defined $lc, but
forgot still this line. Oops.
> and eventually figured it out to be a problem with using 'stdout'
> instead of 'STDOUT' in your binmode() calls. Changing the binmode from:
Thank you and all others! That's it. BTW, i always use uppercase
filehandles, but somehow i didn't it now with modperl2 and here is the
result ;)
--
Best regards,
Kõike hääd,
Gunnar Koppel
Re: mp2: utf-8 and uc() under modperl2
Posted by to...@tuxteam.de.
On Fri, May 26, 2006 at 11:44:29PM -0700, Jason Rhinelander wrote:
> tomas@tuxteam.de wrote:
[...]
> It is, because in Perl stdin, stdout, and stderr are aliases for STDIN,
> STDOUT, and STDERR [...]
You're right. Sorry for the noise.
Regards
-- tomás
Re: mp2: utf-8 and uc() under modperl2
Posted by Jason Rhinelander <ja...@jagerman.com>.
Philip M. Gollucci wrote:
> Jason Rhinelander wrote:
>> Is it, then, intentional?
> You know, I'm not entirely sure, but I betting its because
> STDIN, STDERR, STDOUT are re-tied to the streams in the request object
> automagically for you in Registery/PerlRun under the 'perl-script'
> Handler. Under the mod_perl handler, these are not tied for you; thus,
> you must use $r->print() instead.
>
> The re-tie ing is likely goofing something.
It isn't, actually: I was mistake on this -- the stdin, stdout, and
stderr aliases are only available in package main[1] and so, of course,
don't work under ::Registry and ::PerlRun. As this shows up with a
"binmode() on unopened filehandle" warning when warnings are enabled, I
don't think it warrants a documentation update.
1 - documented in perldoc perlop (search for "stdin", case-sensitively).
--
-- Jason Rhinelander
Re: mp2: utf-8 and uc() under modperl2
Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
Jason Rhinelander wrote:
> Is it, then, intentional?
You know, I'm not entirely sure, but I betting its because
STDIN, STDERR, STDOUT are re-tied to the streams in the request object
automagically for you in Registery/PerlRun under the 'perl-script'
Handler. Under the mod_perl handler, these are not tied for you; thus,
you must use $r->print() instead.
The re-tie ing is likely goofing something.
--
------------------------------------------------------------------------
Philip M. Gollucci (pgollucci@p6m7g8.com) 323.219.4708
Consultant / http://p6m7g8.net/Resume/resume.shtml
Senior Software Engineer - TicketMaster - http://ticketmaster.com
1024D/A79997FA F357 0FDD 2301 6296 690F 6A47 D55A 7172 A799 97F
"It takes a minute to have a crush on someone, an hour to like someone,
and a day to love someone, but it takes a lifetime to forget someone..."
Re: mp2: utf-8 and uc() under modperl2
Posted by Jason Rhinelander <ja...@jagerman.com>.
Philip M. Gollucci wrote:
>> It is, because in Perl stdin, stdout, and stderr are aliases for STDIN,
>> STDOUT, and STDERR -- but that aliasing doesn't appear to be present
>> under mod_perl. Now, perhaps it was intentional, but in that case it
>> should at least be documented somewhere.
> Feel free to supply a documentation patch and it will get added.
>
Is it, then, intentional?
--
-- Jason Rhinelander
Re: mp2: utf-8 and uc() under modperl2
Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
> It is, because in Perl stdin, stdout, and stderr are aliases for STDIN,
> STDOUT, and STDERR -- but that aliasing doesn't appear to be present
> under mod_perl. Now, perhaps it was intentional, but in that case it
> should at least be documented somewhere.
Feel free to supply a documentation patch and it will get added.
--
------------------------------------------------------------------------
Philip M. Gollucci (pgollucci@p6m7g8.com) 323.219.4708
Consultant / http://p6m7g8.net/Resume/resume.shtml
Senior Software Engineer - TicketMaster - http://ticketmaster.com
1024D/A79997FA F357 0FDD 2301 6296 690F 6A47 D55A 7172 A799 97F
"It takes a minute to have a crush on someone, an hour to like someone,
and a day to love someone, but it takes a lifetime to forget someone..."
Re: mp2: utf-8 and uc() under modperl2
Posted by Jason Rhinelander <ja...@jagerman.com>.
tomas@tuxteam.de wrote:
>> makes it work properly. This seems to me like a bug, but perhaps
>> someone more familiar with mod_perl's STDOUT tying than I can explain
>> this (or confirm this as a bug).
>
> Duh. Sorry I didn't see that before. In Perl, the file handles for
> stdin, stdout and stderr are written in capital letters. So this is not
> a bug.
It is, because in Perl stdin, stdout, and stderr are aliases for STDIN,
STDOUT, and STDERR -- but that aliasing doesn't appear to be present
under mod_perl. Now, perhaps it was intentional, but in that case it
should at least be documented somewhere.
> It might be considered annoying that Perl doesn't complain if you pass a
> not-yet-defined file handle (stdin in this case). Even with 'use
> strict'.
>
> Regards
> -- tomás
--
-- Jason Rhinelander
Re: mp2: utf-8 and uc() under modperl2
Posted by to...@tuxteam.de.
On Fri, May 26, 2006 at 02:56:17PM -0700, Jason Rhinelander wrote:
> Gunnar Koppel wrote:
[...]
> > I have little problem with UTF-8 under modperl2. I made such little
> > script for testing:
[...]
> > binmode stdin, ":utf8";
> > binmode stdout, ":utf8";
[...]
> binmode STDOUT, ":utf8";
>
> makes it work properly. This seems to me like a bug, but perhaps
> someone more familiar with mod_perl's STDOUT tying than I can explain
> this (or confirm this as a bug).
Duh. Sorry I didn't see that before. In Perl, the file handles for
stdin, stdout and stderr are written in capital letters. So this is not
a bug.
It might be considered annoying that Perl doesn't complain if you pass a
not-yet-defined file handle (stdin in this case). Even with 'use
strict'.
Regards
-- tomás
Re: mp2: utf-8 and uc() under modperl2
Posted by Jason Rhinelander <ja...@jagerman.com>.
Gunnar Koppel wrote:
> Terr!
>
> I have little problem with UTF-8 under modperl2. I made such little
> script for testing:
>
> -------
> #!/usr/bin/perl
>
> use strict;
> use locale;
> use utf8;
> binmode stdin, ":utf8";
> binmode stdout, ":utf8";
>
> my @alpha = qw(a b c d e f g h i j k l m n o p q r s š z ž t u v õ ä ö ü
> x y);
> print "Content-Type: text/plain; charset=UTF-8\n\n";
> print "LC_CTYPE: $lc\n";
> print "\u$_ " foreach @alpha;
> print "\n";
> -------
The script as above doesn't work out of the box -- the '$lc' variable
isn't defined. Commenting out that line, I got the same results as you
and eventually figured it out to be a problem with using 'stdout'
instead of 'STDOUT' in your binmode() calls. Changing the binmode from:
binmode stdout, ":utf8";
to:
binmode STDOUT, ":utf8";
makes it work properly. This seems to me like a bug, but perhaps
someone more familiar with mod_perl's STDOUT tying than I can explain
this (or confirm this as a bug).
--
-- Jason Rhinelander
mp2: utf-8 and uc() under modperl2
Posted by Gunnar Koppel <gu...@raamatukoi.ee>.
Terr!
I have little problem with UTF-8 under modperl2. I made such little
script for testing:
-------
#!/usr/bin/perl
use strict;
use locale;
use utf8;
binmode stdin, ":utf8";
binmode stdout, ":utf8";
my @alpha = qw(a b c d e f g h i j k l m n o p q r s š z ž t u v õ ä ö ü
x y);
print "Content-Type: text/plain; charset=UTF-8\n\n";
print "LC_CTYPE: $lc\n";
print "\u$_ " foreach @alpha;
print "\n";
-------
Purpose of this script is to print capitalised Estonian alphabet (and to
test locale/utf). It works fine on command line and under mod_cgi,
similar code (with minor diffs) works under mod_perl 1 (under PerlRun,
PerlRegistry and as a handler), but it does not give proper output under
mod_perl2. 4 diacritic characters after 'v' get ugly and i can't find a
solution. Any ideas?
Background:
SERVER_SOFTWARE: Apache/2.0.55 (Debian) mod_apreq2-20051231/2.5.7
mod_perl/2.0.2 Perl/v5.8.8
For case the UTF-chars get ugly here too, you can see sample code and
outputs here:
http://wanradt.msn.ee/code.html
--
TIA,
Gunnar Koppel
Re: is there a way to force UTF-8 encoding
Posted by "Christopher H. Laco" <cl...@chrislaco.com>.
John N. Brahy wrote:
> Is there a way to force UTF-8 encoding? I have tried
>
> AddDefaultCharset utf-8 in the httpd.conf
>
> OS: OpenBSD
> Apache: Apache/1.3.29 (Unix) mod_perl/1.29 mod_ssl/2.8.16 OpenSSL/0.9.7g
>
> But
> 1) wget -S says it's Content-Type: text/html; charset=ISO-8859-1
> 2) when I try the HTML validator on w3c.org it tells me that it's
> ISO-8859-1
> 3) Internet Explorer and Firefox both have ISO-8859-1 selected
> 4) Firefox's Page Info shows it as ISO-8859-1
>
> Anybody know a way to force it to utf-8?
Are there actually any UTF-8 encoded characters in the output?
If their aren't any, then the document can really be both encodings at
the same time, unless of course the document also includes a BOM (Byte
Order Marker).
-=Chris