You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by "John N. Brahy" <jb...@ad2.com> on 2006/03/03 21:01:30 UTC

is there a way to force UTF-8 encoding

Is there a way to force UTF-8 encoding? I have tried

AddDefaultCharset utf-8 in the httpd.conf

OS: OpenBSD 
Apache: Apache/1.3.29 (Unix) mod_perl/1.29 mod_ssl/2.8.16 OpenSSL/0.9.7g

But 
1) wget -S says it's Content-Type: text/html; charset=ISO-8859-1
2) when I try the HTML validator on w3c.org it tells me that it's
ISO-8859-1
3) Internet Explorer and Firefox both have ISO-8859-1 selected
4) Firefox's Page Info shows it as ISO-8859-1

Anybody know a way to force it to utf-8?


::::: John Brahy
::::: CIO
::::: www.ad2.com

::::: jbrahy@ad2.com
::::: t: 310-356-7500
::::: f: 310-356-7520

::::: ad2, Inc.
::::: 1990 East Grand Ave, Suite 200
::::: El Segundo, CA 90245


Re: mp2: utf-8 and uc() under modperl2

Posted by Gunnar Koppel <gu...@raamatukoi.ee>.
Addition:

Seems, that under modperl2 are 4 problematic characters
(õ [otilde] ä [auml] ö [ouml] ü [uuml]) converted to ISO8859-x charset. 
And i can't avoid it. Why are they converted and how to keep them as UTF?

-- 

Best regards,
Kõike hääd,

Gunnar Koppel

Re: mp2: utf-8 and uc() under modperl2 handler (still questions)

Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
Gunnar Koppel wrote:
> 
> Gunnar Koppel kirjutas:
> 
>> Thank you and all others! That's it. BTW, i always use uppercase 
> 
> Seems my joy was too early. As i said, I alway use uppercase 
> filehandlers and so was in every real situation, where was this problem 
> with UTF<->modperl2.
> 
> As a script solution with STD* works fine, but as a handler it gives 
> same output as earlier.
I might be able to look at this laterish... In the meantime, if your end 
goal is a mod_perl2 handler .... why not use libapreq2... might solve 
your problem.


-- 
------------------------------------------------------------------------
Philip M. Gollucci (pgollucci@p6m7g8.com) 323.219.4708
Consultant / http://p6m7g8.net/Resume/resume.shtml
Senior Software Engineer - TicketMaster - http://ticketmaster.com
1024D/A79997FA F357 0FDD 2301 6296 690F  6A47 D55A 7172 A799 97F

"It takes a minute to have a crush on someone, an hour to like someone,
and a day to love someone, but it takes a lifetime to forget someone..."

Re: mp2: utf-8 and uc() under modperl2 handler (still questions)

Posted by Jason Rhinelander <ja...@jagerman.com>.
Gunnar Koppel wrote:
> Seems my joy was too early. As i said, I alway use uppercase
> filehandlers and so was in every real situation, where was this problem
> with UTF<->modperl2.
> 
> As a script solution with STD* works fine, but as a handler it gives
> same output as earlier.
> 
> My little test set for handler:
> [...]
> But not as a handler. Can't understand, why? Seems that binmode STD*,
> ":utf8" has no power here?

Move your binmode()s into the handler -- otherwise they are happening
only once, when the module is loaded (and before STDOUT, etc. are tied),
instead of each time the handler runs.


-- 
-- Jason Rhinelander

mp2: utf-8 and uc() under modperl2 handler (still questions)

Posted by Gunnar Koppel <gu...@raamatukoi.ee>.
Gunnar Koppel kirjutas:

> Thank you and all others! That's it. BTW, i always use uppercase 

Seems my joy was too early. As i said, I alway use uppercase 
filehandlers and so was in every real situation, where was this problem 
with UTF<->modperl2.

As a script solution with STD* works fine, but as a handler it gives 
same output as earlier.

My little test set for handler:
------
package test::utf;

use strict;
use Apache2::Const qw(:common);
use CGI;
use locale;
use utf8;
binmode STDIN, ":utf8";
binmode STDOUT, ":utf8";
our $q = ();

sub handler {
	my @alpha = qw(a b c d e f g h i j k l m n o p q r s š z ž t u v õ ä ö 
ü x y);
	$q = new CGI;
	print $q->header(-type=>"text/plain", -charset=>"UTF-8", -cookie=>'');
	print "\u$_ " foreach @alpha;
	print "\n";
	return OK;
}
1;
------

apache2 conf:
------
<VirtualHost *>
DocumentRoot /home/www/test/pub
ServerName utf.test.com
PerlModule test::utf
SetHandler perl-script
PerlHandler test::utf
PerlSendHeader On
</VirtualHost>
------

If i run this little script (from command line, under PerlRun, 
PerlRegistry or as CGI), i have proper output:
------
#!/usr/bin/perl

use strict;
use warnings;
use locale;
use utf8;
binmode STDIN, ":utf8";
binmode STDOUT, ":utf8";
use test::utf;

&test::utf::handler();
------

But not as a handler. Can't understand, why? Seems that binmode STD*, 
":utf8" has no power here?

And still is the question: if std* are intentionally not valid 
filehandlers, why some (multibyte) UTF characters are handled correctly 
under modperl2 and other not?

-- 

TIA,

Gunnar Koppel

Re: mp2: utf-8 and uc() under modperl2

Posted by Gunnar Koppel <gu...@raamatukoi.ee>.
Jason Rhinelander kirjutas:

> The script as above doesn't work out of the box -- the '$lc' variable
> isn't defined.  Commenting out that line, I got the same results as you

Yes, i took out some lines from my test script, which defined $lc, but 
forgot still this line. Oops.

> and eventually figured it out to be a problem with using 'stdout'
> instead of 'STDOUT' in your binmode() calls.  Changing the binmode from:

Thank you and all others! That's it. BTW, i always use uppercase 
filehandles, but somehow i didn't it now with modperl2 and here is the 
result ;)

-- 

Best regards,
Kõike hääd,

Gunnar Koppel

Re: mp2: utf-8 and uc() under modperl2

Posted by to...@tuxteam.de.
On Fri, May 26, 2006 at 11:44:29PM -0700, Jason Rhinelander wrote:
> tomas@tuxteam.de wrote:
[...]
> It is, because in Perl stdin, stdout, and stderr are aliases for STDIN,
> STDOUT, and STDERR [...]

You're right. Sorry for the noise.

Regards

-- tomás

Re: mp2: utf-8 and uc() under modperl2

Posted by Jason Rhinelander <ja...@jagerman.com>.
Philip M. Gollucci wrote:
> Jason Rhinelander wrote:
>> Is it, then, intentional?
> You know, I'm not entirely sure, but I betting its because
> STDIN, STDERR, STDOUT are re-tied to the streams in the request object
> automagically for you in Registery/PerlRun under the 'perl-script'
> Handler.  Under the mod_perl handler, these are not tied for you; thus,
> you must use $r->print() instead.
> 
> The re-tie ing is likely goofing something.

It isn't, actually: I was mistake on this -- the stdin, stdout, and
stderr aliases are only available in package main[1] and so, of course,
don't work under ::Registry and ::PerlRun.  As this shows up with a
"binmode() on unopened filehandle" warning when warnings are enabled, I
don't think it warrants a documentation update.

1 - documented in perldoc perlop (search for "stdin", case-sensitively).


-- 
-- Jason Rhinelander

Re: mp2: utf-8 and uc() under modperl2

Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
Jason Rhinelander wrote:
> Is it, then, intentional?
You know, I'm not entirely sure, but I betting its because
STDIN, STDERR, STDOUT are re-tied to the streams in the request object 
automagically for you in Registery/PerlRun under the 'perl-script' 
Handler.  Under the mod_perl handler, these are not tied for you; thus,
you must use $r->print() instead.

The re-tie ing is likely goofing something.




-- 
------------------------------------------------------------------------
Philip M. Gollucci (pgollucci@p6m7g8.com) 323.219.4708
Consultant / http://p6m7g8.net/Resume/resume.shtml
Senior Software Engineer - TicketMaster - http://ticketmaster.com
1024D/A79997FA F357 0FDD 2301 6296 690F  6A47 D55A 7172 A799 97F

"It takes a minute to have a crush on someone, an hour to like someone,
and a day to love someone, but it takes a lifetime to forget someone..."

Re: mp2: utf-8 and uc() under modperl2

Posted by Jason Rhinelander <ja...@jagerman.com>.
Philip M. Gollucci wrote:
>> It is, because in Perl stdin, stdout, and stderr are aliases for STDIN,
>> STDOUT, and STDERR -- but that aliasing doesn't appear to be present
>> under mod_perl.  Now, perhaps it was intentional, but in that case it
>> should at least be documented somewhere.
> Feel free to supply a documentation patch and it will get added.
> 

Is it, then, intentional?

-- 
-- Jason Rhinelander

Re: mp2: utf-8 and uc() under modperl2

Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
> It is, because in Perl stdin, stdout, and stderr are aliases for STDIN,
> STDOUT, and STDERR -- but that aliasing doesn't appear to be present
> under mod_perl.  Now, perhaps it was intentional, but in that case it
> should at least be documented somewhere.
Feel free to supply a documentation patch and it will get added.

-- 
------------------------------------------------------------------------
Philip M. Gollucci (pgollucci@p6m7g8.com) 323.219.4708
Consultant / http://p6m7g8.net/Resume/resume.shtml
Senior Software Engineer - TicketMaster - http://ticketmaster.com
1024D/A79997FA F357 0FDD 2301 6296 690F  6A47 D55A 7172 A799 97F

"It takes a minute to have a crush on someone, an hour to like someone,
and a day to love someone, but it takes a lifetime to forget someone..."

Re: mp2: utf-8 and uc() under modperl2

Posted by Jason Rhinelander <ja...@jagerman.com>.
tomas@tuxteam.de wrote:
>> makes it work properly.  This seems to me like a bug, but perhaps
>> someone more familiar with mod_perl's STDOUT tying than I can explain
>> this (or confirm this as a bug).
> 
> Duh. Sorry I didn't see that before. In Perl, the file handles for
> stdin, stdout and stderr are written in capital letters. So this is not
> a bug.

It is, because in Perl stdin, stdout, and stderr are aliases for STDIN,
STDOUT, and STDERR -- but that aliasing doesn't appear to be present
under mod_perl.  Now, perhaps it was intentional, but in that case it
should at least be documented somewhere.

> It might be considered annoying that Perl doesn't complain if you pass a
> not-yet-defined file handle (stdin in this case). Even with 'use
> strict'.
>
> Regards
> -- tomás

-- 
-- Jason Rhinelander


Re: mp2: utf-8 and uc() under modperl2

Posted by to...@tuxteam.de.
On Fri, May 26, 2006 at 02:56:17PM -0700, Jason Rhinelander wrote:
> Gunnar Koppel wrote:
[...]
> > I have little problem with UTF-8 under modperl2. I made such little
> > script for testing:
[...]
> > binmode stdin, ":utf8";
> > binmode stdout, ":utf8";
[...]
> binmode STDOUT, ":utf8";
> 
> makes it work properly.  This seems to me like a bug, but perhaps
> someone more familiar with mod_perl's STDOUT tying than I can explain
> this (or confirm this as a bug).

Duh. Sorry I didn't see that before. In Perl, the file handles for
stdin, stdout and stderr are written in capital letters. So this is not
a bug.

It might be considered annoying that Perl doesn't complain if you pass a
not-yet-defined file handle (stdin in this case). Even with 'use
strict'.

Regards
-- tomás

Re: mp2: utf-8 and uc() under modperl2

Posted by Jason Rhinelander <ja...@jagerman.com>.
Gunnar Koppel wrote:
> Terr!
> 
> I have little problem with UTF-8 under modperl2. I made such little
> script for testing:
> 
> -------
> #!/usr/bin/perl
> 
> use strict;
> use locale;
> use utf8;
> binmode stdin, ":utf8";
> binmode stdout, ":utf8";
> 
> my @alpha = qw(a b c d e f g h i j k l m n o p q r s š z ž t u v õ ä ö ü
> x y);
> print "Content-Type: text/plain; charset=UTF-8\n\n";
> print "LC_CTYPE: $lc\n";
> print "\u$_ " foreach @alpha;
> print "\n";
> -------

The script as above doesn't work out of the box -- the '$lc' variable
isn't defined.  Commenting out that line, I got the same results as you
and eventually figured it out to be a problem with using 'stdout'
instead of 'STDOUT' in your binmode() calls.  Changing the binmode from:

binmode stdout, ":utf8";

to:

binmode STDOUT, ":utf8";

makes it work properly.  This seems to me like a bug, but perhaps
someone more familiar with mod_perl's STDOUT tying than I can explain
this (or confirm this as a bug).


-- 
-- Jason Rhinelander


mp2: utf-8 and uc() under modperl2

Posted by Gunnar Koppel <gu...@raamatukoi.ee>.
Terr!

I have little problem with UTF-8 under modperl2. I made such little
script for testing:

-------
#!/usr/bin/perl

use strict;
use locale;
use utf8;
binmode stdin, ":utf8";
binmode stdout, ":utf8";

my @alpha = qw(a b c d e f g h i j k l m n o p q r s š z ž t u v õ ä ö ü
x y);
print "Content-Type: text/plain; charset=UTF-8\n\n";
print "LC_CTYPE: $lc\n";
print "\u$_ " foreach @alpha;
print "\n";
-------

Purpose of this script is to print capitalised Estonian alphabet (and to 
test locale/utf). It works fine on command line and under mod_cgi, 
similar code (with minor diffs) works under mod_perl 1 (under PerlRun, 
PerlRegistry and as a handler), but it does not give proper output under 
mod_perl2. 4 diacritic characters after 'v' get ugly and i can't find a 
solution. Any ideas?

Background:
SERVER_SOFTWARE: Apache/2.0.55 (Debian) mod_apreq2-20051231/2.5.7 
mod_perl/2.0.2 Perl/v5.8.8

For case the UTF-chars get ugly here too, you can see sample code and 
outputs here:
http://wanradt.msn.ee/code.html

-- 

TIA,

Gunnar Koppel


Re: is there a way to force UTF-8 encoding

Posted by "Christopher H. Laco" <cl...@chrislaco.com>.
John N. Brahy wrote:
> Is there a way to force UTF-8 encoding? I have tried
> 
> AddDefaultCharset utf-8 in the httpd.conf
> 
> OS: OpenBSD 
> Apache: Apache/1.3.29 (Unix) mod_perl/1.29 mod_ssl/2.8.16 OpenSSL/0.9.7g
> 
> But 
> 1) wget -S says it's Content-Type: text/html; charset=ISO-8859-1
> 2) when I try the HTML validator on w3c.org it tells me that it's
> ISO-8859-1
> 3) Internet Explorer and Firefox both have ISO-8859-1 selected
> 4) Firefox's Page Info shows it as ISO-8859-1
> 
> Anybody know a way to force it to utf-8?

Are there actually any UTF-8 encoded characters in the output?
If their aren't any, then the document can really be both encodings at
the same time, unless of course the document also includes a BOM (Byte
Order Marker).

-=Chris