You are viewing a plain text version of this content. The canonical link for it is here.
Posted to embperl@perl.apache.org by Torsten Lüttgert <t....@combox.de> on 2005/05/31 16:42:24 UTC
More UTF8 woes
Hi,
I posted some tips for UTF8-based web applications with DB support
to this list some time ago.
Now, it looks like I was wrong at quite a few points:
* use encoding 'utf8';
which I used in Embperl_Top_Include turned out to be a very bad
idea, since it breaks regular expressions with non-ASCII chars.
It doesn't work on all I/O, too, sometimes perl thinks it has
to use Latin1 anyway. Don't ask me why.
* Embperl_Top_Include doesn't work for 'use utf8;'. I think
Embperl_Top_Include "use utf8;"
is equivalent to
[- use utf8; -]
at the top of the page, and this does exactly nothing, because
perl stuff in [- .. -] has its own scope, and "use utf8;" only
applies to the current scope.
So, one must place
[* use utf8; *]
at the top of every page if literal strings should be flagged as
utf8.
One thing puzzles me here: If I place "use utf8;" in my
startup.pl (i.e., I tell mod_perl that everything under the
main scope is utf8), it still doesn't work! Is it switched off
by Embperl again or something?
* [! ... !] blocks need their own "use utf8;". The [* use utf8; *]
at the top doesn't apply to them. Argh.
Why is this so? The documentation states that [! !] blocks
are only executed once, but don't they run in the same scope!?
A documentation of all the scopes and namespaces in which
startup.pl, the httpd configuration file directives, my .ep pages,
the various [ ] expressions and pages started by Execute() would
be a big help.
* File-IO: I thought I could make utf8 I/O the default by placing
"use open IO => ':encoding(utf8)';" into Embperl_Top_Include.
(this was implied by use encoding 'utf8'; which I used before,
and now I needed a replacement).
Doesn't work. [* use open IO => ':encoding(utf8)'; *] doesn't
work either, nor any other form I tried.
The only solution is to specify the character set in every
open statement, like
open(IN, '<:encoding(utf8)', '/tmp/some/file');
All in all, I can only say that using utf8 in Embperl is very,
very frustrating, because so many things break in non-obvious ways.
Could we at some point have a general "use-utf8-for-everything"
switch?
Regards,
Torsten
---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org