You are viewing a plain text version of this content. The canonical link for it is here.
Posted to embperl@perl.apache.org by Torsten Lüttgert <t....@combox.de> on 2005/05/31 16:42:24 UTC

More UTF8 woes

Hi,

I posted some tips for UTF8-based web applications with DB support
to this list some time ago.

Now, it looks like I was wrong at quite a few points:

* use encoding 'utf8';
  which I used in Embperl_Top_Include turned out to be a very bad
  idea, since it breaks regular expressions with non-ASCII chars.
  It doesn't work on all I/O, too, sometimes perl thinks it has
  to use Latin1 anyway. Don't ask me why.

* Embperl_Top_Include doesn't work for 'use utf8;'. I think

	Embperl_Top_Include "use utf8;"

  is equivalent to

	[- use utf8; -]

  at the top of the page, and this does exactly nothing, because
  perl stuff in [- .. -] has its own scope, and "use utf8;" only
  applies to the current scope.

  So, one must place

	[* use utf8; *]

  at the top of every page if literal strings should be flagged as
  utf8.

  One thing puzzles me here: If I place "use utf8;" in my
  startup.pl (i.e., I tell mod_perl that everything under the
  main scope is utf8), it still doesn't work! Is it switched off
  by Embperl again or something?

* [! ... !] blocks need their own "use utf8;". The [* use utf8; *]
  at the top doesn't apply to them. Argh.
  Why is this so? The documentation states that [! !] blocks
  are only executed once, but don't they run in the same scope!?

  A documentation of all the scopes and namespaces in which
  startup.pl, the httpd configuration file directives, my .ep pages,
  the various [ ] expressions and pages started by Execute() would
  be a big help.

* File-IO: I thought I could make utf8 I/O the default by placing
  "use open IO => ':encoding(utf8)';" into Embperl_Top_Include.
  (this was implied by use encoding 'utf8'; which I used before,
  and now I needed a replacement).
  Doesn't work. [* use open IO => ':encoding(utf8)'; *] doesn't
  work either, nor any other form I tried.
  The only solution is to specify the character set in every
  open statement, like

  	open(IN, '<:encoding(utf8)', '/tmp/some/file');


All in all, I can only say that using utf8 in Embperl is very,
very frustrating, because so many things break in non-obvious ways.
Could we at some point have a general "use-utf8-for-everything"
switch?

Regards,
Torsten



---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org