You are viewing a plain text version of this content. The canonical link for it is here.
Posted to asp@perl.apache.org by k_berov <k_...@yahoo.com> on 2004/12/10 19:38:01 UTC

UTF8 - Who is wrong? Apache::ASP or mod_perl or Apache or Perl?

..You will see what I ment when you reload the page in utf-8 
Hi Boys. I think it's perl's fault, but if someone of you had such
problems and tell mi what is wrong, I will be very gratefull.
So.. I manage a multilanguage site and it is all utf-8. There is an
application where the users place some text. The used languages are
mainly Bulgarian and English with .. German eventualy. There are
situations in which the user enter latin and ciryllic characters in
different textboxes.
And here my nigthmer begins.
the applicatin decides that the input is Latin-1(or i do not know)
and brakes all cyrilic characters into separate bytes
which is simply frigthening!!!

You will see what I ment below when you reload the page in utf-8 
#################
Тестов текст
######becomes
ТеÃ`Ã`‚ов Ã`‚екÃ`Ã`‚

It is interesting that Perl 5.6.1 does not make such problems
The current configuration is Apache 1.3.31 mod_perl 1.29 Apache::ASP
2.57n on Mandrake 10.0
I made everything possible to resolve the problem. 
Here is how looks the beginning of my global.asa:

use utf8;
binmode(STDOUT, ':utf8');
binmode(STDIN, ':utf8');
use DBI;
#use Data::Dumper;
require "../SomeModule.pm";
##AND more
sub Script_OnStart{
       $Response->{Charset}="utf-8";
#....
}
I placed in httpd.conf too
<Perl>
use utf8;
#...
</Perl>
My pages are all written in utf-8
I do not have variables written with cyrillic characters
 Only literal text in the HTML
There should be no reason for this disaster to happen to me.
It brakes months of work.

Thak You




---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org


Re: UTF8 - Who is wrong? Apache::ASP or mod_perl or Apache or Perl?

Posted by Warren Young <wa...@etr-usa.com>.
k_berov wrote:

> Hi Boys. I think it's perl's fault, but if someone of you had such
> problems and tell mi what is wrong, I will be very gratefull.

If you search the archives for a post I made less than a year back, you
will find that I documented some interesting conversion chains in my 
application.  You might find it helpful to read through it.

The core of the problem is that Perl does not run in UTF-8 mode by 
default.  It either tries to guess that it should use UTF-8 (e.g. the 
LANG environment variable) or it is told that it should use UTF-8 by
directive.  So for instance, the Apache::ASP Perl interpreter could be 
seeing a different environment than other Perl code on your system 
because of the way mod_perl works, and so it will convert incoming UTF-8 
to Latin-1.  Then if your httpd is configured to use UTF-8, it may try 
to convert Latin-1 back to UTF-8.

The thing to do is to carefully find all of the stages in your system by 
tracing data through the system.  Once you find all the transition 
points, you will know which code needs to be changed to enforce a pure 
UTF-8 data path.

> It is interesting that Perl 5.6.1 does not make such problems

That's because Perl 5.6 made fewer attempts to convert data.  5.8 is 
more "clever", which can be a problem as well as a benefit.

> There should be no reason for this disaster to happen to me.

Sure there's a reason.  It's called Murphy's Law.  Cope with it.

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org