You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Rob French <dr...@gmail.com> on 2008/03/19 08:32:40 UTC

[mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

I have recently started converting one of our webapps to make it fully
UTF-8 compliant. All input/output from the webapp will be encoded as
UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
enable UTF-8 flagging on all input/output streams. This works with
standalone Perl scripts like the one below (the /tmp/utf8.txt file
contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :

#!/usr/bin/perl -w

use strict;
use Encode;

print "PERL_UNICODE Value: ${^UNICODE}\n";
open(FH, "</tmp/utf8.txt");
undef $/;
my $var = <FH>;
close(FH);

print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
exit;

The resulting output after setting my PERL_UNICODE env var to SDA is:

PERL_UNICODE Value: 63
Flagged as UTF8? 1

Which is correct. Perl processed the input stream (open) as UTF-8 and
flagged it accordingly.

Unfortunately if I put the exact same open call in my mod_perl
TransHandler $var is not flagged as UTF-8. The resulting output when
run in the TransHandler is:

PERL_UNICODE Value: 63
Flagged as UTF8?

The input stream is not processed as UTF-8 and not flagged internally
as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
then everything works as expected. It appears as if mod_perl is
ignoring the PERL_UNICODE env variable and not processing my input
streams as UTF-8.

Thanks in advance.

Cheers




Environment details below:

Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
  Platform:
    osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
archname=i386-linux-thread-multi
    uname='linux hs20-bc1-4.build.redhat.com
2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
i686 i386 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
-mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
-Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc.
-Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
-Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
-Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
-Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
-Dinstallusrbinperl -Ubincompat5005 -Uversiononly
-Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
5.8.0'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.3.4'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
-Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'


Characteristics of this binary (from libperl):
  Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
  Built under linux
  Compiled at Jul 24 2006 18:28:10
  @INC:
    /usr/lib/perl5/5.8.5/i386-linux-thread-multi
    /usr/lib/perl5/5.8.5
    /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.5
    /usr/lib/perl5/site_perl/5.8.4
    /usr/lib/perl5/site_perl/5.8.3
    /usr/lib/perl5/site_perl/5.8.2
    /usr/lib/perl5/site_perl/5.8.1
    /usr/lib/perl5/site_perl/5.8.0
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.5
    /usr/lib/perl5/vendor_perl/5.8.4
    /usr/lib/perl5/vendor_perl/5.8.3
    /usr/lib/perl5/vendor_perl/5.8.2
    /usr/lib/perl5/vendor_perl/5.8.1
    /usr/lib/perl5/vendor_perl/5.8.0
    /usr/lib/perl5/vendor_perl
    .
mod_perl version: 1.30

Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

Posted by Rob French <dr...@gmail.com>.
Good suggestion. It looks like that works for my simple open() example
but unfortunately it doesn't work when reading from sockets. What I am
trying to do is tell Perl that all incoming POST data is UTF-8 encoded
and flag it as such. The $r->read() call unfortunately doesn't abide
by the open pragma.

Looks like I might have to go dig through source :-)

Thanks again for the help.

On Wed, Mar 19, 2008 at 12:45 PM, André Warnier <aw...@ice-sa.com> wrote:
>
>
>  André Warnier wrote:
>  > One more thing to try : doing a
>  > use open ':utf8';
>  > in the global mod_perl startup script.
>  >
>
>  well, that works.
>  Rob, that should probably help you.
>  The difference with PERL_UNICODE "SAD" seems to be that it will not
>  automatically consider @ARGV as utf-8.
>
>
>
>
>
>  > Rob French wrote:
>  >> Setting the environment variable has always worked. mod_perl can "see"
>  >> the PERL_UNICODE variable is set based on the fact that the
>  >> ${^UNICODE} variable is returning 63 (SDA). The problem is that it
>  >> seems to ignore it.
>  >>
>  >> On Wed, Mar 19, 2008 at 12:01 PM, Dondi Stroma <ds...@verizon.net>
>  >> wrote:
>  >>> Maybe you need to use PerlSetEnv ?
>  >>>
>  >>>
>  >>>
>  >>>  ----- Original Message -----
>  >>>  From: "Rob French" <dr...@gmail.com>
>  >>>  To: "André Warnier" <aw...@ice-sa.com>
>  >>>  Cc: <mo...@perl.apache.org>
>  >>>  Sent: Wednesday, March 19, 2008 2:41 PM
>  >>>  Subject: Re: [mp1] Can't get UTF8 input streams to automatically be
>  >>> decoded
>  >>>  using PERL_UNICODE under mod_perl
>  >>>
>  >>>
>  >>>  I have tried setting it via Apache SetEnv directive as well as in my
>  >>>  environment as root when starting Apache. In both cases the variable
>  >>>  is correctly set in mod_perl it is just ignored.
>  >>>
>  >>>  As another test I tried the same code as a plain ol' CGI script and it
>  >>>  works in that case. So the issue is definitely with mod_perl and its
>  >>>  interaction with the PERL_UNICODE env variable.
>  >>>
>  >>>  Thanks for your help investigating. I was worried that it might be a
>  >>>  mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
>  >>>  my setup :)
>  >>>
>  >>>  Rgrds,
>  >>>  Rob
>  >>>
>  >>>
>  >>
>  >
>

Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

Posted by André Warnier <aw...@ice-sa.com>.

André Warnier wrote:
> One more thing to try : doing a
> use open ':utf8';
> in the global mod_perl startup script.
> 

well, that works.
Rob, that should probably help you.
The difference with PERL_UNICODE "SAD" seems to be that it will not 
automatically consider @ARGV as utf-8.



> Rob French wrote:
>> Setting the environment variable has always worked. mod_perl can "see"
>> the PERL_UNICODE variable is set based on the fact that the
>> ${^UNICODE} variable is returning 63 (SDA). The problem is that it
>> seems to ignore it.
>>
>> On Wed, Mar 19, 2008 at 12:01 PM, Dondi Stroma <ds...@verizon.net> 
>> wrote:
>>> Maybe you need to use PerlSetEnv ?
>>>
>>>
>>>
>>>  ----- Original Message -----
>>>  From: "Rob French" <dr...@gmail.com>
>>>  To: "André Warnier" <aw...@ice-sa.com>
>>>  Cc: <mo...@perl.apache.org>
>>>  Sent: Wednesday, March 19, 2008 2:41 PM
>>>  Subject: Re: [mp1] Can't get UTF8 input streams to automatically be 
>>> decoded
>>>  using PERL_UNICODE under mod_perl
>>>
>>>
>>>  I have tried setting it via Apache SetEnv directive as well as in my
>>>  environment as root when starting Apache. In both cases the variable
>>>  is correctly set in mod_perl it is just ignored.
>>>
>>>  As another test I tried the same code as a plain ol' CGI script and it
>>>  works in that case. So the issue is definitely with mod_perl and its
>>>  interaction with the PERL_UNICODE env variable.
>>>
>>>  Thanks for your help investigating. I was worried that it might be a
>>>  mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
>>>  my setup :)
>>>
>>>  Rgrds,
>>>  Rob
>>>
>>>
>>
> 

Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

Posted by André Warnier <aw...@ice-sa.com>.
And I think PerlSetEnv would not work anyway.
It will set PERL_UNICODE in time for the handler/script to print it, but 
probably too late for Perl to take it into account, since by that time 
the Perl interpreter is already up and running, so the internal 
$^UNICODE variable is already set since a long time.
That's why I was asking when it was being set.

By the way, in my case (apache2/mp2 and virtual servers), the Apache 
SetEnv sets $ENV{PERL_UNICODE} for the handler, but $^UNICODE remains 0.
One more thing to try : doing a
use open ':utf8';
in the global mod_perl startup script.


Rob French wrote:
> Setting the environment variable has always worked. mod_perl can "see"
> the PERL_UNICODE variable is set based on the fact that the
> ${^UNICODE} variable is returning 63 (SDA). The problem is that it
> seems to ignore it.
> 
> On Wed, Mar 19, 2008 at 12:01 PM, Dondi Stroma <ds...@verizon.net> wrote:
>> Maybe you need to use PerlSetEnv ?
>>
>>
>>
>>  ----- Original Message -----
>>  From: "Rob French" <dr...@gmail.com>
>>  To: "André Warnier" <aw...@ice-sa.com>
>>  Cc: <mo...@perl.apache.org>
>>  Sent: Wednesday, March 19, 2008 2:41 PM
>>  Subject: Re: [mp1] Can't get UTF8 input streams to automatically be decoded
>>  using PERL_UNICODE under mod_perl
>>
>>
>>  I have tried setting it via Apache SetEnv directive as well as in my
>>  environment as root when starting Apache. In both cases the variable
>>  is correctly set in mod_perl it is just ignored.
>>
>>  As another test I tried the same code as a plain ol' CGI script and it
>>  works in that case. So the issue is definitely with mod_perl and its
>>  interaction with the PERL_UNICODE env variable.
>>
>>  Thanks for your help investigating. I was worried that it might be a
>>  mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
>>  my setup :)
>>
>>  Rgrds,
>>  Rob
>>
>>
> 

Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

Posted by Rob French <dr...@gmail.com>.
Setting the environment variable has always worked. mod_perl can "see"
the PERL_UNICODE variable is set based on the fact that the
${^UNICODE} variable is returning 63 (SDA). The problem is that it
seems to ignore it.

On Wed, Mar 19, 2008 at 12:01 PM, Dondi Stroma <ds...@verizon.net> wrote:
> Maybe you need to use PerlSetEnv ?
>
>
>
>  ----- Original Message -----
>  From: "Rob French" <dr...@gmail.com>
>  To: "André Warnier" <aw...@ice-sa.com>
>  Cc: <mo...@perl.apache.org>
>  Sent: Wednesday, March 19, 2008 2:41 PM
>  Subject: Re: [mp1] Can't get UTF8 input streams to automatically be decoded
>  using PERL_UNICODE under mod_perl
>
>
>  I have tried setting it via Apache SetEnv directive as well as in my
>  environment as root when starting Apache. In both cases the variable
>  is correctly set in mod_perl it is just ignored.
>
>  As another test I tried the same code as a plain ol' CGI script and it
>  works in that case. So the issue is definitely with mod_perl and its
>  interaction with the PERL_UNICODE env variable.
>
>  Thanks for your help investigating. I was worried that it might be a
>  mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
>  my setup :)
>
>  Rgrds,
>  Rob
>
>

Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

Posted by Dondi Stroma <ds...@verizon.net>.
Maybe you need to use PerlSetEnv ?

----- Original Message ----- 
From: "Rob French" <dr...@gmail.com>
To: "André Warnier" <aw...@ice-sa.com>
Cc: <mo...@perl.apache.org>
Sent: Wednesday, March 19, 2008 2:41 PM
Subject: Re: [mp1] Can't get UTF8 input streams to automatically be decoded 
using PERL_UNICODE under mod_perl


I have tried setting it via Apache SetEnv directive as well as in my
environment as root when starting Apache. In both cases the variable
is correctly set in mod_perl it is just ignored.

As another test I tried the same code as a plain ol' CGI script and it
works in that case. So the issue is definitely with mod_perl and its
interaction with the PERL_UNICODE env variable.

Thanks for your help investigating. I was worried that it might be a
mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
my setup :)

Rgrds,
Rob


Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

Posted by Rob French <dr...@gmail.com>.
I have tried setting it via Apache SetEnv directive as well as in my
environment as root when starting Apache. In both cases the variable
is correctly set in mod_perl it is just ignored.

As another test I tried the same code as a plain ol' CGI script and it
works in that case. So the issue is definitely with mod_perl and its
interaction with the PERL_UNICODE env variable.

Thanks for your help investigating. I was worried that it might be a
mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
my setup :)

Rgrds,
Rob

On Wed, Mar 19, 2008 at 11:35 AM, André Warnier <aw...@ice-sa.com> wrote:
> Hi.
>
>  I cannot really think of a reason why Perl itself would do something
>  different in either case.  And in your tests, it was verified that
>  PERL_UNICODE itself is still set right under mod_perl.  So it must be
>  that mod_perl somehow overrides the basic Perl setting.  Maybe mod_perl
>  needs to do something re the filehandles, because some of them might be
>  connected to Apache ?
>
>  Anyhow, out of my depth now, so let's call on a real mod_perl guru if
>  any of them is around ?
>
>  By the way :
>  I have tried the same thing in the meantime under Apache 2.x/mod_perl
>  2.x, and I seem to have the same problem.
>
>  I have one more question : where exactly do you set PERL_UNICODE ?
>
>
>
>
>
>  Rob French wrote:
>  > Hi André,
>  >
>  > Yes, I tried that as well and it worked as expected (UTF-8 flag is
>  > set). Explicit PerlIO layer decoding works in both the non-mod_perl
>  > and mod_perl tests. It seems only the default PERL_UNICODE setting is
>  > ignored in mod_perl even though it is set.
>  >
>  > Rgrds,
>  > Rob
>  >
>  > On Wed, Mar 19, 2008 at 3:01 AM, André Warnier <aw...@ice-sa.com> wrote:
>  >> Hi.
>  >>
>  >>  Perl's handling of Unicode (and of character sets in general) is
>  >>  extremely clever and powerful.
>  >>  But it can sometimes be a bit counter-intuitive.
>  >>
>  >>  In any case, it seems to me that the evaluation of the PERL_UNICODE
>  >>  environment variable is a "Perl thing" rather than a "mod_perl thing",
>  >>  and that mod_perl per se should not interfere with it.  But maybe
>  >>  mod_perl does some magic on filehandles in general which interferes, who
>  >>  knows ?
>  >>
>  >>  Maybe the first thing to do is to ascertain that the problem is really
>  >>  due to a mishandling of the PERL_UNICODE environment variable, or
>  >>  something else.  I propose a simple test :
>  >>  Instead of relying on the PERL_UNICODE variable, what happens when you
>  >>  change the open() statement as follows :
>  >>
>  >>   > open(FH, '<:utf8',"/tmp/utf8.txt");
>  >>
>  >>  thus explicitly setting a UTF-8 decoding layer for the stream FH,
>  >>  instead of relying on PERL_UNICODE.
>  >>  Does your follow-up test then indicate that the utf8 flag for $var is  set ?
>  >>
>  >>  Note : even with the decoding layer set, that does not necessarily mean
>  >>  that all data you read will end up with the utf8 flag set.  It depends
>  >>  on the data.  But in your case, if you are really using the same file
>  >>  data in both tests you show below, then it seems a valid test.
>  >>
>  >>  André
>  >>
>  >>
>  >>
>  >>
>  >>  Rob French wrote:
>  >>  > I have recently started converting one of our webapps to make it fully
>  >>  > UTF-8 compliant. All input/output from the webapp will be encoded as
>  >>  > UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
>  >>  > enable UTF-8 flagging on all input/output streams. This works with
>  >>  > standalone Perl scripts like the one below (the /tmp/utf8.txt file
>  >>  > contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
>  >>  >
>  >>  > #!/usr/bin/perl -w
>  >>  >
>  >>  > use strict;
>  >>  > use Encode;
>  >>  >
>  >>  > print "PERL_UNICODE Value: ${^UNICODE}\n";
>  >>  > open(FH, "</tmp/utf8.txt");
>  >>  > undef $/;
>  >>  > my $var = <FH>;
>  >>  > close(FH);
>  >>  >
>  >>  > print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
>  >>  > exit;
>  >>  >
>  >>  > The resulting output after setting my PERL_UNICODE env var to SDA is:
>  >>  >
>  >>  > PERL_UNICODE Value: 63
>  >>  > Flagged as UTF8? 1
>  >>  >
>  >>  > Which is correct. Perl processed the input stream (open) as UTF-8 and
>  >>  > flagged it accordingly.
>  >>  >
>  >>  > Unfortunately if I put the exact same open call in my mod_perl
>  >>  > TransHandler $var is not flagged as UTF-8. The resulting output when
>  >>  > run in the TransHandler is:
>  >>  >
>  >>  > PERL_UNICODE Value: 63
>  >>  > Flagged as UTF8?
>  >>  >
>  >>  > The input stream is not processed as UTF-8 and not flagged internally
>  >>  > as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
>  >>  > then everything works as expected. It appears as if mod_perl is
>  >>  > ignoring the PERL_UNICODE env variable and not processing my input
>  >>  > streams as UTF-8.
>  >>  >
>  >>  > Thanks in advance.
>  >>  >
>  >>  > Cheers
>  >>  >
>  >>  >
>  >>  >
>  >>  >
>  >>  > Environment details below:
>  >>  >
>  >>  > Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
>  >>  >   Platform:
>  >>  >     osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
>  >>  > archname=i386-linux-thread-multi
>  >>  >     uname='linux hs20-bc1-4.build.redhat.com
>  >>  > 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
>  >>  > i686 i386 gnulinux '
>  >>  >     config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
>  >>  > -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
>  >>  > -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc.
>  >>  > -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
>  >>  > -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
>  >>  > -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
>  >>  > -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
>  >>  > -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
>  >>  > -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
>  >>  > 5.8.0'
>  >>  >     hint=recommended, useposix=true, d_sigaction=define
>  >>  >     usethreads=define use5005threads=undef useithreads=define
>  >>  > usemultiplicity=define
>  >>  >     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>  >>  >     use64bitint=undef use64bitall=undef uselongdouble=undef
>  >>  >     usemymalloc=n, bincompat5005=undef
>  >>  >   Compiler:
>  >>  >     cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
>  >>  > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
>  >>  > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
>  >>  >     optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
>  >>  >     cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
>  >>  > -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
>  >>  >     ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
>  >>  >     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>  >>  >     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
>  >>  >     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
>  >>  > lseeksize=8
>  >>  >     alignbytes=4, prototype=define
>  >>  >   Linker and Libraries:
>  >>  >     ld='gcc', ldflags =' -L/usr/local/lib'
>  >>  >     libpth=/usr/local/lib /lib /usr/lib
>  >>  >     libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
>  >>  >     perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
>  >>  >     libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
>  >>  >     gnulibc_version='2.3.4'
>  >>  >   Dynamic Linking:
>  >>  >     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
>  >>  > -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
>  >>  >     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
>  >>  >
>  >>  >
>  >>  > Characteristics of this binary (from libperl):
>  >>  >   Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
>  >>  > USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
>  >>  >   Built under linux
>  >>  >   Compiled at Jul 24 2006 18:28:10
>  >>  >   @INC:
>  >>  >     /usr/lib/perl5/5.8.5/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/5.8.5
>  >>  >     /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.5
>  >>  >     /usr/lib/perl5/site_perl/5.8.4
>  >>  >     /usr/lib/perl5/site_perl/5.8.3
>  >>  >     /usr/lib/perl5/site_perl/5.8.2
>  >>  >     /usr/lib/perl5/site_perl/5.8.1
>  >>  >     /usr/lib/perl5/site_perl/5.8.0
>  >>  >     /usr/lib/perl5/site_perl
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.5
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.4
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.3
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.2
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.1
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.0
>  >>  >     /usr/lib/perl5/vendor_perl
>  >>  >     .
>  >>  > mod_perl version: 1.30
>  >>  >
>  >>
>  >
>
>

Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

Posted by André Warnier <aw...@ice-sa.com>.
Hi.

I cannot really think of a reason why Perl itself would do something
different in either case.  And in your tests, it was verified that
PERL_UNICODE itself is still set right under mod_perl.  So it must be
that mod_perl somehow overrides the basic Perl setting.  Maybe mod_perl
needs to do something re the filehandles, because some of them might be
connected to Apache ?

Anyhow, out of my depth now, so let's call on a real mod_perl guru if
any of them is around ?

By the way :
I have tried the same thing in the meantime under Apache 2.x/mod_perl 
2.x, and I seem to have the same problem.

I have one more question : where exactly do you set PERL_UNICODE ?



Rob French wrote:
> Hi André,
> 
> Yes, I tried that as well and it worked as expected (UTF-8 flag is
> set). Explicit PerlIO layer decoding works in both the non-mod_perl
> and mod_perl tests. It seems only the default PERL_UNICODE setting is
> ignored in mod_perl even though it is set.
> 
> Rgrds,
> Rob
> 
> On Wed, Mar 19, 2008 at 3:01 AM, André Warnier <aw...@ice-sa.com> wrote:
>> Hi.
>>
>>  Perl's handling of Unicode (and of character sets in general) is
>>  extremely clever and powerful.
>>  But it can sometimes be a bit counter-intuitive.
>>
>>  In any case, it seems to me that the evaluation of the PERL_UNICODE
>>  environment variable is a "Perl thing" rather than a "mod_perl thing",
>>  and that mod_perl per se should not interfere with it.  But maybe
>>  mod_perl does some magic on filehandles in general which interferes, who
>>  knows ?
>>
>>  Maybe the first thing to do is to ascertain that the problem is really
>>  due to a mishandling of the PERL_UNICODE environment variable, or
>>  something else.  I propose a simple test :
>>  Instead of relying on the PERL_UNICODE variable, what happens when you
>>  change the open() statement as follows :
>>
>>   > open(FH, '<:utf8',"/tmp/utf8.txt");
>>
>>  thus explicitly setting a UTF-8 decoding layer for the stream FH,
>>  instead of relying on PERL_UNICODE.
>>  Does your follow-up test then indicate that the utf8 flag for $var is  set ?
>>
>>  Note : even with the decoding layer set, that does not necessarily mean
>>  that all data you read will end up with the utf8 flag set.  It depends
>>  on the data.  But in your case, if you are really using the same file
>>  data in both tests you show below, then it seems a valid test.
>>
>>  André
>>
>>
>>
>>
>>  Rob French wrote:
>>  > I have recently started converting one of our webapps to make it fully
>>  > UTF-8 compliant. All input/output from the webapp will be encoded as
>>  > UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
>>  > enable UTF-8 flagging on all input/output streams. This works with
>>  > standalone Perl scripts like the one below (the /tmp/utf8.txt file
>>  > contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
>>  >
>>  > #!/usr/bin/perl -w
>>  >
>>  > use strict;
>>  > use Encode;
>>  >
>>  > print "PERL_UNICODE Value: ${^UNICODE}\n";
>>  > open(FH, "</tmp/utf8.txt");
>>  > undef $/;
>>  > my $var = <FH>;
>>  > close(FH);
>>  >
>>  > print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
>>  > exit;
>>  >
>>  > The resulting output after setting my PERL_UNICODE env var to SDA is:
>>  >
>>  > PERL_UNICODE Value: 63
>>  > Flagged as UTF8? 1
>>  >
>>  > Which is correct. Perl processed the input stream (open) as UTF-8 and
>>  > flagged it accordingly.
>>  >
>>  > Unfortunately if I put the exact same open call in my mod_perl
>>  > TransHandler $var is not flagged as UTF-8. The resulting output when
>>  > run in the TransHandler is:
>>  >
>>  > PERL_UNICODE Value: 63
>>  > Flagged as UTF8?
>>  >
>>  > The input stream is not processed as UTF-8 and not flagged internally
>>  > as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
>>  > then everything works as expected. It appears as if mod_perl is
>>  > ignoring the PERL_UNICODE env variable and not processing my input
>>  > streams as UTF-8.
>>  >
>>  > Thanks in advance.
>>  >
>>  > Cheers
>>  >
>>  >
>>  >
>>  >
>>  > Environment details below:
>>  >
>>  > Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
>>  >   Platform:
>>  >     osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
>>  > archname=i386-linux-thread-multi
>>  >     uname='linux hs20-bc1-4.build.redhat.com
>>  > 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
>>  > i686 i386 gnulinux '
>>  >     config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
>>  > -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
>>  > -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc.
>>  > -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
>>  > -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
>>  > -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
>>  > -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
>>  > -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
>>  > -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
>>  > 5.8.0'
>>  >     hint=recommended, useposix=true, d_sigaction=define
>>  >     usethreads=define use5005threads=undef useithreads=define
>>  > usemultiplicity=define
>>  >     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>>  >     use64bitint=undef use64bitall=undef uselongdouble=undef
>>  >     usemymalloc=n, bincompat5005=undef
>>  >   Compiler:
>>  >     cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
>>  > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
>>  > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
>>  >     optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
>>  >     cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
>>  > -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
>>  >     ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
>>  >     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>>  >     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
>>  >     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
>>  > lseeksize=8
>>  >     alignbytes=4, prototype=define
>>  >   Linker and Libraries:
>>  >     ld='gcc', ldflags =' -L/usr/local/lib'
>>  >     libpth=/usr/local/lib /lib /usr/lib
>>  >     libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
>>  >     perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
>>  >     libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
>>  >     gnulibc_version='2.3.4'
>>  >   Dynamic Linking:
>>  >     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
>>  > -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
>>  >     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
>>  >
>>  >
>>  > Characteristics of this binary (from libperl):
>>  >   Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
>>  > USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
>>  >   Built under linux
>>  >   Compiled at Jul 24 2006 18:28:10
>>  >   @INC:
>>  >     /usr/lib/perl5/5.8.5/i386-linux-thread-multi
>>  >     /usr/lib/perl5/5.8.5
>>  >     /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
>>  >     /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
>>  >     /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
>>  >     /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
>>  >     /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
>>  >     /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
>>  >     /usr/lib/perl5/site_perl/5.8.5
>>  >     /usr/lib/perl5/site_perl/5.8.4
>>  >     /usr/lib/perl5/site_perl/5.8.3
>>  >     /usr/lib/perl5/site_perl/5.8.2
>>  >     /usr/lib/perl5/site_perl/5.8.1
>>  >     /usr/lib/perl5/site_perl/5.8.0
>>  >     /usr/lib/perl5/site_perl
>>  >     /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
>>  >     /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
>>  >     /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
>>  >     /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
>>  >     /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
>>  >     /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
>>  >     /usr/lib/perl5/vendor_perl/5.8.5
>>  >     /usr/lib/perl5/vendor_perl/5.8.4
>>  >     /usr/lib/perl5/vendor_perl/5.8.3
>>  >     /usr/lib/perl5/vendor_perl/5.8.2
>>  >     /usr/lib/perl5/vendor_perl/5.8.1
>>  >     /usr/lib/perl5/vendor_perl/5.8.0
>>  >     /usr/lib/perl5/vendor_perl
>>  >     .
>>  > mod_perl version: 1.30
>>  >
>>
> 


Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

Posted by Rob French <dr...@gmail.com>.
Hi André,

Yes, I tried that as well and it worked as expected (UTF-8 flag is
set). Explicit PerlIO layer decoding works in both the non-mod_perl
and mod_perl tests. It seems only the default PERL_UNICODE setting is
ignored in mod_perl even though it is set.

Rgrds,
Rob

On Wed, Mar 19, 2008 at 3:01 AM, André Warnier <aw...@ice-sa.com> wrote:
> Hi.
>
>  Perl's handling of Unicode (and of character sets in general) is
>  extremely clever and powerful.
>  But it can sometimes be a bit counter-intuitive.
>
>  In any case, it seems to me that the evaluation of the PERL_UNICODE
>  environment variable is a "Perl thing" rather than a "mod_perl thing",
>  and that mod_perl per se should not interfere with it.  But maybe
>  mod_perl does some magic on filehandles in general which interferes, who
>  knows ?
>
>  Maybe the first thing to do is to ascertain that the problem is really
>  due to a mishandling of the PERL_UNICODE environment variable, or
>  something else.  I propose a simple test :
>  Instead of relying on the PERL_UNICODE variable, what happens when you
>  change the open() statement as follows :
>
>   > open(FH, '<:utf8',"/tmp/utf8.txt");
>
>  thus explicitly setting a UTF-8 decoding layer for the stream FH,
>  instead of relying on PERL_UNICODE.
>  Does your follow-up test then indicate that the utf8 flag for $var is  set ?
>
>  Note : even with the decoding layer set, that does not necessarily mean
>  that all data you read will end up with the utf8 flag set.  It depends
>  on the data.  But in your case, if you are really using the same file
>  data in both tests you show below, then it seems a valid test.
>
>  André
>
>
>
>
>  Rob French wrote:
>  > I have recently started converting one of our webapps to make it fully
>  > UTF-8 compliant. All input/output from the webapp will be encoded as
>  > UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
>  > enable UTF-8 flagging on all input/output streams. This works with
>  > standalone Perl scripts like the one below (the /tmp/utf8.txt file
>  > contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
>  >
>  > #!/usr/bin/perl -w
>  >
>  > use strict;
>  > use Encode;
>  >
>  > print "PERL_UNICODE Value: ${^UNICODE}\n";
>  > open(FH, "</tmp/utf8.txt");
>  > undef $/;
>  > my $var = <FH>;
>  > close(FH);
>  >
>  > print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
>  > exit;
>  >
>  > The resulting output after setting my PERL_UNICODE env var to SDA is:
>  >
>  > PERL_UNICODE Value: 63
>  > Flagged as UTF8? 1
>  >
>  > Which is correct. Perl processed the input stream (open) as UTF-8 and
>  > flagged it accordingly.
>  >
>  > Unfortunately if I put the exact same open call in my mod_perl
>  > TransHandler $var is not flagged as UTF-8. The resulting output when
>  > run in the TransHandler is:
>  >
>  > PERL_UNICODE Value: 63
>  > Flagged as UTF8?
>  >
>  > The input stream is not processed as UTF-8 and not flagged internally
>  > as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
>  > then everything works as expected. It appears as if mod_perl is
>  > ignoring the PERL_UNICODE env variable and not processing my input
>  > streams as UTF-8.
>  >
>  > Thanks in advance.
>  >
>  > Cheers
>  >
>  >
>  >
>  >
>  > Environment details below:
>  >
>  > Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
>  >   Platform:
>  >     osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
>  > archname=i386-linux-thread-multi
>  >     uname='linux hs20-bc1-4.build.redhat.com
>  > 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
>  > i686 i386 gnulinux '
>  >     config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
>  > -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
>  > -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc.
>  > -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
>  > -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
>  > -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
>  > -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
>  > -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
>  > -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
>  > 5.8.0'
>  >     hint=recommended, useposix=true, d_sigaction=define
>  >     usethreads=define use5005threads=undef useithreads=define
>  > usemultiplicity=define
>  >     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>  >     use64bitint=undef use64bitall=undef uselongdouble=undef
>  >     usemymalloc=n, bincompat5005=undef
>  >   Compiler:
>  >     cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
>  > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
>  > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
>  >     optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
>  >     cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
>  > -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
>  >     ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
>  >     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>  >     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
>  >     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
>  > lseeksize=8
>  >     alignbytes=4, prototype=define
>  >   Linker and Libraries:
>  >     ld='gcc', ldflags =' -L/usr/local/lib'
>  >     libpth=/usr/local/lib /lib /usr/lib
>  >     libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
>  >     perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
>  >     libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
>  >     gnulibc_version='2.3.4'
>  >   Dynamic Linking:
>  >     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
>  > -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
>  >     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
>  >
>  >
>  > Characteristics of this binary (from libperl):
>  >   Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
>  > USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
>  >   Built under linux
>  >   Compiled at Jul 24 2006 18:28:10
>  >   @INC:
>  >     /usr/lib/perl5/5.8.5/i386-linux-thread-multi
>  >     /usr/lib/perl5/5.8.5
>  >     /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
>  >     /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
>  >     /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
>  >     /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
>  >     /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
>  >     /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
>  >     /usr/lib/perl5/site_perl/5.8.5
>  >     /usr/lib/perl5/site_perl/5.8.4
>  >     /usr/lib/perl5/site_perl/5.8.3
>  >     /usr/lib/perl5/site_perl/5.8.2
>  >     /usr/lib/perl5/site_perl/5.8.1
>  >     /usr/lib/perl5/site_perl/5.8.0
>  >     /usr/lib/perl5/site_perl
>  >     /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
>  >     /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
>  >     /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
>  >     /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
>  >     /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
>  >     /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
>  >     /usr/lib/perl5/vendor_perl/5.8.5
>  >     /usr/lib/perl5/vendor_perl/5.8.4
>  >     /usr/lib/perl5/vendor_perl/5.8.3
>  >     /usr/lib/perl5/vendor_perl/5.8.2
>  >     /usr/lib/perl5/vendor_perl/5.8.1
>  >     /usr/lib/perl5/vendor_perl/5.8.0
>  >     /usr/lib/perl5/vendor_perl
>  >     .
>  > mod_perl version: 1.30
>  >
>

Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

Posted by André Warnier <aw...@ice-sa.com>.
Hi.

Perl's handling of Unicode (and of character sets in general) is 
extremely clever and powerful.
But it can sometimes be a bit counter-intuitive.

In any case, it seems to me that the evaluation of the PERL_UNICODE 
environment variable is a "Perl thing" rather than a "mod_perl thing", 
and that mod_perl per se should not interfere with it.  But maybe 
mod_perl does some magic on filehandles in general which interferes, who 
knows ?

Maybe the first thing to do is to ascertain that the problem is really 
due to a mishandling of the PERL_UNICODE environment variable, or 
something else.  I propose a simple test :
Instead of relying on the PERL_UNICODE variable, what happens when you 
change the open() statement as follows :

 > open(FH, '<:utf8',"/tmp/utf8.txt");

thus explicitly setting a UTF-8 decoding layer for the stream FH, 
instead of relying on PERL_UNICODE.
Does your follow-up test then indicate that the utf8 flag for $var is  set ?

Note : even with the decoding layer set, that does not necessarily mean 
that all data you read will end up with the utf8 flag set.  It depends 
on the data.  But in your case, if you are really using the same file 
data in both tests you show below, then it seems a valid test.

André


Rob French wrote:
> I have recently started converting one of our webapps to make it fully
> UTF-8 compliant. All input/output from the webapp will be encoded as
> UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
> enable UTF-8 flagging on all input/output streams. This works with
> standalone Perl scripts like the one below (the /tmp/utf8.txt file
> contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
> 
> #!/usr/bin/perl -w
> 
> use strict;
> use Encode;
> 
> print "PERL_UNICODE Value: ${^UNICODE}\n";
> open(FH, "</tmp/utf8.txt");
> undef $/;
> my $var = <FH>;
> close(FH);
> 
> print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
> exit;
> 
> The resulting output after setting my PERL_UNICODE env var to SDA is:
> 
> PERL_UNICODE Value: 63
> Flagged as UTF8? 1
> 
> Which is correct. Perl processed the input stream (open) as UTF-8 and
> flagged it accordingly.
> 
> Unfortunately if I put the exact same open call in my mod_perl
> TransHandler $var is not flagged as UTF-8. The resulting output when
> run in the TransHandler is:
> 
> PERL_UNICODE Value: 63
> Flagged as UTF8?
> 
> The input stream is not processed as UTF-8 and not flagged internally
> as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
> then everything works as expected. It appears as if mod_perl is
> ignoring the PERL_UNICODE env variable and not processing my input
> streams as UTF-8.
> 
> Thanks in advance.
> 
> Cheers
> 
> 
> 
> 
> Environment details below:
> 
> Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
>   Platform:
>     osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
> archname=i386-linux-thread-multi
>     uname='linux hs20-bc1-4.build.redhat.com
> 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
> i686 i386 gnulinux '
>     config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
> -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
> -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc.
> -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
> -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
> -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
> -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
> -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
> -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
> 5.8.0'
>     hint=recommended, useposix=true, d_sigaction=define
>     usethreads=define use5005threads=undef useithreads=define
> usemultiplicity=define
>     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>     use64bitint=undef use64bitall=undef uselongdouble=undef
>     usemymalloc=n, bincompat5005=undef
>   Compiler:
>     cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
>     optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
>     cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
>     ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
>     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
>     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
> lseeksize=8
>     alignbytes=4, prototype=define
>   Linker and Libraries:
>     ld='gcc', ldflags =' -L/usr/local/lib'
>     libpth=/usr/local/lib /lib /usr/lib
>     libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
>     perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
>     libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
>     gnulibc_version='2.3.4'
>   Dynamic Linking:
>     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
> -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
>     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
> 
> 
> Characteristics of this binary (from libperl):
>   Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
> USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
>   Built under linux
>   Compiled at Jul 24 2006 18:28:10
>   @INC:
>     /usr/lib/perl5/5.8.5/i386-linux-thread-multi
>     /usr/lib/perl5/5.8.5
>     /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.5
>     /usr/lib/perl5/site_perl/5.8.4
>     /usr/lib/perl5/site_perl/5.8.3
>     /usr/lib/perl5/site_perl/5.8.2
>     /usr/lib/perl5/site_perl/5.8.1
>     /usr/lib/perl5/site_perl/5.8.0
>     /usr/lib/perl5/site_perl
>     /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.5
>     /usr/lib/perl5/vendor_perl/5.8.4
>     /usr/lib/perl5/vendor_perl/5.8.3
>     /usr/lib/perl5/vendor_perl/5.8.2
>     /usr/lib/perl5/vendor_perl/5.8.1
>     /usr/lib/perl5/vendor_perl/5.8.0
>     /usr/lib/perl5/vendor_perl
>     .
> mod_perl version: 1.30
>