You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Bart Terryn <ba...@grafikon.com> on 2003/09/06 01:14:54 UTC

RE: porting from mod_perl1 to mod_perl2

Hi,

I have an application running under apache
1.37(win32)/mod_perl1.27_01-dev/perl5.6 build 633

I am trying to move this application to apache
2.0.47(win32)/mod_perl1.99_10-dev/perl 5.8

However I run into a problem with character encoding.
Somewhere in this app I put up a form that contains text.
The encoding of the html page that contains this form is set to 'utf-8' by
the following:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
That form displays OK in both mod_perl1.0 and mod_perl2.0

When I read the form back under the apache1, everything is OK.
When I do the same using the apache 2 combination I run into trouble with
the char ref entities entities which are high in the unicode set like:
&#8212; or &#8211;. These characters are returned as unicode characters hex
97 and hex 96.

Other character ref entities like the one for e (e umlaut = &#235;) are
returned correctly.

So I assume that only characters above 07FFF are returned wrong.

Anybody any idea?

Thanks in advance

Bart

PS: some might say that this has nothing to do with mod_perl.
And maybe you are right, but I have no clue which part might be causing
this.
I am fairly sure it is not perl5.8.
Although in order to make the apache2/mod_perl2 combination to work I had to
upgrade the CGI.pm to version 3.0



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: porting from mod_perl1 to mod_perl2

Posted by Perrin Harkins <pe...@elem.com>.
On Fri, 2003-09-05 at 21:36, Stas Bekman wrote:
> Bart is on win32, AS Perl 5.8.

Oops, sorry Bart, I missed that.  Even so, I'm suspicious that 5.8 and
all of its unicode changes are involved somehow.

- Perrin



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


RE: porting from mod_perl1 to mod_perl2

Posted by Bart Terryn <ba...@grafikon.com>.
I had version CGI 3.00 installed.
Downgraded it to CGI 2.93, put I still have the same result.

The problem as I see it that I have a form with character &#8212; in it.
But it is returned as character &#151 from the Widows-1252 characterset.
Does everybody agree that it should be returned as &#8212; (the utf-8
representation I mean)?

See my previous mail for the test I used.

Bart

-----Original Message-----
From: Stas Bekman [mailto:stas@stason.org]
Sent: Saturday, September 06, 2003 8:35 AM
To: Philip M. Gollucci
Cc: Perrin Harkins; bart@grafikon.com; modperl@perl.apache.org
Subject: Re: porting from mod_perl1 to mod_perl2


Philip M. Gollucci wrote:
> If you check out the changes to CGI.pm on Licoln Stiens web site, utf8
> was added via a path by someone else
> 2.99 - 3.00 likely this is the cause.

Bart, can you try then with an earlier version? e.g. 2.93 was good for me.
You
  can get it from here: http://www.cpan.org/authors/id/L/LD/LDS/

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com



--
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: porting from mod_perl1 to mod_perl2

Posted by Stas Bekman <st...@stason.org>.
Philip M. Gollucci wrote:
> If you check out the changes to CGI.pm on Licoln Stiens web site, utf8 
> was added via a path by someone else
> 2.99 - 3.00 likely this is the cause.

Bart, can you try then with an earlier version? e.g. 2.93 was good for me. You 
  can get it from here: http://www.cpan.org/authors/id/L/LD/LDS/

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: porting from mod_perl1 to mod_perl2

Posted by "Philip M. Gollucci" <ph...@p6m7g8.com>.
If you check out the changes to CGI.pm on Licoln Stiens web site, utf8 
was added via a path by someone else
2.99 - 3.00 likely this is the cause.

Stas Bekman wrote:

> Perrin Harkins wrote:
>
>>> I am fairly sure it is not perl5.8.
>>
>>
>>
>> I'm fairly sure it is.  What is your locale set to?  Are you on Red
>> Hat?  See previous discussions of locale issues on Red Hat 8 and 9 in
>> the list archives.
>
>
> Bart is on win32, AS Perl 5.8. I doubt it's a locale issue, since it's 
> the client who decides what encoding the data is in, it's either 
> CGI.pm  (guessing that what he was using to parse the forms) or more 
> low level (io) issues.
>
> Bart, can you test whether you have the same problem when a run the 
> same code under mod_cgi in Apache2 (with perl5.8 ofcourse)? If not, 
> that will point the blaming finger towards mod_perl 2.0. Someone 
> volunteers to add a new test? See
>
> t/modperl/print_utf8.t
> t/response/TestModperl/print_utf8.pm
>
> for an example of testing the responding with utf8 data. You can 
> probably adopt one of these couples for testing the posting of utf8 data:
>
> t/apache/cgihandler.t
> t/response/TestApache/cgihandler.pm
>
> t/modules/cgi.t
> t/response/TestModules/cgi.pm
>
> t/modules/cgiupload.t
> t/response/TestModules/cgiupload.pm
>
> of course you will want to create a new couple of files for this test.
>
> __________________________________________________________________
> Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
> http://stason.org/     mod_perl Guide ---> http://perl.apache.org
> mailto:stas@stason.org http://use.perl.org http://apacheweek.com
> http://modperlbook.org http://apache.org   http://ticketmaster.com
>
>
>




-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


RE: porting from mod_perl1 to mod_perl2

Posted by Randy Kobes <ra...@theoryx5.uwinnipeg.ca>.
On Tue, 9 Sep 2003, Bart Terryn wrote:

> Stas,
>
> Sorry to insist.
> But here I am again...
>
> Stas wrote:
> >Actually I haven't looked, I have tested with your code.
> Thanks a lot for going through the effort...
>
> >Before setting the header I wasn't getting the unicode
> >chars you put in the form back in the dump. After setting
> >the header it did print out exacly the same unicode
> >character.
>
> Well that is strange. I just changed my code and still am
> getting the endash back as code 150 and not as the 8212
> code (the way it went in).

If you're using ppm to install mod_perl, could you try the
latest version at http://theoryx5.uwinnipeg.ca/ppms/? There
were some changes made recently that may affect the above
problem. Note that the version in the mod_perl.ppd hasn't
changed, so you may have to uninstall mod_perl and then
install it to force ppm to upgrade.

-- 
best regards,
randy kobes

RE: porting from mod_perl1 to mod_perl2

Posted by Bart Terryn <ba...@grafikon.com>.
Stas,

Sorry to insist.
But here I am again...

Stas wrote:
>Actually I haven't looked, I have tested with your code.
Thanks a lot for going through the effort...

>Before setting the
>header I wasn't getting the unicode chars you put in the form back in the
>dump. After setting the header it did print out exacly the same unicode
character.

Well that is strange. I just changed my code and still am getting the endash
back as code 150 and not as the 8212 code (the way it went in).

Are you sure that you have the 2 lines in the test program that change the
multibyte utf-8 encoded characters into their values?
(the famous lines 11 and 12)

Because if not, then I can understand that you have to put the changed
header in as you would be sending utf-8 encoded data to the client.
And it would also explain why you would 'see' the same character after
putting the utf-8 header in.

>I didn't have a chance to mess with the hex representations yet.

That makes me wonder even more about the thing above.

[...]

>I think this is where the weak point is. You need to compare characters on
the
>server side, not trying to rely on the browser, which as you have seen will
>render them improperly if you didn't set the right header.

Again that is the purpose of the dreaded lines 11 and 12 of my test program.
I don't want to render the character, I just want to display the actual
(utf-8 encoded) code that I read back from the form.

>You have two things happening: read input, send output. The problem can be
in
>any of the two and worse, it can be in both and the error can fix itself
when
>doubled. You need to verify first that the input is read properly, then the
>same for the output.

Believe me.
I also ran tests that write out the data to disk and then used a hex dump of
that file to actually verify what is in there. I got the same results. But
that go a bit tedious hence my little test program that does more or less
the same thing.

For your convenience here is the test program again
You will note that I change the $q->header print statement, but as said
before the outcome is still wrong.

Could you confirm that you indeed used this script unmodified and still are
recieving correct output?

As said the important part is in line 11 and 12.
You will need perl 5.8 in order to make those 2 lines work properly
(5.6 does not understand unicode correctly)

#!/perl/bin/perl.exe
use strict;
use CGI;
use CGI::Carp qw(fatalsToBrowser);
use CGI::Cookie;

my $q = CGI->new;
my $content = $q->param("utf8-test");
$content .= "verify with \x{2014}";
my @content = unpack('U*', $content);
$content =~ s/([\x{0800}-\x{FFFF}])/sprintf('+entity:%d+',ord($1))/ge;
$content =~ s/([\x{0080}-\x{07FF}])/sprintf('+entity: %d+',ord($1))/ge;
print $q->header("text/html; charset=utf-8");
print $q->p($content);
print $q->p('hex');
foreach (@content) {printf "%x ", $_}


>I have started writing the test for mp2 to verify utf8 input, hopefully
I'll
>finish it soon.

Thanks a lot for your support...

Bart


Re: porting from mod_perl1 to mod_perl2

Posted by Stas Bekman <st...@stason.org>.
Bart Terryn wrote:
> Stas and all of the others,
> 
> Stas said:
> 
>>I think I got your problem solved, you need to:
> 
> 
>>- print $q->header();
>>+ print $q->header("text/html; charset=utf-8");
> 
> 
> Well actually you did not.
> Probably you looked a bit too fast.
> (forgivable in view of the numbers of mails you reply to:-)

Actually I haven't looked, I have tested with your code. Before setting the 
header I wasn't getting the unicode chars you put in the form back in the 
dump. After setting the header it did print out exacly the same unicode character.

I didn't have a chance to mess with the hex representations yet.

[...]
> (Oh did I mention already that I have tested only against IE6, because the
> browser could be the cause as well of this odd(?) behaviour.)

I think this is where the weak point is. You need to compare characters on the 
server side, not trying to rely on the browser, which as you have seen will 
render them improperly if you didn't set the right header.

You have two things happening: read input, send output. The problem can be in 
any of the two and worse, it can be in both and the error can fix itself when 
doubled. You need to verify first that the input is read properly, then the 
same for the output.

I have started writing the test for mp2 to verify utf8 input, hopefully I'll 
finish it soon.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


RE: porting from mod_perl1 to mod_perl2

Posted by Bart Terryn <ba...@grafikon.com>.
Stas and all of the others,

Stas said:
>I think I got your problem solved, you need to:

>- print $q->header();
>+ print $q->header("text/html; charset=utf-8");

Well actually you did not.
Probably you looked a bit too fast.
(forgivable in view of the numbers of mails you reply to:-)

The utf8-test.pl code is reading what comes out of the form (which has a
charset=utf-8 meta tag, so that is OK, see my previous mail)
The utf8-test.pl then replaces the characters higher the 7F with char. ref
entities but with the string '+entity: ' in front of the value(see below
lines 11 and 12 of utf8-test.pl).
And to double verify the information read back from the form is also
unpacked from unicode values into their hex counterparts.
And then both strings are printed out as normal low ascii characters (<7f),
so no need to set the utf-8 flag here.

>From further testing I have seen that only unicode characters that actually
have a representation in the win1252 characters set come back under their
corresponding win1252 characterset position.
So the form would for example contain an ndash character (unicode position
dec 8211 or U+2013) .
But that is read back as character dec 150 or hex 96.
And if the form contains a right single quotation (unicode position dec 8217
or U+2019), it comes back under its win1252 position of dec 146 or hex 92.

I would have expected if I send something in under its unicode position, it
would come back to me under its unicode position.
But then again I may be wrong.
And the utf8 flag in the header only means that is will be utf8 encoded and
should not be confused with the characterset used.

I am under the impression I confusing myself more and more here.
So if somebody has been on this path before and knows the truth, let him
speak up!

(Oh did I mention already that I have tested only against IE6, because the
browser could be the cause as well of this odd(?) behaviour.)

Thanks all for your patience.
I would really like to get to the bottom of this.

Bart

Here is utf8-test.pl, again this time with line numbers
1:#!/perl/bin/perl.exe
2:use strict;
3:use CGI;
4:use CGI::Carp qw(fatalsToBrowser);
5:
6:my $q = CGI->new;
7:my $content = $q->param("utf8-test");
8:$content .= "verify with \x{2014}";
9:my @content = unpack('U*', $content);
10:$content =~ s/([\x{0800}-\x{FFFF}])/sprintf('+entity:%d+',ord($1))/ge;
11:$content =~ s/([\x{0080}-\x{07FF}])/sprintf('+entity: %d+',ord($1))/ge;
12:print $q->header();
13:print $q->p($content);
14:print $q->p('hex');
15:foreach (@content) {printf "%x ", $_}

and here is the htlm form that triggers the utf8-test.pl:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
</head>

<body>
<form method="post" action="/mod_perl/utf8-test.pl"
enctype="multipart/form-data">
<textarea name ='utf8-test' cols='60'>test: &#235; &#8212;</textarea>
&nbsp;&nbsp;<input type="submit" value="publish new content"/></h4>
</form>
</body></html>

and here is the result this all produces:
test: +entity: 235+ +entity: 151+verify with +entity:8212+

hex

74 65 73 74 3a 20 eb 20 97 76 65 72 69 66 79 20 77 69 74 68 20 2014


Re: porting from mod_perl1 to mod_perl2

Posted by Stas Bekman <st...@stason.org>.
I think I got your problem solved, you need to:

- print $q->header();
+ print $q->header("text/html; charset=utf-8");

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


RE: porting from mod_perl1 to mod_perl2

Posted by Bart Terryn <ba...@grafikon.com>.
Stas wrote:

>Bart, can you test whether you have the same problem when a run the same
code
>under mod_cgi in Apache2 (with perl5.8 ofcourse)? If not, that will point
the
>blaming finger towards mod_perl 2.0.

Well I did that and guess what? mod_cgi fails as well.
So it is not a mod_perl problem
But for me it is still uncertain who to blame. (cgi.pm? apache2? perl5.8?)

I made a small test for this.
Just in case somebody wants to give it a try
Here is my sample page:
-------------
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
</head>

<body>
<form method="post" action="/mod_perl/utf8-test.pl"
enctype="multipart/form-data">
<textarea name ='utf8-test' cols='60'>test: &#235; &#8212;</textarea>
&nbsp;&nbsp;<input type="submit" value="publish new content"/></h4>
</form>
</body></html>
------------------
Here is the utf8-test.pl:
-------------------
#!/perl/bin/perl.exe
use strict;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

my $q = CGI->new;
my $content = $q->param("utf8-test");
$content .= "verify with \x{2014}";
my @content = unpack('U*', $content);
$content =~ s/([\x{0800}-\x{FFFF}])/sprintf('+entity:%d+',ord($1))/ge;
$content =~ s/([\x{0080}-\x{07FF}])/sprintf('+entity: %d+',ord($1))/ge;
print $q->header();
print $q->p($content);
print $q->p('hex');
foreach (@content) {printf "%x ", $_}
-----------------------
and here is the output I get:
------------------------
test: +entity: 235+ +entity: 151+verify with +entity:8212+

hex

74 65 73 74 3a 20 eb 20 97 76 65 72 69 66 79 20 77 69 74 68 20 2014
--------------------------

>From which I understand that the original character &#8212; is returned as
hex 97 or dec 151.
And would be correct if the characterset would be window-1252 but that is
not what I expected.
Wanted utf-8 to be returned.

If mod_perl is not the correct forum for this (which I agree it isn't) can
somebody point me in the direction of a correct forum? But as said before
the difficulty is that I don't know who to blame

Kind Regards,

Bart



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: porting from mod_perl1 to mod_perl2

Posted by Stas Bekman <st...@stason.org>.
Perrin Harkins wrote:

>>I am fairly sure it is not perl5.8.
> 
> 
> I'm fairly sure it is.  What is your locale set to?  Are you on Red
> Hat?  See previous discussions of locale issues on Red Hat 8 and 9 in
> the list archives.

Bart is on win32, AS Perl 5.8. I doubt it's a locale issue, since it's the 
client who decides what encoding the data is in, it's either CGI.pm  (guessing 
that what he was using to parse the forms) or more low level (io) issues.

Bart, can you test whether you have the same problem when a run the same code 
under mod_cgi in Apache2 (with perl5.8 ofcourse)? If not, that will point the 
blaming finger towards mod_perl 2.0. Someone volunteers to add a new test? See

t/modperl/print_utf8.t
t/response/TestModperl/print_utf8.pm

for an example of testing the responding with utf8 data. You can probably 
adopt one of these couples for testing the posting of utf8 data:

t/apache/cgihandler.t
t/response/TestApache/cgihandler.pm

t/modules/cgi.t
t/response/TestModules/cgi.pm

t/modules/cgiupload.t
t/response/TestModules/cgiupload.pm

of course you will want to create a new couple of files for this test.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


RE: porting from mod_perl1 to mod_perl2

Posted by Perrin Harkins <pe...@elem.com>.
On Fri, 2003-09-05 at 19:14, Bart Terryn wrote:
> PS: some might say that this has nothing to do with mod_perl

I would say that, but it's okay, you didn't know.

> I am fairly sure it is not perl5.8.

I'm fairly sure it is.  What is your locale set to?  Are you on Red
Hat?  See previous discussions of locale issues on Red Hat 8 and 9 in
the list archives.

- Perrin



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: porting from mod_perl1 to mod_perl2

Posted by Stas Bekman <st...@stason.org>.
Bart Terryn wrote:
> Hi,
> 
> I have an application running under apache
> 1.37(win32)/mod_perl1.27_01-dev/perl5.6 build 633
> 
> I am trying to move this application to apache
> 2.0.47(win32)/mod_perl1.99_10-dev/perl 5.8
> 
> However I run into a problem with character encoding.
> Somewhere in this app I put up a form that contains text.
> The encoding of the html page that contains this form is set to 'utf-8' by
> the following:
> <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
> That form displays OK in both mod_perl1.0 and mod_perl2.0
> 
> When I read the form back under the apache1, everything is OK.
> When I do the same using the apache 2 combination I run into trouble with
> the char ref entities entities which are high in the unicode set like:
> &#8212; or &#8211;. These characters are returned as unicode characters hex
> 97 and hex 96.

Returned from where? CGI.pm?

Does your 'perl -V:useperlio' reports:

useperlio='define';

If so, can you give a try with the latest mp2 cvs? However I think it won't 
change anything, since the only change is that since now perlio is used, you 
can binmode it to 'utf8'.

I have just added tests for sending utf8 data, but we probably need to add the 
receiving utf8 data as well.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


RE: porting from mod_perl1 to mod_perl2

Posted by Ged Haywood <ge...@www2.jubileegroup.co.uk>.
Hi there,

On Sat, 6 Sep 2003, Bart Terryn wrote:

> Hi,
> 
> I have an application running under apache
> 1.37(win32)/mod_perl1.27_01-dev/perl5.6 build 633
> 
> I am trying to move this application to apache
> 2.0.47(win32)/mod_perl1.99_10-dev/perl 5.8
> 
> However I run into a problem with character encoding.

Have you checked

perldoc perllocale

?

73,
Ged.
 



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html