You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Carl Brewer <ca...@vivitec.com.au> on 2003/11/24 05:48:53 UTC

[users@httpd] apache2, CGI.pm, IE and smart quotes

Hello,

I'm seeing a very interesting (and as yet I've been unable to
produce a simple test case) problem since moving some websites to
apache 2.0.48 from 1.3.x relating to users inserting microsoft
"smart quotes" into forms run through CGI.pm.  I have to appologise
for the vague nature of this help request, as I'm struggling to
reproduce the problem outside of a fairly complex set of perl scripts
that pull & push data from databases.

The symptom I'm seeing with a POST request is that the first parameter
defined in the form seems to go missing if there's a smartquote
backtick (I don't know how else to describe this entity!) in one
of the text fields.

This only happens with IE, Mozilla etc don't cause the problem (Mozilla
converts the backtick to something legal?), and it only seems to
happen when a user has written some text in Word, word's put some
funky quote in place of a normal ', and they've then cut & pasted it
into a textarea form.  I'd expect to be able to reproduce it with
something like this :

#!/usr/local/bin/perl

use CGI;
$q = new CGI;
$script = "test.pl";

print "Content-type: text/html\n\n";

print "Paramaters: ".join("/",$q->param())."<BR>\n";

$foo = $q->param('foo');
$bar = $q->param('bar');
$baz = $q->param('baz');

print qq{
  <FORM ACTION="$script" METHOD=POST ENCTYPE="multipart/form-data" 
NAME='test_form'>
                 <INPUT TYPE=TEXT NAME=foo VALUE='$foo'><BR>
                 <INPUT TYPE=TEXT NAME=bar VALUE="$bar"><BR>
                 <INPUT TYPE=TEXT NAME=baz VALUE='$baz'><BR>
                 <INPUT TYPE=submit><BR>
};


But that's not reproducing the problem, so it's something subtle
somewhere that I've not yet identified.  The behaviour does not
happen under apache 1.3.x, just with 2.0 (.48).  I thought this
may have been a problem with an older version of CGI.pm, so I
upgraded to 3.00, but this did not help.

I have a workaround for our scripts, which is just to insert
a dummy parameter into the form that we don't need, and that
"solves" the problem, but we have a lot of scripts to change if
that's really necessary.  Looking through CGI.pm's doco
on escaping things doesn't point me at an obvious solution,
I'd *like* CGI.pm to just make whatever those quotes are turn into
legal ASCII, but that may not be possible?  I think that's
maybe what happened under apache 1.3.x?

Anyone got any suggestions?  Again, I know the above is way too
vague to allow any sort of real analysis, but until I narrow
it down more (assuming my masters give me the time to do it!)
that's all I have :(

We're running perl 5.6.1 on RedHat linux 7.3.

thanks

Carl



-- 
=======================
Vivitec Pty. Ltd.
Suite 6, 51-55 City Rd.
Southbank, 3006.
Ph. +61 3 8626 5626
Fax +61 3 9682 1000
=======================


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] apache2, CGI.pm, IE and smart quotes

Posted by bvr <bv...@xs4all.nl>.
Yes, I have experienced this weird bug too.

This especially happens when you copy/paste stuff from Word because it 
changes regular quotes to some special fancy quotes while typing.

When you dump MSIE's response data to the screen you will notice that 
the start of the MIME message is missing. While someone has allready 
posted a workaround I'm not sure if this will always work or will work 
at all with some stricter form parsers.

Playing with this in order to fix the content management system I was 
working on I discovered what to do to prevent IE from corrupting it's 
response.

To fix your problem for good, at least the page that contains the form 
should have a character set specified explicitly. The way I do this is 
by specifying it in the content-type header, but it may also work using 
a META tag. Also it is recommended that you specify the same character 
set in any page you display the data again.

-- example header
Content-Type: text/html; charset=iso-8859-15

baz.


>>The symptom I'm seeing with a POST request is that the first
>>parameter defined in the form seems to go missing if there's a
>>smartquote backtick (I don't know how else to describe this
>>entity!) in one of the text fields.


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] apache2, CGI.pm, IE and smart quotes

Posted by Robert Andersson <ro...@profundis.nu>.
Carl Brewer wrote:
> Robert Andersson wrote:
> > Carl Brewer wrote:
> >>Is it actually IE that's doing the breaking?  We've found the
> >>broken encoding, but are confused, because as far as we've been able
> >>to test, it does work fine in apache 1.3.
> >
> > A question though. Does this only affect forms who use
multipart/form-data
> > encoding? These were the only forms I had problems with.
>
> yes.  It works ok with a normal form, I'm only able to
> see the fault when I use multipart/form-data encoding.

It has been awhile...

> Here's my test page :
>
> ...snip...
>
> And if I feed it smartquotes stuff from winword via IE 5.5, it does
> exactly as you describe for apache 2, but is ok for apache 1.3.

I have not tested in 1.3, but if I find time I might try to investigate if
it does indeed manage to recover from the corruption.

> So in some way, apache 1.3 must be dealing with it, but 2.0(.48)
> is broken, or at least, not having whatever it takes to deal with the
> IE breakage.
>
> I can't sniff the traffic all that well, as I don't really know how
> to do it.

That is not very hard, use Ethereal: http://www.ethereal.com/

> but is this something that should/could go into an apache 2
> bug report?

As it is not a bug with Apache, you should not file it as a 'bug' bug
report. But, if it can be determined that Apache 1.3 is proof that this
corruption can be corrected, it might be worth considering filing an
'enhancement' bug report. We'll need to investigate more first.

Regards,
Robert Andersson


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] apache2, CGI.pm, IE and smart quotes

Posted by Carl Brewer <ca...@vivitec.com.au>.
G'day Robert,


Robert Andersson wrote:

> Carl Brewer wrote:
> 
>>Is it actually IE that's doing the breaking?  We've found the
>>broken encoding, but are confused, because as far as we've been able
>>to test, it does work fine in apache 1.3.
> 
> 
> A question though. Does this only affect forms who use multipart/form-data
> encoding? These were the only forms I had problems with.

yes.  It works ok with a normal form, I'm only able to
see the fault when I use multipart/form-data encoding.

Here's my test page :

#!/usr/bin/perl

use CGI;

my $q = new CGI;

#print $q->header('text/html');
print "Content-type: text/html\n\n";

my $hidden = "hidden";
my $text = "text";
my $checkbox;

if ($q->param('hidden')) {
     $hidden = $q->param('hidden');
     $text   = $q->param('text');
     $checkbox = $q->param('checkbox');
}

print qq {

<FORM METHOD=POST ENCTYPE="multipart/form-data" action="testform.pl">

<INPUT TYPE="text" NAME="text" VALUE="$text">
<INPUT TYPE="hidden" NAME="hidden" VALUE="$hidden">
<INPUT TYPE="checkbox" NAME="checkbox" VALUE="checkbox">

<INPUT TYPE="submit" VALUE="run">

</FORM>

};

my @names = $q->param;
foreach my $name (@names) {
     my $param = $q->param($name);
     print "$name : $param<BR>\n";
}



And if I feed it smartquotes stuff from winword via IE 5.5, it does
exactly as you describe for apache 2, but is ok for apache 1.3.

So in some way, apache 1.3 must be dealing with it, but 2.0(.48)
is broken, or at least, not having whatever it takes to deal with the
IE breakage.

I can't sniff the traffic all that well, as I don't really know how
to do it, but is this something that should/could go into an apache 2
bug report?  I don't think we'd be the only site that's running into
this problem, we're just pretty keen to get all our (complex!)
stuff onto apache2 :)

Carl



-- 
=======================
Vivitec Pty. Ltd.
Suite 6, 51-55 City Rd.
Southbank, 3006.
Ph. +61 3 8626 5626
Fax +61 3 9682 1000
=======================

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] apache2, CGI.pm, IE and smart quotes

Posted by Robert Andersson <ro...@profundis.nu>.
Carl Brewer wrote:
> Is it actually IE that's doing the breaking?  We've found the
> broken encoding, but are confused, because as far as we've been able
> to test, it does work fine in apache 1.3.

A question though. Does this only affect forms who use multipart/form-data
encoding? These were the only forms I had problems with.

I made a test form with one hidden input field, one text field, and a
checkbox. If the text field contained any of a few special characters
(including "smart quotes", I think) and the checkbox was checked, the
request body looked like this:

    -----------------------------7d3da02018a
    Content-Disposition: form-data; name="module"

    module-value
    -----------------------------7d3da02018a
    Content-Disposition: form-data; name="Title"

    some-data
    -----------------------------7d3da02018a
    Content-Disposition: form-data; name="Options"

    visible
    -----------------------------7d3da02018a--

However, if the checkbox was unchecked, the request body looked like this:

    module"

    module-value
    -----------------------------7d32e22018a
    Content-Disposition: form-data; name="Title"

    some-data
    -----------------------------7d32e22018a--

As can be seen in the latter request, the first part is severely corrupted.
Apache 1.3, or PHP with Apache 1.3, *might* have been able salvage the
value. However, when I just tested this letting IE communicate with me
mimicing an HTTP server, I didn't see this behaviour, although my test was
just a quick one.

If you still have the ability, I suggest you do the following:
- Sniff a "good" POST request from IE to Apache 1.3
- Sniff a "bad" POST request from IE to Apache 1.3
- Sniff a "good" POST request from IE to Apache 2.0
- Sniff a "bad" POST request from IE to Apache 2.0

Also sniff the exact headers Apache is sending for the page the form is
located on. A "good" request means one that works with both 1.3 and 2.0, a
"bad" is one that fails on Apache 2.0. Comparing these sniffs should reveal
what is going on, and perhaps if Apache 2.0 does something to cause this,
that 1.3 didn't.

Regards,
Robert Andersson


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] apache2, CGI.pm, IE and smart quotes

Posted by Carl Brewer <ca...@vivitec.com.au>.
Robert Andersson wrote:

> Carl Brewer wrote:
> 
>>The symptom I'm seeing with a POST request is that the first
>>parameter defined in the form seems to go missing if there's a
>>smartquote backtick (I don't know how else to describe this
>>entity!) in one of the text fields.
> 
> 
> This is an Internet Explorer bug, which I have myself had much trouble with;
> very subtile indeed.
> 
> It is not possible to workaround from Apache point-of-view, as IE for very
> mysterious reasons screws up the encoding. I cannot really believe that it
> worked when sending to Apache 1.3, if it wasn't so that 1.3 was somehow able
> to salvage the damaged data.

Is it actually IE that's doing the breaking?  We've found the
broken encoding, but are confused, because as far as we've been able
to test, it does work fine in apache 1.3.

> My workaround is to insert a dummy (hidden) field first in the form, which
> will take the hit. This is a c&p from one of my applications:

I'm doing the same as a tactical fix, but it's far from ideal,
especially when we have hundreds of scripts to patch :)

cheers

Carl


-- 
=======================
Vivitec Pty. Ltd.
Suite 6, 51-55 City Rd.
Southbank, 3006.
Ph. +61 3 8626 5626
Fax +61 3 9682 1000
=======================


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] apache2, CGI.pm, IE and smart quotes

Posted by Robert Andersson <ro...@profundis.nu>.
Carl Brewer wrote:
> The symptom I'm seeing with a POST request is that the first
> parameter defined in the form seems to go missing if there's a
> smartquote backtick (I don't know how else to describe this
> entity!) in one of the text fields.

This is an Internet Explorer bug, which I have myself had much trouble with;
very subtile indeed.

It is not possible to workaround from Apache point-of-view, as IE for very
mysterious reasons screws up the encoding. I cannot really believe that it
worked when sending to Apache 1.3, if it wasn't so that 1.3 was somehow able
to salvage the damaged data.

My workaround is to insert a dummy (hidden) field first in the form, which
will take the hit. This is a c&p from one of my applications:

<form method='post' action='?' enctype='multipart/form-data'>
<input type='hidden' name='dummy-ie-workaround' value='ie_is_buggy'>

When the bug kicks in, this field will be damaged in the request body, but
as it isn't used, no harm is done.

Regards,
Robert Andersson


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org