You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Robin Berjon <ro...@knowscape.com> on 2001/06/13 16:17:14 UTC

Charset woes

Hi,

I'm running into trouble with browsers submitting data using various charsets 
and not telling me which charset they're using. This results in all sorts of 
breakages and unusable text. I can't be the only one dealing with this 
problem (if I am, then I'm really out of luck) so I was wondering if anyone 
here knows of a good way to reliably detect the charset that the browser is 
using to post its data.

Thanks,

-- 
_______________________________________________________________________
Robin Berjon <ro...@knowscape.com> -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
-----------------------------------------------------------------------
Always remember you're unique just like everyone else. 


Re: Charset woes

Posted by Robin Berjon <ro...@knowscape.com>.
On Wednesday 13 June 2001 20:15, Ričardas Čepas wrote:
> On Wed Jun 13 16:17:14 2001 +0200 Robin Berjon wrote:
> > Hi,
> >
> > I'm running into trouble with browsers submitting data using various
> > charsets and not telling me which charset they're using. This results in
> > all sorts of breakages and unusable text. I can't be the only one dealing
> > with this problem (if I am, then I'm really out of luck) so I was
> > wondering if anyone here knows of a good way to reliably detect the
> > charset that the browser is using to post its data.
>
>         Make sure your page or http header has charset declared and add
> hidden input field with known string that you can examin when submitted
> back.

Hmm, that's an interesting approach. Have you used it before ? Do you know of 
a string that could potentially detect any encoding ? I'm really facing 
pretty much anything depending on the browser's whim.

-- 
_______________________________________________________________________
Robin Berjon <ro...@knowscape.com> -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
-----------------------------------------------------------------------
Lavish spending can be disastrous. Don't buy lavishes for a while.


Re: Charset woes

Posted by Ričardas Čepas <rc...@richard.eu.org>.
On Wed Jun 13 16:17:14 2001 +0200 Robin Berjon wrote:

> Hi,
> 
> I'm running into trouble with browsers submitting data using various charsets 
> and not telling me which charset they're using. This results in all sorts of 
> breakages and unusable text. I can't be the only one dealing with this 
> problem (if I am, then I'm really out of luck) so I was wondering if anyone 
> here knows of a good way to reliably detect the charset that the browser is 
> using to post its data.
> 
        Make sure your page or http header has charset declared and add hidden input field with known string that you can examin when submitted back.
-- 
      ☻ Ričardas Čepas ☺
~~
~

Re: Charset woes

Posted by Robin Berjon <ro...@knowscape.com>.
On Thursday 14 June 2001 23:40, Ged Haywood wrote:
> On Thu, 14 Jun 2001, Robin Berjon wrote:
> > The problem is simply that I need to mix that data with other data
> > in another encoding, which means I have to convert it.
>
> Do you send a charset specification to the client?  This was often
> overlooked until the cross-site scripting thing blew up early last
> year.  I wonder if browsers seeing that might be more forthcoming if
> they want to use a something different.

Yes I am, AxKit doesn't give you much of a choice there (rightfully so) :) 
After doing a number of tests, I've found that browsers (even totally 
non-compliant ones) tend to POST back in the same charset you used to send 
the page to them, unless the user types characters that don't fit into that 
charset (people usually post in the language in which the page is written, 
but sometimes their names will not fit into the charset). The spec says they 
_may_ do that if accept-charset is set to UNKNOWN (its default value), but 
then the spec is moot when it comes to browsers.

So now if I could find a way to send UTF-8 to Netscape 4 without it blowing 
up, I might have found a workable solution to this problem :-)

-- 
_______________________________________________________________________
Robin Berjon <ro...@knowscape.com> -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
-----------------------------------------------------------------------
"Chance is irrelevant. We will succeed." 
-- 7o9


Re: Charset woes

Posted by Ged Haywood <ge...@www2.jubileegroup.co.uk>.
Hi Robin,

On Thu, 14 Jun 2001, Robin Berjon wrote:

> The problem is simply that I need to mix that data with other data
> in another encoding, which means I have to convert it.

Do you send a charset specification to the client?  This was often
overlooked until the cross-site scripting thing blew up early last
year.  I wonder if browsers seeing that might be more forthcoming if
they want to use a something different.

73,
Ged.


Re: Charset woes

Posted by Robin Berjon <ro...@knowscape.com>.
On Thursday 14 June 2001 13:18, Ged Haywood wrote:
> On Wed, 13 Jun 2001, Robin Berjon wrote:
> > I'm running into trouble with browsers submitting data using various
> > charsets and not telling me which charset they're using. This results in
> > all sorts of breakages and unusable text. I can't be the only one dealing
> > with this problem (if I am, then I'm really out of luck) so I was
> > wondering if anyone here knows of a good way to reliably detect the
> > charset that the browser is using to post its data.
>
> It will be very difficult to guess reliably what charset is in use from
> a random sample of characters taken from it.  I think you just have to
> be able to handle the data.  You need sixteen bits per character.

I'm able to handle the data :) The problem is simply that I need to mix that 
data with other data in another encoding, which means I have to convert it. 
And in order to convert it, I need to know the original encoding... otherwise 
either the converter will blow up, or I'll corrupt the content.

Thanks Ged :)

-- 
_______________________________________________________________________
Robin Berjon <ro...@knowscape.com> -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
-----------------------------------------------------------------------
There are trivial truths and there are great Truths. The opposite of 
a trival truth is obviously false. The opposite of a great Truth is 
also true.  
-- Niels Bohr 


Re: Charset woes

Posted by Ged Haywood <ge...@www2.jubileegroup.co.uk>.
Hi Robin,

On Wed, 13 Jun 2001, Robin Berjon wrote:

> I'm running into trouble with browsers submitting data using various charsets 
> and not telling me which charset they're using. This results in all sorts of 
> breakages and unusable text. I can't be the only one dealing with this 
> problem (if I am, then I'm really out of luck) so I was wondering if anyone 
> here knows of a good way to reliably detect the charset that the browser is 
> using to post its data.

It will be very difficult to guess reliably what charset is in use from
a random sample of characters taken from it.  I think you just have to
be able to handle the data.  You need sixteen bits per character.

73,
Ged.