You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Robin Berjon <ro...@knowscape.com> on 2001/06/13 16:17:14 UTC
Charset woes
Hi,
I'm running into trouble with browsers submitting data using various charsets
and not telling me which charset they're using. This results in all sorts of
breakages and unusable text. I can't be the only one dealing with this
problem (if I am, then I'm really out of luck) so I was wondering if anyone
here knows of a good way to reliably detect the charset that the browser is
using to post its data.
Thanks,
--
_______________________________________________________________________
Robin Berjon <ro...@knowscape.com> -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
-----------------------------------------------------------------------
Always remember you're unique just like everyone else.
Re: Charset woes
Posted by Robin Berjon <ro...@knowscape.com>.
On Wednesday 13 June 2001 20:15, Ričardas Čepas wrote:
> On Wed Jun 13 16:17:14 2001 +0200 Robin Berjon wrote:
> > Hi,
> >
> > I'm running into trouble with browsers submitting data using various
> > charsets and not telling me which charset they're using. This results in
> > all sorts of breakages and unusable text. I can't be the only one dealing
> > with this problem (if I am, then I'm really out of luck) so I was
> > wondering if anyone here knows of a good way to reliably detect the
> > charset that the browser is using to post its data.
>
> Make sure your page or http header has charset declared and add
> hidden input field with known string that you can examin when submitted
> back.
Hmm, that's an interesting approach. Have you used it before ? Do you know of
a string that could potentially detect any encoding ? I'm really facing
pretty much anything depending on the browser's whim.
--
_______________________________________________________________________
Robin Berjon <ro...@knowscape.com> -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
-----------------------------------------------------------------------
Lavish spending can be disastrous. Don't buy lavishes for a while.
Re: Charset woes
Posted by Ričardas Čepas <rc...@richard.eu.org>.
On Wed Jun 13 16:17:14 2001 +0200 Robin Berjon wrote:
> Hi,
>
> I'm running into trouble with browsers submitting data using various charsets
> and not telling me which charset they're using. This results in all sorts of
> breakages and unusable text. I can't be the only one dealing with this
> problem (if I am, then I'm really out of luck) so I was wondering if anyone
> here knows of a good way to reliably detect the charset that the browser is
> using to post its data.
>
Make sure your page or http header has charset declared and add hidden input field with known string that you can examin when submitted back.
--
☻ Ričardas Čepas ☺
~~
~
Re: Charset woes
Posted by Robin Berjon <ro...@knowscape.com>.
On Thursday 14 June 2001 23:40, Ged Haywood wrote:
> On Thu, 14 Jun 2001, Robin Berjon wrote:
> > The problem is simply that I need to mix that data with other data
> > in another encoding, which means I have to convert it.
>
> Do you send a charset specification to the client? This was often
> overlooked until the cross-site scripting thing blew up early last
> year. I wonder if browsers seeing that might be more forthcoming if
> they want to use a something different.
Yes I am, AxKit doesn't give you much of a choice there (rightfully so) :)
After doing a number of tests, I've found that browsers (even totally
non-compliant ones) tend to POST back in the same charset you used to send
the page to them, unless the user types characters that don't fit into that
charset (people usually post in the language in which the page is written,
but sometimes their names will not fit into the charset). The spec says they
_may_ do that if accept-charset is set to UNKNOWN (its default value), but
then the spec is moot when it comes to browsers.
So now if I could find a way to send UTF-8 to Netscape 4 without it blowing
up, I might have found a workable solution to this problem :-)
--
_______________________________________________________________________
Robin Berjon <ro...@knowscape.com> -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
-----------------------------------------------------------------------
"Chance is irrelevant. We will succeed."
-- 7o9
Re: Charset woes
Posted by Ged Haywood <ge...@www2.jubileegroup.co.uk>.
Hi Robin,
On Thu, 14 Jun 2001, Robin Berjon wrote:
> The problem is simply that I need to mix that data with other data
> in another encoding, which means I have to convert it.
Do you send a charset specification to the client? This was often
overlooked until the cross-site scripting thing blew up early last
year. I wonder if browsers seeing that might be more forthcoming if
they want to use a something different.
73,
Ged.
Re: Charset woes
Posted by Robin Berjon <ro...@knowscape.com>.
On Thursday 14 June 2001 13:18, Ged Haywood wrote:
> On Wed, 13 Jun 2001, Robin Berjon wrote:
> > I'm running into trouble with browsers submitting data using various
> > charsets and not telling me which charset they're using. This results in
> > all sorts of breakages and unusable text. I can't be the only one dealing
> > with this problem (if I am, then I'm really out of luck) so I was
> > wondering if anyone here knows of a good way to reliably detect the
> > charset that the browser is using to post its data.
>
> It will be very difficult to guess reliably what charset is in use from
> a random sample of characters taken from it. I think you just have to
> be able to handle the data. You need sixteen bits per character.
I'm able to handle the data :) The problem is simply that I need to mix that
data with other data in another encoding, which means I have to convert it.
And in order to convert it, I need to know the original encoding... otherwise
either the converter will blow up, or I'll corrupt the content.
Thanks Ged :)
--
_______________________________________________________________________
Robin Berjon <ro...@knowscape.com> -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
-----------------------------------------------------------------------
There are trivial truths and there are great Truths. The opposite of
a trival truth is obviously false. The opposite of a great Truth is
also true.
-- Niels Bohr
Re: Charset woes
Posted by Ged Haywood <ge...@www2.jubileegroup.co.uk>.
Hi Robin,
On Wed, 13 Jun 2001, Robin Berjon wrote:
> I'm running into trouble with browsers submitting data using various charsets
> and not telling me which charset they're using. This results in all sorts of
> breakages and unusable text. I can't be the only one dealing with this
> problem (if I am, then I'm really out of luck) so I was wondering if anyone
> here knows of a good way to reliably detect the charset that the browser is
> using to post its data.
It will be very difficult to guess reliably what charset is in use from
a random sample of characters taken from it. I think you just have to
be able to handle the data. You need sixteen bits per character.
73,
Ged.