You are viewing a plain text version of this content. The canonical link for it is here.
Posted to asp@perl.apache.org by karl <ka...@ethervizion.com> on 2004/07/19 18:35:23 UTC

Output formatting problem (text encoding?)

Hello,

I'm a total newbie to Perl/Apache::ASP, but I seem to have gotten 
things to work on a WinXP setup.

I'm currently having problem with the formatting of output. I have 
text output coming from a database and ' (apostrophes) are shown in 
the browser (IE6) as ? (question marks). The weird thing is if I 
save the output as an HTML file and open it in the browser, then 
everything looks fine. The only thing I can figure out is that 
original output shows up as encoded Unicode (UTF-8) in the browser; 
after I save it and open it and things look fine, it shows as being 
encoded as Western European (ISO).

Note, on an IIS/ASP setup, the equivalent output shows up correctly 
and with Western European (ISO) encoding. The only physical 
difference I can find between the output generated by Apache::ASP 
and IIS/ASP is that the Apache::ASP has Unix style LF line-endings 
and the IIS/ASP has DOS/Windows style CRLF line-endings. However, 
I'm pretty sure that this isn't the problem because when I save the 
output from Apache::ASP and reopen in the browser and things look 
fine, it still has Unix style LF line-endings.

So, to make a long story short, I'm trying to figure out how to get 
Apache::ASP to output (correct encoding?) so that the text looks 
correct.

Thanks for any help!

-Karl


---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org


Re: Output formatting problem (text encoding?)

Posted by karl <ka...@ethervizion.com>.
Thanks for your help Warren. I wrote my last message before seeing 
yours. I can see now that it can be confusing to track all the text 
encoding changes, but that it is only the last one that generally 
matters (assuming lossless conversion).

Before I discovered that the AddDefaultCharset Apache directive 
would solve my problem, I found a stopgap solution of setting 
$Response->{Charset} in my script.

Thanks again!

--- In apache-asp@yahoogroups.com, Warren Young <wa...@e...> wrote:
> karl wrote:
> > I have 
> > text output coming from a database and ' (apostrophes) are shown 
in 
> > the browser (IE6) as ? (question marks). 
> 
> There's apostrophes and there are apostrophes.  There's ASCII code 
39, 
> there's Windows code page 1252 code 146, there's Unicode code 
> <mumble>....  The question is, which of these codes are in your 
> database?  You must know the answer to that question before you 
can 
> decide how to proceed.
> 
> Character code handling in the 
database/Apache::ASP/Perl5/Apache/browser 
> chain is stranger than you probably expect.  Here's a post I wrote 
a few 
> months back detailing two chains I've personally observed:
> 
> 	http://www.mail-archive.com/asp@p.../msg01952.html
> 
> Notice that I saw two rather different translation chains on my 
two test 
> systems!  Your particular configuration is quite different from 
either 
> of mine, so it could give yet a third path.
> 
> > The only thing I can figure out is that 
> > original output shows up as encoded Unicode (UTF-8) in the 
browser; 
> 
> Don't guess, find out.
> 
> The way I did the analysis to make that post I linked to, I dumped 
the 
> text in question to a file at several places along the I/O chain, 
then I 
> examined each file.  You should also use a network sniffer to see 
what 
> the HTTP headers and HTML data are without the browser getting in 
the 
> way.  There's a good list of sniffers in the Winsock Programmer's 
FAQ, 
> if you don't have one already:
> 
> 	http://tangentsoft.net/wskfaq/
> 
> I think you'll find, as I did, that your characters are being 
translated 
> back and forth between ISO 8859-x and Unicode multiple times, and 
that 
> the last step isn't being done correctly.
> 
> That last step is critical because of the high probability that 
the 
> intermediate transformations are all lossless in your situation.  
All 
> you have to do is communicate to the browser what the final 
character 
> encoding is.  In my particular situation, I had to change an 
Apache 
> setting to make it send a header informing the browser that the 
> character encoding was UTF-8.  The browser was then able to 
display the 
> web page correctly, nevermind that the data was stored as ISO 8859-
1 
> (Latin-1) in the database, and translated back and forth several 
times 
> along the path.
> 
> > The only physical 
> > difference I can find between the output generated by 
Apache::ASP 
> > and IIS/ASP is that the Apache::ASP has Unix style LF line-
endings 
> > and the IIS/ASP has DOS/Windows style CRLF line-endings. 
> 
> I'll bet you didn't compare the HTTP headers.  Different web 
servers, 
> hence different headers, hence different browser interpretation.
> 
> -------------------------------------------------------------------
--
> To unsubscribe, e-mail: asp-unsubscribe@p...
> For additional commands, e-mail: asp-help@p...


---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org


Re: Output formatting problem (text encoding?)

Posted by Warren Young <wa...@etr-usa.com>.
karl wrote:
> I have 
> text output coming from a database and ' (apostrophes) are shown in 
> the browser (IE6) as ? (question marks). 

There's apostrophes and there are apostrophes.  There's ASCII code 39, 
there's Windows code page 1252 code 146, there's Unicode code 
<mumble>....  The question is, which of these codes are in your 
database?  You must know the answer to that question before you can 
decide how to proceed.

Character code handling in the database/Apache::ASP/Perl5/Apache/browser 
chain is stranger than you probably expect.  Here's a post I wrote a few 
months back detailing two chains I've personally observed:

	http://www.mail-archive.com/asp@perl.apache.org/msg01952.html

Notice that I saw two rather different translation chains on my two test 
systems!  Your particular configuration is quite different from either 
of mine, so it could give yet a third path.

> The only thing I can figure out is that 
> original output shows up as encoded Unicode (UTF-8) in the browser; 

Don't guess, find out.

The way I did the analysis to make that post I linked to, I dumped the 
text in question to a file at several places along the I/O chain, then I 
examined each file.  You should also use a network sniffer to see what 
the HTTP headers and HTML data are without the browser getting in the 
way.  There's a good list of sniffers in the Winsock Programmer's FAQ, 
if you don't have one already:

	http://tangentsoft.net/wskfaq/

I think you'll find, as I did, that your characters are being translated 
back and forth between ISO 8859-x and Unicode multiple times, and that 
the last step isn't being done correctly.

That last step is critical because of the high probability that the 
intermediate transformations are all lossless in your situation.  All 
you have to do is communicate to the browser what the final character 
encoding is.  In my particular situation, I had to change an Apache 
setting to make it send a header informing the browser that the 
character encoding was UTF-8.  The browser was then able to display the 
web page correctly, nevermind that the data was stored as ISO 8859-1 
(Latin-1) in the database, and translated back and forth several times 
along the path.

> The only physical 
> difference I can find between the output generated by Apache::ASP 
> and IIS/ASP is that the Apache::ASP has Unix style LF line-endings 
> and the IIS/ASP has DOS/Windows style CRLF line-endings. 

I'll bet you didn't compare the HTTP headers.  Different web servers, 
hence different headers, hence different browser interpretation.

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org


Re: Output formatting problem (text encoding?)

Posted by karl <ka...@ethervizion.com>.
Nevermind...I realized that this was an Apache issue. I fixed the 
problem by changing the AddDefaultCharset to ISO-8859-1.

Thanks anyway!

--- In apache-asp@yahoogroups.com, "karl" <ka...@e...> wrote:
> Hello,
> 
> I'm a total newbie to Perl/Apache::ASP, but I seem to have gotten 
> things to work on a WinXP setup.
> 
> I'm currently having problem with the formatting of output. I have 
> text output coming from a database and ' (apostrophes) are shown 
in 
> the browser (IE6) as ? (question marks). The weird thing is if I 
> save the output as an HTML file and open it in the browser, then 
> everything looks fine. The only thing I can figure out is that 
> original output shows up as encoded Unicode (UTF-8) in the 
browser; 
> after I save it and open it and things look fine, it shows as 
being 
> encoded as Western European (ISO).
> 
> Note, on an IIS/ASP setup, the equivalent output shows up 
correctly 
> and with Western European (ISO) encoding. The only physical 
> difference I can find between the output generated by Apache::ASP 
> and IIS/ASP is that the Apache::ASP has Unix style LF line-endings 
> and the IIS/ASP has DOS/Windows style CRLF line-endings. However, 
> I'm pretty sure that this isn't the problem because when I save 
the 
> output from Apache::ASP and reopen in the browser and things look 
> fine, it still has Unix style LF line-endings.
> 
> So, to make a long story short, I'm trying to figure out how to 
get 
> Apache::ASP to output (correct encoding?) so that the text looks 
> correct.
> 
> Thanks for any help!
> 
> -Karl
> 
> 
> -------------------------------------------------------------------
--
> To unsubscribe, e-mail: asp-unsubscribe@p...
> For additional commands, e-mail: asp-help@p...


---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org