You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@jakarta.apache.org by Michal Mosiewicz <mi...@interdata.com.pl> on 1999/11/02 15:01:42 UTC

JASPER: page charset handling broken

I haven't noticed where it is exactly broken, but the point is that JSP
Spec assumes that page should be created using the same charset as
defined in '<%@page ContentType=....'.

It doesn't work that way in Jasper.

-- Mike

JSP large file test

Posted by Arun Jamwal <Ar...@eng.sun.com>.

Hi Anil,
Just wanted to check with you if you wanted me to
add the test you mentioned in yesterday's meeting
- something related to large file....

Thanks,
Arun.

Re: JASPER: page charset handling broken

Posted by "Anil K. Vijendran" <An...@eng.sun.com>.


Michal Mosiewicz wrote:

> "Anil K. Vijendran" wrote:
> >
> > [Moving the discussion to tomcat-dev]
> >
> > I recently heard about this, myself from one of the users of this JSP
> > engine. I believe the way it is supposed to work is that you read until you
> > encounter contentType and then re-read the file using the encoding you saw
> > in contentType. Right now, the JSP engine always uses the encoding obtained
> > using System.getProperty("file.encoding", "8859_1").
>
> It seems that there are more than one bug...

Quite possible :-)

> I have done exactly what you're talking about. I.e. I changed
> createJspReader to pass additional encoding parameter, and changed
> Compiler to check files twice if it appears that the file was read using
> a different encoding.

Let's investigate this a bit more and then I can commit your patch. I'm hoping to
hear from folks that implement XML parsers :-) since they have to deal with
similar issues.

> The result is somehow strange... If I set 'charset=iso-8859-1', I can
> see that the content of resulting page matches what I typed. However, if
> I try using iso-8859-2, I can see in the source of page, that it looks
> like it was interpreted as unicode string...
>
> For example, by using (excuse me this 8859-2 chars) the following
> characters: "¿¼ó³±¿¼±¿¼¼¼ó³±", I get them exactly the same in resulting
> page if I set charset=iso-8859-1. Of course it is improperly interpreted
> by the browser, becouse charset is obviously wrong, but the codes are
> matched. However, if I set iso-8859-2, I get something like:
> '|zóB|z|zzzóB' as result, and
> "...|z\u00f3B\u0005|z\u0005|zzz\u00f3B\u0005..." in the page source.
>
> It seems like setting iso-8859-2 makes my JVM to interpret the stream as
> unicode???
>
> -- Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org

--
Peace, Anil +<:-)

Re: JASPER: page charset handling broken

Posted by Michal Mosiewicz <mi...@interdata.com.pl>.

"Anil K. Vijendran" wrote:
> 
> [Moving the discussion to tomcat-dev]
> 
> I recently heard about this, myself from one of the users of this JSP
> engine. I believe the way it is supposed to work is that you read until you
> encounter contentType and then re-read the file using the encoding you saw
> in contentType. Right now, the JSP engine always uses the encoding obtained
> using System.getProperty("file.encoding", "8859_1").

It seems that there are more than one bug...

I have done exactly what you're talking about. I.e. I changed
createJspReader to pass additional encoding parameter, and changed
Compiler to check files twice if it appears that the file was read using
a different encoding. 

The result is somehow strange... If I set 'charset=iso-8859-1', I can
see that the content of resulting page matches what I typed. However, if
I try using iso-8859-2, I can see in the source of page, that it looks
like it was interpreted as unicode string...

For example, by using (excuse me this 8859-2 chars) the following
characters: "żźółążźążźźźółą", I get them exactly the same in resulting
page if I set charset=iso-8859-1. Of course it is improperly interpreted
by the browser, becouse charset is obviously wrong, but the codes are
matched. However, if I set iso-8859-2, I get something like:
'|zóB|z|zzzóB' as result, and
"...|z\u00f3B\u0005|z\u0005|zzz\u00f3B\u0005..." in the page source.

It seems like setting iso-8859-2 makes my JVM to interpret the stream as
unicode??? 

-- Mike

Re: JASPER: page charset handling broken

Posted by "Anil K. Vijendran" <An...@eng.sun.com>.

[Moving the discussion to tomcat-dev]

I recently heard about this, myself from one of the users of this JSP
engine. I believe the way it is supposed to work is that you read until you
encounter contentType and then re-read the file using the encoding you saw
in contentType. Right now, the JSP engine always uses the encoding obtained
using System.getProperty("file.encoding", "8859_1").

I admint my understanding of this is limited. Would appreciate any input or
tested patches in this area.

Thanks.

Michal Mosiewicz wrote:

> I haven't noticed where it is exactly broken, but the point is that JSP
> Spec assumes that page should be created using the same charset as
> defined in '<%@page ContentType=....'.
>
> It doesn't work that way in Jasper.
>
> -- Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: general-help@jakarta.apache.org

--
Peace, Anil +<:-)