You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hc.apache.org by "Kalnichevski, Oleg" <ol...@bearingpoint.com> on 2003/01/08 16:35:09 UTC

[PATCH] code page problem fix (revision 1)

Ortwin, Neal

Here's my first shot at it. Feedback, critique welcome. I have done my best to ensure that appropriate charsets are used throughout the HttpClient code-base. I have not touched PutMethod, though, as I believe it needs to be completely rewritten.

Ortwin, once this issue is done with I'd like to rework PostMethod and PutMethod classes to make them consistent (at least) on the API level. Currently these two methods are totally out of sync despite massive commonalities. What do you think?

Oleg

-----Original Message-----
From: neal.johnston-ward@ubs.com [mailto:neal.johnston-ward@ubs.com]
Sent: Wednesday, January 08, 2003 2:17 PM
To: commons-httpclient-dev@jakarta.apache.org
Subject: RE: Possible code page problem


Hi Oleg and Odi,

Thanks for taking an interest and picking this one up quickly. If I can
help at all then just ask. 

I will certainly test the patch out as soon as it is ready. 

Neal

Re: [PATCH] code page problem fix (revision 1)

Posted by Ortwin Glück <or...@nose.ch>.

Had only time for a quick glance. Looks okay so far. Maybe we can find 
better names for HttpElement.encode methods and the class. Will test a 
little tomorrow.


Odi

Re: [PATCH] code page problem fix (Review)

Posted by Ortwin Glück <or...@nose.ch>.

Oleg Kalnichevski wrote:
> My understanding is that ISO-8859-1 is default content encoding, however
> protocol elements such as headers are supposed to use US-ASCII. Do you
> see this differently?

Yes you are right. I read RFC 2616 sections 2.2 and 3.7.1 now.

> I have little idea of what is going on in the URIUtil class. Maybe time
> has come for me to get my self introduced

Maybe Sung-Gu can shed some light onto it.

> What about better class and methods names. Any ideas?

What about HttpConstants as a class name that has the encoding 
identifier as a public constant (we could also stuff more constants in 
as needed e.g. status codes etc.) and the two methods called 
getBytes(String) and getString(byte[]).

Re: [PATCH] code page problem fix (Review)

Posted by Oleg Kalnichevski <o....@dplanet.ch>.

Ortwin



On Thu, 2003-01-09 at 09:27, Ortwin Glück wrote: 
> I reviewed your patch and noticed the following:
> 
> - The HTTP charset is ISO-8859-1 and *not* US-ASCII !
> 


My understanding is that ISO-8859-1 is default content encoding, however
protocol elements such as headers are supposed to use US-ASCII. Do you
see this differently?


> - Does not cover the NTLM class: the NTLM class makes excessive use of 
> String.getBytes(String) but all with hard-coded and repeated "ASCII" 
> encoding identifiers. These should define a class-local constant for it. 
> Some of the methods throw UnsupportedEncodingException. This exception 
> should be caught internally because ASCII has to be supported by *all* 
> JVM implementations.
> 

Got it. I'll clean it up


> - I noticed there are some really strange methods (toUsingCharset) in 
> URIUtil. What the heck are they for? They do basically nothing but 
> replace missing characters with question marks.... I don't think this is 
> really useful for anything.
> 


I have little idea of what is going on in the URIUtil class. Maybe time
has come for me to get my self introduced


> Following line numbers refer lines AFTER applying your patch.
> 
> there are still getBytes() with default encoding in:
> - PostMethod lines 361, 809
> - PutMethod line 189
> - Base64 line 136
> - SimpleHttpConnection line 139
> - TestAuthenticator: all lines like String expected = "Basic " + new 
> String(Base64.encode("username:password".getBytes()));
> - TestBase64
> - TestStreams
> - TestWebappMethods
> - TestWebappRedirect
> 
> there are still calls to new String(byte[]) with default encoding in:
> - PostXML sample
> - Authenticator lines 288, 833
> - HttpConnection line 666; line 635 has a static reference to "ISO-8859-1"
> - HttpMethodBase line 692
> - TestStreams lines 36, 51, 99, 119
> 
> - I did not run any tests for now
> 

Hmmm, I have overlooked quite a few. However, I felt that in some cases
the use of getBytes() was legitimate. I'll review all these cases one
more time

What about better class and methods names. Any ideas?

Cheers

Oleg

Re: [PATCH] code page problem fix (Review)

Posted by Ortwin Glück <or...@nose.ch>.

I reviewed your patch and noticed the following:

- The HTTP charset is ISO-8859-1 and *not* US-ASCII !

- Does not cover the NTLM class: the NTLM class makes excessive use of 
String.getBytes(String) but all with hard-coded and repeated "ASCII" 
encoding identifiers. These should define a class-local constant for it. 
Some of the methods throw UnsupportedEncodingException. This exception 
should be caught internally because ASCII has to be supported by *all* 
JVM implementations.

- I noticed there are some really strange methods (toUsingCharset) in 
URIUtil. What the heck are they for? They do basically nothing but 
replace missing characters with question marks.... I don't think this is 
really useful for anything.

Following line numbers refer lines AFTER applying your patch.

there are still getBytes() with default encoding in:
- PostMethod lines 361, 809
- PutMethod line 189
- Base64 line 136
- SimpleHttpConnection line 139
- TestAuthenticator: all lines like String expected = "Basic " + new 
String(Base64.encode("username:password".getBytes()));
- TestBase64
- TestStreams
- TestWebappMethods
- TestWebappRedirect

there are still calls to new String(byte[]) with default encoding in:
- PostXML sample
- Authenticator lines 288, 833
- HttpConnection line 666; line 635 has a static reference to "ISO-8859-1"
- HttpMethodBase line 692
- TestStreams lines 36, 51, 99, 119

- I did not run any tests for now

Cheers

Odi