You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Scott Shen <ic...@gmail.com> on 2007/12/27 18:16:33 UTC

The character set about the PostMethod class

Dear friends:

    I am a developer in Chinese and a user of HttpClient component of Apache
Software Foundation. I am very happy to use this component which takes much
convenience for my

work, hence, I extend my thankfulness to all developers of this component.
But, I also encounter some trouble. The PostMehod posts the data to a URL
which cannot set the character

set automatically according to the Locale of the computer system(s). For
instance, In China, the character set may be the GBK, GB2312 or UTF8, but
the PostMethod cannot

automatically config the encoding style. So, I suggest that you may set
the character set in the HttpMethodBase class' method getRequestCharSet()
from the getContentCharSet

method, and if,  the http headers not specify a Contnet-Type header, you may
get the character set from the Locale of the computer system. So that the
problem of encoding can be

avoided.  The  (DEFAULT_CHARSET = "ISO-8859-1") may not suit for any
computer system.

   Extend my best wishes to you.


Scott S.

Re: The character set about the PostMethod class

Posted by Roland Weber <os...@dubioso.net>.
Hello Scott,

>     I am a developer in Chinese and a user of HttpClient component of Apache
> Software Foundation. I am very happy to use this component which takes much
> convenience for my
> work, hence, I extend my thankfulness to all developers of this component.

Thanks, it's always good to learn that somebody finds our stuff useful.

> the http headers not specify a Contnet-Type header, you may
> get the character set from the Locale of the computer system.

I'm afraid that would be a very bad move. We do not have the resources
of a company like Sun, who can test their deliverables on a variety of
platforms with different characteristics like locale settings. We are
just a handful of core developers spending their spare time on this
project.
One of the most important characteristics of a library, from my point
of view, is predictable behaviour. If we allowed system properties to
affect the behaviour of HttpClient, we would be begging for trouble.
Let's take the locale example, since you mentioned it. Suddenly, people
would report that they are having problems on platform with a specific
locale, which none of the developers here uses. So how are we supposed
to help them? I can't change my system to a Big5 or ShiftJIS locale
without risking to make it unusable. And the same problem would be
introduced to _every_ application using HttpClient.
I hope you understand that this is absolutely impossible. We are
providing libraries that (we hope) behave identically on all systems,
regardless of the locale or other system settings that do not directly
affect IO. In fact, a lot of effort has been spent in the past to make
sure that HttpClient works correctly even on platforms like OS/390
(aka z/OS) which uses EBCDIC instead of ASCII based character sets.

> So that the problem of encoding can be avoided.

As explained above, this problem cannot be avoided: It must be
addressed. If we tried to do that within the library, we would
create a lot of trouble for ourselves and our users.
If you want your application to change it's behavior based on
the system locale, your application has to take responsibility
for that.

> The  (DEFAULT_CHARSET = "ISO-8859-1") may not suit for any
> computer system.

As the name indicates, this is a _default_ that can be overridden
by your application. It is up to your application to decide which
character set should be used. HttpClient doesn't care whether you
obtain the character set from the system locale or elsewhere. You
can even specify different character sets for the header and body
of the messages:

http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/params/HttpMethodParams.html#setHttpElementCharset(java.lang.String)
http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/params/HttpMethodParams.html#setContentCharset(java.lang.String)

Parameter hierarchies are explained in the Preferences Guide:
http://hc.apache.org/httpclient-3.x/preference-api.html

The same basic parameterization technique is used in HttpClient 4.0,
although there are variations in the details of the implementation.
We will not make HttpClient depend on system settings. We make it
configurable, and applications can set parameters based on system
settings if that is desired.

hope this helps,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org