You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Joan Balaguero <jo...@grupoventus.com> on 2012/01/12 14:15:00 UTC

Encoding issue

Hello Oleg,

 

I’m having an strange issue with encoding using HttpClient 4.0.1.

 

I’m sending a string to a servlet using HttpClient 4.0.1. The servlet just
sends back to the client exactly the same received string:

 

This is the piece of code that sends the string to the servlet and receives
the response:

 

( . . . )

 

String ENCODING = “ISO-8859-1”;

String str   = "ÑÑÑ_ÁÁÁ";

 

objPost = new HttpPost(url);

 

StringEntity strEntity = new StringEntity(str);

strEntity.setContentType("text/plain; charset=" + ENCODING);

objPost.setEntity(strEntity);

                    

HttpEntity entity = null;

 

try 

{ 

 entity = objHttp.execute(objPost).getEntity(); 

 

// Read the response.

BufferedInputStream bis   = new BufferedInputStream(entity.getContent());

ByteArrayOutputStream bos = new ByteArrayOutputStream();

byte[] buffer = new byte[4098];

int numBytes;

 

while ((numBytes = bis.read(buffer)) >= 0) bos.write(buffer, 0, numBytes);

                     

 System.out.println("RESPONSE = " + new String(bos.toString(ENCODING));

 

( . . . )

 

 

And this is the piece of code of the servlet that prints the request
inputstream directly to the response outputstream:

 

( . . . )

 

   BufferedOutputStream bos = null;

   

   try

   {

    response.setContentType(request.getContentType());

 

    bos                 = new
BufferedOutputStream(response.getOutputStream());

    InputStream in      = request.getInputStream();

    byte[] tmp   = new byte[4098];

    int numBytesRead = 0;

    while ((numBytesRead = in.read(tmp, 0, 4098)) >= 0) bos.write(tmp, 0,
numBytesRead);

 

( . . . )

 

 

If the ENCODING variable is “ISO-8859-1”, the response is received OK:

RESPONSE = ÑÑÑ_ÁÁÁ

 

But if the ENCODING variable is “UTF-8”, the response is received KO:

RESPONSE = "???_???"

In this case, if I force an “iso” decoding in the line:

System.out.println("RESPONSE = " + new String(bos.toString(“ISO-8859-1”));

Then it works: RESPONSE = ÑÑÑ_ÁÁÁ

 

 

Could be an issue in StringEntity? Or maybe I’m missing anything?

 

Thanks in advance,

 

Joan.


RE: Encoding issue

Posted by Joan Balaguero <jo...@grupoventus.com>.
Hi Oleg,

Thanks a lot. I hadn't seen this constructor. Problem solved.

Joan.

-----Mensaje original-----
De: Oleg Kalnichevski [mailto:olegk@apache.org] 
Enviado el: jueves, 12 de enero de 2012 21:04
Para: HttpClient User Discussion
Asunto: RE: Encoding issue

On Thu, 2012-01-12 at 20:57 +0100, Joan Balaguero wrote:
> Hi Oleg,
> 
> > StringEntity strEntity = new StringEntity(str);
> > 
> This constructor assumes default charset encoding for HTTP content which is ISO-8859-1.
> 
> > strEntity.setContentType("text/plain; charset=" + ENCODING);
> > 
> This basically causes the content to be decoded incorrectly as long as ENCODING is not ISO-8859-1.
> 
> 
> Then, what is my option? Something like:
> 
> ByteArrayEntity bae = new ByteArrayEntity(str.getBytes(ENCODING));
> objPost.setEntity(bae);
> 
> 

Why do not you simply use StringEntity#StringEntity(String, String)
constructor?

> Another question: if the StringEntity constructor assumes ISO encoding for the String, what's the utility of 'entity.setContentType'?

To be able to specify a Content-Type with custom attributes besides
'charset'.

Oleg

> I expected to find an empty constructor to do something like:
> StringEntity strEntity = new StringEntity();
> strEntity.setContentType("text/plain; charset=" + ENCODING);
> strEntity.setContent(str);
> 
> 
> Thanks,
> Joan.



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


-----
No se encontraron virus en este mensaje.
Comprobado por AVG - www.avg.com
Versión: 2012.0.1901 / Base de datos de virus: 2109/4738 - Fecha de publicación: 01/12/12


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


RE: Encoding issue

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Thu, 2012-01-12 at 20:57 +0100, Joan Balaguero wrote:
> Hi Oleg,
> 
> > StringEntity strEntity = new StringEntity(str);
> > 
> This constructor assumes default charset encoding for HTTP content which is ISO-8859-1.
> 
> > strEntity.setContentType("text/plain; charset=" + ENCODING);
> > 
> This basically causes the content to be decoded incorrectly as long as ENCODING is not ISO-8859-1.
> 
> 
> Then, what is my option? Something like:
> 
> ByteArrayEntity bae = new ByteArrayEntity(str.getBytes(ENCODING));
> objPost.setEntity(bae);
> 
> 

Why do not you simply use StringEntity#StringEntity(String, String)
constructor?

> Another question: if the StringEntity constructor assumes ISO encoding for the String, what's the utility of 'entity.setContentType'?

To be able to specify a Content-Type with custom attributes besides
'charset'.

Oleg

> I expected to find an empty constructor to do something like:
> StringEntity strEntity = new StringEntity();
> strEntity.setContentType("text/plain; charset=" + ENCODING);
> strEntity.setContent(str);
> 
> 
> Thanks,
> Joan.



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


RE: Encoding issue

Posted by Joan Balaguero <jo...@grupoventus.com>.
Hi Oleg,

> StringEntity strEntity = new StringEntity(str);
> 
This constructor assumes default charset encoding for HTTP content which is ISO-8859-1.

> strEntity.setContentType("text/plain; charset=" + ENCODING);
> 
This basically causes the content to be decoded incorrectly as long as ENCODING is not ISO-8859-1.


Then, what is my option? Something like:

ByteArrayEntity bae = new ByteArrayEntity(str.getBytes(ENCODING));
objPost.setEntity(bae);


Another question: if the StringEntity constructor assumes ISO encoding for the String, what's the utility of 'entity.setContentType'?
I expected to find an empty constructor to do something like:
StringEntity strEntity = new StringEntity();
strEntity.setContentType("text/plain; charset=" + ENCODING);
strEntity.setContent(str);


Thanks,
Joan.



-----Mensaje original-----
De: Oleg Kalnichevski [mailto:olegk@apache.org] 
Enviado el: jueves, 12 de enero de 2012 16:52
Para: HttpClient User Discussion
Asunto: Re: Encoding issue

On Thu, 2012-01-12 at 14:15 +0100, Joan Balaguero wrote:
> Hello Oleg,
> 
>  
> 
> I’m having an strange issue with encoding using HttpClient 4.0.1.
> 
>  
> 
> I’m sending a string to a servlet using HttpClient 4.0.1. The servlet just
> sends back to the client exactly the same received string:
> 
>  
> 
> This is the piece of code that sends the string to the servlet and receives
> the response:
> 
>  
> 
> ( . . . )
> 
>  
> 
> String ENCODING = “ISO-8859-1”;
> 
> String str   = "ÑÑÑ_ÁÁÁ";
> 
>  
> 
> objPost = new HttpPost(url);
> 
>  
> 
> StringEntity strEntity = new StringEntity(str);
> 

This constructor assumes default charset encoding for HTTP content which
is ISO-8859-1.

> strEntity.setContentType("text/plain; charset=" + ENCODING);
> 

This basically causes the content to be decoded incorrectly as long as
ENCODING is not ISO-8859-1.

Oleg

> objPost.setEntity(strEntity);
> 
>                     
> 
> HttpEntity entity = null;
> 
>  
> 
> try 
> 
> { 
> 
>  entity = objHttp.execute(objPost).getEntity(); 
> 
>  
> 
> // Read the response.
> 
> BufferedInputStream bis   = new BufferedInputStream(entity.getContent());
> 
> ByteArrayOutputStream bos = new ByteArrayOutputStream();
> 
> byte[] buffer = new byte[4098];
> 
> int numBytes;
> 
>  
> 
> while ((numBytes = bis.read(buffer)) >= 0) bos.write(buffer, 0, numBytes);
> 
>                      
> 
>  System.out.println("RESPONSE = " + new String(bos.toString(ENCODING));
> 
>  
> 
> ( . . . )
> 
>  
> 
> 
> 
> And this is the piece of code of the servlet that prints the request
> inputstream directly to the response outputstream:
> 
>  
> 
> ( . . . )
> 
>  
> 
>    BufferedOutputStream bos = null;
> 
>    
> 
>    try
> 
>    {
> 
>     response.setContentType(request.getContentType());
> 
>  
> 
>     bos                 = new
> BufferedOutputStream(response.getOutputStream());
> 
>     InputStream in      = request.getInputStream();
> 
>     byte[] tmp   = new byte[4098];
> 
>     int numBytesRead = 0;
> 
>     while ((numBytesRead = in.read(tmp, 0, 4098)) >= 0) bos.write(tmp, 0,
> numBytesRead);
> 
>  
> 
> ( . . . )
> 
>  
> 
> 
> 
> If the ENCODING variable is “ISO-8859-1”, the response is received OK:
> 
> RESPONSE = ÑÑÑ_ÁÁÁ
> 
>  
> 
> But if the ENCODING variable is “UTF-8”, the response is received KO:
> 
> RESPONSE = "???_???"
> 
> In this case, if I force an “iso” decoding in the line:
> 
> System.out.println("RESPONSE = " + new String(bos.toString(“ISO-8859-1”));
> 
> Then it works: RESPONSE = ÑÑÑ_ÁÁÁ
> 
>  
> 
> 
> 
> Could be an issue in StringEntity? Or maybe I’m missing anything?
> 
>  
> 
> Thanks in advance,
> 
>  
> 
> Joan.
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


-----
No se encontraron virus en este mensaje.
Comprobado por AVG - www.avg.com
Versión: 2012.0.1901 / Base de datos de virus: 2109/4737 - Fecha de publicación: 01/11/12


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Encoding issue

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Thu, 2012-01-12 at 14:15 +0100, Joan Balaguero wrote:
> Hello Oleg,
> 
>  
> 
> I’m having an strange issue with encoding using HttpClient 4.0.1.
> 
>  
> 
> I’m sending a string to a servlet using HttpClient 4.0.1. The servlet just
> sends back to the client exactly the same received string:
> 
>  
> 
> This is the piece of code that sends the string to the servlet and receives
> the response:
> 
>  
> 
> ( . . . )
> 
>  
> 
> String ENCODING = “ISO-8859-1”;
> 
> String str   = "ÑÑÑ_ÁÁÁ";
> 
>  
> 
> objPost = new HttpPost(url);
> 
>  
> 
> StringEntity strEntity = new StringEntity(str);
> 

This constructor assumes default charset encoding for HTTP content which
is ISO-8859-1.

> strEntity.setContentType("text/plain; charset=" + ENCODING);
> 

This basically causes the content to be decoded incorrectly as long as
ENCODING is not ISO-8859-1.

Oleg

> objPost.setEntity(strEntity);
> 
>                     
> 
> HttpEntity entity = null;
> 
>  
> 
> try 
> 
> { 
> 
>  entity = objHttp.execute(objPost).getEntity(); 
> 
>  
> 
> // Read the response.
> 
> BufferedInputStream bis   = new BufferedInputStream(entity.getContent());
> 
> ByteArrayOutputStream bos = new ByteArrayOutputStream();
> 
> byte[] buffer = new byte[4098];
> 
> int numBytes;
> 
>  
> 
> while ((numBytes = bis.read(buffer)) >= 0) bos.write(buffer, 0, numBytes);
> 
>                      
> 
>  System.out.println("RESPONSE = " + new String(bos.toString(ENCODING));
> 
>  
> 
> ( . . . )
> 
>  
> 
> 
> 
> And this is the piece of code of the servlet that prints the request
> inputstream directly to the response outputstream:
> 
>  
> 
> ( . . . )
> 
>  
> 
>    BufferedOutputStream bos = null;
> 
>    
> 
>    try
> 
>    {
> 
>     response.setContentType(request.getContentType());
> 
>  
> 
>     bos                 = new
> BufferedOutputStream(response.getOutputStream());
> 
>     InputStream in      = request.getInputStream();
> 
>     byte[] tmp   = new byte[4098];
> 
>     int numBytesRead = 0;
> 
>     while ((numBytesRead = in.read(tmp, 0, 4098)) >= 0) bos.write(tmp, 0,
> numBytesRead);
> 
>  
> 
> ( . . . )
> 
>  
> 
> 
> 
> If the ENCODING variable is “ISO-8859-1”, the response is received OK:
> 
> RESPONSE = ÑÑÑ_ÁÁÁ
> 
>  
> 
> But if the ENCODING variable is “UTF-8”, the response is received KO:
> 
> RESPONSE = "???_???"
> 
> In this case, if I force an “iso” decoding in the line:
> 
> System.out.println("RESPONSE = " + new String(bos.toString(“ISO-8859-1”));
> 
> Then it works: RESPONSE = ÑÑÑ_ÁÁÁ
> 
>  
> 
> 
> 
> Could be an issue in StringEntity? Or maybe I’m missing anything?
> 
>  
> 
> Thanks in advance,
> 
>  
> 
> Joan.
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org