You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "saksham (Jira)" <ji...@apache.org> on 2020/05/17 07:50:00 UTC

[jira] [Comment Edited] (CAMEL-14959) Inconsistent behavior of default charset in StringEntity and IOHelper.getCharsetFromContentType

    [ https://issues.apache.org/jira/browse/CAMEL-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17109378#comment-17109378 ] 

saksham edited comment on CAMEL-14959 at 5/17/20, 7:49 AM:
-----------------------------------------------------------

Hi [~davsclaus]

Please find the attached projects for reproducing the issue. And below is how to use them:

1. Echo server (echoServer.zip): contains a nodejs based web server. It only has one end point "/echo".
     This endpoint returns the body it receives. It sets the response header "content-type" as "text/plain". No charset
      Note, this server starts on default http port : 80.

2. apache-camel-consume.zip : contains a maven project for invoking the echo endpoint of echoserver. 
     It sends a text/plain content-type with body: *HÃLLO*. It does not set any charset in the content-type. 
     But the body it receives is: *H?LLO*.

Root cause:
 If you check the org.apache.http.entity.StringEntity, it uses the default charset, if none is provided. And the default is ISO-8859-1.
 But the while handling the response , the default charset that is used in IOHelper.getCharsetFromContentType is UTF-8.

Thus, there is this difference in the message sent and received.

we expect the ISO-8859-1 to be used whenever the charset is not present as it is the default charset as per HTTP 1.1 standards.

Also note, I tried to send the charset as utf-8 in the request, then the sent and received body is same as expected. 
 But we want the default charset to be uniform everywhere.

 

Behavior is the same with latest 2.24.x release.

 


was (Author: saksham.verma):
Hi [~davsclaus]

Please find the attached projects for reproducing the issue. And below is how to use them:

1. Echo server (echoServer.zip): contains a nodejs based web server. It only has one end point "/echo".
     This endpoint returns the body it receives. It sets the response header "content-type" as "text/plain". No charset
      Note, this server starts on default http port : 80.

2. apache-camel-consume.zip : contains a maven project for invoking the echo endpoint of echoserver. 
     It sends a text/plain content-type with body: *HÃLLO*. It does not set any charset in the content-type. 
     But the body it receives is: *H?LLO*.

Root cause:
 If you check the org.apache.http.entity.StringEntity, it uses the default charset, if none is provided. And the default is ISO-8859-1.
 But the while handling the response , the default charset that is used in IOHelper.getCharsetFromContentType is UTF-8.

Thus, there is this difference in the message sent and received.

we expect the ISO-8859-1 to be used whenever the charset is not present as it is the default charset as per HTTP 1.1 standards.

Also note, I tried to send the charset as utf-8 in the request, then the sent and received body is same as expected. 

Behavior is the same with latest 2.24.x release.
 But we want the default charset to be uniform everywhere.

 

> Inconsistent behavior of default charset in StringEntity and IOHelper.getCharsetFromContentType
> -----------------------------------------------------------------------------------------------
>
>                 Key: CAMEL-14959
>                 URL: https://issues.apache.org/jira/browse/CAMEL-14959
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-http, camel-http4
>    Affects Versions: 2.23.1
>            Reporter: saksham
>            Priority: Major
>         Attachments: apache-camel-consume.zip, echoServer.zip
>
>
> In our product, we invoke the Odata endpoint of a different service, it uses an CSRF nonce. 
> So in one post requset, two thing happen, Fetch the CSRF nonce and Post the actual request. 
> Here on processing the response of Fetch, it sets the charset from IOHelper.getCharsetFromContentType. If there is no charset in the response header Content-Type, then UTF-8 is set to default. 
>  Thus this charset gets put on the exchange object. and used further in the post request. 
> But when we are creating a request entity. in HttpProducer.createRequestEntity
>  The default charset it uses is ISO-8859-1 which is correct as per HTTP 1.1 standard. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)