You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Stephen Sanchez (JIRA)" <ji...@apache.org> on 2011/07/14 18:59:00 UTC

[jira] [Created] (SLING-2143) SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding

SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding
--------------------------------------------------------------------------------------------------------------

                 Key: SLING-2143
                 URL: https://issues.apache.org/jira/browse/SLING-2143
             Project: Sling
          Issue Type: Bug
          Components: Servlets
    Affects Versions: Servlets Post 2.1.0
         Environment: OS: Windows 2k8, Windows 7
Web Servers: Tomcat 6
Sling 6
            Reporter: Stephen Sanchez


The ImportOperation on the SlingPostServlet (2.1.0 to Trunk) does not support encoding specified in the form _charset_ parameter on a POST request. 

REPRODUCTION STEPS:
Create a POST request using the SlingPostServlet and ImportOperation:

curl -F":operation=import" -F"_charset_=UTF-8" -F":contentType=json" -F":replace=true" -F":replaceProperties=true" -F":content={'latin':'øµå', 'chinese':'玄牛'}" http://admin:admin@localhost:8080

PROPOSED SOLUTION:

In ImportOperation.java, line 137 (as of 2.1.0 tag):

contentStream = new ByteArrayInputStream(content.getBytes());

This line will take the :content parameter on a request, properly encoded (ex. UTF-8) and get the bytes using system-level encoding. This causes all unicode characters in the content to be encoded, on windows, with the wrong encoding. 

The simple fix, which resolves the issue and allows UTF-8 encoding across all operating systems:

contentStream = new ByteArrayInputStream(content.getBytes("UTF-8"));

My proposed fix, is to support the form parameter _charset_, since :content is a parameter on the form in the post request:

RequestParameter encodingParam = request.getRequestParameter("_charset_");
byte[] contentBytes;
if (encodingParam != null && encodingParam.getString() != null) {
      contentBytes = content.getBytes(encodingParam.getString());
} else {
      contentBytes = content.getBytes();
}
contentStream = new ByteArrayInputStream(contentBytes);



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Assigned] (SLING-2143) SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding

Posted by "Eric Norman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SLING-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Norman reassigned SLING-2143:
----------------------------------

    Assignee: Eric Norman

> SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SLING-2143
>                 URL: https://issues.apache.org/jira/browse/SLING-2143
>             Project: Sling
>          Issue Type: Bug
>          Components: Servlets
>    Affects Versions: Servlets Post 2.1.0
>         Environment: OS: Windows 2k8, Windows 7
> Web Servers: Tomcat 6
> Sling 6
>            Reporter: Stephen Sanchez
>            Assignee: Eric Norman
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The ImportOperation on the SlingPostServlet (2.1.0 to Trunk) does not support encoding specified in the form _charset_ parameter on a POST request. 
> REPRODUCTION STEPS:
> Create a POST request using the SlingPostServlet and ImportOperation:
> curl -F":operation=import" -F"_charset_=UTF-8" -F":contentType=json" -F":replace=true" -F":replaceProperties=true" -F":content={'latin':'øµå', 'chinese':'玄牛'}" http://admin:admin@localhost:8080
> PROPOSED SOLUTION:
> In ImportOperation.java, line 137 (as of 2.1.0 tag):
> contentStream = new ByteArrayInputStream(content.getBytes());
> This line will take the :content parameter on a request, properly encoded (ex. UTF-8) and get the bytes using system-level encoding. This causes all unicode characters in the content to be encoded, on windows, with the wrong encoding. 
> The simple fix, which resolves the issue and allows UTF-8 encoding across all operating systems:
> contentStream = new ByteArrayInputStream(content.getBytes("UTF-8"));
> My proposed fix, is to support the form parameter _charset_, since :content is a parameter on the form in the post request:
> RequestParameter encodingParam = request.getRequestParameter("_charset_");
> byte[] contentBytes;
> if (encodingParam != null && encodingParam.getString() != null) {
>       contentBytes = content.getBytes(encodingParam.getString());
> } else {
>       contentBytes = content.getBytes();
> }
> contentStream = new ByteArrayInputStream(contentBytes);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (SLING-2143) SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding

Posted by "Eric Norman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072683#comment-13072683 ] 

Eric Norman commented on SLING-2143:
------------------------------------

Hi Stephen,

I believe that since the operation is now using the RequestParameter object to get the content, the _charset_ parameter handling should already be converting the values to to the requested charset per what is documented in [1].  Do you have a scenario where this does not work?  Can you provide a test case?

1. http://sling.apache.org/site/request-parameters.html


> SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SLING-2143
>                 URL: https://issues.apache.org/jira/browse/SLING-2143
>             Project: Sling
>          Issue Type: Bug
>          Components: Servlets
>    Affects Versions: Servlets Post 2.1.0
>         Environment: OS: Windows 2k8, Windows 7
> Web Servers: Tomcat 6
> Sling 6
>            Reporter: Stephen Sanchez
>            Assignee: Eric Norman
>             Fix For: Servlets Post 2.1.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The ImportOperation on the SlingPostServlet (2.1.0 to Trunk) does not support encoding specified in the form _charset_ parameter on a POST request. 
> REPRODUCTION STEPS:
> Create a POST request using the SlingPostServlet and ImportOperation:
> curl -F":operation=import" -F"_charset_=UTF-8" -F":contentType=json" -F":replace=true" -F":replaceProperties=true" -F":content={'latin':'øµå', 'chinese':'玄牛'}" http://admin:admin@localhost:8080
> PROPOSED SOLUTION:
> In ImportOperation.java, line 137 (as of 2.1.0 tag):
> contentStream = new ByteArrayInputStream(content.getBytes());
> This line will take the :content parameter on a request, properly encoded (ex. UTF-8) and get the bytes using system-level encoding. This causes all unicode characters in the content to be encoded, on windows, with the wrong encoding. 
> The simple fix, which resolves the issue and allows UTF-8 encoding across all operating systems:
> contentStream = new ByteArrayInputStream(content.getBytes("UTF-8"));
> My proposed fix, is to support the form parameter _charset_, since :content is a parameter on the form in the post request:
> RequestParameter encodingParam = request.getRequestParameter("_charset_");
> byte[] contentBytes;
> if (encodingParam != null && encodingParam.getString() != null) {
>       contentBytes = content.getBytes(encodingParam.getString());
> } else {
>       contentBytes = content.getBytes();
> }
> contentStream = new ByteArrayInputStream(contentBytes);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Resolved] (SLING-2143) SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding

Posted by "Eric Norman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SLING-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Norman resolved SLING-2143.
--------------------------------

       Resolution: Fixed
    Fix Version/s: Servlets Post 2.1.2

fixed in r1150269.  Please verify when you have some free time.

> SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SLING-2143
>                 URL: https://issues.apache.org/jira/browse/SLING-2143
>             Project: Sling
>          Issue Type: Bug
>          Components: Servlets
>    Affects Versions: Servlets Post 2.1.0
>         Environment: OS: Windows 2k8, Windows 7
> Web Servers: Tomcat 6
> Sling 6
>            Reporter: Stephen Sanchez
>            Assignee: Eric Norman
>             Fix For: Servlets Post 2.1.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The ImportOperation on the SlingPostServlet (2.1.0 to Trunk) does not support encoding specified in the form _charset_ parameter on a POST request. 
> REPRODUCTION STEPS:
> Create a POST request using the SlingPostServlet and ImportOperation:
> curl -F":operation=import" -F"_charset_=UTF-8" -F":contentType=json" -F":replace=true" -F":replaceProperties=true" -F":content={'latin':'øµå', 'chinese':'玄牛'}" http://admin:admin@localhost:8080
> PROPOSED SOLUTION:
> In ImportOperation.java, line 137 (as of 2.1.0 tag):
> contentStream = new ByteArrayInputStream(content.getBytes());
> This line will take the :content parameter on a request, properly encoded (ex. UTF-8) and get the bytes using system-level encoding. This causes all unicode characters in the content to be encoded, on windows, with the wrong encoding. 
> The simple fix, which resolves the issue and allows UTF-8 encoding across all operating systems:
> contentStream = new ByteArrayInputStream(content.getBytes("UTF-8"));
> My proposed fix, is to support the form parameter _charset_, since :content is a parameter on the form in the post request:
> RequestParameter encodingParam = request.getRequestParameter("_charset_");
> byte[] contentBytes;
> if (encodingParam != null && encodingParam.getString() != null) {
>       contentBytes = content.getBytes(encodingParam.getString());
> } else {
>       contentBytes = content.getBytes();
> }
> contentStream = new ByteArrayInputStream(contentBytes);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (SLING-2143) SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding

Posted by "Stephen Sanchez (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072515#comment-13072515 ] 

Stephen Sanchez commented on SLING-2143:
----------------------------------------

Verified unicode characters are now properly saved through the import operation.

However, I should note that based on the code changes, it appears UTF-8 encoding is forced by building Strings from the properties (Strings being by default encoded to UTF-8), instead of reading the bytes based on the _charset_ parameter. This means there would be no way to encode your POST request in any other way. 

I noticed this was fixed for Servlet Post 2.1.2, any chance of a back-port to 2.1.0, so Sling 6 is supported as well? 

> SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SLING-2143
>                 URL: https://issues.apache.org/jira/browse/SLING-2143
>             Project: Sling
>          Issue Type: Bug
>          Components: Servlets
>    Affects Versions: Servlets Post 2.1.0
>         Environment: OS: Windows 2k8, Windows 7
> Web Servers: Tomcat 6
> Sling 6
>            Reporter: Stephen Sanchez
>            Assignee: Eric Norman
>             Fix For: Servlets Post 2.1.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The ImportOperation on the SlingPostServlet (2.1.0 to Trunk) does not support encoding specified in the form _charset_ parameter on a POST request. 
> REPRODUCTION STEPS:
> Create a POST request using the SlingPostServlet and ImportOperation:
> curl -F":operation=import" -F"_charset_=UTF-8" -F":contentType=json" -F":replace=true" -F":replaceProperties=true" -F":content={'latin':'øµå', 'chinese':'玄牛'}" http://admin:admin@localhost:8080
> PROPOSED SOLUTION:
> In ImportOperation.java, line 137 (as of 2.1.0 tag):
> contentStream = new ByteArrayInputStream(content.getBytes());
> This line will take the :content parameter on a request, properly encoded (ex. UTF-8) and get the bytes using system-level encoding. This causes all unicode characters in the content to be encoded, on windows, with the wrong encoding. 
> The simple fix, which resolves the issue and allows UTF-8 encoding across all operating systems:
> contentStream = new ByteArrayInputStream(content.getBytes("UTF-8"));
> My proposed fix, is to support the form parameter _charset_, since :content is a parameter on the form in the post request:
> RequestParameter encodingParam = request.getRequestParameter("_charset_");
> byte[] contentBytes;
> if (encodingParam != null && encodingParam.getString() != null) {
>       contentBytes = content.getBytes(encodingParam.getString());
> } else {
>       contentBytes = content.getBytes();
> }
> contentStream = new ByteArrayInputStream(contentBytes);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Issue Comment Edited] (SLING-2143) SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding

Posted by "Eric Norman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SLING-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072683#comment-13072683 ] 

Eric Norman edited comment on SLING-2143 at 7/29/11 5:28 AM:
-------------------------------------------------------------

Hi Stephen,

I believe that since the operation is now using the RequestParameter object to get the content, the _charset_ parameter handling should already be converting the values to the requested charset per what is documented in [1].  Do you have a scenario where this does not work?  Can you provide a test case?

1. http://sling.apache.org/site/request-parameters.html


      was (Author: edn):
    Hi Stephen,

I believe that since the operation is now using the RequestParameter object to get the content, the _charset_ parameter handling should already be converting the values to to the requested charset per what is documented in [1].  Do you have a scenario where this does not work?  Can you provide a test case?

1. http://sling.apache.org/site/request-parameters.html

  
> SlingPostServlet ImportOperation :content parameter overrides _charset_ parameter with system default encoding
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SLING-2143
>                 URL: https://issues.apache.org/jira/browse/SLING-2143
>             Project: Sling
>          Issue Type: Bug
>          Components: Servlets
>    Affects Versions: Servlets Post 2.1.0
>         Environment: OS: Windows 2k8, Windows 7
> Web Servers: Tomcat 6
> Sling 6
>            Reporter: Stephen Sanchez
>            Assignee: Eric Norman
>             Fix For: Servlets Post 2.1.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The ImportOperation on the SlingPostServlet (2.1.0 to Trunk) does not support encoding specified in the form _charset_ parameter on a POST request. 
> REPRODUCTION STEPS:
> Create a POST request using the SlingPostServlet and ImportOperation:
> curl -F":operation=import" -F"_charset_=UTF-8" -F":contentType=json" -F":replace=true" -F":replaceProperties=true" -F":content={'latin':'øµå', 'chinese':'玄牛'}" http://admin:admin@localhost:8080
> PROPOSED SOLUTION:
> In ImportOperation.java, line 137 (as of 2.1.0 tag):
> contentStream = new ByteArrayInputStream(content.getBytes());
> This line will take the :content parameter on a request, properly encoded (ex. UTF-8) and get the bytes using system-level encoding. This causes all unicode characters in the content to be encoded, on windows, with the wrong encoding. 
> The simple fix, which resolves the issue and allows UTF-8 encoding across all operating systems:
> contentStream = new ByteArrayInputStream(content.getBytes("UTF-8"));
> My proposed fix, is to support the form parameter _charset_, since :content is a parameter on the form in the post request:
> RequestParameter encodingParam = request.getRequestParameter("_charset_");
> byte[] contentBytes;
> if (encodingParam != null && encodingParam.getString() != null) {
>       contentBytes = content.getBytes(encodingParam.getString());
> } else {
>       contentBytes = content.getBytes();
> }
> contentStream = new ByteArrayInputStream(contentBytes);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira