You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2013/01/04 21:20:12 UTC

[jira] [Comment Edited] (SOLR-4265) Encoding problem from test console

    [ https://issues.apache.org/jira/browse/SOLR-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544217#comment-13544217 ] 

Uwe Schindler edited comment on SOLR-4265 at 1/4/13 8:18 PM:
-------------------------------------------------------------

Alex: Solr expects all URL parameters encoded as UTF-8 - PERIOD. The problem we are discussing about here is that some servlet containers use ISO-8859-1 to decode the parameters, so although you pass UTF-8-URL-encoded values (e.g. your example would be "q=m%C3%AAme") the servlet container may not use UTF-8 to decode the %-encoded parts. This causes the issue you have seen. And this is currently a configuration issue (in Tomcat you have to change connector), in Jetty you have to set the body encoding (sorry, Dawid, in Jetty this works definitely).

The HTTP protocol by itsself has nothing to do with this. The whole issue is about the request URI and the decoding of the URL parameters (URLDecorder java class).

My proposal to fix this in a portable way (like we did with the InputStreams/OutputStreams instead of using Readers/Writers to prevent the buggy Jetty Readers/Writers)): For POST requests, let us set the body encoding (as demonstrated in the patch) to UTF-8. And for the GET-parameters lets decode them manually. Its just a series of String.split() and URLDecoder.decode(..., "UTF-8")
                
      was (Author: thetaphi):
    Alex: Solr expects all URL parameters encoded as UTF-8 - PERIOD. The problem we are discussing about here is that some servlet containers use ISO-8859-1 to decode the parameters, so although you pass UTF-8-URL-encoded values (e.g. your example would be "q=m%C3%AAme") the servlet container may not use UTF-8 to decode the %-encoded parts. This causes the issue you have seen. And this is currently a configuration issue (in Tomcat you have to change connector), in Jetty you have to set the body encoding (sorry,

The HTTP protocol by itsself has nothing to do with this. The whole issue is about the request URI and the decoding of the URL parameters (URLDecorder java class).

My proposal to fix this in a portable way (like we did with the InputStreams/OutputStreams instead of using Readers/Writers to prevent the buggy Jetty Readers/Writers)): For POST requests, let us set the body encoding (as demonstrated in the patch) to UTF-8. And for the GET-parameters lets decode them manually. Its just a series of String.split() and URLDecoder.decode(..., "UTF-8")
                  
> Encoding problem from test console
> ----------------------------------
>
>                 Key: SOLR-4265
>                 URL: https://issues.apache.org/jira/browse/SOLR-4265
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 4.0
>         Environment: Windows but, environment independent
>            Reporter: Alex Rocher
>            Priority: Blocker
>         Attachments: SolrDispatchFilter.java.patch
>
>
> When you type an accent (in french language for example) in the console query tester, there's no charset conversion (servlet request charset conversion)
> Eg.: "même" is converted into it's ISO-8859-1 representation ==> fail
> The reason : getCharacterEncoding from HTTPRequest is not tested. Il it's null, il will assume to convert an UTF-8 encoding charset.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org