You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2013/01/05 04:30:14 UTC

[jira] [Commented] (SOLR-4265) Fix decoding of GET/POST parameters for servlet containers with non-UTF-8 URL parsing (Tomcat)

    [ https://issues.apache.org/jira/browse/SOLR-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544560#comment-13544560 ] 

Yonik Seeley commented on SOLR-4265:
------------------------------------

I did some manual testing, and one difference I notice is that on IE10 (Windows 8), pasting this into the address bar
  http://rogue.local:8983/solr/query?q=héllo
Results in 
{code}
HTTP ERROR 500

Problem accessing /solr/query. Reason: 
    {msg=Not valid UTF8! byte 6c in state 3,trace=org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte 6c in state 3
	at org.eclipse.jetty.util.Utf8Appendable.appendByte(Utf8Appendable.java:174)
	at org.eclipse.jetty.util.Utf8Appendable.append(Utf8Appendable.java:113)
	at org.eclipse.jetty.http.HttpURI.toUtf8String(HttpURI.java:503)
	at org.eclipse.jetty.http.HttpURI.getQuery(HttpURI.java:672)
	at org.eclipse.jetty.server.Request.getQueryString(Request.java:835)
	at org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:395)
	at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
{code}

So it looks like IE10 doesn't percent encode the international character, but it wouldn't matter even if they did because it would be percent encoded latin-1 instead of UTF-8.  It would probably work with Tomcat however (with or without this current patch).

The old behavior did not result in an HTTP error, but I actually think this new behavior is preferable!
Before this patch, the request was just incorrect and did not match the users intentions.  At least now it will fail more quickly.

                
> Fix decoding of GET/POST parameters for servlet containers with non-UTF-8 URL parsing (Tomcat)
> ----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4265
>                 URL: https://issues.apache.org/jira/browse/SOLR-4265
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 4.0
>         Environment: Windows but, environment independent
>            Reporter: Alex Rocher
>            Assignee: Uwe Schindler
>         Attachments: SOLR-4265.patch, SOLR-4265.patch, SolrDispatchFilter.java.patch
>
>
> When you type an accent (in french language for example) in the console query tester, there's no charset conversion (servlet request charset conversion)
> Eg.: "même" is converted into it's ISO-8859-1 representation ==> fail
> The reason : getCharacterEncoding from HTTPRequest is not tested. Il it's null, il will assume to convert an UTF-8 encoding charset.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org