You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2013/01/08 18:34:12 UTC

[jira] [Resolved] (SOLR-4283) Improve URL decoding (followup of SOLR-4265)

     [ https://issues.apache.org/jira/browse/SOLR-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved SOLR-4283.
---------------------------------

    Resolution: Fixed

Committed to trunk and 4.x.

A next step would be to make the encoding of the GET-URLs configureable (using the defacto standard "&ie=charset" URL parameter, as used by most REST webservices of major search engines).
                
> Improve URL decoding (followup of SOLR-4265)
> --------------------------------------------
>
>                 Key: SOLR-4283
>                 URL: https://issues.apache.org/jira/browse/SOLR-4283
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.1, 5.0
>
>         Attachments: index.jsp, request.http, SOLR-4283.patch, SOLR-4283.patch, SOLR-4283.patch, SOLR-4283.patch, SOLR-4283.patch
>
>
> Followup of SOLR-4265:
> SOLR-4265 has 2 problems:
> - it reads the whole InputStream into a String and this one can be big. This wastes memory, especially when your query string from the POSted form data is near the 2 Megabyte limit. The String is then packed in splitted form into a big Map.
> - it does not report corrupt UTF-8
> The attached patch will do 2 things:
> - The decoding of the POSTed form data is done on the ServletInputStream, directly parsing the bytes (not chars). Key/Value pairs are extracted and %-decoded to byte[] on the fly. URL-parameters from getQueryString() are parsed with the same code using ByteArrayInputStream on the original String, interpreted as UTF-8 (this is a hack, because Servlet API does not give back the original bytes from the HTTP request). To be standards conform, the query String should be interpreted as US-ASCII, but with this approach, not full escaped UTF-8 from the HTTP request survive.
> - the byte[] key/value pairs are converted to Strings using CharsetDecoder
> This will be memory efficient and will report incorrect escaped form data, so people will no longer complain if searches hit no results or similar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org