You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Eric Pugh <ep...@opensourceconnections.com> on 2007/07/09 23:04:57 UTC

URL Encoding/Decoding

Hi all,

My patch for adding rich unstructured content (https:// 
issues.apache.org/jira/browse/SOLR-284) has a problem when some of  
the extra field data passed in via the get request have spaces etc..   
The content comes through URL encoded.

Should the SolrParams object handle decoding of parameters, or should  
that be the domain of my RichDocumentRequestHandler since only some  
parameters will have URL encoding.

Cheers,

Eric Pugh

-------------------------------------------------------
Principal
OpenSource Connections
Site: http://www.opensourceconnections.com
Blog: http://blog.opensourceconnections.com
Cell: 1-434-466-1467





Re: URL Encoding/Decoding

Posted by Eric Pugh <ep...@opensourceconnections.com>.
Thanks...  I am backing out my code!

On Jul 10, 2007, at 12:45 AM, Chris Hostetter wrote:

>
> the URL encoding/decoding in Solr only happens when dealing with HTTP
> based requests.  When writing unit test that deal with the  
> SolrTestHarness
> (and LocalSOlrQueryRequest which is what the loadLocal() and req()  
> methods
> do under the covers) you shouldn't be doing any URL escaping  
> because no
> URLs are involved.
>
> : new code that showed they were being encoded....  But I think it may
> : have been because the unit test don't operate through a regular HTTP
> : layer?
>
> bingo.
>
>
>
> -Hoss
>

-------------------------------------------------------
Principal
OpenSource Connections
Site: http://www.opensourceconnections.com
Blog: http://blog.opensourceconnections.com
Cell: 1-434-466-1467





Re: URL Encoding/Decoding

Posted by Chris Hostetter <ho...@fucit.org>.
the URL encoding/decoding in Solr only happens when dealing with HTTP
based requests.  When writing unit test that deal with the SolrTestHarness
(and LocalSOlrQueryRequest which is what the loadLocal() and req() methods
do under the covers) you shouldn't be doing any URL escaping because no
URLs are involved.

: new code that showed they were being encoded....  But I think it may
: have been because the unit test don't operate through a regular HTTP
: layer?

bingo.



-Hoss


Re: URL Encoding/Decoding

Posted by Eric Pugh <ep...@opensourceconnections.com>.
It might have been...   I wrote some code to decode them, and then I  
was told that it worked okay.  However, i wrote a unit test for my  
new code that showed they were being encoded....  But I think it may  
have been because the unit test don't operate through a regular HTTP  
layer?


This test (similar to what is in the CSVLoader test!)

   public void testPDFLoadWithExtraFieldsThatAreURLEncoded() throws  
Exception {
	    makeFile("I love PDF documents.");
		loadLocal 
("stream.type","pdf","stream.file",filename,"stream.fieldname","text","i 
d","100","fieldnames","name,subject","name","My%20Name%20is% 
20Johnny", "subject","A%20test%20document");
		assertU(commit());

		assertQ(req("text:Love"),"//*[@numFound='1']");
	    assertQ(req("text:Hate"),"//*[@numFound='0']");
	
	    assertQ(req("name:My%20Name%20is%20Johnny"),"//*[@numFound='0']");
	    assertQ(req("subject:A%20test%20document"),"//*[@numFound='0']");
	
	    assertQ(req("name:My Name is Johnny"),"//*[@numFound='1']");
	    assertQ(req("subject:A test document"),"//*[@numFound='1']");
	
	
	  }

was failing into I added an explicit decode....   I think I retract  
my initial email!!

Eric




On Jul 9, 2007, at 5:24 PM, Yonik Seeley wrote:

> On 7/9/07, Eric Pugh <ep...@opensourceconnections.com> wrote:
>> My patch for adding rich unstructured content (https://
>> issues.apache.org/jira/browse/SOLR-284) has a problem when some of
>> the extra field data passed in via the get request have spaces etc..
>> The content comes through URL encoded.
>>
>> Should the SolrParams object handle decoding of parameters, or should
>> that be the domain of my RichDocumentRequestHandler since only some
>> parameters will have URL encoding.
>
> Anhy URL encoding should already be automatically decoded by the time
> the handler gets any data via SolrParams. Or was it double-encoded
> perhaps?
>
> -Yonik

-------------------------------------------------------
Principal
OpenSource Connections
Site: http://www.opensourceconnections.com
Blog: http://blog.opensourceconnections.com
Cell: 1-434-466-1467





Re: URL Encoding/Decoding

Posted by Yonik Seeley <yo...@apache.org>.
On 7/9/07, Eric Pugh <ep...@opensourceconnections.com> wrote:
> My patch for adding rich unstructured content (https://
> issues.apache.org/jira/browse/SOLR-284) has a problem when some of
> the extra field data passed in via the get request have spaces etc..
> The content comes through URL encoded.
>
> Should the SolrParams object handle decoding of parameters, or should
> that be the domain of my RichDocumentRequestHandler since only some
> parameters will have URL encoding.

Anhy URL encoding should already be automatically decoded by the time
the handler gets any data via SolrParams. Or was it double-encoded
perhaps?

-Yonik