You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Alaak <al...@gmx.de> on 2012/09/08 15:22:28 UTC

Query SolrIndex for Id

Hi,

I have a problem with the way Nutch (1.6 trunk) stores the pages id to 
the Solr (3.6.1) index. It uses the pages URL. However due to special 
characters in the URL like ":" or "/" it is impossible to use the URL as 
query parameter to access documents on Solr. Of course it is possible to 
apply URL encoding, but it seems Solr does not invert the encoding on 
Server side and thus is unable to match the queried URL to the URL in 
the index. So does anyone know of a way to retrieve documents from a 
Nutch created Solr Index by id?

Thanks and Regards

Re: Query SolrIndex for Id

Posted by Alaak <al...@gmx.de>.

*headdesk* Well of course. About the whole URL escaping I totally 
forgot about the Lucene escaping. Thanks for the tip. Now it works.

Am Sa 08 Sep 2012 16:36:26 CEST schrieb Markus Jelsma:
> Special characters must be escaped:
> http://lucene.apache.org/core/3_6_1/queryparsersyntax.html#Escaping%20Special%20Characters
>
>
>
> -----Original message-----
>> From:Alaak <al...@gmx.de>
>> Sent: Sat 08-Sep-2012 15:26
>> To: user@nutch.apache.org
>> Subject: Query SolrIndex for Id
>>
>> Hi,
>>
>> I have a problem with the way Nutch (1.6 trunk) stores the pages id to
>> the Solr (3.6.1) index. It uses the pages URL. However due to special
>> characters in the URL like ":" or "/" it is impossible to use the URL as
>> query parameter to access documents on Solr. Of course it is possible to
>> apply URL encoding, but it seems Solr does not invert the encoding on
>> Server side and thus is unable to match the queried URL to the URL in
>> the index. So does anyone know of a way to retrieve documents from a
>> Nutch created Solr Index by id?
>>
>> Thanks and Regards
>>

RE: Query SolrIndex for Id

Posted by Markus Jelsma <ma...@openindex.io>.

Special characters must be escaped:
http://lucene.apache.org/core/3_6_1/queryparsersyntax.html#Escaping%20Special%20Characters

 
 
-----Original message-----
> From:Alaak <al...@gmx.de>
> Sent: Sat 08-Sep-2012 15:26
> To: user@nutch.apache.org
> Subject: Query SolrIndex for Id
> 
> Hi,
> 
> I have a problem with the way Nutch (1.6 trunk) stores the pages id to 
> the Solr (3.6.1) index. It uses the pages URL. However due to special 
> characters in the URL like ":" or "/" it is impossible to use the URL as 
> query parameter to access documents on Solr. Of course it is possible to 
> apply URL encoding, but it seems Solr does not invert the encoding on 
> Server side and thus is unable to match the queried URL to the URL in 
> the index. So does anyone know of a way to retrieve documents from a 
> Nutch created Solr Index by id?
> 
> Thanks and Regards
>

Re: Query SolrIndex for Id

Posted by Lewis John Mcgibbney <le...@gmail.com>.

You can of course change the source and destination field mappings so
that you don't need to query URL id's

This is a workaround though and doesn't fully address the issue of
querying by URL id.

Lewis

On Sat, Sep 8, 2012 at 2:22 PM, Alaak <al...@gmx.de> wrote:
> Hi,
>
> I have a problem with the way Nutch (1.6 trunk) stores the pages id to the
> Solr (3.6.1) index. It uses the pages URL. However due to special characters
> in the URL like ":" or "/" it is impossible to use the URL as query
> parameter to access documents on Solr. Of course it is possible to apply URL
> encoding, but it seems Solr does not invert the encoding on Server side and
> thus is unable to match the queried URL to the URL in the index. So does
> anyone know of a way to retrieve documents from a Nutch created Solr Index
> by id?
>
> Thanks and Regards



-- 
Lewis