You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Ryan <mr...@moreover.com> on 2011/06/28 17:30:47 UTC

Using FieldCache in SolrIndexSearcher - crazy idea?

I am a user of Solr 3.2 and I make use of the distributed search capabilities of Solr using a fairly simple architecture of a coordinator + some shards.

Correct me if I am wrong:  In a standard distributed search with QueryComponent, the first query sent to the shards asks for fl=myUniqueKey or fl=myUniqueKey,score.  When the response is being generated to send back to the coordinator, SolrIndexSearcher.doc (int i, Set<String> fields) is called for each document.  As I understand it, this will read each document from the index _on disk_ and retrieve the myUniqueKey field value for each document.

My idea is to have a FieldCache for the myUniqueKey field in SolrIndexSearcher (or somewhere else?) that would be used in cases where the only field that needs to be retrieved is myUniqueKey.  Is this something that would improve performance?

In our actual setup, we are using an extended version of QueryComponent that queries for a couple other fields besides myUniqueKey in the initial query to the shards, and it asks a lot of rows when doing so, many more than what the user ends up getting back when they see the results.  (The reasons for this are complicated and aren't related much to this question.)  We already maintain FieldCaches for the fields that we are asking for, but for other purposes.  Would it make sense to utilize these FieldCaches in SolrIndexSearcher?  Is this something that anyone else has done before?

-Michael

Re: Using FieldCache in SolrIndexSearcher - crazy idea?

Posted by Ryan McKinley <ry...@gmail.com>.
>
> Ah, thanks Hoss - I had meant to respond to the original email, but
> then I lost track of it.
>
> Via pseudo-fields, we actually already have the ability to retrieve
> values via FieldCache.
> fl=id:{!func}id
>
> But using CSF would probably be better here - no memory overhead for
> the FieldCache entry.
>

Not sure if this is related, but we should also consider using the
memory codec for id field
https://issues.apache.org/jira/browse/LUCENE-3209

Re: Using FieldCache in SolrIndexSearcher - crazy idea?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Jul 19, 2011 at 3:20 PM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : > Quite probably ... you typically can't assume that a FieldCache can be
> : > constructed for *any* field, but it should be a safe assumption for the
> : > uniqueKey field, so for that initial request of the mutiphase distributed
> : > search it's quite possible it would speed things up.
> :
> : Ah, thanks Hoss - I had meant to respond to the original email, but
> : then I lost track of it.
> :
> : Via pseudo-fields, we actually already have the ability to retrieve
> : values via FieldCache.
> : fl=id:{!func}id
>
> isn't that kind of orthoginal to the question though? ... a user can use
> the new psuedo-field functionality to request values from the FieldCache
> instead of stored fields, but specificly in the case of distributed
> search, when the first request is only asking for the uniqueKey values and
> scores, shouldn't that use the FieldCache to get those values?  (w/o the
> user needing to jumpt thorugh hoops in how the request is made/configured)

Well, I was pointing out that distributed search could be easily
modified to use the field-cache
by changing id to id:{!func}id

But I'm not sure we should do that by default - the memory of a full
fieldCache entry is non-trivial for some people.
Using a CSF id field would be better I think (the type were it doesn't
populate a fieldcache entry).

-Yonik
http://www.lucidimagination.com

Re: Using FieldCache in SolrIndexSearcher - crazy idea?

Posted by Chris Hostetter <ho...@fucit.org>.
: > Quite probably ... you typically can't assume that a FieldCache can be
: > constructed for *any* field, but it should be a safe assumption for the
: > uniqueKey field, so for that initial request of the mutiphase distributed
: > search it's quite possible it would speed things up.
: 
: Ah, thanks Hoss - I had meant to respond to the original email, but
: then I lost track of it.
: 
: Via pseudo-fields, we actually already have the ability to retrieve
: values via FieldCache.
: fl=id:{!func}id

isn't that kind of orthoginal to the question though? ... a user can use 
the new psuedo-field functionality to request values from the FieldCache 
instead of stored fields, but specificly in the case of distributed 
search, when the first request is only asking for the uniqueKey values and 
scores, shouldn't that use the FieldCache to get those values?  (w/o the 
user needing to jumpt thorugh hoops in how the request is made/configured)


-Hoss

Re: Using FieldCache in SolrIndexSearcher - crazy idea?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Jul 5, 2011 at 5:13 PM, Chris Hostetter
<ho...@fucit.org> wrote:
> : Correct me if I am wrong:  In a standard distributed search with
> : QueryComponent, the first query sent to the shards asks for
> : fl=myUniqueKey or fl=myUniqueKey,score.  When the response is being
> : generated to send back to the coordinator, SolrIndexSearcher.doc (int i,
> : Set<String> fields) is called for each document.  As I understand it,
> : this will read each document from the index _on disk_ and retrieve the
> : myUniqueKey field value for each document.
> :
> : My idea is to have a FieldCache for the myUniqueKey field in
> : SolrIndexSearcher (or somewhere else?) that would be used in cases where
> : the only field that needs to be retrieved is myUniqueKey.  Is this
> : something that would improve performance?
>
> Quite probably ... you typically can't assume that a FieldCache can be
> constructed for *any* field, but it should be a safe assumption for the
> uniqueKey field, so for that initial request of the mutiphase distributed
> search it's quite possible it would speed things up.

Ah, thanks Hoss - I had meant to respond to the original email, but
then I lost track of it.

Via pseudo-fields, we actually already have the ability to retrieve
values via FieldCache.
fl=id:{!func}id

But using CSF would probably be better here - no memory overhead for
the FieldCache entry.

-Yonik
http://www.lucidimagination.com



> if you want to try this and report back results, i'm sure a lot of people
> would be interested in a patch ... i would guess the best place to make
> the chance would be in the QueryComponent so thta it used the FieldCache
> (probably best to do it via getValueSource() on the uniqueKey's
> SchemaField) to put the ids in teh response instead of using a
> SolrDocList.
>
> Hmm, actually...
>
> there's no reason why this kind of optimization would need to be specific
> to distributed queries, it could be done by the ResponseWriters directly
> -- if the field list they are being asked to return only contains the
> uniqueKeyField and computed values (like score) then don't bother calling
> SolrIndexSearcher.doc at all ... the only hitch is that with distributed
> search and using function values as psuedo fields and what not there are
> more places calling SolrIndexSearcher.doc then their use to be ... so
> maybe putting this change directly into SolrIndexSearcher.doc would make
> the most sense?
>
>
>
> -Hoss
>

Re: Using FieldCache in SolrIndexSearcher - crazy idea?

Posted by Chris Hostetter <ho...@fucit.org>.
: Correct me if I am wrong:  In a standard distributed search with 
: QueryComponent, the first query sent to the shards asks for 
: fl=myUniqueKey or fl=myUniqueKey,score.  When the response is being 
: generated to send back to the coordinator, SolrIndexSearcher.doc (int i, 
: Set<String> fields) is called for each document.  As I understand it, 
: this will read each document from the index _on disk_ and retrieve the 
: myUniqueKey field value for each document.
: 
: My idea is to have a FieldCache for the myUniqueKey field in 
: SolrIndexSearcher (or somewhere else?) that would be used in cases where 
: the only field that needs to be retrieved is myUniqueKey.  Is this 
: something that would improve performance?

Quite probably ... you typically can't assume that a FieldCache can be 
constructed for *any* field, but it should be a safe assumption for the 
uniqueKey field, so for that initial request of the mutiphase distributed 
search it's quite possible it would speed things up.

if you want to try this and report back results, i'm sure a lot of people 
would be interested in a patch ... i would guess the best place to make 
the chance would be in the QueryComponent so thta it used the FieldCache 
(probably best to do it via getValueSource() on the uniqueKey's 
SchemaField) to put the ids in teh response instead of using a 
SolrDocList.

Hmm, actually...

there's no reason why this kind of optimization would need to be specific 
to distributed queries, it could be done by the ResponseWriters directly 
-- if the field list they are being asked to return only contains the 
uniqueKeyField and computed values (like score) then don't bother calling 
SolrIndexSearcher.doc at all ... the only hitch is that with distributed 
search and using function values as psuedo fields and what not there are 
more places calling SolrIndexSearcher.doc then their use to be ... so 
maybe putting this change directly into SolrIndexSearcher.doc would make 
the most sense?



-Hoss