You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jeff Wartes <jw...@whitepages.com> on 2010/10/21 19:32:05 UTC

DistributedSearchDesign and multiple requests

I'm using Solr 1.4. My observations and this page http://wiki.apache.org/solr/DistributedSearchDesign#line-254 indicate that the general strategy for Distributed Search is something like:
	1. Query the shards with the user's query and "fl=unique_field,score"
	2. Re-query (maybe a subset of) the shards for certain documents by unique_field with the field list the user requested.
	3. Maybe re-query the shards again to flesh out faceting info.

I'm encountering a significant performance penalty using DistributedSearch due to these additional queries, and it seems like there are some obvious optimizations that could avoid them in certain cases. 

For example, a way to say "I claim the fields I'm requesting are small enough that querying again for stored fields is worse than just getting the stored fields in the first request". (assert_tiny_data=true&fl=tiny_stored_field,unique_field) 
Or, "If the field list of the original query is contained in the first round of shard requests, don't bother querying again for more fields". (fl=unique_field,score)

Has anyone else looked into this? I'd be interested to learn if there are issues that makes these kind of shortcuts difficult before I dig in.

Thanks,
  -Jeff Wartes