You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2008/09/03 16:53:36 UTC

[Solr Wiki] Update of "DistributedSearchDesign" by ShalinMangar

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by ShalinMangar:
http://wiki.apache.org/solr/DistributedSearchDesign

The comment on the change is:
Added notes on the current DistributedSearch approach

------------------------------------------------------------------------------
  Distributing the indexing will be up to users via a Multiple Master approach.
  In the future, we may want to migrate to "Consistency via Specifying Index Version" and lucene internal docids.
  
+ The query is executed in phases. In each phase a request is sent to relevant shards in a separate thread. After all the responses are received for all requests the next phase is executed.
+ 
+ ==== Phase 1: GET_TOP_IDS [& GET_FACETS] ====
+ Each shard is requested for the top matching document's unique keys and sort fields with facets for the given query. The number of keys requested in this phase is 'N' (start=0&rows=N) regardless of the start specified, so that the results can be correctly merged together.
+ 
+ The response gets the unique keys for each document and their scores. If GET_FACETS is requested it returns the top 'N' facets. n=facet.count. After the responses are obtained they are merged and sorted by the rank. From the sorted list the documents to be returned are identified on the basis of 'start' and 'rows' parameter.
+ 
+ ==== Phase 2 ====
+ Request are sent to fetch fields, highlighting and MoreLikeThis information only for the documents identified in Phase 1. The request contains the document unique keys and is sent to only the relevant shard which has the document.
+ 
+ ==== Phase 3: REFINE_FACETS (only for faceted search) ====
+ The original returned facets may have insufficient information. So more requests are sent to shards for refining facets. Note that the approach applied here gives accurate counts but theoretically, it is possible to miss some facet terms.
+ 
+ After the document fields and facets are obtained the response is constructed and sent back to client.
+ 
+ It is possible that during the small window of time (from phase 1-3) the index may change. In that case the responses may have incorrect data. That is ignored for the time-being.
+