You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2006/07/11 07:07:56 UTC

[Solr Wiki] Update of "FederatedSearch" by YonikSeeley

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by YonikSeeley:
http://wiki.apache.org/solr/FederatedSearch

The comment on the change is:
a start on design for federated search

New page:
= Federated Search Design =

Follow the basic Lucene design for MultiSearcher/RemoteSearcher as a template.
  * SolrSearchable as an interface that contains a subset of functionality common to all types of SolrSearchers
  * SolrIndexSearcher would implement SolrSearchable
  * SolrMultiSearcher would implement SolrSearchable via multiple SolrSearchers, implementing the logic of combining search results from multiple subsearchers.  The implementation should be network friendly (no HitCollectors, avoid passing around DocSets/BitSets, etc).
   
Areas that will need change:
 * Solr's caches don't contain enough info to merge search results from subsearchers

Network Transports
 * RMI
 * XML

Misc:
 * optional global idf calculations
 * new style APIs geared toward faceted browsing (avoid instantiating DocSets... pass around symbolic sets)


=== High Availability ===
How can High Availability be obtained on the query side?
 * sub-searchers could be identified by VIPs (top-level-searcher would go through a load-balancer to access sub-searchers).
 * could do it in code via HASolrMultiSearcher that takes a list of sub-servers for each 

=== Master ===
How should the collection be updated?  It would be complex for the client to partition the data themselves, since they would have to ensure that a particular document always went to the same server.  Although user partitioning should be possible, there should be an easier default.

==== Single Master ====
A single master could partition the data into multiple local indicies... subsearchers would only pull the local index they are configured to have.

== Commits ==
How to synchronize commits across subsearchers and top-level-searchers?