You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2006/07/11 07:07:56 UTC
[Solr Wiki] Update of "FederatedSearch" by YonikSeeley
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by YonikSeeley:
http://wiki.apache.org/solr/FederatedSearch
The comment on the change is:
a start on design for federated search
New page:
= Federated Search Design =
Follow the basic Lucene design for MultiSearcher/RemoteSearcher as a template.
* SolrSearchable as an interface that contains a subset of functionality common to all types of SolrSearchers
* SolrIndexSearcher would implement SolrSearchable
* SolrMultiSearcher would implement SolrSearchable via multiple SolrSearchers, implementing the logic of combining search results from multiple subsearchers. The implementation should be network friendly (no HitCollectors, avoid passing around DocSets/BitSets, etc).
Areas that will need change:
* Solr's caches don't contain enough info to merge search results from subsearchers
Network Transports
* RMI
* XML
Misc:
* optional global idf calculations
* new style APIs geared toward faceted browsing (avoid instantiating DocSets... pass around symbolic sets)
=== High Availability ===
How can High Availability be obtained on the query side?
* sub-searchers could be identified by VIPs (top-level-searcher would go through a load-balancer to access sub-searchers).
* could do it in code via HASolrMultiSearcher that takes a list of sub-servers for each
=== Master ===
How should the collection be updated? It would be complex for the client to partition the data themselves, since they would have to ensure that a particular document always went to the same server. Although user partitioning should be possible, there should be an easier default.
==== Single Master ====
A single master could partition the data into multiple local indicies... subsearchers would only pull the local index they are configured to have.
== Commits ==
How to synchronize commits across subsearchers and top-level-searchers?