You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/12/03 21:59:47 UTC
[Solr Wiki] Update of "SolrCloud" by YonikSeeley
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "SolrCloud" page has been changed by YonikSeeley.
The comment on this change is: a start on some high level requirements / design.
http://wiki.apache.org/solr/SolrCloud
--------------------------------------------------
New page:
=== High level design goals ===
These are long term goals for SolrCloud. Many of these features will not be developed in the first versions, but we're designing for the long haul.
===== High Availability and Fault Tolerance =====
No external load balancer should be required. We should eliminate any single points of failure (i.e. start with a design that will allow us to add this feature at a later point with minimal changes to things like the zookeeper schema)
===== Cluster resizing and rebalancing =====
To grow the cluster or to rebalance due to hotspots, shards should be resizable. Pieces of existing shards should be able to be split off and assigned to new servers.
===== Clients should not have to know about cluster layout. =====
A simple client (like a browser) should be able to hit any solr search server in the cluster with a request, and that search server should be able to execute a distributed search against the cluster as a whole, including load balancing and failover, to return the results to the client.
===== Open API =====
The zookeeper schema should be well defined and public, allowing other software components other than the master node to inspect and change the cluster via zookeeper. A task like creating a new collection eventually be achievable by creating the correct znodes/files in zookeeper, w/o having to talk to any solr servers.
===== Support various levels of custom clusters =====
Support various splits between how much the user manages and how much solr manages. One could have a set of servers in zookeeper with defined indexes (shards or complete copies) and want to just use the client search capabilities. One may also want a traditional master/slave relationship, even if more advanced options are available.
===== Support user specified partitioning =====
Partitioning of documents by geographic region, time, user, etc, brings huge performance benefits by allowing only a part of the cluster to be queried. This should be an option even in conjunction with the most advanced modes of operation (automatic document->shard assignment, index rebalancing, no single points of failure, etc) presumably by allowing user-specified hashcodes for documents.
=== Entities we want to model or record in zookeeper: ===
'''host'''
* The actual physical (or virtual, as it may be) machine
* operating system, RAM, disk
* some sort of metric for what level of load should be placed on this machine
'''node'''
* a single JVM running one or more solr cores
'''collection'''
* A collection of documents sharing the same schema
'''shard'''
* A piece of a collection
'''core'''
* A solr core (is there a better name we can use for this?)
* If a core and a shard have a one-to-one mapping, they could be redundant.
'''role'''
* is a core a master, a search node, a spell-checker node, etc
* we probably want a generic way to map from role to different configs or config overrides
'''network topology'''
* switch, rack, data center, etc
Resources
http://sourceforge.net/mailarchive/forum.php?forum_name=bailey-developers