You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/12/03 21:59:47 UTC

[Solr Wiki] Update of "SolrCloud" by YonikSeeley

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCloud" page has been changed by YonikSeeley.
The comment on this change is: a start on some high level requirements / design.
http://wiki.apache.org/solr/SolrCloud

--------------------------------------------------

New page:
=== High level design goals ===
These are long term goals for SolrCloud.  Many of these features will not be developed in the first versions, but we're designing for the long haul.

===== High Availability and Fault Tolerance =====
No external load balancer should be required.  We should eliminate any single points of failure (i.e. start with a design that will allow us to add this feature at a later point with minimal changes to things like the zookeeper schema)

===== Cluster resizing and rebalancing =====
To grow the cluster or to rebalance due to hotspots, shards should be resizable.  Pieces of existing shards should be able to be split off and assigned to new servers.

===== Clients should not have to know about cluster layout. =====
A simple client (like a browser) should be able to hit any solr search server in the cluster with a request, and that search server should be able to execute a distributed search against the cluster as a whole, including load balancing and failover, to return the results to the client.

===== Open API =====
The zookeeper schema should be well defined and public, allowing other software components other than the master node to inspect and change the cluster via zookeeper.  A task like creating a new collection eventually be achievable by creating the correct znodes/files in zookeeper, w/o having to talk to any solr servers.

===== Support various levels of custom clusters =====
Support various splits between how much the user manages and how much solr manages.  One could have a set of servers in zookeeper with defined indexes (shards or complete copies) and want to just use the client search capabilities.  One may also want a traditional master/slave relationship, even if more advanced options are available.

===== Support user specified partitioning =====
Partitioning of documents by geographic region, time, user, etc, brings huge performance benefits by allowing only a part of the cluster to be queried.  This should be an option even in conjunction with the most advanced modes of operation (automatic document->shard assignment, index rebalancing, no single points of failure, etc) presumably by allowing user-specified hashcodes for documents.

=== Entities we want to model or record in zookeeper: ===
'''host'''

 * The actual physical (or virtual, as it may be) machine
 * operating system, RAM, disk
 * some sort of metric for what level of load should be placed on this machine

'''node'''

 * a single JVM running one or more solr cores

'''collection'''

 * A collection of documents sharing the same schema

'''shard'''

 * A piece of a collection

'''core'''

 * A solr core (is there a better name we can use for this?)
 * If a core and a shard have a one-to-one mapping, they could be redundant.

'''role'''

 * is a core a master, a search node, a spell-checker node, etc
 * we probably want a generic way to map from role to different configs or config overrides

'''network topology'''

 * switch, rack, data center, etc

Resources

http://sourceforge.net/mailarchive/forum.php?forum_name=bailey-developers