You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2012/11/30 14:42:01 UTC

[Solr Wiki] Update of "SolrCloud" by Per Steffensen

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCloud" page has been changed by Per Steffensen:
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=79&rev2=80

Comment:
a little more details about create operation of Collections API - and preparing descriptions covering SOLR-4114 and SOLR-4120

   1. If you do colocate ZooKeeper with Solr, using separate disk drives for Solr and ZooKeeper will help with performance.
  
  == Managing collections via the Collections API ==
- The collections API let's you manage collections. Under the hood, it generally uses the CoreAdmin API to manage SolrCores on each server - it's essentially sugar for actions that you could handle yourself if you made individual CoreAdmin API calls to each server you wanted an action to take place on.
+ The collections API let's you manage collections. Under the hood, it generally uses the CoreAdmin API to asynchronously (though Overseer) manage SolrCores on each server - it's essentially sugar for actions that you could handle yourself if you made individual CoreAdmin API calls to each server you wanted an action to take place on.
  
  Create http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4
+ 
+ About the params
+  * '''name''': The name of the collection to be created
+  * '''numShards''': The number of slices (sometimes called shards) to be created as part of the collection
+  * '''replicationFactor''': The number of "additional" shards (sometimes called replica) to be created for each slice. Set it to 0 to have "one shard for each of your slices". Set to 1 to have "two shards for each of your slices" etc. With a value of 0 your data will not be replicated
+  * '''maxShardsPerNode''' (not in 4.0.0 and not even committet yet - see SOLR-4114): A create operation will spread numShards*(replicationFactor+1) shards across your live Solr nodes - fairly distributed, and never two shards of the same slice on the same Solr node. If a Solr is not live at the point in time where the create operation is carried out, it will not get any shards of the new collection. To prevent too many shards being created on a single Solr node, use maxShardsPerNode to set a limit for how many shards the create operation is allowed to create on each node - default is 1. If it cannot fit the entire collection (numShards*(replicationFactor+1) shards) on you live Solrs it will not create anything at all. Unfortunately, since the create operation is carried out asynchronously, you will not get any feedback about a decission to not create the collection.
+  * '''createNodeSet''' (not in 4.0.0 and not even committet yet - see SOLR-4120): If not provided the create operation will create shards spread across all of your live Solr nodes. You can provide the "createNodeSet" parameter to change the set of nodes to spread the shards across. The format of values for this param is "<node-name1>,<node-name2>,...,<node-nameN>" - e.g. "localhost:8983_solr,localhost:8984_solr,localhost:8985_solr"
  
  Note: replicationFactor defines the maximum number of replicas created in addition to the leader from amongst the nodes currently running (i.e. nodes added later will not be used for this collection). Imagine you have a cluster with 20 nodes and want to add an additional smaller collection to your installation with 2 shards, each shard with a leader and two replicas. You would specify a replicationFactor=2. Now six of your nodes will host this new collection and the other 14 will not host the new collection.