You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2013/04/15 09:09:01 UTC

[Solr Wiki] Trivial Update of "SolrCloud" by TimVaillancourt

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCloud" page has been changed by TimVaillancourt:
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=98&rev2=99

Comment:
Re-ordered my addition + reworded configName to be consistent.

  
  {{{
  rm -r example/solr/collection1/data/*
+ 
  cp -r example example2
+ 
  }}}
  This command starts up a Solr server and bootstraps a new solr cluster.
  
  {{{
  cd example
+ 
  java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
+ 
  }}}
   * {{{-DzkRun}}} causes an embedded zookeeper server to be run as part of this Solr server.
   * {{{-Dbootstrap_confdir=./solr/collection1/conf}}} Since we don't yet have a config in zookeeper, this parameter causes the local configuration directory {{{./solr/conf}}} to be uploaded as the "myconf" config.  The name "myconf" is taken from the "collection.configName" param below.
@@ -54, +58 @@

  
  {{{
  cd example2
+ 
  java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
+ 
  }}}
   * {{{-Djetty.port=7574}}}  is just one way to tell the Jetty servlet container to use a different port.
   * {{{-DzkHost=localhost:9983}}} points to the Zookeeper ensemble containing the cluster state.  In this example we're running a single Zookeeper server embedded in the first Solr server.  By default, an embedded Zookeeper server runs at the Solr port plus 1000, so 9983.
@@ -65, +71 @@

  
  {{{
  cd exampledocs
+ 
  java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar ipod_video.xml
+ 
  java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar monitor.xml
+ 
  java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar mem.xml
+ 
  }}}
  And now, a request to either server results in a distributed search that covers the entire collection:
  
@@ -84, +94 @@

  
  {{{
  cp -r example exampleB
+ 
  cp -r example2 example2B
+ 
  }}}
  Then start the two new servers on different ports, each in its own window:
  
  {{{
  cd exampleB
+ 
  java -Djetty.port=8900 -DzkHost=localhost:9983 -jar start.jar
+ 
  }}}
  {{{
  cd example2B
+ 
  java -Djetty.port=7500 -DzkHost=localhost:9983 -jar start.jar
+ 
  }}}
  Refresh the zookeeper browser page [[http://localhost:8983/solr/#/~cloud|Solr Zookeeper Admin UI]] and verify that 4 solr nodes are up, and that each shard has two replicas.
  
@@ -125, +141 @@

  
  {{{
  rm -r example*/solr/zoo_data
+ 
  }}}
  We will be running the servers again at ports 8983,7574,8900,7500.  The default is to run an embedded zookeeper server at hostPort+1000, so if we run an embedded zookeeper on the first three servers, the ensemble address will be {{{localhost:9983,localhost:8574,localhost:9900}}}.
  
@@ -132, +149 @@

  
  {{{
  cd example
+ 
  java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2 -jar start.jar
+ 
  }}}
  {{{
  cd example2
+ 
  java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
+ 
  }}}
  {{{
  cd exampleB
+ 
  java -Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
+ 
  }}}
  {{{
  cd example2B
+ 
  java -Djetty.port=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
+ 
  }}}
  Now since we are running three embedded zookeeper servers as an ensemble, everything can keep working even if a server is lost. To demonstrate this, kill the exampleB server by pressing CTRL+C in it's window and then browse to the [[http://localhost:8983/solr/#/~cloud|Solr Zookeeper Admin UI]] to verify that the zookeeper service still works.
  
@@ -167, +192 @@

  '''Create''' http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4
  
  About the params:
+ 
   * '''name''': The name of the collection to be created.
   * '''numShards''': The number of logical shards (sometimes called slices) to be created as part of the collection.
   * '''replicationFactor''': The number of copies of each document (or, the number of physical replicas to be created for each logical shard of the collection.)  A replicationFactor of 3 means that there will be 3 replicas (one of which is normally designated to be the leader) for each logical shard.  NOTE: in Solr 4.0, replicationFactor was the number of *additional* copies as opposed to the total number of copies.
   * '''maxShardsPerNode''' : A create operation will spread numShards*replicationFactor shard-replica across your live Solr nodes - fairly distributed, and never two replica of the same shard on the same Solr node. If a Solr is not live at the point in time where the create operation is carried out, it will not get any parts of the new collection. To prevent too many replica being created on a single Solr node, use maxShardsPerNode to set a limit for how many replicas the create operation is allowed to create on each node - default is 1. If it cannot fit the entire collection numShards*replicationFactor replicas on you live Solrs it will not create anything at all.
   * '''createNodeSet''': If not provided the create operation will create shard-replica spread across all of your live Solr nodes. You can provide the "createNodeSet" parameter to change the set of nodes to spread the shard-replica across. The format of values for this param is "<node-name1>,<node-name2>,...,<node-nameN>" - e.g. "localhost:8983_solr,localhost:8984_solr,localhost:8985_solr"
-  * '''collection.configName''': The name of the config (must be already stored in zookeeper) to use for this new collection. If not provided, Solr will default to the collection name as the config name.
+  * '''collection.configName''': The name of the config (must be already stored in zookeeper) to use for this new collection. If not provided the create operation will default to the collection name as the config name.
  
- '''Delete''' http://localhost:8983/solr/admin/collections?action=DELETE&name=mycollection
+ '''CreateAlias''' (''added in Solr 4.2'') http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=alias&collections=collection1,collection2,…
  
  About the params:
-  * '''name''': The name of the collection to be deleted.
  
- '''Reload''' http://localhost:8983/solr/admin/collections?action=RELOAD&name=mycollection
- 
- About the params:
-  * '''name''': The name of the collection to be reloaded.
- 
- '''CreateAlias''' (''added in Solr 4.2'') http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=alias&collections=collection1,collection2,…
- 
- About the params:
   * '''name''': The name of the collection alias to be created.
   * '''collections''': A comma-separated list of one or more collections to alias to.
+ 
+ '''Delete''' http://localhost:8983/solr/admin/collections?action=DELETE&name=mycollection
+ 
+ About the params:
+ 
+  * '''name''': The name of the collection to be deleted.
+ 
+ '''Reload''' http://localhost:8983/solr/admin/collections?action=RELOAD&name=mycollection
+ 
+ About the params:
+ 
+  * '''name''': The name of the collection to be reloaded.
  
  == Creating cores via CoreAdmin ==
  New Solr cores may also be created and associated with a collection via CoreAdmin.
@@ -205, +234 @@

  
  {{{
  curl 'http://localhost:8983/solr/admin/cores?action=CREATE&name=mycore&collection=collection1&shard=shard2'
+ 
  }}}
  == Distributed Requests ==
  Explicitly specify the addresses of shards you want to query:
  
  {{{
  shards=localhost:8983/solr,localhost:7574/solr
+ 
  }}}
  Explicitly specify the addresses of shards you want to query, giving alternatives (delimited by `|`) used for load balancing and fail-over:
  
  {{{
  shards=localhost:8983/solr|localhost:8900/solr,localhost:7574/solr|localhost:7500/solr
+ 
  }}}
  Query all shards of a collection (the collection is implicit in the URL):
  
  {{{
  http://localhost:8983/solr/collection1/select?
+ 
  }}}
  Query specific shard ids. In this example, the user has partitioned the index by date, creating a new shard every month.
  
  {{{
  http://localhost:8983/solr/collection1/select?shards=shard_200812,shard_200912,shard_201001
+ 
  }}}
  Query all shards of a compatible collection, explicitly specified:
  
  {{{
  http://localhost:8983/solr/collection1/select?collection=collection1_recent
+ 
  }}}
  Query all shards of multiple compatible collections, explicitly specified:
  
  {{{
  http://localhost:8983/solr/collection1/select?collection=collection1_NY,collection1_NJ,collection1_CT
+ 
  }}}
  == Required Config ==
  All of the required config is already setup in the example configs shipped with Solr. The following is what you need to add if you are migrating old config files, or what you should not remove if you are starting with new config files.
@@ -245, +281 @@

  
  {{{
  <field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>
- }}}
  
+ }}}
  === solrconfig.xml ===
  You must have an UpdateLog defined - this should be defined in the updateHandler section.
  
  {{{
      <!-- Enables a transaction log, currently used for real-time get.
+ 
           "dir" - the target directory for transaction logs, defaults to the
+ 
           solr data directory.  -->
+ 
      <updateLog>
+ 
        <str name="dir">${solr.data.dir:}</str>
+ 
      </updateLog>
+ 
  }}}
  You must have a replication handler called /replication defined:
  
  {{{
      <requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy" />
+ 
  }}}
  You must have a realtime get handler called /get defined:
  
  {{{
      <requestHandler name="/get" class="solr.RealTimeGetHandler">
+ 
        <lst name="defaults">
+ 
          <str name="omitHeader">true</str>
+ 
       </lst>
+ 
      </requestHandler>
+ 
  }}}
  You must have the admin handlers defined:
  
  {{{
      <requestHandler name="/admin/" class="solr.admin.AdminHandlers" />
+ 
  }}}
  The DistributedUpdateProcessor is part of the default update chain and is automatically injected into any of your custom update chains. You can still explicitly add it yourself as follows:
  
  {{{
     <updateRequestProcessorChain name="sample">
+ 
       <processor class="solr.LogUpdateProcessorFactory" />
+ 
       <processor class="solr.DistributedUpdateProcessorFactory"/>
+ 
       <processor class="my.package.UpdateFactory"/>
+ 
       <processor class="solr.RunUpdateProcessorFactory" />
+ 
     </updateRequestProcessorChain>
+ 
  }}}
  If you do not want the '''DistributedUpdateProcessFactory''' auto injected into your chain (say you want to use SolrCloud functionality, but you want to distribute updates yourself) then specify the following update processor factory in your chain: '''NoOpDistributingUpdateProcessorFactory'''
  
@@ -294, +349 @@

  
  {{{
      <cores adminPath="/admin/cores"
- }}}
  
+ }}}
  == Re-sizing a Cluster ==
  You can control cluster size by passing the numShards when you start up the first SolrCore in a collection. This parameter is used to auto assign which shard each instance should be part of. Any SolrCores that you start after starting numShards instances are evenly added to each shard as replicas (as long as they all belong to the same collection).
  
@@ -360, +415 @@

  
  {{{
  usage: ZkCLI
+ 
   -c,--collection <arg>   for linkconfig: name of the collection
+ 
   -cmd <arg>              cmd to run: bootstrap, upconfig, downconfig,
+ 
                           linkconfig, makepath, clear
+ 
   -d,--confdir <arg>      for upconfig: a directory of configuration files
+ 
   -h,--help               bring up this help page
+ 
   -n,--confname <arg>     for upconfig, linkconfig: name of the config set
+ 
   -r,--runzk <arg>        run zk internally by passing the solr run port -
+ 
                           only for clusters on one machine (tests, dev)
+ 
   -s,--solrhome <arg>     for bootstrap, runzk: solrhome location
+ 
   -z,--zkhost <arg>       ZooKeeper host address
+ 
  }}}
  ==== Examples ====
  {{{
  # try uploading a conf dir
+ 
  java -classpath example/solr-webapp/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983 -confdir example/solr/collection1/conf -confname conf1
+ 
  }}}
  {{{
  # try linking a collection to a conf set
+ 
  java -classpath example/solr-webapp/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -zkhost 127.0.0.1:9983 -collection collection1 -confname conf1
+ 
  }}}
  {{{
  # try bootstrapping all the conf dirs in solr.xml
+ 
  java -classpath example/solr-webapp/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd bootstrap -zkhost 127.0.0.1:9983 -solrhome example/solr
+ 
  }}}
  ==== Scripts ====
  There are scripts in example/cloud-scripts that handle the classpath and class name for you if you are using Solr out of the box with Jetty. Cmds then become:
  
  {{{
  sh zkcli.sh -cmd linkconfig -zkhost 127.0.0.1:9983 -collection collection1 -confname conf1
+ 
  }}}
  === Zookeeper chroot ===
  If you are already using Zookeeper for other applications and you want to keep the ZNodes organized by application, or if you want to have multiple separated SolrCloud clusters sharing one Zookeeper ensemble you can use Zookeeper's "chroot" option. From Zookeeper's documentation: http://zookeeper.apache.org/doc/r3.3.6/zookeeperProgrammers.html#ch_zkSessions
  
  {{{
  An optional "chroot" suffix may also be appended to the connection string. This will run the client commands while interpreting all paths relative to this root (similar to the unix chroot command). If used the example would look like: "127.0.0.1:4545/app/a" or "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002/app/a" where the client would be rooted at "/app/a" and all paths would be relative to this root - ie getting/setting/etc... "/foo/bar" would result in operations being run on "/app/a/foo/bar" (from the server perspective).
+ 
  }}}
  To use this Zookeeper feature, simply start Solr with the "chroot" suffix in the zkHost parameter. For example:
  
  {{{
  java -DzkHost=localhost:9983/foo/bar -jar start.jar
+ 
  }}}
  or
  
  {{{
  java -DzkHost=zoo1:9983,zoo2:9983,zoo3:9983/foo/bar -jar start.jar
+ 
  }}}
  '''NOTE:''' With Solr 4.0 you'll need to create the initial path in Zoookeeper before starting Solr. Since Solr 4.1, the initial path will automatically be created if you are using either ''bootstrap_conf'' or ''boostrap_confdir''.