You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2008/02/27 19:50:13 UTC

[Solr Wiki] Update of "DistributedSearch" by YonikSeeley

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by YonikSeeley:
http://wiki.apache.org/solr/DistributedSearch

New page:
<!> ["Solr1.3"]

== What is Distributed Search? ==
When an index becomes too large to fit on a single system, or when a single query takes too long to execute, an index can be split into multiple shards, and Solr can query and merge results across those shards.

If single queries are currently fast enough and one simply wishes to expand the capacity (queries/sec) of the search system, then standard whole [wiki:CollectionDistribution index replication] should be used.

== Distributed Searching ==
The presence of the '''shards''' parameter in a request will cause that request to be distributed across all shards in the list.  The syntax of '''shards''' is host:port/base_url[,host:port/base_url]*

Currently, only query requests will be distributed.  This includes requests to the standard request handler (and subclasses such as the dismax request handler), and any other handler (org.apache.solr.handler.component.SearchHandler) using standard components that support distributed search.

The current components that support distributed search are
   * The Query component that returns documents matching a query
   * The Facet component, for facet.query and facet.field requests where facet.sorted=true (the default)
   * The Highlighting component
   * the Debug component

== Distributed Indexing ==
It's up to the user to distribute documents across shards.  The easiest method to determine what server a document should be indexed at is to use something like '''uniqueId.hashCode() % numServers'''.

== Example ==
For simple functionality testing, it's easiest to just set up two local Solr servers on different ports.
{{{
#make a copy 
cd solr
cp -r example example7574

#change the port number
perl -pi -e s/8983/7574/g example7574/etc/jetty.xml  example7574/exampledocs/post.sh

#in window 1, start up the server on port 8983
cd example
java -server -jar start.jar

#in window 2, start up the server on port 7574
cd example7574
java -server -jar start.jar

#in window 3, index some example documents to each server
cd example/exampledocs
./post.sh [a-m]*.xml
cd ../../example7574/exampledocs
./post.sh [n-z]*.xml

#now do a distributed search across both servers with your browser or curl
curl 'http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr'
}}}