You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2010/02/17 00:02:54 UTC

[Solr Wiki] Update of "DeploymentofSolrCoreswithZookeeper" by JasonRutherglen

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "DeploymentofSolrCoreswithZookeeper" page has been changed by JasonRutherglen.
http://wiki.apache.org/solr/DeploymentofSolrCoreswithZookeeper

--------------------------------------------------

New page:
= Deployment of Solr Cores with Zookeeper =

https://issues.apache.org/jira/browse/SOLR-1724
== Architecture ==

Zookeeper may be used as a distributed filesystem to write which Solr servers should be running which cores.  GSON is the JSON library used to serialize and deserialize objects to and from the JSON format.  Ephemeral nodes are intentionally not used.  Zookeeper is used as a transactionally redundant filesystem, not a system for maintaining connections to various servers.  This is best left to dedicated monitoring services.

== Supported File Types ==

Zipped cores are the standard because they are easier to manage, download, and transfer across the network.

 * Zipped core accessible via HDFS
 * Zipped core accessible via HTTP

== Zookeeper Filesystem ==

=== Cores ===

Each "cores" file is written to Zookeeper and is of the form cores_N.  This is purposefully similar to the segment infos files written by Lucene.  The cores is stored in JSON format.

Contents of the cores file:

||Name||Type||
||name||string||
||version||long||
||array||coresinfo||

Each cores info contains:
||Name||Type||
||host||string||
||name||string||
||instanceDir||string||
||configFile||string||
||schemaFile||string||
||dataDir||string||
||url||string||

=== Host ===

Each Solr server (aka host or CoreContainer) must report to Zookeeper which cores it has installed.  Each host file is of the form host_version.  It is the responsibility of each Solr host/server to match the state of the cores_N file.  

Contents of a host file:

||Name||Type||
||name||string||
||version||long||
||array||hostinfo||

Each host info contains:
||Name||Type||
||name||string||
||instanceDir||string||
||configFile||string||
||schemaFile||string||
||dataDir||string||
||size||long||
||lastModified||long||

=== Sample Directory Layout ===

There are 2 cores files in this sample directory layout.  Under /production/hosts several host files have been written.  Actually, all of the necessary hosts files have been written indicating that for example cores_1 and cores_2 operational definitions have completed.

/production/cores_1<<BR>>
/production/cores_2<<BR>>
/production/hosts/servera_1<<BR>>
/production/hosts/serverb_1<<BR>>
/production/hosts/serverc_1<<BR>>
/production/hosts/serverd_1<<BR>>
/production/hosts/servera_2<<BR>>
/production/hosts/serverb_2<<BR>>
/production/hosts/serverc_2<<BR>>
/production/hosts/serverd_2<<BR>>

== CoreController ==

Core deploy client that lives inside a CoreContainer.  It listens for events on a given path, finds it's hostname in the latest file by version.  Each cores file is like Lucene's segment infos file which describes the set of segments that make up the current index.  The cores file defines the set of cores that should be installed on a given Solr host.

A default root path must be defined, for the unit tests /production is used.