You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2006/04/15 01:14:16 UTC

[Solr Wiki] Update of "IndexPartitioning" by HossMan

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by HossMan:
http://wiki.apache.org/solr/IndexPartitioning

New page:
= Overview of Idea =

In addition to the goal of allowing single Application Server instance (ie: port) to run multiple Solr webapp instances (ie: the same war, duplicated several time s with differnet names, each with their own configs/data) there has been some discionsion about hte idea of allowing a single Solr webapp to support "partitioning" of the index -- either based on a particular field value, or based on an explicit "partition name" used when issuing updates.

= Detailed Description =

solrconfig.xml would have new information indicating there are index partitions, something like this perhaps...

{{{
   <partitions default="foo">
     <partition>foo</partition>
     <partition>bar</partition>
     <partition>baz</partition>
     <partition>bax</partition>
   </partition>
}}}

...which indicates that there are 4 partitions, which can be refrenced by name.  Each partition would be backed by a physical index on disk in the data directory (the directory name would be the same as the partition name (or maybe the syntax would be `<partition dir="f_o_o">`)

The `<add>` command would support a new `partition="..."` attribute indicating which partition the document(s) should be added to, if no partition is specified, they would go in the default partition.  If the index has a uniqueKey field and allowDups is false, then old docs with the same key should be deleted in *all* of the partitions.  (this is neccessary to allow data to be moved from one partition to another ... but perhaps a new `overwriteOtherPartitions` attribute on the add would help here?)

Deleting by id should (probably) delete across all partitions (see above) but deleting by query could also be confined to a single partition using a similar `partition="..."` attribute.

`SolrQueryRequestion.getSearcher()` sould continue to behave as it allways has, returning a Searcher acoress the entire "index" -- by making an !IndexSearcher over a !MultiReader of all partitions.  but a new method could be added: `SolrQueryRequestion.getSearcher(String partitionName)` which would allow request handlers to confine their searches to a single partition.

== Alternate Dynamic Partitioning Idea ==

Partitions could be created on the fly, based on field values.  Considering confguration like this...

{{{
   <partitionField default="foo">objectType</partitionField>
}}}

...which would indicate that anytime a document is added, the `objectType` field should be inspected, and used as the name of the parition (and directory of the underlying index).  if a document does not have a value for the obejctType field, the default partition of "foo" should be assumed.

= Things that need to be considered =

   * !SolrQueryRequestBase currently grabs a searcher as soon as it is constructed so that it's garunteed to have a consistent view of the index.  would it need to grab a seearcher across every paritition to ensure this without knowing in advance which partition(s) the plugin wants to look at it?
   * how do the cache configurations work regarding the various searchers/indexreaders?  
      * does each searcher on each partition have it's own caches wit hte same config as the main searcher?
      * is there a way to specially config the cahces on a per partition basis?
   * how would the replication scripts need to change?
      * should it be possible to replicate individual partitions seperately?