You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2006/04/15 01:14:16 UTC
[Solr Wiki] Update of "IndexPartitioning" by HossMan
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by HossMan:
http://wiki.apache.org/solr/IndexPartitioning
New page:
= Overview of Idea =
In addition to the goal of allowing single Application Server instance (ie: port) to run multiple Solr webapp instances (ie: the same war, duplicated several time s with differnet names, each with their own configs/data) there has been some discionsion about hte idea of allowing a single Solr webapp to support "partitioning" of the index -- either based on a particular field value, or based on an explicit "partition name" used when issuing updates.
= Detailed Description =
solrconfig.xml would have new information indicating there are index partitions, something like this perhaps...
{{{
<partitions default="foo">
<partition>foo</partition>
<partition>bar</partition>
<partition>baz</partition>
<partition>bax</partition>
</partition>
}}}
...which indicates that there are 4 partitions, which can be refrenced by name. Each partition would be backed by a physical index on disk in the data directory (the directory name would be the same as the partition name (or maybe the syntax would be `<partition dir="f_o_o">`)
The `<add>` command would support a new `partition="..."` attribute indicating which partition the document(s) should be added to, if no partition is specified, they would go in the default partition. If the index has a uniqueKey field and allowDups is false, then old docs with the same key should be deleted in *all* of the partitions. (this is neccessary to allow data to be moved from one partition to another ... but perhaps a new `overwriteOtherPartitions` attribute on the add would help here?)
Deleting by id should (probably) delete across all partitions (see above) but deleting by query could also be confined to a single partition using a similar `partition="..."` attribute.
`SolrQueryRequestion.getSearcher()` sould continue to behave as it allways has, returning a Searcher acoress the entire "index" -- by making an !IndexSearcher over a !MultiReader of all partitions. but a new method could be added: `SolrQueryRequestion.getSearcher(String partitionName)` which would allow request handlers to confine their searches to a single partition.
== Alternate Dynamic Partitioning Idea ==
Partitions could be created on the fly, based on field values. Considering confguration like this...
{{{
<partitionField default="foo">objectType</partitionField>
}}}
...which would indicate that anytime a document is added, the `objectType` field should be inspected, and used as the name of the parition (and directory of the underlying index). if a document does not have a value for the obejctType field, the default partition of "foo" should be assumed.
= Things that need to be considered =
* !SolrQueryRequestBase currently grabs a searcher as soon as it is constructed so that it's garunteed to have a consistent view of the index. would it need to grab a seearcher across every paritition to ensure this without knowing in advance which partition(s) the plugin wants to look at it?
* how do the cache configurations work regarding the various searchers/indexreaders?
* does each searcher on each partition have it's own caches wit hte same config as the main searcher?
* is there a way to specially config the cahces on a per partition basis?
* how would the replication scripts need to change?
* should it be possible to replicate individual partitions seperately?