You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/09/11 19:33:39 UTC

[Solr Wiki] Update of "KattaIntegration" by JasonRutherglen

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by JasonRutherglen:
http://wiki.apache.org/solr/KattaIntegration

New page:
= Introduction =

Katta integration with Solr allows Hadoop indexing into shards,
which are replicated to N nodes/servers of a Solr cluster. This is
useful for large Solr clusters that require failover,
replication and the ability to provision shards dynamically.
Katta uses Zookeeper to coordinate the creation and deployment
of shards to Solr servers. 

See http://issues.apache.org/jira/browse/SOLR-1395

See http://sourceforge.net/projects/katta/

See http://hadoop.apache.org/zookeeper

= Features =

* Uses Hadoop RPC which is implemented with non-blocking (NIO) sockets underneath.  This should scale better than the current HTTP approach when there are hundreds of nodes because HTTP can create unnecessary overhead.

* All current distributed Solr requests function properly with no changes

* Incremental indexing may be accomplished by creating new shards and deploying them into the Katta cluster. The alternative method is to update a shard deployed on a Solr server (using the Solr normal XML over HTTP interface). On commit, the newly updated shard would be uploaded back into the Katta cluster, and the old version of the shard removed. 

* Solr Katta has built in failover