You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Li Li <fa...@gmail.com> on 2010/07/08 13:44:14 UTC

Distributed Indexing

    Is there any tools for "Distributed Indexing"? It refers to
KattaIntegration  and ZooKeeperIntegration in
http://wiki.apache.org/solr/DistributedSearch.
    But it seems that they concern more on error processing and
replication. I need a dispatcher that dispatch different docs by
uniqueKey(such as url)  to different machines. And when a doc is
updated, the doc is sent to the machine that contains the url. Also I
need the docs are randomly sent to all the machines so that when I do
a distributed search the idfs of different machines are similar
because the current distributed search's idf are local.

RE: Distributed Indexing

Posted by Yuval Feinstein <yu...@answers.com>.
Li, 
as far as I know, you still have to do this part yourself.
A possible way to shard is to number the shards from 0 to numShards-1, 
calculate hash(uniqueKey)%numShards per each document,
and send the document to the resulting shard number.
This number is consistent and sends documents uniformly to different shards.
-- Yuval

-----Original Message-----
From: Li Li [mailto:fancyerii@gmail.com] 
Sent: Thursday, July 08, 2010 2:44 PM
To: solr-user@lucene.apache.org
Subject: Distributed Indexing

    Is there any tools for "Distributed Indexing"? It refers to
KattaIntegration  and ZooKeeperIntegration in
http://wiki.apache.org/solr/DistributedSearch.
    But it seems that they concern more on error processing and
replication. I need a dispatcher that dispatch different docs by
uniqueKey(such as url)  to different machines. And when a doc is
updated, the doc is sent to the machine that contains the url. Also I
need the docs are randomly sent to all the machines so that when I do
a distributed search the idfs of different machines are similar
because the current distributed search's idf are local.