You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dmitry Kan <so...@gmail.com> on 2013/04/17 16:27:30 UTC

Re: Rejecting document already existing in different shard.

Hi,

Although we use logical sharding, there are cases in our environment as you
described. We handle them manually:

0. prepare new version of a document
1. remove the old version of the document
2. post it and commit

With logical sharding it is relatively easy, but we do need to store
location metadata in a DB.

In your case, have you had a look onto this:

http://wiki.apache.org/solr/Deduplication

Other things that come to mind: store the parameters of hashing and then
find a link between new and parameters of the "same" document.

Dmitry


On Wed, Mar 13, 2013 at 11:34 PM, Marcin Rzewucki <mr...@gmail.com>wrote:

> Hi there,
>
> Let's say we use custom hashing algorithm and there is a document already
> indexed in "shard1". After some time the same document has changed and
> should be indexed to "shard2" (because of routing rules used in indexing
> program). It has been indexed without issues and as a result 2 "almost" the
> same documents are in different shards. In my case, they are duplicates for
> the end user. Is it possible to reject a document if it already exists in
> different shard ? It would be even easier to handle such cases prior to
> adding new with the same ID.
>
> Regards.
>