You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pankaj Gurumukhi <Pa...@talentica.com> on 2017/09/15 17:38:31 UTC

Adding UniqueKey to an existing Solr 6.4 Index

Hello,

I have a single node Solr 6.4 server, with a Index of 100 Million documents. The default "id" is the primary key of this index. Now, I would like to setup an update process to insert new documents, and update existing documents based on availability of value in another field (say ProductId), that is different from the default "id". Now, to ensure that I use the Solr provided De-Duplication method by having a new field SignatureField using the ProductId as UniqueKey. Considering the millions of documents I have, I would like to ask if its possible to setup a De-Duplication mechanism in an existing solr index with the following steps:

a.     Add new field SignatureField, and configure it as UniqueKey in Solr schema.

b.    Run an Atomic Update process on all documents, to update the value of this new field SignatureField.

Is there an easier/better way to add a SignatureField to an existing large index?

Thx,
Pankaj


Re: Adding UniqueKey to an existing Solr 6.4 Index

Posted by Erick Erickson <er...@gmail.com>.
Not really. Do note that atomic updates require
1> all _original_ fields (i.e. fields that are _not_ destinations for
copyFields) have stored=true
2> no destination of a copyField has stored=true
3> compose the original document from stored fields and re-index the
doc. This latter just means that atomic updates are actually slightly
more work than just re-indexing the doc from the system-of-record (as
far as Solr is concerned).

The decision to use atomic updates is up to you of course, the slight
extra work may be bettern than getting the docs from the original
source...

Best,
Erick

On Fri, Sep 15, 2017 at 10:38 AM, Pankaj Gurumukhi
<Pa...@talentica.com> wrote:
> Hello,
>
> I have a single node Solr 6.4 server, with a Index of 100 Million documents. The default "id" is the primary key of this index. Now, I would like to setup an update process to insert new documents, and update existing documents based on availability of value in another field (say ProductId), that is different from the default "id". Now, to ensure that I use the Solr provided De-Duplication method by having a new field SignatureField using the ProductId as UniqueKey. Considering the millions of documents I have, I would like to ask if its possible to setup a De-Duplication mechanism in an existing solr index with the following steps:
>
> a.     Add new field SignatureField, and configure it as UniqueKey in Solr schema.
>
> b.    Run an Atomic Update process on all documents, to update the value of this new field SignatureField.
>
> Is there an easier/better way to add a SignatureField to an existing large index?
>
> Thx,
> Pankaj
>