You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pratyul Kapoor <pr...@gmail.com> on 2012/11/24 21:30:34 UTC

Indexing only on change

Hi,

I just discovered that solr while editing a particular field of a document,
removes the entire document and recreates.

I have a list of 1000s of documents to be indexed. But I am aware that only
some of those documents would be changed and rest all would already be
there. Is there any way, I can check whether the incoming and already
existing document is same, and there is no need of indexing it again.

Pratyul

Re: Indexing only on change

Posted by Otis Gospodnetic <ot...@gmail.com>.
Not sure if there is an automated way, but you could do it by computing a
hash of various/all fields at index time and later use that to compare
before updating. And you can hide this in a UpdateRequestProcessor.  Could
be a generally useful feature, so consider contributing.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Nov 24, 2012 3:37 PM, "Pratyul Kapoor" <pr...@gmail.com> wrote:

> Hi,
>
> I just discovered that solr while editing a particular field of a document,
> removes the entire document and recreates.
>
> I have a list of 1000s of documents to be indexed. But I am aware that only
> some of those documents would be changed and rest all would already be
> there. Is there any way, I can check whether the incoming and already
> existing document is same, and there is no need of indexing it again.
>
> Pratyul
>

Re: Indexing only on change

Posted by François Schiettecatte <fs...@gmail.com>.
I would create a hash of the document content and store that in SOLR along with any document info you wish to store. When a document is presented for indexing, hash that and compare to the hash of the stored document, index if they are different and skip if they are not.

François
 

On Nov 24, 2012, at 3:30 PM, Pratyul Kapoor <pr...@gmail.com> wrote:

> Hi,
> 
> I just discovered that solr while editing a particular field of a document,
> removes the entire document and recreates.
> 
> I have a list of 1000s of documents to be indexed. But I am aware that only
> some of those documents would be changed and rest all would already be
> there. Is there any way, I can check whether the incoming and already
> existing document is same, and there is no need of indexing it again.
> 
> Pratyul