You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by SolrUser1543 <os...@gmail.com> on 2015/01/28 07:54:13 UTC

Reindex data without creating new index.

I want to reindex my data in order to change a value of some field according
to value of another. ( both field are existing ) 

For this purpose I run a "clue" utility in order to get a list of IDs.  
Then I created an update processor , which can set a value of field A
according to value of field B.
I added a new request handler ,like a classic update , but with new update
chain with a new update processor

I want to run a http post request for each ID , to a new handler ,with item
id only. 
This will trigger my update processor , which will get an existing doc from
the index and do the logic. 

So in this way I can do some enrichment , without full data import and
without creating a new index .

What do you think about it ?
Could it cause a performance degradation because of it? SOLR can handle it
or it will rebalance the index ?
Does SOLR has some built in feature which can do it ?






--
View this message in context: http://lucene.472066.n3.nabble.com/Reindex-data-without-creating-new-index-tp4182464.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reindex data without creating new index.

Posted by SolrUser1543 <os...@gmail.com>.

By rebalancing I mean that such a big amount of updates will create a
situation which will require running optimization of index ,because each
document will be added again, instead of original one. 

But according to what you say it is should not be a problem, am I correct? 





--
View this message in context: http://lucene.472066.n3.nabble.com/Reindex-data-without-creating-new-index-tp4182464p4182726.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reindex data without creating new index.

Posted by Shawn Heisey <ap...@elyograg.org>.

On 1/27/2015 11:54 PM, SolrUser1543 wrote:
> I want to reindex my data in order to change a value of some field according
> to value of another. ( both field are existing ) 
> 
> For this purpose I run a "clue" utility in order to get a list of IDs.  
> Then I created an update processor , which can set a value of field A
> according to value of field B.
> I added a new request handler ,like a classic update , but with new update
> chain with a new update processor
> 
> I want to run a http post request for each ID , to a new handler ,with item
> id only. 
> This will trigger my update processor , which will get an existing doc from
> the index and do the logic. 
> 
> So in this way I can do some enrichment , without full data import and
> without creating a new index .
> 
> What do you think about it ?
> Could it cause a performance degradation because of it? SOLR can handle it
> or it will rebalance the index ?
> Does SOLR has some built in feature which can do it ?

This is likely possible, with some caveats.  You'll need to write all
the code yourself, extending the UpdateRequestProcessorFactory and
UpdateRequestProcessor classes.

This will be similar to the atomic update feature, so you'll likely need
to find that source code and model yours on its operation.  It will have
the same requirements -- all fields must be 'stored="true"' except those
which are copyField destinations, which must be 'stored="false"'.  With
Atomic Updates, this requirement is not *enforced*, but it must be met,
or there will be data loss.

https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations

What do you mean by "rebalance" the index?  This could mean almost
anything, but most of the meanings I can come up with would not apply to
this situation at all.

The effect on Solr for each document you process will be the sum of:  A
query for that document, a tiny bit for the update processor itself,
followed by a reindex of that document.

Thanks,
Shawn