You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by erolagnab <tr...@gmail.com> on 2011/11/17 15:34:45 UTC

What is the best approach to do reindexing on the fly?

Hi all,

I'm using Solr 3.2 with DataImportHandler periodically update index every 5
min.
There's an house keeping script running weekly which delete some data in the
database.
I'd like to incorporate the reindexing strategy with this house keeping
script by:
1. Locking the DataImportHandler - not allow to perform any update on the
index - by having a flag in the database, every time scheduled job trigger,
it first checks for the flag before perform incremental index.
2. Run separate Solr instance, pointing to the same index and perform a
clean index

Now before coming to this setup, I had some options but they didn't fit very
well:
1. Trigger reindexing directy in the running Solr instance - I wrap Solr
with our own authentication mechanism and reindexing would be causing spike
in memory usage and affect the current running apps (sitting in the same
j2ee container) is the least thing I want
2. Master/Slave setup - I think this is the  most proper way to do but
looking at it as a long term solution, we have a time constraint so it won't
work for now

For the above selected strategy, would the searches be affected due to the
reindexing from 2nd solr instance?
Do we need to tell Solr to update new index once it's available?
Any better option that I can give a try?

Many thanks,

Ero

--
View this message in context: http://lucene.472066.n3.nabble.com/What-is-the-best-approach-to-do-reindexing-on-the-fly-tp3515948p3515948.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: What is the best approach to do reindexing on the fly?

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, the master/slave setup takes about a day to get completely
running assuming that you don't have any experience to start with,
so you may be able to fit that in your schedule. Otherwise, you won't
be able to avoid the memory and CPU spikes.

But there's another option. It's actually quite easy to write a SolrJ
program that you can do anything you want in, including examining
your tables for locking.

But there's also another option. Create a trigger on your tables
that inserts what you use to create Solr's <uniqueKey> in a
"modified" table. Have your SolrJ program simply query that table
and delete/update as required to keep the single index in sync with
the database....

Of course, all that depends on how long it takes to re-index from scratch.
If it's reasonably quick, perhaps simply re-indexing at 3:00 AM (or whatever)
would work

Best
Erick

On Thu, Nov 17, 2011 at 9:34 AM, erolagnab <tr...@gmail.com> wrote:
> Hi all,
>
> I'm using Solr 3.2 with DataImportHandler periodically update index every 5
> min.
> There's an house keeping script running weekly which delete some data in the
> database.
> I'd like to incorporate the reindexing strategy with this house keeping
> script by:
> 1. Locking the DataImportHandler - not allow to perform any update on the
> index - by having a flag in the database, every time scheduled job trigger,
> it first checks for the flag before perform incremental index.
> 2. Run separate Solr instance, pointing to the same index and perform a
> clean index
>
> Now before coming to this setup, I had some options but they didn't fit very
> well:
> 1. Trigger reindexing directy in the running Solr instance - I wrap Solr
> with our own authentication mechanism and reindexing would be causing spike
> in memory usage and affect the current running apps (sitting in the same
> j2ee container) is the least thing I want
> 2. Master/Slave setup - I think this is the  most proper way to do but
> looking at it as a long term solution, we have a time constraint so it won't
> work for now
>
> For the above selected strategy, would the searches be affected due to the
> reindexing from 2nd solr instance?
> Do we need to tell Solr to update new index once it's available?
> Any better option that I can give a try?
>
> Many thanks,
>
> Ero
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/What-is-the-best-approach-to-do-reindexing-on-the-fly-tp3515948p3515948.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>