You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dan OConnor <do...@acquiremedia.com> on 2009/05/15 22:41:12 UTC

is there a way to control when merges happen?

All:

I would like to be able to control when an index merge happens (by wall clock time) so that merges do not occur in the middle of the business day.

I have a lucene system based on v2.3.2 and we add a couple hundred thousand documents per day - and we allow searching while documents are being added - we reopen an IndexReader periodically to expose newly arrived contents.

There are times when merging causes significant performance impacts on search results - I've seen cases where merging will cause 200% load on a system (dual quad core x86_64 running Centos) with a raid-5 disk subsystem of 15k drives.

I've seen some info on the MergeScheduler and ConcurrentMergeScheduler but not necessarily enough to attempt a coding effort.

Looking through the code for ConcurrentMergeScheduler.java, is it as straightforward as over-riding the mergeScheduler.merge() method with a method that checks to see if a merge is allowed (by wall clock time)?  If a merge is not allowed at that time, can I just return();? Or do I have to sleep the thread until the merge is allowed?

Thanks,
Dan


Dan O'Connor
SVP, Engineering
Acquire Media<http://www.acquiremedia.com/>
77 South Bedford Street, Suite 350<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
Burlington, MA 01803<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
e: doconnor@acquiremedia.com<ma...@acquiremedia.com>
o: 781-250-0565
f: 877-861-7724


Re: is there a way to control when merges happen?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Whenever a new segment is flushed, or a merge completes, then the
MergePolicy and MergeScheduler are invoked.

You can also invoke them at any time by calling
IndexWriter.maybeMerge() yourself.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Aug 1, 2012 at 7:03 AM, Konstantyn Smirnov <in...@yahoo.com> wrote:
> Hi Mike.
>
> I have a LogDocMergePolicy + ConcurrentMergeScheduler in my setup.
> I tried adding new segments with 800-5000 documents in each of them in a
> row, but the scheduler seemed to ignore them at first... only after some
> time it managed to merge some of them.
>
> I have an option to use a quartz-scheduler to trigger my mergers, but I
> would like to keep that logic where it really belongs: in Lucene's
> mergeScheduler.
>
> Is there a way to control merge scheduling now (with 3.6.0)?
> When exactly the scheduler is triggered: upon adding a new segment, or is it
> running every n hours? Can I configure the scheduler to do both?
>
> TIA
>
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-control-when-merges-happen-tp560736p3998571.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: is there a way to control when merges happen?

Posted by Konstantyn Smirnov <in...@yahoo.com>.
Hi Mike.

I have a LogDocMergePolicy + ConcurrentMergeScheduler in my setup.
I tried adding new segments with 800-5000 documents in each of them in a
row, but the scheduler seemed to ignore them at first... only after some
time it managed to merge some of them.

I have an option to use a quartz-scheduler to trigger my mergers, but I
would like to keep that logic where it really belongs: in Lucene's
mergeScheduler.

Is there a way to control merge scheduling now (with 3.6.0)?
When exactly the scheduler is triggered: upon adding a new segment, or is it
running every n hours? Can I configure the scheduler to do both?

TIA






--
View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-control-when-merges-happen-tp560736p3998571.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: is there a way to control when merges happen?

Posted by Michael McCandless <lu...@mikemccandless.com>.
I think you could subclass ConcurrentMergeScheduler, overriding
merge() to only call super.merge() if the time is right?  (And just
return right away if it's not the right time).

Though you might want to allow small merges to run in real-time, and
big merges to wait until after hours.

Mike

On Fri, May 15, 2009 at 4:41 PM, Dan OConnor <do...@acquiremedia.com> wrote:
> All:
>
> I would like to be able to control when an index merge happens (by wall clock time) so that merges do not occur in the middle of the business day.
>
> I have a lucene system based on v2.3.2 and we add a couple hundred thousand documents per day - and we allow searching while documents are being added - we reopen an IndexReader periodically to expose newly arrived contents.
>
> There are times when merging causes significant performance impacts on search results - I've seen cases where merging will cause 200% load on a system (dual quad core x86_64 running Centos) with a raid-5 disk subsystem of 15k drives.
>
> I've seen some info on the MergeScheduler and ConcurrentMergeScheduler but not necessarily enough to attempt a coding effort.
>
> Looking through the code for ConcurrentMergeScheduler.java, is it as straightforward as over-riding the mergeScheduler.merge() method with a method that checks to see if a merge is allowed (by wall clock time)?  If a merge is not allowed at that time, can I just return();? Or do I have to sleep the thread until the merge is allowed?
>
> Thanks,
> Dan
>
>
> Dan O'Connor
> SVP, Engineering
> Acquire Media<http://www.acquiremedia.com/>
> 77 South Bedford Street, Suite 350<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
> Burlington, MA 01803<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
> e: doconnor@acquiremedia.com<ma...@acquiremedia.com>
> o: 781-250-0565
> f: 877-861-7724
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: is there a way to control when merges happen?

Posted by Jason Rutherglen <ja...@gmail.com>.
Hi Dan,

You are looking to throttle the merging?  I'd recommend setting
ConcurrentMergeScheduler.setMaxThreadCount(1).  This way IW.addDocument
doesn't wait while a merge occurs (like SerialMergeScheduler) however it
should not use as much CPU as only one merge will occur at a time.

In regards to overriding the MS.merge method either way you mentioned would
work.

-J

On Fri, May 15, 2009 at 1:41 PM, Dan OConnor <do...@acquiremedia.com>wrote:

> All:
>
> I would like to be able to control when an index merge happens (by wall
> clock time) so that merges do not occur in the middle of the business day.
>
> I have a lucene system based on v2.3.2 and we add a couple hundred thousand
> documents per day - and we allow searching while documents are being added -
> we reopen an IndexReader periodically to expose newly arrived contents.
>
> There are times when merging causes significant performance impacts on
> search results - I've seen cases where merging will cause 200% load on a
> system (dual quad core x86_64 running Centos) with a raid-5 disk subsystem
> of 15k drives.
>
> I've seen some info on the MergeScheduler and but not necessarily enough to
> attempt a coding effort.
>
> Looking through the code for ConcurrentMergeScheduler.java, is it as
> straightforward as over-riding the mergeScheduler.merge() method with a
> method that checks to see if a merge is allowed (by wall clock time)?  If a
> merge is not allowed at that time, can I just return();? Or do I have to
> sleep the thread until the merge is allowed?
>
> Thanks,
> Dan
>
>
> Dan O'Connor
> SVP, Engineering
> Acquire Media<http://www.acquiremedia.com/>
> 77 South Bedford Street, Suite 350<
> http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18
> >
> Burlington, MA 01803<
> http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18
> >
> e: doconnor@acquiremedia.com<ma...@acquiremedia.com>
> o: 781-250-0565
> f: 877-861-7724
>
>