You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gaurav gupta <gu...@gmail.com> on 2014/08/04 16:33:59 UTC

Is housekeeping of Lucene indexes block index update but allow search ?

Hi,

We are planning to use Lucene 4.8.1 over Oracle (1 to 2 TB data) and
seeking information on  "How Lucene conduct housekeeping or maintenance of
indexes over a period of time". *Is it a blocking operation for write and
search or it will not block anything while merging is going on? *

I found :- *"Since Lucene adds the updated document to the index and marks
all previous versions as deleted. So to get rid of deleted documents Lucene
needs to do some housekeeping over a period of time. Under the hood is that
from time to time segments are merged into (usually) bigger segments
using configurable MergePolicy
<http://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/index/MergePolicy>
(TieredMergePolicy).
"*

1- Is it's a blocking operation for write and search both or it will not
block anything while merging is going on?

2- What is the best practice to avoid any blocking in production servers?
Not sure how Solr or Elasticsearch is handling it.
Should we control the merging by calling *forcemerge(int) at low traffic
time *to avoid any unpredictable blocking operation? Is it recommended or
Lucene do intelligent merging and don't block anything (updates and
searches) or there are ways to reduce the blocking time to a very small
duration (1 -2 minutes) using some API or demon thread etc.

Looking for your professional guidance on it.

Regards
Gaurav

Re: Is housekeeping of Lucene indexes block index update but allow search ?

Posted by Gaurav gupta <gu...@gmail.com>.
Thanks Kumaran and Erik for resolving my queries.

Kumaran,
You are right at only one indexwriter can write as it acquire the lock but
using the NRT manager APis -
TrackingIndexWriter
<http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/TrackingIndexWriter.html>
multiple
concurrent updates/delete/append is possible.

Thanks again !







On Mon, Aug 4, 2014 at 10:29 PM, Erick Erickson <er...@gmail.com>
wrote:

> Right.
> 1> Occasionally the merge will require 2x the disk space. (3x in compound
> file system). The merging is, indeed, done in the background, it is NOT a
> blocking operation.
>
> 2> n/a. It shouldn't block at all.
>
> Here's a cool video by Mike McCandless on the merging process, plus some
> explanations:
>
>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
> Best,
> Erick
>
>
>
>
> On Mon, Aug 4, 2014 at 8:45 AM, Kumaran R <ku...@gmail.com> wrote:
>
> > Hi Gaurav
> >
> > 1.When you opened index to write,till you close that index, there will
> > be a lock to do further write. But not for search. During merge, index
> > needs 3X ( not sure 2X?) of more storage space, i believe that is the
> > reason for no blocking for search. ( any other experts can clarify you
> > more on this )
> >
> > 2. Merge will be taken care by default values( merge factor 2) of
> > lucene. If u need to control more on merge policy, please go through
> > about merge by size or by number of segments or many merge policies.
> >
> >
> > Hope this will help you a little bit.
> >
> > --
> > Kumaran R
> > Sent from Phone
> >
> > > On 04-Aug-2014, at 8:04 pm, Gaurav gupta <gu...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > We are planning to use Lucene 4.8.1 over Oracle (1 to 2 TB data) and
> > > seeking information on  "How Lucene conduct housekeeping or maintenance
> > of
> > > indexes over a period of time". *Is it a blocking operation for write
> and
> > > search or it will not block anything while merging is going on? *
> > >
> > > I found :- *"Since Lucene adds the updated document to the index and
> > marks
> > > all previous versions as deleted. So to get rid of deleted documents
> > Lucene
> > > needs to do some housekeeping over a period of time. Under the hood is
> > that
> > > from time to time segments are merged into (usually) bigger segments
> > > using configurable MergePolicy
> > > <
> >
> http://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/index/MergePolicy
> > >
> > > (TieredMergePolicy).
> > > "*
> > >
> > > 1- Is it's a blocking operation for write and search both or it will
> not
> > > block anything while merging is going on?
> > >
> > > 2- What is the best practice to avoid any blocking in production
> servers?
> > > Not sure how Solr or Elasticsearch is handling it.
> > > Should we control the merging by calling *forcemerge(int) at low
> traffic
> > > time *to avoid any unpredictable blocking operation? Is it recommended
> or
> > > Lucene do intelligent merging and don't block anything (updates and
> > > searches) or there are ways to reduce the blocking time to a very small
> > > duration (1 -2 minutes) using some API or demon thread etc.
> > >
> > > Looking for your professional guidance on it.
> > >
> > > Regards
> > > Gaurav
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: Is housekeeping of Lucene indexes block index update but allow search ?

Posted by Erick Erickson <er...@gmail.com>.
Right.
1> Occasionally the merge will require 2x the disk space. (3x in compound
file system). The merging is, indeed, done in the background, it is NOT a
blocking operation.

2> n/a. It shouldn't block at all.

Here's a cool video by Mike McCandless on the merging process, plus some
explanations:

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Best,
Erick




On Mon, Aug 4, 2014 at 8:45 AM, Kumaran R <ku...@gmail.com> wrote:

> Hi Gaurav
>
> 1.When you opened index to write,till you close that index, there will
> be a lock to do further write. But not for search. During merge, index
> needs 3X ( not sure 2X?) of more storage space, i believe that is the
> reason for no blocking for search. ( any other experts can clarify you
> more on this )
>
> 2. Merge will be taken care by default values( merge factor 2) of
> lucene. If u need to control more on merge policy, please go through
> about merge by size or by number of segments or many merge policies.
>
>
> Hope this will help you a little bit.
>
> --
> Kumaran R
> Sent from Phone
>
> > On 04-Aug-2014, at 8:04 pm, Gaurav gupta <gu...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > We are planning to use Lucene 4.8.1 over Oracle (1 to 2 TB data) and
> > seeking information on  "How Lucene conduct housekeeping or maintenance
> of
> > indexes over a period of time". *Is it a blocking operation for write and
> > search or it will not block anything while merging is going on? *
> >
> > I found :- *"Since Lucene adds the updated document to the index and
> marks
> > all previous versions as deleted. So to get rid of deleted documents
> Lucene
> > needs to do some housekeeping over a period of time. Under the hood is
> that
> > from time to time segments are merged into (usually) bigger segments
> > using configurable MergePolicy
> > <
> http://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/index/MergePolicy
> >
> > (TieredMergePolicy).
> > "*
> >
> > 1- Is it's a blocking operation for write and search both or it will not
> > block anything while merging is going on?
> >
> > 2- What is the best practice to avoid any blocking in production servers?
> > Not sure how Solr or Elasticsearch is handling it.
> > Should we control the merging by calling *forcemerge(int) at low traffic
> > time *to avoid any unpredictable blocking operation? Is it recommended or
> > Lucene do intelligent merging and don't block anything (updates and
> > searches) or there are ways to reduce the blocking time to a very small
> > duration (1 -2 minutes) using some API or demon thread etc.
> >
> > Looking for your professional guidance on it.
> >
> > Regards
> > Gaurav
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Is housekeeping of Lucene indexes block index update but allow search ?

Posted by Kumaran R <ku...@gmail.com>.
Hi Gaurav

1.When you opened index to write,till you close that index, there will
be a lock to do further write. But not for search. During merge, index
needs 3X ( not sure 2X?) of more storage space, i believe that is the
reason for no blocking for search. ( any other experts can clarify you
more on this )

2. Merge will be taken care by default values( merge factor 2) of
lucene. If u need to control more on merge policy, please go through
about merge by size or by number of segments or many merge policies.


Hope this will help you a little bit.

--
Kumaran R
Sent from Phone

> On 04-Aug-2014, at 8:04 pm, Gaurav gupta <gu...@gmail.com> wrote:
>
> Hi,
>
> We are planning to use Lucene 4.8.1 over Oracle (1 to 2 TB data) and
> seeking information on  "How Lucene conduct housekeeping or maintenance of
> indexes over a period of time". *Is it a blocking operation for write and
> search or it will not block anything while merging is going on? *
>
> I found :- *"Since Lucene adds the updated document to the index and marks
> all previous versions as deleted. So to get rid of deleted documents Lucene
> needs to do some housekeeping over a period of time. Under the hood is that
> from time to time segments are merged into (usually) bigger segments
> using configurable MergePolicy
> <http://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/index/MergePolicy>
> (TieredMergePolicy).
> "*
>
> 1- Is it's a blocking operation for write and search both or it will not
> block anything while merging is going on?
>
> 2- What is the best practice to avoid any blocking in production servers?
> Not sure how Solr or Elasticsearch is handling it.
> Should we control the merging by calling *forcemerge(int) at low traffic
> time *to avoid any unpredictable blocking operation? Is it recommended or
> Lucene do intelligent merging and don't block anything (updates and
> searches) or there are ways to reduce the blocking time to a very small
> duration (1 -2 minutes) using some API or demon thread etc.
>
> Looking for your professional guidance on it.
>
> Regards
> Gaurav

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org