You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sascha Janz <Sa...@gmx.net> on 2014/08/07 12:14:38 UTC

improve indexing speed with nomergepolicy

hi,

i try to speed up our indexing process. we use SeacherManager with applydeletes to get near real time Reader.

we have not really "much" incoming documents, but the documents must be updated from time to time and the amount of documents to be updated could be quite large. 

i tried some tests with NoMergePolicy and the indexing process was 25 % faster. 

so i think of a change in our code, to use NoMergePolicy for a specific time interval, when users are active and do a forceMerge(20) every night, which last about 2 - 5 minutes.

is this a good idea? or will i perhaps get into trouble?

Sascha


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Aw: Re: improve indexing speed with nomergepolicy

Posted by Sascha Janz <Sa...@gmx.net>.
many thanks again. this was a good tip.

after switching from FSDirectory to NRTCachingDirectory queries run at double speed. 

Sascha
 
 

Gesendet: Donnerstag, 07. August 2014 um 14:54 Uhr
Von: "Sascha Janz" <Sa...@gmx.net>
An: java-user@lucene.apache.org
Betreff: Aw: Re: improve indexing speed with nomergepolicy
 many thanks for the tip with NRTCachingDirectory. didn't know that.

i will try it .

Sascha
 

Gesendet: Donnerstag, 07. August 2014 um 13:37 Uhr
Von: "Shai Erera" <se...@gmail.com>
An: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
Betreff: Re: improve indexing speed with nomergepolicy
Using NoMergePolicy for online indexes is usually not recommended. You want
to use NoMP in case where you build an index in a batch job, then in the
end before the index is "published" you run a forceMerge or maybeMerge
(with a real MergePolicy).

For online indexes, i.e. indexes that are being searched while they are
updated, if you use NoMP you will accumulate many segments in the index.
This means higher resources consumption overall: file handles, RAM,
potentially disk space, and usually results in slower searches.

You may want to tweak the default MP's settings though, to not kick off a
merge unless there are a large number of segments in the index. E.g. the
default MP merges segments when there are 10 at the same level (i.e.
roughly the same size). You can increase that.

Also, do you use NRTCachingDirectory? It's usually recommended for NRT,
even with default MP, since the tiny segments are merged in-memory, and
your NRT reopens don't result in flushing new segments to disk.

Shai


On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sa...@gmx.net> wrote:

> hi,
>
> i try to speed up our indexing process. we use SeacherManager with
> applydeletes to get near real time Reader.
>
> we have not really "much" incoming documents, but the documents must be
> updated from time to time and the amount of documents to be updated could
> be quite large.
>
> i tried some tests with NoMergePolicy and the indexing process was 25 %
> faster.
>
> so i think of a change in our code, to use NoMergePolicy for a specific
> time interval, when users are active and do a forceMerge(20) every night,
> which last about 2 - 5 minutes.
>
> is this a good idea? or will i perhaps get into trouble?
>
> Sascha
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Aw: Re: improve indexing speed with nomergepolicy

Posted by Sascha Janz <Sa...@gmx.net>.
 many thanks for the tip with NRTCachingDirectory. didn't know that. 

i will try it .

Sascha
 

Gesendet: Donnerstag, 07. August 2014 um 13:37 Uhr
Von: "Shai Erera" <se...@gmail.com>
An: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
Betreff: Re: improve indexing speed with nomergepolicy
Using NoMergePolicy for online indexes is usually not recommended. You want
to use NoMP in case where you build an index in a batch job, then in the
end before the index is "published" you run a forceMerge or maybeMerge
(with a real MergePolicy).

For online indexes, i.e. indexes that are being searched while they are
updated, if you use NoMP you will accumulate many segments in the index.
This means higher resources consumption overall: file handles, RAM,
potentially disk space, and usually results in slower searches.

You may want to tweak the default MP's settings though, to not kick off a
merge unless there are a large number of segments in the index. E.g. the
default MP merges segments when there are 10 at the same level (i.e.
roughly the same size). You can increase that.

Also, do you use NRTCachingDirectory? It's usually recommended for NRT,
even with default MP, since the tiny segments are merged in-memory, and
your NRT reopens don't result in flushing new segments to disk.

Shai


On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sa...@gmx.net> wrote:

> hi,
>
> i try to speed up our indexing process. we use SeacherManager with
> applydeletes to get near real time Reader.
>
> we have not really "much" incoming documents, but the documents must be
> updated from time to time and the amount of documents to be updated could
> be quite large.
>
> i tried some tests with NoMergePolicy and the indexing process was 25 %
> faster.
>
> so i think of a change in our code, to use NoMergePolicy for a specific
> time interval, when users are active and do a forceMerge(20) every night,
> which last about 2 - 5 minutes.
>
> is this a good idea? or will i perhaps get into trouble?
>
> Sascha
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: improve indexing speed with nomergepolicy

Posted by Shai Erera <se...@gmail.com>.
I opened https://issues.apache.org/jira/browse/LUCENE-5883 to handle that.

Shai


On Thu, Aug 7, 2014 at 6:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> This is a good idea, because sometimes it's nice to change the MergePolicy
> on the fly without reopening! One example is
> https://issues.apache.org/jira/browse/LUCENE-5526
> In my case, I would like to open an IndexWriter, set its merge policy to
> IndexUpdaterMergePolicy, force a merge to upgrade all segments and then
> proceed with normal indexing and other stuff. Currently you have to close
> IW - this is bad in multithreaded environments: If you start an Index
> Upgrade after installing a new version of your favourite Solr/ES/...
> server, but need to index documents in parallel (real time system) - so
> with little downtime.
> The proposal in the above issue is to allow to pass a MergePolicy to
> forceMerge().
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Shai Erera [mailto:serera@gmail.com]
> > Sent: Thursday, August 07, 2014 4:11 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: improve indexing speed with nomergepolicy
> >
> > Yes, currently an MP isn't a "live" setting on IndexWriter, meaning you
> pass it
> > at construction time and don't change it afterwards. I wonder if after
> > LUCENE-5711 we can move MergePolicy to LiveIndexWriterConfig and fix
> > IndexWriter to not hold on to it, but rather pull it from the config.
> >
> > Not sure what others think about it.
> >
> > Shai
> >
> >
> > On Thu, Aug 7, 2014 at 5:05 PM, Jon Stewart
> > <jo...@lightboxtechnologies.com>
> > wrote:
> >
> > > Related, how does one change the MergePolicy on an IndexWriter (e.g.,
> > > use NoMergePolicy during batch indexing, then change to something
> > > better once finished with batch)? It looks like the MergePolicy is set
> > > through IndexWriterConfig but I don't see a way to update an IWC on an
> > > IW.
> > >
> > > Thanks,
> > >
> > > Jon
> > >
> > >
> > > On Thu, Aug 7, 2014 at 7:37 AM, Shai Erera <se...@gmail.com> wrote:
> > > > Using NoMergePolicy for online indexes is usually not recommended.
> > > > You
> > > want
> > > > to use NoMP in case where you build an index in a batch job, then in
> > > > the end before the index is "published" you run a forceMerge or
> > > > maybeMerge (with a real MergePolicy).
> > > >
> > > > For online indexes, i.e. indexes that are being searched while they
> > > > are updated, if you use NoMP you will accumulate many segments in the
> > index.
> > > > This means higher resources consumption overall: file handles, RAM,
> > > > potentially disk space, and usually results in slower searches.
> > > >
> > > > You may want to tweak the default MP's settings though, to not kick
> > > > off a merge unless there are a large number of segments in the
> > > > index. E.g. the default MP merges segments when there are 10 at the
> > same level (i.e.
> > > > roughly the same size). You can increase that.
> > > >
> > > > Also, do you use NRTCachingDirectory? It's usually recommended for
> > > > NRT, even with default MP, since the tiny segments are merged
> > > > in-memory, and your NRT reopens don't result in flushing new segments
> > to disk.
> > > >
> > > > Shai
> > > >
> > > >
> > > > On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sa...@gmx.net>
> > wrote:
> > > >
> > > >> hi,
> > > >>
> > > >> i try to speed up our indexing process. we use SeacherManager with
> > > >> applydeletes to get near real time Reader.
> > > >>
> > > >> we have not really "much" incoming documents, but the documents
> > > >> must be updated from time to time and the amount of documents to be
> > > >> updated
> > > could
> > > >> be quite large.
> > > >>
> > > >> i tried some tests with NoMergePolicy and the indexing process was
> > > >> 25 % faster.
> > > >>
> > > >> so i think of a change in our code, to use NoMergePolicy for a
> > > >> specific time interval, when users are active and do a
> > > >> forceMerge(20) every
> > > night,
> > > >> which last about 2 - 5 minutes.
> > > >>
> > > >> is this a good idea? or will i perhaps get into trouble?
> > > >>
> > > >> Sascha
> > > >>
> > > >>
> > > >> -------------------------------------------------------------------
> > > >> -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >>
> > > >>
> > >
> > >
> > >
> > > --
> > > Jon Stewart, Principal
> > > (646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: improve indexing speed with nomergepolicy

Posted by Uwe Schindler <uw...@thetaphi.de>.
This is a good idea, because sometimes it's nice to change the MergePolicy on the fly without reopening! One example is https://issues.apache.org/jira/browse/LUCENE-5526
In my case, I would like to open an IndexWriter, set its merge policy to IndexUpdaterMergePolicy, force a merge to upgrade all segments and then proceed with normal indexing and other stuff. Currently you have to close IW - this is bad in multithreaded environments: If you start an Index Upgrade after installing a new version of your favourite Solr/ES/... server, but need to index documents in parallel (real time system) - so with little downtime.
The proposal in the above issue is to allow to pass a MergePolicy to forceMerge().

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: Thursday, August 07, 2014 4:11 PM
> To: java-user@lucene.apache.org
> Subject: Re: improve indexing speed with nomergepolicy
> 
> Yes, currently an MP isn't a "live" setting on IndexWriter, meaning you pass it
> at construction time and don't change it afterwards. I wonder if after
> LUCENE-5711 we can move MergePolicy to LiveIndexWriterConfig and fix
> IndexWriter to not hold on to it, but rather pull it from the config.
> 
> Not sure what others think about it.
> 
> Shai
> 
> 
> On Thu, Aug 7, 2014 at 5:05 PM, Jon Stewart
> <jo...@lightboxtechnologies.com>
> wrote:
> 
> > Related, how does one change the MergePolicy on an IndexWriter (e.g.,
> > use NoMergePolicy during batch indexing, then change to something
> > better once finished with batch)? It looks like the MergePolicy is set
> > through IndexWriterConfig but I don't see a way to update an IWC on an
> > IW.
> >
> > Thanks,
> >
> > Jon
> >
> >
> > On Thu, Aug 7, 2014 at 7:37 AM, Shai Erera <se...@gmail.com> wrote:
> > > Using NoMergePolicy for online indexes is usually not recommended.
> > > You
> > want
> > > to use NoMP in case where you build an index in a batch job, then in
> > > the end before the index is "published" you run a forceMerge or
> > > maybeMerge (with a real MergePolicy).
> > >
> > > For online indexes, i.e. indexes that are being searched while they
> > > are updated, if you use NoMP you will accumulate many segments in the
> index.
> > > This means higher resources consumption overall: file handles, RAM,
> > > potentially disk space, and usually results in slower searches.
> > >
> > > You may want to tweak the default MP's settings though, to not kick
> > > off a merge unless there are a large number of segments in the
> > > index. E.g. the default MP merges segments when there are 10 at the
> same level (i.e.
> > > roughly the same size). You can increase that.
> > >
> > > Also, do you use NRTCachingDirectory? It's usually recommended for
> > > NRT, even with default MP, since the tiny segments are merged
> > > in-memory, and your NRT reopens don't result in flushing new segments
> to disk.
> > >
> > > Shai
> > >
> > >
> > > On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sa...@gmx.net>
> wrote:
> > >
> > >> hi,
> > >>
> > >> i try to speed up our indexing process. we use SeacherManager with
> > >> applydeletes to get near real time Reader.
> > >>
> > >> we have not really "much" incoming documents, but the documents
> > >> must be updated from time to time and the amount of documents to be
> > >> updated
> > could
> > >> be quite large.
> > >>
> > >> i tried some tests with NoMergePolicy and the indexing process was
> > >> 25 % faster.
> > >>
> > >> so i think of a change in our code, to use NoMergePolicy for a
> > >> specific time interval, when users are active and do a
> > >> forceMerge(20) every
> > night,
> > >> which last about 2 - 5 minutes.
> > >>
> > >> is this a good idea? or will i perhaps get into trouble?
> > >>
> > >> Sascha
> > >>
> > >>
> > >> -------------------------------------------------------------------
> > >> -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> >
> >
> >
> > --
> > Jon Stewart, Principal
> > (646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: improve indexing speed with nomergepolicy

Posted by Shai Erera <se...@gmail.com>.
Yes, currently an MP isn't a "live" setting on IndexWriter, meaning you
pass it at construction time and don't change it afterwards. I wonder if
after LUCENE-5711 we can move MergePolicy to LiveIndexWriterConfig and fix
IndexWriter to not hold on to it, but rather pull it from the config.

Not sure what others think about it.

Shai


On Thu, Aug 7, 2014 at 5:05 PM, Jon Stewart <jo...@lightboxtechnologies.com>
wrote:

> Related, how does one change the MergePolicy on an IndexWriter (e.g.,
> use NoMergePolicy during batch indexing, then change to something
> better once finished with batch)? It looks like the MergePolicy is set
> through IndexWriterConfig but I don't see a way to update an IWC on an
> IW.
>
> Thanks,
>
> Jon
>
>
> On Thu, Aug 7, 2014 at 7:37 AM, Shai Erera <se...@gmail.com> wrote:
> > Using NoMergePolicy for online indexes is usually not recommended. You
> want
> > to use NoMP in case where you build an index in a batch job, then in the
> > end before the index is "published" you run a forceMerge or maybeMerge
> > (with a real MergePolicy).
> >
> > For online indexes, i.e. indexes that are being searched while they are
> > updated, if you use NoMP you will accumulate many segments in the index.
> > This means higher resources consumption overall: file handles, RAM,
> > potentially disk space, and usually results in slower searches.
> >
> > You may want to tweak the default MP's settings though, to not kick off a
> > merge unless there are a large number of segments in the index. E.g. the
> > default MP merges segments when there are 10 at the same level (i.e.
> > roughly the same size). You can increase that.
> >
> > Also, do you use NRTCachingDirectory? It's usually recommended for NRT,
> > even with default MP, since the tiny segments are merged in-memory, and
> > your NRT reopens don't result in flushing new segments to disk.
> >
> > Shai
> >
> >
> > On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sa...@gmx.net> wrote:
> >
> >> hi,
> >>
> >> i try to speed up our indexing process. we use SeacherManager with
> >> applydeletes to get near real time Reader.
> >>
> >> we have not really "much" incoming documents, but the documents must be
> >> updated from time to time and the amount of documents to be updated
> could
> >> be quite large.
> >>
> >> i tried some tests with NoMergePolicy and the indexing process was 25 %
> >> faster.
> >>
> >> so i think of a change in our code, to use NoMergePolicy for a specific
> >> time interval, when users are active and do a forceMerge(20) every
> night,
> >> which last about 2 - 5 minutes.
> >>
> >> is this a good idea? or will i perhaps get into trouble?
> >>
> >> Sascha
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
>
> --
> Jon Stewart, Principal
> (646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Aw: Re: improve indexing speed with nomergepolicy

Posted by Sascha Janz <Sa...@gmx.net>.
 it only could be set when opening IndexWriter

IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_46, new StandardAnalyzer(Version.LUCENE_46));
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
iwc.setRAMBufferSizeMB(250);
iwc.setMergePolicy(NoMergePolicy.INSTANCE);
Directory fsDir = FSDirectory.open(new File(sourcedir));
NRTCachingDirectory cachedFSDir = new NRTCachingDirectory(fsDir, 5.0, 60.0);
IndexWriter writer = new IndexWriter(cachedFSDir,iwc);

so to change policy IndexWriter must be closed and opened again.

Sascha
 

Gesendet: Donnerstag, 07. August 2014 um 16:05 Uhr
Von: "Jon Stewart" <jo...@lightboxtechnologies.com>
An: java-user@lucene.apache.org
Betreff: Re: improve indexing speed with nomergepolicy
Related, how does one change the MergePolicy on an IndexWriter (e.g.,
use NoMergePolicy during batch indexing, then change to something
better once finished with batch)? It looks like the MergePolicy is set
through IndexWriterConfig but I don't see a way to update an IWC on an
IW.

Thanks,

Jon


On Thu, Aug 7, 2014 at 7:37 AM, Shai Erera <se...@gmail.com> wrote:
> Using NoMergePolicy for online indexes is usually not recommended. You want
> to use NoMP in case where you build an index in a batch job, then in the
> end before the index is "published" you run a forceMerge or maybeMerge
> (with a real MergePolicy).
>
> For online indexes, i.e. indexes that are being searched while they are
> updated, if you use NoMP you will accumulate many segments in the index.
> This means higher resources consumption overall: file handles, RAM,
> potentially disk space, and usually results in slower searches.
>
> You may want to tweak the default MP's settings though, to not kick off a
> merge unless there are a large number of segments in the index. E.g. the
> default MP merges segments when there are 10 at the same level (i.e.
> roughly the same size). You can increase that.
>
> Also, do you use NRTCachingDirectory? It's usually recommended for NRT,
> even with default MP, since the tiny segments are merged in-memory, and
> your NRT reopens don't result in flushing new segments to disk.
>
> Shai
>
>
> On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sa...@gmx.net> wrote:
>
>> hi,
>>
>> i try to speed up our indexing process. we use SeacherManager with
>> applydeletes to get near real time Reader.
>>
>> we have not really "much" incoming documents, but the documents must be
>> updated from time to time and the amount of documents to be updated could
>> be quite large.
>>
>> i tried some tests with NoMergePolicy and the indexing process was 25 %
>> faster.
>>
>> so i think of a change in our code, to use NoMergePolicy for a specific
>> time interval, when users are active and do a forceMerge(20) every night,
>> which last about 2 - 5 minutes.
>>
>> is this a good idea? or will i perhaps get into trouble?
>>
>> Sascha
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>



--
Jon Stewart, Principal
(646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: improve indexing speed with nomergepolicy

Posted by Jon Stewart <jo...@lightboxtechnologies.com>.
Related, how does one change the MergePolicy on an IndexWriter (e.g.,
use NoMergePolicy during batch indexing, then change to something
better once finished with batch)? It looks like the MergePolicy is set
through IndexWriterConfig but I don't see a way to update an IWC on an
IW.

Thanks,

Jon


On Thu, Aug 7, 2014 at 7:37 AM, Shai Erera <se...@gmail.com> wrote:
> Using NoMergePolicy for online indexes is usually not recommended. You want
> to use NoMP in case where you build an index in a batch job, then in the
> end before the index is "published" you run a forceMerge or maybeMerge
> (with a real MergePolicy).
>
> For online indexes, i.e. indexes that are being searched while they are
> updated, if you use NoMP you will accumulate many segments in the index.
> This means higher resources consumption overall: file handles, RAM,
> potentially disk space, and usually results in slower searches.
>
> You may want to tweak the default MP's settings though, to not kick off a
> merge unless there are a large number of segments in the index. E.g. the
> default MP merges segments when there are 10 at the same level (i.e.
> roughly the same size). You can increase that.
>
> Also, do you use NRTCachingDirectory? It's usually recommended for NRT,
> even with default MP, since the tiny segments are merged in-memory, and
> your NRT reopens don't result in flushing new segments to disk.
>
> Shai
>
>
> On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sa...@gmx.net> wrote:
>
>> hi,
>>
>> i try to speed up our indexing process. we use SeacherManager with
>> applydeletes to get near real time Reader.
>>
>> we have not really "much" incoming documents, but the documents must be
>> updated from time to time and the amount of documents to be updated could
>> be quite large.
>>
>> i tried some tests with NoMergePolicy and the indexing process was 25 %
>> faster.
>>
>> so i think of a change in our code, to use NoMergePolicy for a specific
>> time interval, when users are active and do a forceMerge(20) every night,
>> which last about 2 - 5 minutes.
>>
>> is this a good idea? or will i perhaps get into trouble?
>>
>> Sascha
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>



-- 
Jon Stewart, Principal
(646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: improve indexing speed with nomergepolicy

Posted by Shai Erera <se...@gmail.com>.
Using NoMergePolicy for online indexes is usually not recommended. You want
to use NoMP in case where you build an index in a batch job, then in the
end before the index is "published" you run a forceMerge or maybeMerge
(with a real MergePolicy).

For online indexes, i.e. indexes that are being searched while they are
updated, if you use NoMP you will accumulate many segments in the index.
This means higher resources consumption overall: file handles, RAM,
potentially disk space, and usually results in slower searches.

You may want to tweak the default MP's settings though, to not kick off a
merge unless there are a large number of segments in the index. E.g. the
default MP merges segments when there are 10 at the same level (i.e.
roughly the same size). You can increase that.

Also, do you use NRTCachingDirectory? It's usually recommended for NRT,
even with default MP, since the tiny segments are merged in-memory, and
your NRT reopens don't result in flushing new segments to disk.

Shai


On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sa...@gmx.net> wrote:

> hi,
>
> i try to speed up our indexing process. we use SeacherManager with
> applydeletes to get near real time Reader.
>
> we have not really "much" incoming documents, but the documents must be
> updated from time to time and the amount of documents to be updated could
> be quite large.
>
> i tried some tests with NoMergePolicy and the indexing process was 25 %
> faster.
>
> so i think of a change in our code, to use NoMergePolicy for a specific
> time interval, when users are active and do a forceMerge(20) every night,
> which last about 2 - 5 minutes.
>
> is this a good idea? or will i perhaps get into trouble?
>
> Sascha
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>