You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Naveen Kumar <id...@gmail.com> on 2010/09/30 13:29:33 UTC

Merge policy, optimization for small frequently changing indexes.

Hi
I have a Very large number (say 3 million) of frequently changing Small
indexes. 90% of these indexes contain about 50 documents, while a few 2-3%
indexes have about 100,000 documents each (these being the more frequently
used indexes).
Each index belongs to a signed in user, thus can have unpredictable index
changes, in short periods of time, or can be stagnant for a long time.
What kind of indexing policy (mergefactor, maxmergedocs) would be optimal
for this kind of index. Is optimizing for this kind of index needed? or
wise?, if yes, what would be a good way to optimize (please note number of
indexes are very large).

Any suggestions would be very helpful.

-- 
Thanks
Naveen Kumar

Re: Merge policy, optimization for small frequently changing indexes.

Posted by Ian Lea <ia...@gmail.com>.
Have you considered having fewer indexes, each storing data for
multiple users?  Obviously with some indexed field that you can use
for restricting searches to data for that user.

I believe that is more common practice for this sort of scenario and
is known to work well.  You seem to be adding possibly unnecessary
complexity and overhead for all the tiny indexes.


--
Ian.


On Tue, Oct 5, 2010 at 11:18 AM, Naveen Kumar <id...@gmail.com> wrote:
> Thank you, Ian
> I have a large number of dynamically changing Index, so calling
> expungeDeletes() and optimize() is very costly.
> At this point I am opting to just set a optimum merge factor and skip
> optimize()
>
> On Tue, Oct 5, 2010 at 2:54 PM, Ian Lea <ia...@gmail.com> wrote:
>
>> Deleted docs will be removed by lucene at some point - there is no
>> need to run optimize.
>> Read the javadocs for IndexWriter for details.  See also
>> expungeDeletes().  That may be just what you need.
>>
>>
>> --
>> Ian.
>>
>>
>> On Tue, Oct 5, 2010 at 7:48 AM, Naveen Kumar <id...@gmail.com> wrote:
>> > Hi
>> > I have one more question, does Lucene purge the deleted documents before
>> > merging the segments, or purging of deleted  documents done only when
>> > optimized?
>> >
>> > On Thu, Sep 30, 2010 at 4:59 PM, Naveen Kumar <id...@gmail.com> wrote:
>> >
>> >> Hi
>> >> I have a Very large number (say 3 million) of frequently changing Small
>> >> indexes. 90% of these indexes contain about 50 documents, while a few
>> 2-3%
>> >> indexes have about 100,000 documents each (these being the more
>> frequently
>> >> used indexes).
>> >> Each index belongs to a signed in user, thus can have unpredictable
>> index
>> >> changes, in short periods of time, or can be stagnant for a long time.
>> >> What kind of indexing policy (mergefactor, maxmergedocs) would be
>> optimal
>> >> for this kind of index. Is optimizing for this kind of index needed? or
>> >> wise?, if yes, what would be a good way to optimize (please note number
>> of
>> >> indexes are very large).
>> >>
>> >> Any suggestions would be very helpful.
>> >>
>> >> --
>> >> Thanks
>> >> Naveen Kumar
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > Thanks
>> > Naveen Kumar
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> Naveen Kumar
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Merge policy, optimization for small frequently changing indexes.

Posted by Naveen Kumar <id...@gmail.com>.
Thank you, Ian
I have a large number of dynamically changing Index, so calling
expungeDeletes() and optimize() is very costly.
At this point I am opting to just set a optimum merge factor and skip
optimize()

On Tue, Oct 5, 2010 at 2:54 PM, Ian Lea <ia...@gmail.com> wrote:

> Deleted docs will be removed by lucene at some point - there is no
> need to run optimize.
> Read the javadocs for IndexWriter for details.  See also
> expungeDeletes().  That may be just what you need.
>
>
> --
> Ian.
>
>
> On Tue, Oct 5, 2010 at 7:48 AM, Naveen Kumar <id...@gmail.com> wrote:
> > Hi
> > I have one more question, does Lucene purge the deleted documents before
> > merging the segments, or purging of deleted  documents done only when
> > optimized?
> >
> > On Thu, Sep 30, 2010 at 4:59 PM, Naveen Kumar <id...@gmail.com> wrote:
> >
> >> Hi
> >> I have a Very large number (say 3 million) of frequently changing Small
> >> indexes. 90% of these indexes contain about 50 documents, while a few
> 2-3%
> >> indexes have about 100,000 documents each (these being the more
> frequently
> >> used indexes).
> >> Each index belongs to a signed in user, thus can have unpredictable
> index
> >> changes, in short periods of time, or can be stagnant for a long time.
> >> What kind of indexing policy (mergefactor, maxmergedocs) would be
> optimal
> >> for this kind of index. Is optimizing for this kind of index needed? or
> >> wise?, if yes, what would be a good way to optimize (please note number
> of
> >> indexes are very large).
> >>
> >> Any suggestions would be very helpful.
> >>
> >> --
> >> Thanks
> >> Naveen Kumar
> >>
> >>
> >>
> >>
> >
> >
> > --
> > Thanks
> > Naveen Kumar
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


Naveen Kumar

Re: Merge policy, optimization for small frequently changing indexes.

Posted by Ian Lea <ia...@gmail.com>.
Deleted docs will be removed by lucene at some point - there is no
need to run optimize.
Read the javadocs for IndexWriter for details.  See also
expungeDeletes().  That may be just what you need.


--
Ian.


On Tue, Oct 5, 2010 at 7:48 AM, Naveen Kumar <id...@gmail.com> wrote:
> Hi
> I have one more question, does Lucene purge the deleted documents before
> merging the segments, or purging of deleted  documents done only when
> optimized?
>
> On Thu, Sep 30, 2010 at 4:59 PM, Naveen Kumar <id...@gmail.com> wrote:
>
>> Hi
>> I have a Very large number (say 3 million) of frequently changing Small
>> indexes. 90% of these indexes contain about 50 documents, while a few 2-3%
>> indexes have about 100,000 documents each (these being the more frequently
>> used indexes).
>> Each index belongs to a signed in user, thus can have unpredictable index
>> changes, in short periods of time, or can be stagnant for a long time.
>> What kind of indexing policy (mergefactor, maxmergedocs) would be optimal
>> for this kind of index. Is optimizing for this kind of index needed? or
>> wise?, if yes, what would be a good way to optimize (please note number of
>> indexes are very large).
>>
>> Any suggestions would be very helpful.
>>
>> --
>> Thanks
>> Naveen Kumar
>>
>>
>>
>>
>
>
> --
> Thanks
> Naveen Kumar
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Merge policy, optimization for small frequently changing indexes.

Posted by Naveen Kumar <id...@gmail.com>.
Hi
I have one more question, does Lucene purge the deleted documents before
merging the segments, or purging of deleted  documents done only when
optimized?

On Thu, Sep 30, 2010 at 4:59 PM, Naveen Kumar <id...@gmail.com> wrote:

> Hi
> I have a Very large number (say 3 million) of frequently changing Small
> indexes. 90% of these indexes contain about 50 documents, while a few 2-3%
> indexes have about 100,000 documents each (these being the more frequently
> used indexes).
> Each index belongs to a signed in user, thus can have unpredictable index
> changes, in short periods of time, or can be stagnant for a long time.
> What kind of indexing policy (mergefactor, maxmergedocs) would be optimal
> for this kind of index. Is optimizing for this kind of index needed? or
> wise?, if yes, what would be a good way to optimize (please note number of
> indexes are very large).
>
> Any suggestions would be very helpful.
>
> --
> Thanks
> Naveen Kumar
>
>
>
>


-- 
Thanks
Naveen Kumar