You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mklprasad <mk...@gmail.com> on 2010/02/12 11:31:19 UTC

optimize is taking too much time

hi 
in my solr u have 1,42,45,223 records having some 50GB .
Now when iam loading a new record and when its trying optimize the docs its
taking 2 much memory and time 


can any body please tell do we have any property in solr to get rid of this.

Thanks in advance

-- 
View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimize is taking too much time

Posted by "David Smiley @MITRE.org" <DS...@mitre.org>.
Your response contradicts the wiki's description of mergeFactor:
http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor
-- which clearly states that the indexes are merged into a single segment. 
It makes no reference to "optimize" to trigger this condition.  If what you
say is true, and we agree that the mergeFactor is the upper bound of the
number of segments, then what is the lower bound of the number of segments
seen for an index that has not been optimized?  Always 2, or some function
of mergeFactor?

~ David Smiley


Jay Hill wrote:
> 
> With a mergeFactor set to anything > 1 you would never have only one
> segment
> - unless you optimized. So Lucene will never naturally merge all the
> segments into one. Unless, I suppose, the mergeFactor was set to 1, but
> I've
> never tested that. It's hard to picture how that would work.
> 
> If I understand correctly, the same actions occur (deleted documents are
> removed, etc.) because an optimize is only a multiway merge down to one
> segment, whereas normal merging is triggered by the mergeFactor, but does
> not have a "target" segment count to merge down to.
> 
> -Jay
> ...
> 

-- 
View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27693177.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimize is taking too much time

Posted by Jay Hill <ja...@gmail.com>.
With a mergeFactor set to anything > 1 you would never have only one segment
- unless you optimized. So Lucene will never naturally merge all the
segments into one. Unless, I suppose, the mergeFactor was set to 1, but I've
never tested that. It's hard to picture how that would work.

If I understand correctly, the same actions occur (deleted documents are
removed, etc.) because an optimize is only a multiway merge down to one
segment, whereas normal merging is triggered by the mergeFactor, but does
not have a "target" segment count to merge down to.

-Jay

On Sun, Feb 21, 2010 at 11:20 AM, David Smiley @MITRE.org <DSMILEY@mitre.org
> wrote:

>
> I've always thought that these two events were effectively equivalent.  --
> the results of an optimize vs the results of Lucene _naturally_ merging all
> segments together into one.  If they don't have the safe effect then what
> is
> the difference?
>
> ~ David Smiley
>
>
> Otis Gospodnetic wrote:
> >
> > Hello,
> >
> > Solr will never optimize the whole index without somebody explicitly
> > asking for it.
> > Lucene will merge index segments on the master as documents are indexed.
> > How often it does that depends on mergeFactor.
> >
> > See:
> >
> http://search-lucene.com/?q=mergeFactor+segment+merge&fc_project=Lucene&fc_project=Solr&fc_type=mail+_hash_+user
> >
> >
> > Otis ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Hadoop ecosystem search :: http://search-hadoop.com/
> >
> >
> >
> > ----- Original Message ----
> >> From: mklprasad <mk...@gmail.com>
> >> To: solr-user@lucene.apache.org
> >> Sent: Fri, February 19, 2010 1:02:11 AM
> >> Subject: Re: optimize is taking too much time
> >>
> >>
> >>
> >>
> >> Jagdish Vasani-2 wrote:
> >> >
> >> > Hi,
> >> >
> >> > you should not optimize index after each insert of document.insted you
> >> > should optimize it after inserting some good no of documents.
> >> > because in optimize it will merge  all segments to one according to
> >> > setting
> >> > of lucene index.
> >> >
> >> > thanks,
> >> > Jagdish
> >> > On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote:
> >> >
> >> >>
> >> >> hi
> >> >> in my solr u have 1,42,45,223 records having some 50GB .
> >> >> Now when iam loading a new record and when its trying optimize the
> >> docs
> >> >> its
> >> >> taking 2 much memory and time
> >> >>
> >> >>
> >> >> can any body please tell do we have any property in solr to get rid
> of
> >> >> this.
> >> >>
> >> >> Thanks in advance
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> Yes,
> >> Thanks for reply
> >> i have removed the optmize() from  code. but i have a doubt ..
> >> 1.Will  mergefactor internally do any optmization (or) we have to
> specify
> >>
> >> 2. Even if solr initaiates optmize if i have a large data like 52GB will
> >> that takes huge time?
> >>
> >> Thanks,
> >> Prasad
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27676881.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: optimize is taking too much time

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Feb 22, 2010 at 6:39 PM, Jay Hill <ja...@gmail.com> wrote:
> It's just that, in a
> running system, it's probably very rare that there is only a single segment
> for any meaningful length of time.

Right - but the performance impact of a huge merge can be non-trivial.
 People wishing to avoid the biggest of merges at unpredictable times
should look at parameters such as maxMergeDocs that prevent merging
segments over a certain size.

-Yonik
http://www.lucidimagination.com

Re: optimize is taking too much time

Posted by Jay Hill <ja...@gmail.com>.
Thanks for clearing that up guys, I misspoke slightly. It's just that, in a
running system, it's probably very rare that there is only a single segment
for any meaningful length of time. Unless that merge-down-to-one occurs
right when indexing stops there will almost always be a new (small) segment
following immediately after the merge. It would be interesting to observe,
over a long time, how often and for how long everything is merged down to a
single segment.

Probably with a very low mergeFactor (2 or 3?) merges-to-one might occur
often enough to make optimizing unnecessary. But I'm guessing that the
merge-to-one happens so infrequently in most situations that optimizing is
more important.

-Jay


On Mon, Feb 22, 2010 at 12:16 PM, Mark Miller <ma...@gmail.com> wrote:

> Also, a mergefactor of 1 is actually invalid - 2 is the lowest you can go.
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>

Re: optimize is taking too much time

Posted by Mark Miller <ma...@gmail.com>.
Also, a mergefactor of 1 is actually invalid - 2 is the lowest you can go.



-- 
- Mark

http://www.lucidimagination.com




Re: optimize is taking too much time

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sun, Feb 21, 2010 at 2:20 PM, David Smiley @MITRE.org
<DS...@mitre.org> wrote:
> I've always thought that these two events were effectively equivalent.  --
> the results of an optimize vs the results of Lucene _naturally_ merging all
> segments together into one.

Correct.  Occasionally one hit's a "major" merge and as few as 1
segment is produced.

Think about the odometer in your car (if you have one that spins).
Each digit is the number of segments of that size... so the total
number of segments in the index is the total of the digits.  You
naturally get back to a single segment whenever it rolls over to the
next highest power of 10 (mergeFactor=10).

-Yonik
http://www.lucidimagination.com

Re: optimize is taking too much time

Posted by "David Smiley @MITRE.org" <DS...@mitre.org>.
I've always thought that these two events were effectively equivalent.  --
the results of an optimize vs the results of Lucene _naturally_ merging all
segments together into one.  If they don't have the safe effect then what is
the difference?

~ David Smiley


Otis Gospodnetic wrote:
> 
> Hello,
> 
> Solr will never optimize the whole index without somebody explicitly
> asking for it.
> Lucene will merge index segments on the master as documents are indexed. 
> How often it does that depends on mergeFactor.
> 
> See:
> http://search-lucene.com/?q=mergeFactor+segment+merge&fc_project=Lucene&fc_project=Solr&fc_type=mail+_hash_+user
> 
> 
> Otis ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
> 
> ----- Original Message ----
>> From: mklprasad <mk...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Fri, February 19, 2010 1:02:11 AM
>> Subject: Re: optimize is taking too much time
>> 
>> 
>> 
>> 
>> Jagdish Vasani-2 wrote:
>> > 
>> > Hi,
>> > 
>> > you should not optimize index after each insert of document.insted you
>> > should optimize it after inserting some good no of documents.
>> > because in optimize it will merge  all segments to one according to
>> > setting
>> > of lucene index.
>> > 
>> > thanks,
>> > Jagdish
>> > On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote:
>> > 
>> >>
>> >> hi
>> >> in my solr u have 1,42,45,223 records having some 50GB .
>> >> Now when iam loading a new record and when its trying optimize the
>> docs
>> >> its
>> >> taking 2 much memory and time
>> >>
>> >>
>> >> can any body please tell do we have any property in solr to get rid of
>> >> this.
>> >>
>> >> Thanks in advance
>> >>
>> >> --
>> >> View this message in context:
>> >> 
>> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> > 
>> > 
>> 
>> Yes,
>> Thanks for reply 
>> i have removed the optmize() from  code. but i have a doubt ..
>> 1.Will  mergefactor internally do any optmization (or) we have to specify
>> 
>> 2. Even if solr initaiates optmize if i have a large data like 52GB will
>> that takes huge time?
>> 
>> Thanks,
>> Prasad
>> 
>> 
>> 
>> -- 
>> View this message in context: 
>> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27676881.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimize is taking too much time

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

Solr will never optimize the whole index without somebody explicitly asking for it.
Lucene will merge index segments on the master as documents are indexed.  How often it does that depends on mergeFactor.

See:
http://search-lucene.com/?q=mergeFactor+segment+merge&fc_project=Lucene&fc_project=Solr&fc_type=mail+_hash_+user


Otis ----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: mklprasad <mk...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Fri, February 19, 2010 1:02:11 AM
> Subject: Re: optimize is taking too much time
> 
> 
> 
> 
> Jagdish Vasani-2 wrote:
> > 
> > Hi,
> > 
> > you should not optimize index after each insert of document.insted you
> > should optimize it after inserting some good no of documents.
> > because in optimize it will merge  all segments to one according to
> > setting
> > of lucene index.
> > 
> > thanks,
> > Jagdish
> > On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote:
> > 
> >>
> >> hi
> >> in my solr u have 1,42,45,223 records having some 50GB .
> >> Now when iam loading a new record and when its trying optimize the docs
> >> its
> >> taking 2 much memory and time
> >>
> >>
> >> can any body please tell do we have any property in solr to get rid of
> >> this.
> >>
> >> Thanks in advance
> >>
> >> --
> >> View this message in context:
> >> 
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> > 
> > 
> 
> Yes,
> Thanks for reply 
> i have removed the optmize() from  code. but i have a doubt ..
> 1.Will  mergefactor internally do any optmization (or) we have to specify
> 
> 2. Even if solr initaiates optmize if i have a large data like 52GB will
> that takes huge time?
> 
> Thanks,
> Prasad
> 
> 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimize is taking too much time

Posted by mklprasad <mk...@gmail.com>.


Jagdish Vasani-2 wrote:
> 
> Hi,
> 
> you should not optimize index after each insert of document.insted you
> should optimize it after inserting some good no of documents.
> because in optimize it will merge  all segments to one according to
> setting
> of lucene index.
> 
> thanks,
> Jagdish
> On Fri, Feb 12, 2010 at 4:01 PM, mklprasad <mk...@gmail.com> wrote:
> 
>>
>> hi
>> in my solr u have 1,42,45,223 records having some 50GB .
>> Now when iam loading a new record and when its trying optimize the docs
>> its
>> taking 2 much memory and time
>>
>>
>> can any body please tell do we have any property in solr to get rid of
>> this.
>>
>> Thanks in advance
>>
>> --
>> View this message in context:
>> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

Yes,
Thanks for reply 
i have removed the optmize() from  code. but i have a doubt ..
1.Will  mergefactor internally do any optmization (or) we have to specify
<autoOptimze >
2. Even if solr initaiates optmize if i have a large data like 52GB will
that takes huge time?

Thanks,
Prasad



-- 
View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimize is taking too much time

Posted by NarasimhaRaju <ra...@yahoo.com>.
Hi, 
You can also make use of autocommit feature of solr.
You have two possibilities either based on max number of uncommited docs or based on time.
see <updateHandler> of your solrconfig.xml.

Example:-

<autoCommit>
   <!--  
   <maxDocs>10000</maxDocs>
   -->
   
   <!-- maximum time (in MS) after adding a doc before an autocommit is triggered --> 
   <maxTime>600000</maxTime> 
  </autoCommit>


once your done with adding run final optimize/commit.

Regards, 
P.N.Raju, 




________________________________
From: Jagdish Vasani <jv...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Thu, February 18, 2010 3:12:15 PM
Subject: Re: optimize is taking too much time

Hi,

you should not optimize index after each insert of document.insted you
should optimize it after inserting some good no of documents.
because in optimize it will merge  all segments to one according to setting
of lucene index.

thanks,
Jagdish
On Fri, Feb 12, 2010 at 4:01 PM, mklprasad <mk...@gmail.com> wrote:

>
> hi
> in my solr u have 1,42,45,223 records having some 50GB .
> Now when iam loading a new record and when its trying optimize the docs its
> taking 2 much memory and time
>
>
> can any body please tell do we have any property in solr to get rid of
> this.
>
> Thanks in advance
>
> --
> View this message in context:
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



      

Re: optimize is taking too much time

Posted by Jagdish Vasani <jv...@gmail.com>.
Hi,

you should not optimize index after each insert of document.insted you
should optimize it after inserting some good no of documents.
because in optimize it will merge  all segments to one according to setting
of lucene index.

thanks,
Jagdish
On Fri, Feb 12, 2010 at 4:01 PM, mklprasad <mk...@gmail.com> wrote:

>
> hi
> in my solr u have 1,42,45,223 records having some 50GB .
> Now when iam loading a new record and when its trying optimize the docs its
> taking 2 much memory and time
>
>
> can any body please tell do we have any property in solr to get rid of
> this.
>
> Thanks in advance
>
> --
> View this message in context:
> http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: optimize is taking too much time

Posted by mklprasad <mk...@gmail.com>.


hossman wrote:
> 
> 
> : in my solr u have 1,42,45,223 records having some 50GB .
> : Now when iam loading a new record and when its trying optimize the docs
> its
> : taking 2 much memory and time 
> 
> : can any body please tell do we have any property in solr to get rid of
> this.
> 
> Solr isn't going to optimize the index unless you tell it to -- how are 
> you indexing your docs? are you sure you don't have something programmed 
> to send an optimize command?
> 
> 
> -Hoss
> 
>  yes ,
> From My Code 
> For Every Load iam calling the server.optimize() method
> ( Now iam planning to remove this from the code)
> in the config level i have 'mergerFactor=10'
> i have a doubt like will the mergerFactor will only do a merge  or will it
> also performs the optimization 
> if not do i need to call <autooptimize from my solrConfig 
> in that case for my 50Gb will it takes less time .
> 
> 
> Please clearify me
> Thanks in advance
> 
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27634994.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimize is taking too much time

Posted by Chris Hostetter <ho...@fucit.org>.
: in my solr u have 1,42,45,223 records having some 50GB .
: Now when iam loading a new record and when its trying optimize the docs its
: taking 2 much memory and time 

: can any body please tell do we have any property in solr to get rid of this.

Solr isn't going to optimize the index unless you tell it to -- how are 
you indexing your docs? are you sure you don't have something programmed 
to send an optimize command?


-Hoss