You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kalika Mishra <ka...@germinait.com> on 2011/11/11 13:50:42 UTC

Using solr during optimization

Hi,

I would like to optimize solr core which is in Reader Writer mode. Since
the Solr cores are huge in size (above 100 GB) the optimization takes hours
to complete.

When the optimization is going on say. on the Writer core, the application
wants to continue using the indexes for both query and write purposes. What
is the best approach to do this.

I was thinking of using a temporary index (empty core) to write the
documents and use the same Reader to read the documents. (Please note that
temp index and the Reader cannot be made Reader Writer as Reader is already
setup for the Writer on which optimization is taking place) But there could
be some updates to the temp index which I would like to get reflected in
the Reader. Whats the best setup to support this.

Thanks,
Kalika

Re: Using solr during optimization

Posted by Isan Fulia <is...@germinait.com>.
Hi Mark,

Thanks for the reply.

You are right.We need to test first by decreasing the mergefactor and see
the indexing as well as searching performance and have some numbers in hand.
Also after partial optimize with the same mergefactor how long the
performance lasts(both searching and indexing)  by continuously adding more
documents.

Thanks,
Isan Fulia,.

On 14 November 2011 19:41, Mark Miller <ma...@gmail.com> wrote:

>
> On Nov 14, 2011, at 8:27 AM, Isan Fulia wrote:
>
> > Hi Mark,
> >
> > In the above case , what if  the index is optimized partly ie. by
> > specifying the max no of segments we want.
> > It has been observed that after optimizing(even partly optimization), the
> > indexing as well as searching had been faster than in case of an
> > unoptimized one.
>
> Yes, this remains true - searching against fewer segments is faster than
> searching against many segments. Unless you have a really high merge
> factor, this is just generally not a big deal IMO.
>
> It tends to be something like, a given query is say 10-30% slower. If you
> have good performance though, this should often be something like a 50ms
> query goes to 80 or 90ms. You really have to decide/test if there is a
> practical difference to your users.
>
> You should also pay attention to how long that perf improvement lasts
> while you are continuously adding more documents. Is it a super high cost
> for a short perf boost?
>
> > Decreasing the merge factor will affect  the performance as it will
> > increase the indexing time due to the frequent merges.
>
> True - it will essentially amortize the cost of reducing segments. Have
> you tested lower merge factors though? Does it really slow down indexing to
> the point where you find it unacceptable? I've been surprised in the past.
> Usually you can find a pretty nice balance.
>
> > So is it good that we optimize partly(let say once in a month), rather
> than
> > decreasing the merge factor and affect  the indexing speed.Also since we
> > will be sharding, that 100 GB index will be divided in different shards.
>
> Partial optimize is a good option, and optimize is an option. They both
> exist for a reason ;) Many people pay the price because they assume they
> have to though, when they really have no practical need.
>
> Generally, the best way to manage the number of segments in your index is
> through the merge policy IMO - not necessarily optimize calls.
>
> I'm pretty sure optimize also blocks adds in previous version of Solr as
> well - it grabs the commit lock. It won't do that in Solr 4, but that is
> another reason I wouldn't recommend it under normal circumstances.
>
> I look at optimize as a last option, or when creating a static index
> personally.
>
> >
> > Thanks,
> > Isan Fulia.
> >
> >
> >
> > On 14 November 2011 11:28, Kalika Mishra <kalika.mishra@germinait.com
> >wrote:
> >
> >> Hi Mark,
> >>
> >> Thanks for your reply.
> >>
> >> What you saying is interesting; so are you suggesting that optimizations
> >> should be done usually when there not many updates. Also can you please
> >> point out further under what conditions optimizations might be
> beneficial.
> >>
> >> Thanks.
> >>
> >> On 11 November 2011 20:30, Mark Miller <ma...@gmail.com> wrote:
> >>
> >>> I would not optimize - it's very expensive. With 11,000 updates a day,
> I
> >>> think it makes sense to completely avoid optimizing.
> >>>
> >>> That should be your default move in any case. If you notice performance
> >>> suffers more than is acceptable (good chance you won't), then I'd use a
> >>> lower merge factor. It defaults to 10 - lower numbers will lower the
> >> number
> >>> of segments in your index, and essentially amortize the cost of an
> >> optimize.
> >>>
> >>> Optimize is generally only useful when you will have a mostly static
> >> index.
> >>>
> >>> - Mark Miller
> >>> lucidimagination.com
> >>>
> >>>
> >>> On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
> >>>
> >>>> Hi Mark,
> >>>>
> >>>> We are performing almost 11,000 updates a day, we have around 50
> >> million
> >>>> docs in the index (i understand we will need to shard) the core seg
> >> will
> >>>> get fragmented over a period of time. We will need to do optimize
> every
> >>> few
> >>>> days or once in a month; do you have any reason not to optimize the
> >> core.
> >>>> Please let me know.
> >>>>
> >>>> Thanks.
> >>>>
> >>>> On 11 November 2011 18:51, Mark Miller <ma...@gmail.com> wrote:
> >>>>
> >>>>> Do a you have something forcing you to optimize, or are you just
> doing
> >>> it
> >>>>> for the heck of it?
> >>>>>
> >>>>> On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I would like to optimize solr core which is in Reader Writer mode.
> >>> Since
> >>>>>> the Solr cores are huge in size (above 100 GB) the optimization
> takes
> >>>>> hours
> >>>>>> to complete.
> >>>>>>
> >>>>>> When the optimization is going on say. on the Writer core, the
> >>>>> application
> >>>>>> wants to continue using the indexes for both query and write
> >> purposes.
> >>>>> What
> >>>>>> is the best approach to do this.
> >>>>>>
> >>>>>> I was thinking of using a temporary index (empty core) to write the
> >>>>>> documents and use the same Reader to read the documents. (Please
> note
> >>>>> that
> >>>>>> temp index and the Reader cannot be made Reader Writer as Reader is
> >>>>> already
> >>>>>> setup for the Writer on which optimization is taking place) But
> there
> >>>>> could
> >>>>>> be some updates to the temp index which I would like to get
> reflected
> >>> in
> >>>>>> the Reader. Whats the best setup to support this.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Kalika
> >>>>>
> >>>>> - Mark Miller
> >>>>> lucidimagination.com
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Thanks & Regards,
> >>>> Kalika
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> Kalika
> >>
> >
> >
> >
> > --
> > Thanks & Regards,
> > Isan Fulia.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>


-- 
Thanks & Regards,
Isan Fulia.

Re: Using solr during optimization

Posted by Mark Miller <ma...@gmail.com>.
On Nov 14, 2011, at 8:27 AM, Isan Fulia wrote:

> Hi Mark,
> 
> In the above case , what if  the index is optimized partly ie. by
> specifying the max no of segments we want.
> It has been observed that after optimizing(even partly optimization), the
> indexing as well as searching had been faster than in case of an
> unoptimized one.

Yes, this remains true - searching against fewer segments is faster than searching against many segments. Unless you have a really high merge factor, this is just generally not a big deal IMO.

It tends to be something like, a given query is say 10-30% slower. If you have good performance though, this should often be something like a 50ms query goes to 80 or 90ms. You really have to decide/test if there is a practical difference to your users.

You should also pay attention to how long that perf improvement lasts while you are continuously adding more documents. Is it a super high cost for a short perf boost?

> Decreasing the merge factor will affect  the performance as it will
> increase the indexing time due to the frequent merges.

True - it will essentially amortize the cost of reducing segments. Have you tested lower merge factors though? Does it really slow down indexing to the point where you find it unacceptable? I've been surprised in the past. Usually you can find a pretty nice balance.

> So is it good that we optimize partly(let say once in a month), rather than
> decreasing the merge factor and affect  the indexing speed.Also since we
> will be sharding, that 100 GB index will be divided in different shards.

Partial optimize is a good option, and optimize is an option. They both exist for a reason ;) Many people pay the price because they assume they have to though, when they really have no practical need.

Generally, the best way to manage the number of segments in your index is through the merge policy IMO - not necessarily optimize calls.

I'm pretty sure optimize also blocks adds in previous version of Solr as well - it grabs the commit lock. It won't do that in Solr 4, but that is another reason I wouldn't recommend it under normal circumstances.

I look at optimize as a last option, or when creating a static index personally.

> 
> Thanks,
> Isan Fulia.
> 
> 
> 
> On 14 November 2011 11:28, Kalika Mishra <ka...@germinait.com>wrote:
> 
>> Hi Mark,
>> 
>> Thanks for your reply.
>> 
>> What you saying is interesting; so are you suggesting that optimizations
>> should be done usually when there not many updates. Also can you please
>> point out further under what conditions optimizations might be beneficial.
>> 
>> Thanks.
>> 
>> On 11 November 2011 20:30, Mark Miller <ma...@gmail.com> wrote:
>> 
>>> I would not optimize - it's very expensive. With 11,000 updates a day, I
>>> think it makes sense to completely avoid optimizing.
>>> 
>>> That should be your default move in any case. If you notice performance
>>> suffers more than is acceptable (good chance you won't), then I'd use a
>>> lower merge factor. It defaults to 10 - lower numbers will lower the
>> number
>>> of segments in your index, and essentially amortize the cost of an
>> optimize.
>>> 
>>> Optimize is generally only useful when you will have a mostly static
>> index.
>>> 
>>> - Mark Miller
>>> lucidimagination.com
>>> 
>>> 
>>> On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
>>> 
>>>> Hi Mark,
>>>> 
>>>> We are performing almost 11,000 updates a day, we have around 50
>> million
>>>> docs in the index (i understand we will need to shard) the core seg
>> will
>>>> get fragmented over a period of time. We will need to do optimize every
>>> few
>>>> days or once in a month; do you have any reason not to optimize the
>> core.
>>>> Please let me know.
>>>> 
>>>> Thanks.
>>>> 
>>>> On 11 November 2011 18:51, Mark Miller <ma...@gmail.com> wrote:
>>>> 
>>>>> Do a you have something forcing you to optimize, or are you just doing
>>> it
>>>>> for the heck of it?
>>>>> 
>>>>> On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I would like to optimize solr core which is in Reader Writer mode.
>>> Since
>>>>>> the Solr cores are huge in size (above 100 GB) the optimization takes
>>>>> hours
>>>>>> to complete.
>>>>>> 
>>>>>> When the optimization is going on say. on the Writer core, the
>>>>> application
>>>>>> wants to continue using the indexes for both query and write
>> purposes.
>>>>> What
>>>>>> is the best approach to do this.
>>>>>> 
>>>>>> I was thinking of using a temporary index (empty core) to write the
>>>>>> documents and use the same Reader to read the documents. (Please note
>>>>> that
>>>>>> temp index and the Reader cannot be made Reader Writer as Reader is
>>>>> already
>>>>>> setup for the Writer on which optimization is taking place) But there
>>>>> could
>>>>>> be some updates to the temp index which I would like to get reflected
>>> in
>>>>>> the Reader. Whats the best setup to support this.
>>>>>> 
>>>>>> Thanks,
>>>>>> Kalika
>>>>> 
>>>>> - Mark Miller
>>>>> lucidimagination.com
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Thanks & Regards,
>>>> Kalika
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> --
>> Thanks & Regards,
>> Kalika
>> 
> 
> 
> 
> -- 
> Thanks & Regards,
> Isan Fulia.

- Mark Miller
lucidimagination.com












Re: Using solr during optimization

Posted by Isan Fulia <is...@germinait.com>.
Hi Mark,

In the above case , what if  the index is optimized partly ie. by
specifying the max no of segments we want.
It has been observed that after optimizing(even partly optimization), the
indexing as well as searching had been faster than in case of an
unoptimized one.
Decreasing the merge factor will affect  the performance as it will
increase the indexing time due to the frequent merges.
So is it good that we optimize partly(let say once in a month), rather than
decreasing the merge factor and affect  the indexing speed.Also since we
will be sharding, that 100 GB index will be divided in different shards.

Thanks,
Isan Fulia.



On 14 November 2011 11:28, Kalika Mishra <ka...@germinait.com>wrote:

> Hi Mark,
>
> Thanks for your reply.
>
> What you saying is interesting; so are you suggesting that optimizations
> should be done usually when there not many updates. Also can you please
> point out further under what conditions optimizations might be beneficial.
>
> Thanks.
>
> On 11 November 2011 20:30, Mark Miller <ma...@gmail.com> wrote:
>
> > I would not optimize - it's very expensive. With 11,000 updates a day, I
> > think it makes sense to completely avoid optimizing.
> >
> > That should be your default move in any case. If you notice performance
> > suffers more than is acceptable (good chance you won't), then I'd use a
> > lower merge factor. It defaults to 10 - lower numbers will lower the
> number
> > of segments in your index, and essentially amortize the cost of an
> optimize.
> >
> > Optimize is generally only useful when you will have a mostly static
> index.
> >
> > - Mark Miller
> > lucidimagination.com
> >
> >
> > On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
> >
> > > Hi Mark,
> > >
> > > We are performing almost 11,000 updates a day, we have around 50
> million
> > > docs in the index (i understand we will need to shard) the core seg
> will
> > > get fragmented over a period of time. We will need to do optimize every
> > few
> > > days or once in a month; do you have any reason not to optimize the
> core.
> > > Please let me know.
> > >
> > > Thanks.
> > >
> > > On 11 November 2011 18:51, Mark Miller <ma...@gmail.com> wrote:
> > >
> > >> Do a you have something forcing you to optimize, or are you just doing
> > it
> > >> for the heck of it?
> > >>
> > >> On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I would like to optimize solr core which is in Reader Writer mode.
> > Since
> > >>> the Solr cores are huge in size (above 100 GB) the optimization takes
> > >> hours
> > >>> to complete.
> > >>>
> > >>> When the optimization is going on say. on the Writer core, the
> > >> application
> > >>> wants to continue using the indexes for both query and write
> purposes.
> > >> What
> > >>> is the best approach to do this.
> > >>>
> > >>> I was thinking of using a temporary index (empty core) to write the
> > >>> documents and use the same Reader to read the documents. (Please note
> > >> that
> > >>> temp index and the Reader cannot be made Reader Writer as Reader is
> > >> already
> > >>> setup for the Writer on which optimization is taking place) But there
> > >> could
> > >>> be some updates to the temp index which I would like to get reflected
> > in
> > >>> the Reader. Whats the best setup to support this.
> > >>>
> > >>> Thanks,
> > >>> Kalika
> > >>
> > >> - Mark Miller
> > >> lucidimagination.com
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Kalika
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> Thanks & Regards,
> Kalika
>



-- 
Thanks & Regards,
Isan Fulia.

Re: Using solr during optimization

Posted by Kalika Mishra <ka...@germinait.com>.
Hi Mark,

Thanks for your reply.

What you saying is interesting; so are you suggesting that optimizations
should be done usually when there not many updates. Also can you please
point out further under what conditions optimizations might be beneficial.

Thanks.

On 11 November 2011 20:30, Mark Miller <ma...@gmail.com> wrote:

> I would not optimize - it's very expensive. With 11,000 updates a day, I
> think it makes sense to completely avoid optimizing.
>
> That should be your default move in any case. If you notice performance
> suffers more than is acceptable (good chance you won't), then I'd use a
> lower merge factor. It defaults to 10 - lower numbers will lower the number
> of segments in your index, and essentially amortize the cost of an optimize.
>
> Optimize is generally only useful when you will have a mostly static index.
>
> - Mark Miller
> lucidimagination.com
>
>
> On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:
>
> > Hi Mark,
> >
> > We are performing almost 11,000 updates a day, we have around 50 million
> > docs in the index (i understand we will need to shard) the core seg will
> > get fragmented over a period of time. We will need to do optimize every
> few
> > days or once in a month; do you have any reason not to optimize the core.
> > Please let me know.
> >
> > Thanks.
> >
> > On 11 November 2011 18:51, Mark Miller <ma...@gmail.com> wrote:
> >
> >> Do a you have something forcing you to optimize, or are you just doing
> it
> >> for the heck of it?
> >>
> >> On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
> >>
> >>> Hi,
> >>>
> >>> I would like to optimize solr core which is in Reader Writer mode.
> Since
> >>> the Solr cores are huge in size (above 100 GB) the optimization takes
> >> hours
> >>> to complete.
> >>>
> >>> When the optimization is going on say. on the Writer core, the
> >> application
> >>> wants to continue using the indexes for both query and write purposes.
> >> What
> >>> is the best approach to do this.
> >>>
> >>> I was thinking of using a temporary index (empty core) to write the
> >>> documents and use the same Reader to read the documents. (Please note
> >> that
> >>> temp index and the Reader cannot be made Reader Writer as Reader is
> >> already
> >>> setup for the Writer on which optimization is taking place) But there
> >> could
> >>> be some updates to the temp index which I would like to get reflected
> in
> >>> the Reader. Whats the best setup to support this.
> >>>
> >>> Thanks,
> >>> Kalika
> >>
> >> - Mark Miller
> >> lucidimagination.com
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> > --
> > Thanks & Regards,
> > Kalika
>
>
>
>
>
>
>
>
>
>
>
>
>


-- 
Thanks & Regards,
Kalika

Re: Using solr during optimization

Posted by Mark Miller <ma...@gmail.com>.
I would not optimize - it's very expensive. With 11,000 updates a day, I think it makes sense to completely avoid optimizing.

That should be your default move in any case. If you notice performance suffers more than is acceptable (good chance you won't), then I'd use a lower merge factor. It defaults to 10 - lower numbers will lower the number of segments in your index, and essentially amortize the cost of an optimize.

Optimize is generally only useful when you will have a mostly static index.

- Mark Miller
lucidimagination.com


On Nov 11, 2011, at 9:12 AM, Kalika Mishra wrote:

> Hi Mark,
> 
> We are performing almost 11,000 updates a day, we have around 50 million
> docs in the index (i understand we will need to shard) the core seg will
> get fragmented over a period of time. We will need to do optimize every few
> days or once in a month; do you have any reason not to optimize the core.
> Please let me know.
> 
> Thanks.
> 
> On 11 November 2011 18:51, Mark Miller <ma...@gmail.com> wrote:
> 
>> Do a you have something forcing you to optimize, or are you just doing it
>> for the heck of it?
>> 
>> On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
>> 
>>> Hi,
>>> 
>>> I would like to optimize solr core which is in Reader Writer mode. Since
>>> the Solr cores are huge in size (above 100 GB) the optimization takes
>> hours
>>> to complete.
>>> 
>>> When the optimization is going on say. on the Writer core, the
>> application
>>> wants to continue using the indexes for both query and write purposes.
>> What
>>> is the best approach to do this.
>>> 
>>> I was thinking of using a temporary index (empty core) to write the
>>> documents and use the same Reader to read the documents. (Please note
>> that
>>> temp index and the Reader cannot be made Reader Writer as Reader is
>> already
>>> setup for the Writer on which optimization is taking place) But there
>> could
>>> be some updates to the temp index which I would like to get reflected in
>>> the Reader. Whats the best setup to support this.
>>> 
>>> Thanks,
>>> Kalika
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Thanks & Regards,
> Kalika













Re: Using solr during optimization

Posted by Kalika Mishra <ka...@germinait.com>.
Hi Mark,

We are performing almost 11,000 updates a day, we have around 50 million
docs in the index (i understand we will need to shard) the core seg will
get fragmented over a period of time. We will need to do optimize every few
days or once in a month; do you have any reason not to optimize the core.
Please let me know.

Thanks.

On 11 November 2011 18:51, Mark Miller <ma...@gmail.com> wrote:

> Do a you have something forcing you to optimize, or are you just doing it
> for the heck of it?
>
> On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:
>
> > Hi,
> >
> > I would like to optimize solr core which is in Reader Writer mode. Since
> > the Solr cores are huge in size (above 100 GB) the optimization takes
> hours
> > to complete.
> >
> > When the optimization is going on say. on the Writer core, the
> application
> > wants to continue using the indexes for both query and write purposes.
> What
> > is the best approach to do this.
> >
> > I was thinking of using a temporary index (empty core) to write the
> > documents and use the same Reader to read the documents. (Please note
> that
> > temp index and the Reader cannot be made Reader Writer as Reader is
> already
> > setup for the Writer on which optimization is taking place) But there
> could
> > be some updates to the temp index which I would like to get reflected in
> > the Reader. Whats the best setup to support this.
> >
> > Thanks,
> > Kalika
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>


-- 
Thanks & Regards,
Kalika

Re: Using solr during optimization

Posted by Mark Miller <ma...@gmail.com>.
Do a you have something forcing you to optimize, or are you just doing it for the heck of it?

On Nov 11, 2011, at 7:50 AM, Kalika Mishra wrote:

> Hi,
> 
> I would like to optimize solr core which is in Reader Writer mode. Since
> the Solr cores are huge in size (above 100 GB) the optimization takes hours
> to complete.
> 
> When the optimization is going on say. on the Writer core, the application
> wants to continue using the indexes for both query and write purposes. What
> is the best approach to do this.
> 
> I was thinking of using a temporary index (empty core) to write the
> documents and use the same Reader to read the documents. (Please note that
> temp index and the Reader cannot be made Reader Writer as Reader is already
> setup for the Writer on which optimization is taking place) But there could
> be some updates to the temp index which I would like to get reflected in
> the Reader. Whats the best setup to support this.
> 
> Thanks,
> Kalika

- Mark Miller
lucidimagination.com