You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by James Mason <ja...@hotmail.com> on 2016/01/25 19:50:05 UTC

unmerged index segments

Hi,

I’ve have a large index that has been added to over several years, and I’ve discovered that I have many segments that haven’t been updated for well over a year, even though I’m adding, updating and deleting records daily. My five largest segments all haven’t been updated for over a year.

Meanwhile, the number of segments I have keeps on increasing, and I have hundreds of segment files that don’t seem to be getting merged past a certain size (e.g. the largest is 2Gb but my older segments are over 100Gb).

My understanding was that background merges should be merging these older segments with newer data over time, but this doesn’t seem to be the case.

I’m using Solr 4.9, but I was using an older version at the time that these ‘older’ segments were created. 

Any help on suggestions of what’s happening would be very much appreciated. And also any suggestion on how I can monitor what’s happening with the background merges.

Thanks,

James

Re: unmerged index segments

Posted by Jack Krupansky <ja...@gmail.com>.
Sorry I don't have any specific guidance since the results are so
unpredictable. But a much lower mergeFactor should result in more frequent
merges, which should reduce segment count but may slow indexing down.

If you make the change and then add enough documents to exceed the segment
size limit (ramBufferSizeMB and maxBufferedDocs), then it should trigger
the merge, we hope.

You may also have to use your own explicit <mergePolicy> in order to get
control over more of the parameters of TieredMergePolicy which is the
default. Solr is using <mergeFactor> to set the maxMergeAtOnce and
segmentsPerTier options to be the same, but you may want change them to
differ.

Some doc to read:
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
https://wiki.apache.org/solr/SolrPerformanceFactors

The official Solr doc doesn't detail all the merge policy settings,
pointing yoou to the Javadoc, which for Tiered is here:
http://lucene.apache.org/core/5_4_0/core/org/apache/lucene/index/TieredMergePolicy.html

I did doc all of these options (as of Solr 4.4) in my Solr 4.x Deep Dive
e-book and I don't think much of that has changed since then:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

On Tue, Jan 26, 2016 at 3:37 AM, James Mason <ja...@hotmail.com>
wrote:

> Hi Jack,
>
> Sorry, I should have put them on my original message.
>
> All merge policy settings are at their default except mergeFactor, which I
> now notice is quite high at 45. Unfortunately I don’t have the full history
> to see when this setting was changed, but I do know they haven’t been
> changed for well over a year, and that we did originally run Solr using the
> default settings.
>
> So reading about mergeFactor it sounds like this is likely the problem,
> and we’re simply not asking Solr to merge into these old and large segments
> yet?
>
> If I was to change this back down to the default of 10, would you expect
> we’d get quite an immediate and intense period of merging?
>
> If I was to launch a dupliacate test Solr instance, change the merge
> factor, and simply leave it for a few days, would it perform the background
> merge (so I can test to see if there’s enough memory etc for the merge to
> complete?).
>
> Thanks,
>
> James
>
>
>
> > On 25 Jan 2016, at 21:39, Jack Krupansky <ja...@gmail.com>
> wrote:
> >
> > What exacting are you merge policy settings in solrconfig? They control
> > when the background merges will be performed. Sometimes they do need to
> be
> > tweaked.
> >
> > -- Jack Krupansky
> >
> > On Mon, Jan 25, 2016 at 1:50 PM, James Mason <ja...@hotmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I’ve have a large index that has been added to over several years, and
> >> I’ve discovered that I have many segments that haven’t been updated for
> >> well over a year, even though I’m adding, updating and deleting records
> >> daily. My five largest segments all haven’t been updated for over a
> year.
> >>
> >> Meanwhile, the number of segments I have keeps on increasing, and I have
> >> hundreds of segment files that don’t seem to be getting merged past a
> >> certain size (e.g. the largest is 2Gb but my older segments are over
> 100Gb).
> >>
> >> My understanding was that background merges should be merging these
> older
> >> segments with newer data over time, but this doesn’t seem to be the
> case.
> >>
> >> I’m using Solr 4.9, but I was using an older version at the time that
> >> these ‘older’ segments were created.
> >>
> >> Any help on suggestions of what’s happening would be very much
> >> appreciated. And also any suggestion on how I can monitor what’s
> happening
> >> with the background merges.
> >>
> >> Thanks,
> >>
> >> James
>
>

Re: unmerged index segments

Posted by James Mason <ja...@hotmail.com>.
Hi Jack,

Sorry, I should have put them on my original message.

All merge policy settings are at their default except mergeFactor, which I now notice is quite high at 45. Unfortunately I don’t have the full history to see when this setting was changed, but I do know they haven’t been changed for well over a year, and that we did originally run Solr using the default settings.

So reading about mergeFactor it sounds like this is likely the problem, and we’re simply not asking Solr to merge into these old and large segments yet?

If I was to change this back down to the default of 10, would you expect we’d get quite an immediate and intense period of merging? 

If I was to launch a dupliacate test Solr instance, change the merge factor, and simply leave it for a few days, would it perform the background merge (so I can test to see if there’s enough memory etc for the merge to complete?).

Thanks,

James



> On 25 Jan 2016, at 21:39, Jack Krupansky <ja...@gmail.com> wrote:
> 
> What exacting are you merge policy settings in solrconfig? They control
> when the background merges will be performed. Sometimes they do need to be
> tweaked.
> 
> -- Jack Krupansky
> 
> On Mon, Jan 25, 2016 at 1:50 PM, James Mason <ja...@hotmail.com>
> wrote:
> 
>> Hi,
>> 
>> I’ve have a large index that has been added to over several years, and
>> I’ve discovered that I have many segments that haven’t been updated for
>> well over a year, even though I’m adding, updating and deleting records
>> daily. My five largest segments all haven’t been updated for over a year.
>> 
>> Meanwhile, the number of segments I have keeps on increasing, and I have
>> hundreds of segment files that don’t seem to be getting merged past a
>> certain size (e.g. the largest is 2Gb but my older segments are over 100Gb).
>> 
>> My understanding was that background merges should be merging these older
>> segments with newer data over time, but this doesn’t seem to be the case.
>> 
>> I’m using Solr 4.9, but I was using an older version at the time that
>> these ‘older’ segments were created.
>> 
>> Any help on suggestions of what’s happening would be very much
>> appreciated. And also any suggestion on how I can monitor what’s happening
>> with the background merges.
>> 
>> Thanks,
>> 
>> James


Re: unmerged index segments

Posted by Jack Krupansky <ja...@gmail.com>.
What exacting are you merge policy settings in solrconfig? They control
when the background merges will be performed. Sometimes they do need to be
tweaked.

-- Jack Krupansky

On Mon, Jan 25, 2016 at 1:50 PM, James Mason <ja...@hotmail.com>
wrote:

> Hi,
>
> I’ve have a large index that has been added to over several years, and
> I’ve discovered that I have many segments that haven’t been updated for
> well over a year, even though I’m adding, updating and deleting records
> daily. My five largest segments all haven’t been updated for over a year.
>
> Meanwhile, the number of segments I have keeps on increasing, and I have
> hundreds of segment files that don’t seem to be getting merged past a
> certain size (e.g. the largest is 2Gb but my older segments are over 100Gb).
>
> My understanding was that background merges should be merging these older
> segments with newer data over time, but this doesn’t seem to be the case.
>
> I’m using Solr 4.9, but I was using an older version at the time that
> these ‘older’ segments were created.
>
> Any help on suggestions of what’s happening would be very much
> appreciated. And also any suggestion on how I can monitor what’s happening
> with the background merges.
>
> Thanks,
>
> James