You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Arkadi Colson <ar...@smartbit.be> on 2016/10/27 08:19:36 UTC

Merge policy

Hi

As you can see in the screenshot above in the oldest segments there are 
a lot of deletions. In total the shard has about 26% deletions. How can 
I get rid of them so the index will be smaller again?
Can this only be done with an optimize or does it also depend on the 
merge policy? If it also depends also on the merge policy which one 
should I choose then?

Thanks!

BR,
Arkadi


Re: Merge policy

Posted by Arkadi Colson <ar...@smartbit.be>.
It's a default installation using the default settings and parameters. 
Should I perhaps change the segment size or so? Is it possible to do 
live without re-indexing? If you need more info, just let me know...

Thx!


On 27-10-16 19:03, Walter Underwood wrote:
> That distribution of segment sizes seems odd. Why so many medium-large segments?
>
> Are there custom settings for merge policy? I think the default policy would avoid so many segments that are mostly deleted documents.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Oct 27, 2016, at 9:40 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>>
>> On 10/27/2016 9:50 AM, Yonik Seeley wrote:
>>> On Thu, Oct 27, 2016 at 9:56 AM, Arkadi Colson <ar...@smartbit.be>
>>> wrote:
>>>> Thanks for the answer! Do you know if there is a way to trigger an
>>>> optimize for only 1 shard and not the whole collection at once?
>>> Adding a "distrib=false" parameter should work I think.
>> Last time I checked, which I admit has been a little while, optimize
>> ignored distrib and proceeded with a sequential optimize of every core
>> in the collection.
>>
>> Thanks,
>> Shawn
>>
>


Re: Merge policy

Posted by Walter Underwood <wu...@wunderwood.org>.
That distribution of segment sizes seems odd. Why so many medium-large segments?

Are there custom settings for merge policy? I think the default policy would avoid so many segments that are mostly deleted documents.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 27, 2016, at 9:40 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
> On 10/27/2016 9:50 AM, Yonik Seeley wrote:
>> On Thu, Oct 27, 2016 at 9:56 AM, Arkadi Colson <ar...@smartbit.be>
>> wrote:
>>> Thanks for the answer! Do you know if there is a way to trigger an
>>> optimize for only 1 shard and not the whole collection at once? 
>> Adding a "distrib=false" parameter should work I think. 
> 
> Last time I checked, which I admit has been a little while, optimize
> ignored distrib and proceeded with a sequential optimize of every core
> in the collection.
> 
> Thanks,
> Shawn
> 


Re: Merge policy

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/27/2016 9:50 AM, Yonik Seeley wrote:
> On Thu, Oct 27, 2016 at 9:56 AM, Arkadi Colson <ar...@smartbit.be>
> wrote:
>> Thanks for the answer! Do you know if there is a way to trigger an
>> optimize for only 1 shard and not the whole collection at once? 
> Adding a "distrib=false" parameter should work I think. 

Last time I checked, which I admit has been a little while, optimize
ignored distrib and proceeded with a sequential optimize of every core
in the collection.

Thanks,
Shawn


Re: Merge policy

Posted by Yonik Seeley <ys...@gmail.com>.
On Thu, Oct 27, 2016 at 9:56 AM, Arkadi Colson <ar...@smartbit.be> wrote:

> Thanks for the answer!
> Do you know if there is a way to trigger an optimize for only 1 shard and
> not the whole collection at once?
>

Adding a "distrib=false" parameter should work I think.

-Yonik

Re: Merge policy

Posted by Emir Arnautovic <em...@sematext.com>.
I got some notification from mailer, so not sure if my reply reached you:

"If you are using TieredMergePolicy, you can try setting 
/*reclaimDeletesWeight*/."

HTH,
Emir


On 28.10.2016 09:20, Arkadi Colson wrote:
>
> The index size of 1 shard is about 125GB and we are running 11 shards 
> with replication factor 2 so it's a lot of data. The deletions 
> percentage at the bottom of the segment page is around 25%. So it's 
> quite some space which we could recover. That's why I was looking for 
> an optimize.
>
> Do you have any idea why the merge policy does not merge away the 
> deletions? Should I tweak some parameters somehow? It's a default 
> installation using the default settings and parameters. If you need 
> more info, just let me know...
>
> Thx!
>
>
> On 27-10-16 17:40, Erick Erickson wrote:
>> Why do you think you need to get rid of the deleted data? During normal
>> indexing, these will be "merged away". Optimizing has some downsides
>> for continually changing indexes, in particular since the default
>> tieredmergepolicy tries to merge "like size" segments, deletions will
>> accumulate in your one large segment and the percentage of
>> deleted documents may get even higher.
>>
>> Unless there's some measurable performance gain that the users
>> will notice, I'd just leave this alone.
>>
>> The exception here is if you have, say, an index that changes rarely
>> in which case optimizing then makes more sense.
>>
>> Best,
>> Erick
>>
>> On Thu, Oct 27, 2016 at 6:56 AM, Arkadi Colson <arkadi@smartbit.be 
>> <ma...@smartbit.be>> wrote:
>>
>>     Thanks for the answer!
>>     Do you know if there is a way to trigger an optimize for only 1
>>     shard and not the whole collection at once?
>>
>>
>>     On 27-10-16 15:30, Pushkar Raste wrote:
>>>
>>>     Try commit with expungeDeletes="true"
>>>
>>>     I am not sure if it will merge old segments that have deleted
>>>     documents.
>>>
>>>     In the worst case you can 'optimize' your index which should
>>>     take care of removing deleted document
>>>
>>>
>>>     On Oct 27, 2016 4:20 AM, "Arkadi Colson" <arkadi@smartbit.be
>>>     <ma...@smartbit.be>> wrote:
>>>
>>>         Hi
>>>
>>>         As you can see in the screenshot above in the oldest
>>>         segments there are a lot of deletions. In total the shard
>>>         has about 26% deletions. How can I get rid of them so the
>>>         index will be smaller again?
>>>         Can this only be done with an optimize or does it also
>>>         depend on the merge policy? If it also depends also on the
>>>         merge policy which one should I choose then?
>>>
>>>         Thanks!
>>>
>>>         BR,
>>>         Arkadi
>>>
>>
>>
>

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: Merge policy

Posted by Walter Underwood <wu...@wunderwood.org>.
25% overhead is pretty good. It is easy for a merge to need almost double the space of a minimum sized index. It is possible to use 3X the space.

Don’t try use the least possible disk space. If there isn’t enough free space on the disk, Solr cannot merge the big indexes. Ever. That may be what has happened here.

Make sure the nodes have at lease 100 Gb of free space on the volumes, maybe 150. That space is not “wasted” or “unused”. It is necessary for merges.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 28, 2016, at 12:20 AM, Arkadi Colson <ar...@smartbit.be> wrote:
> 
> The index size of 1 shard is about 125GB and we are running 11 shards with replication factor 2 so it's a lot of data. The deletions percentage at the bottom of the segment page is around 25%. So it's quite some space which we could recover. That's why I was looking for an optimize.
> 
> Do you have any idea why the merge policy does not merge away the deletions? Should I tweak some parameters somehow? It's a default installation using the default settings and parameters. If you need more info, just let me know...
> 
> Thx!
> 
> On 27-10-16 17:40, Erick Erickson wrote:
>> Why do you think you need to get rid of the deleted data? During normal
>> indexing, these will be "merged away". Optimizing has some downsides
>> for continually changing indexes, in particular since the default 
>> tieredmergepolicy tries to merge "like size" segments, deletions will
>> accumulate in your one large segment and the percentage of
>> deleted documents may get even higher.
>> 
>> Unless there's some measurable performance gain that the users
>> will notice, I'd just leave this alone.
>> 
>> The exception here is if you have, say, an index that changes rarely
>> in which case optimizing then makes more sense.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Oct 27, 2016 at 6:56 AM, Arkadi Colson <arkadi@smartbit.be <ma...@smartbit.be>> wrote:
>> Thanks for the answer!
>> Do you know if there is a way to trigger an optimize for only 1 shard and not the whole collection at once?
>> 
>> On 27-10-16 15:30, Pushkar Raste wrote:
>>> Try commit with expungeDeletes="true"
>>> 
>>> I am not sure if it will merge old segments that have deleted documents.
>>> 
>>> In the worst case you can 'optimize' your index which should take care of removing deleted document
>>> 
>>> 
>>> On Oct 27, 2016 4:20 AM, "Arkadi Colson" <arkadi@smartbit.be <ma...@smartbit.be>> wrote:
>>> Hi<Mail Attachment.png>
>>> 
>>> As you can see in the screenshot above in the oldest segments there are a lot of deletions. In total the shard has about 26% deletions. How can I get rid of them so the index will be smaller again?
>>> Can this only be done with an optimize or does it also depend on the merge policy? If it also depends also on the merge policy which one should I choose then?
>>> 
>>> Thanks!
>>> 
>>> BR,
>>> Arkadi
>> 
>> 
> 


Re: Merge policy

Posted by Arkadi Colson <ar...@smartbit.be>.
The index size of 1 shard is about 125GB and we are running 11 shards 
with replication factor 2 so it's a lot of data. The deletions 
percentage at the bottom of the segment page is around 25%. So it's 
quite some space which we could recover. That's why I was looking for an 
optimize.

Do you have any idea why the merge policy does not merge away the 
deletions? Should I tweak some parameters somehow? It's a default 
installation using the default settings and parameters. If you need more 
info, just let me know...

Thx!


On 27-10-16 17:40, Erick Erickson wrote:
> Why do you think you need to get rid of the deleted data? During normal
> indexing, these will be "merged away". Optimizing has some downsides
> for continually changing indexes, in particular since the default
> tieredmergepolicy tries to merge "like size" segments, deletions will
> accumulate in your one large segment and the percentage of
> deleted documents may get even higher.
>
> Unless there's some measurable performance gain that the users
> will notice, I'd just leave this alone.
>
> The exception here is if you have, say, an index that changes rarely
> in which case optimizing then makes more sense.
>
> Best,
> Erick
>
> On Thu, Oct 27, 2016 at 6:56 AM, Arkadi Colson <arkadi@smartbit.be 
> <ma...@smartbit.be>> wrote:
>
>     Thanks for the answer!
>     Do you know if there is a way to trigger an optimize for only 1
>     shard and not the whole collection at once?
>
>
>     On 27-10-16 15:30, Pushkar Raste wrote:
>>
>>     Try commit with expungeDeletes="true"
>>
>>     I am not sure if it will merge old segments that have deleted
>>     documents.
>>
>>     In the worst case you can 'optimize' your index which should take
>>     care of removing deleted document
>>
>>
>>     On Oct 27, 2016 4:20 AM, "Arkadi Colson" <arkadi@smartbit.be
>>     <ma...@smartbit.be>> wrote:
>>
>>         Hi
>>
>>         As you can see in the screenshot above in the oldest segments
>>         there are a lot of deletions. In total the shard has about
>>         26% deletions. How can I get rid of them so the index will be
>>         smaller again?
>>         Can this only be done with an optimize or does it also depend
>>         on the merge policy? If it also depends also on the merge
>>         policy which one should I choose then?
>>
>>         Thanks!
>>
>>         BR,
>>         Arkadi
>>
>
>


Re: Merge policy

Posted by Erick Erickson <er...@gmail.com>.
Why do you think you need to get rid of the deleted data? During normal
indexing, these will be "merged away". Optimizing has some downsides
for continually changing indexes, in particular since the default
tieredmergepolicy tries to merge "like size" segments, deletions will
accumulate in your one large segment and the percentage of
deleted documents may get even higher.

Unless there's some measurable performance gain that the users
will notice, I'd just leave this alone.

The exception here is if you have, say, an index that changes rarely
in which case optimizing then makes more sense.

Best,
Erick

On Thu, Oct 27, 2016 at 6:56 AM, Arkadi Colson <ar...@smartbit.be> wrote:

> Thanks for the answer!
> Do you know if there is a way to trigger an optimize for only 1 shard and
> not the whole collection at once?
>
> On 27-10-16 15:30, Pushkar Raste wrote:
>
> Try commit with expungeDeletes="true"
>
> I am not sure if it will merge old segments that have deleted documents.
>
> In the worst case you can 'optimize' your index which should take care of
> removing deleted document
>
> On Oct 27, 2016 4:20 AM, "Arkadi Colson" <ar...@smartbit.be> wrote:
>
>> Hi
>>
>> As you can see in the screenshot above in the oldest segments there are a
>> lot of deletions. In total the shard has about 26% deletions. How can I get
>> rid of them so the index will be smaller again?
>> Can this only be done with an optimize or does it also depend on the
>> merge policy? If it also depends also on the merge policy which one should
>> I choose then?
>>
>> Thanks!
>>
>> BR,
>> Arkadi
>>
>
>

Re: Merge policy

Posted by Arkadi Colson <ar...@smartbit.be>.
Thanks for the answer!
Do you know if there is a way to trigger an optimize for only 1 shard 
and not the whole collection at once?


On 27-10-16 15:30, Pushkar Raste wrote:
>
> Try commit with expungeDeletes="true"
>
> I am not sure if it will merge old segments that have deleted documents.
>
> In the worst case you can 'optimize' your index which should take care 
> of removing deleted document
>
>
> On Oct 27, 2016 4:20 AM, "Arkadi Colson" <arkadi@smartbit.be 
> <ma...@smartbit.be>> wrote:
>
>     Hi
>
>     As you can see in the screenshot above in the oldest segments
>     there are a lot of deletions. In total the shard has about 26%
>     deletions. How can I get rid of them so the index will be smaller
>     again?
>     Can this only be done with an optimize or does it also depend on
>     the merge policy? If it also depends also on the merge policy
>     which one should I choose then?
>
>     Thanks!
>
>     BR,
>     Arkadi
>


Re: Merge policy

Posted by Pushkar Raste <pu...@gmail.com>.
Try commit with expungeDeletes="true"

I am not sure if it will merge old segments that have deleted documents.

In the worst case you can 'optimize' your index which should take care of
removing deleted document

On Oct 27, 2016 4:20 AM, "Arkadi Colson" <ar...@smartbit.be> wrote:

> Hi
>
> As you can see in the screenshot above in the oldest segments there are a
> lot of deletions. In total the shard has about 26% deletions. How can I get
> rid of them so the index will be smaller again?
> Can this only be done with an optimize or does it also depend on the merge
> policy? If it also depends also on the merge policy which one should I
> choose then?
>
> Thanks!
>
> BR,
> Arkadi
>