You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kayak28 <ka...@gmail.com> on 2020/04/15 17:28:43 UTC

Defaults Merge Policy

Hello, Solr Community:

I would like to ask about Default's Merge Policy for Solr 8.3.0.
My client (SolrJ) makes a commit every 10'000 doc.
I have not explicitly configured Merge Policy via solrconfig.xml
For each indexing time, some documents are updated or deleted.
I think the Default Merge Policy will merge segments automatically
if there are too many segments.
But, the number of deleted documents is increasing.

Is there a Default Merge Policy Configuration?
Or, do I have to configure it?

Sincerely,
Kaya Ota



-- 

Sincerely,
Kaya
github: https://github.com/28kayak

Re: Defaults Merge Policy

Posted by Kayak28 <ka...@gmail.com>.
Thank you for responding.
I will keep your words in mind.

Thank you again.



2020年4月23日(木) 20:38 Erick Erickson <er...@gmail.com>:

> Glad those articles helped, I remember them well ;)
>
> Do note that 30 (well, actually 33%) is usually the ceiling.
> But as I mentioned, it’s soft, not absolute. So your index
> might have a higher percentage temporarily.
>
> Best,
> Erick
>
> > On Apr 23, 2020, at 4:01 AM, Kayak28 <ka...@gmail.com> wrote:
> >
> > Hello, Erick Erickson:
> >
> > Thank you for answering my questions.
> >
> > Deleted docs in Solr 8.3.0 has not reached to 30% of the entire index,
> > so I will monitor it for now.
> > Again thank you for your response.
> >
> > Actually, the articles below helped me a lot.
> >
> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
> > https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
> >
> >
> > Sincerely,
> > Kaya Ota
> >
> > 2020年4月16日(木) 2:41 Erick Erickson <er...@gmail.com>:
> >
> >> The number of deleted documents will bounce around.
> >> The default TieredMergePolicy has a rather complex
> >> algorithm that decides which segments to
> >> merge, and the percentage of deleted docs in any
> >> given segment is a factor, but not the sole determinant.
> >>
> >> Merging is not really based on the raw number of segments,
> >> rather on the number of segments of similar size.
> >>
> >> But the short answer is “no, you don’t have to configure
> >> anything explicitly”. The percentage of deleted docs
> >> should max out at around 30% or so, although that’s a
> >> soft number, it’s usually lower.
> >>
> >> Unless you have some provable performance problem,
> >> I wouldn’t worry about it. And don’t infer anything
> >> until you’ve indexed a _lot_ of docs.
> >>
> >> Oh, and I kind of dislike numDocs as the trigger and
> >> tend to use time on the theory that it’s easier to explain,
> >> whereas when commits happen when using maxDocs
> >> varies depending on the throughput rate.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Apr 15, 2020, at 1:28 PM, Kayak28 <ka...@gmail.com> wrote:
> >>>
> >>> Hello, Solr Community:
> >>>
> >>> I would like to ask about Default's Merge Policy for Solr 8.3.0.
> >>> My client (SolrJ) makes a commit every 10'000 doc.
> >>> I have not explicitly configured Merge Policy via solrconfig.xml
> >>> For each indexing time, some documents are updated or deleted.
> >>> I think the Default Merge Policy will merge segments automatically
> >>> if there are too many segments.
> >>> But, the number of deleted documents is increasing.
> >>>
> >>> Is there a Default Merge Policy Configuration?
> >>> Or, do I have to configure it?
> >>>
> >>> Sincerely,
> >>> Kaya Ota
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Sincerely,
> >>> Kaya
> >>> github: https://github.com/28kayak
> >>
> >>
> >
> > --
> >
> > Sincerely,
> > Kaya
> > github: https://github.com/28kayak
>
>

-- 

Sincerely,
Kaya
github: https://github.com/28kayak

Re: Defaults Merge Policy

Posted by Erick Erickson <er...@gmail.com>.
Glad those articles helped, I remember them well ;)

Do note that 30 (well, actually 33%) is usually the ceiling.
But as I mentioned, it’s soft, not absolute. So your index
might have a higher percentage temporarily.

Best,
Erick

> On Apr 23, 2020, at 4:01 AM, Kayak28 <ka...@gmail.com> wrote:
> 
> Hello, Erick Erickson:
> 
> Thank you for answering my questions.
> 
> Deleted docs in Solr 8.3.0 has not reached to 30% of the entire index,
> so I will monitor it for now.
> Again thank you for your response.
> 
> Actually, the articles below helped me a lot.
> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
> 
> 
> Sincerely,
> Kaya Ota
> 
> 2020年4月16日(木) 2:41 Erick Erickson <er...@gmail.com>:
> 
>> The number of deleted documents will bounce around.
>> The default TieredMergePolicy has a rather complex
>> algorithm that decides which segments to
>> merge, and the percentage of deleted docs in any
>> given segment is a factor, but not the sole determinant.
>> 
>> Merging is not really based on the raw number of segments,
>> rather on the number of segments of similar size.
>> 
>> But the short answer is “no, you don’t have to configure
>> anything explicitly”. The percentage of deleted docs
>> should max out at around 30% or so, although that’s a
>> soft number, it’s usually lower.
>> 
>> Unless you have some provable performance problem,
>> I wouldn’t worry about it. And don’t infer anything
>> until you’ve indexed a _lot_ of docs.
>> 
>> Oh, and I kind of dislike numDocs as the trigger and
>> tend to use time on the theory that it’s easier to explain,
>> whereas when commits happen when using maxDocs
>> varies depending on the throughput rate.
>> 
>> Best,
>> Erick
>> 
>>> On Apr 15, 2020, at 1:28 PM, Kayak28 <ka...@gmail.com> wrote:
>>> 
>>> Hello, Solr Community:
>>> 
>>> I would like to ask about Default's Merge Policy for Solr 8.3.0.
>>> My client (SolrJ) makes a commit every 10'000 doc.
>>> I have not explicitly configured Merge Policy via solrconfig.xml
>>> For each indexing time, some documents are updated or deleted.
>>> I think the Default Merge Policy will merge segments automatically
>>> if there are too many segments.
>>> But, the number of deleted documents is increasing.
>>> 
>>> Is there a Default Merge Policy Configuration?
>>> Or, do I have to configure it?
>>> 
>>> Sincerely,
>>> Kaya Ota
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Sincerely,
>>> Kaya
>>> github: https://github.com/28kayak
>> 
>> 
> 
> -- 
> 
> Sincerely,
> Kaya
> github: https://github.com/28kayak


Re: Defaults Merge Policy

Posted by Kayak28 <ka...@gmail.com>.
Hello, Erick Erickson:

Thank you for answering my questions.

Deleted docs in Solr 8.3.0 has not reached to 30% of the entire index,
so I will monitor it for now.
Again thank you for your response.

Actually, the articles below helped me a lot.
https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/


 Sincerely,
Kaya Ota

2020年4月16日(木) 2:41 Erick Erickson <er...@gmail.com>:

> The number of deleted documents will bounce around.
> The default TieredMergePolicy has a rather complex
> algorithm that decides which segments to
> merge, and the percentage of deleted docs in any
> given segment is a factor, but not the sole determinant.
>
> Merging is not really based on the raw number of segments,
> rather on the number of segments of similar size.
>
> But the short answer is “no, you don’t have to configure
> anything explicitly”. The percentage of deleted docs
> should max out at around 30% or so, although that’s a
> soft number, it’s usually lower.
>
> Unless you have some provable performance problem,
> I wouldn’t worry about it. And don’t infer anything
> until you’ve indexed a _lot_ of docs.
>
> Oh, and I kind of dislike numDocs as the trigger and
> tend to use time on the theory that it’s easier to explain,
> whereas when commits happen when using maxDocs
> varies depending on the throughput rate.
>
> Best,
> Erick
>
> > On Apr 15, 2020, at 1:28 PM, Kayak28 <ka...@gmail.com> wrote:
> >
> > Hello, Solr Community:
> >
> > I would like to ask about Default's Merge Policy for Solr 8.3.0.
> > My client (SolrJ) makes a commit every 10'000 doc.
> > I have not explicitly configured Merge Policy via solrconfig.xml
> > For each indexing time, some documents are updated or deleted.
> > I think the Default Merge Policy will merge segments automatically
> > if there are too many segments.
> > But, the number of deleted documents is increasing.
> >
> > Is there a Default Merge Policy Configuration?
> > Or, do I have to configure it?
> >
> > Sincerely,
> > Kaya Ota
> >
> >
> >
> > --
> >
> > Sincerely,
> > Kaya
> > github: https://github.com/28kayak
>
>

-- 

Sincerely,
Kaya
github: https://github.com/28kayak

Re: Defaults Merge Policy

Posted by Erick Erickson <er...@gmail.com>.
The number of deleted documents will bounce around.
The default TieredMergePolicy has a rather complex
algorithm that decides which segments to 
merge, and the percentage of deleted docs in any
given segment is a factor, but not the sole determinant.

Merging is not really based on the raw number of segments,
rather on the number of segments of similar size.

But the short answer is “no, you don’t have to configure
anything explicitly”. The percentage of deleted docs
should max out at around 30% or so, although that’s a
soft number, it’s usually lower.

Unless you have some provable performance problem,
I wouldn’t worry about it. And don’t infer anything
until you’ve indexed a _lot_ of docs.

Oh, and I kind of dislike numDocs as the trigger and
tend to use time on the theory that it’s easier to explain,
whereas when commits happen when using maxDocs
varies depending on the throughput rate.

Best,
Erick

> On Apr 15, 2020, at 1:28 PM, Kayak28 <ka...@gmail.com> wrote:
> 
> Hello, Solr Community:
> 
> I would like to ask about Default's Merge Policy for Solr 8.3.0.
> My client (SolrJ) makes a commit every 10'000 doc.
> I have not explicitly configured Merge Policy via solrconfig.xml
> For each indexing time, some documents are updated or deleted.
> I think the Default Merge Policy will merge segments automatically
> if there are too many segments.
> But, the number of deleted documents is increasing.
> 
> Is there a Default Merge Policy Configuration?
> Or, do I have to configure it?
> 
> Sincerely,
> Kaya Ota
> 
> 
> 
> -- 
> 
> Sincerely,
> Kaya
> github: https://github.com/28kayak