You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Satya Nand <sa...@indiamart.com.INVALID> on 2022/10/03 10:04:32 UTC

removing deleted documents in solr cloud (8.10)

Hi,

This is what our solr cloud's merge policy looks like. and we have approx
30% deleted documents in the index.



   - is this normal?
   - If not How can I decrease the number of deleted documents?
   - Will the above help us in response time?







* <ramBufferSizeMB>200</ramBufferSizeMB>     <mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">            <int
name="maxMergeAtOnce">5</int>            <int
name="segmentsPerTier">3</int>  </mergePolicyFactory>*

Re: removing deleted documents in solr cloud (8.10)

Posted by Satya Nand <sa...@indiamart.com.INVALID>.
Thanks, Marcus,

I will try it and will let you know.

On Mon, Oct 3, 2022 at 6:05 PM Markus Jelsma <ma...@openindex.io>
wrote:

> Hello Satya,
>
> This is what our solr cloud's merge policy looks like. and we have approx
> > 30% deleted documents in the index.
> >
>
> >    - is this normal?
> >
>
> It depends on how often you delete or overwrite existing documents,
> although i find 30% to be a little too high for my comfort. Our various
> Solr collections are all very different from eachother, it ranges from 0.3%
> to 9%, and 16% and even 24%. Very normal for the way they are used.
>
>
> >    - If not How can I decrease the number of deleted documents?
> >
>
> You can try setting <double name="deletesPctAllowed"> at
> TieredMergePolicyFactory [1], it defaults to 33%. I am not sure if it will
> work though so please report back if you can.
>
>
> >    - Will the above help us in response time?
> >
>
> It depends, but possibly. A lower value means more merging and thus more IO
> on the leader node. If it is separated from the follower node and you only
> query that node, then you should have smaller indexes and so see better
> response times.
>
> Regards,
> Markus
>
> [1]
>
> https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/index/TieredMergePolicy.html#setDeletesPctAllowed-double-
>
> Op ma 3 okt. 2022 om 12:06 schreef Satya Nand
> <sa...@indiamart.com.invalid>:
>
> > Hi,
> >
> > This is what our solr cloud's merge policy looks like. and we have approx
> > 30% deleted documents in the index.
> >
> >
> >
> >    - is this normal?
> >    - If not How can I decrease the number of deleted documents?
> >    - Will the above help us in response time?
> >
> >
> >
> >
> >
> >
> >
> > * <ramBufferSizeMB>200</ramBufferSizeMB>     <mergePolicyFactory
> > class="org.apache.solr.index.TieredMergePolicyFactory">            <int
> > name="maxMergeAtOnce">5</int>            <int
> > name="segmentsPerTier">3</int>  </mergePolicyFactory>*
> >
>
> Op ma 3 okt. 2022 om 12:06 schreef Satya Nand
> <sa...@indiamart.com.invalid>:
>
> > Hi,
> >
> > This is what our solr cloud's merge policy looks like. and we have approx
> > 30% deleted documents in the index.
> >
> >
> >
> >    - is this normal?
> >    - If not How can I decrease the number of deleted documents?
> >    - Will the above help us in response time?
> >
> >
> >
> >
> >
> >
> >
> > * <ramBufferSizeMB>200</ramBufferSizeMB>     <mergePolicyFactory
> > class="org.apache.solr.index.TieredMergePolicyFactory">            <int
> > name="maxMergeAtOnce">5</int>            <int
> > name="segmentsPerTier">3</int>  </mergePolicyFactory>*
> >
>

Re: removing deleted documents in solr cloud (8.10)

Posted by Markus Jelsma <ma...@openindex.io>.
Hello Satya,

This is what our solr cloud's merge policy looks like. and we have approx
> 30% deleted documents in the index.
>

>    - is this normal?
>

It depends on how often you delete or overwrite existing documents,
although i find 30% to be a little too high for my comfort. Our various
Solr collections are all very different from eachother, it ranges from 0.3%
to 9%, and 16% and even 24%. Very normal for the way they are used.


>    - If not How can I decrease the number of deleted documents?
>

You can try setting <double name="deletesPctAllowed"> at
TieredMergePolicyFactory [1], it defaults to 33%. I am not sure if it will
work though so please report back if you can.


>    - Will the above help us in response time?
>

It depends, but possibly. A lower value means more merging and thus more IO
on the leader node. If it is separated from the follower node and you only
query that node, then you should have smaller indexes and so see better
response times.

Regards,
Markus

[1]
https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/index/TieredMergePolicy.html#setDeletesPctAllowed-double-

Op ma 3 okt. 2022 om 12:06 schreef Satya Nand
<sa...@indiamart.com.invalid>:

> Hi,
>
> This is what our solr cloud's merge policy looks like. and we have approx
> 30% deleted documents in the index.
>
>
>
>    - is this normal?
>    - If not How can I decrease the number of deleted documents?
>    - Will the above help us in response time?
>
>
>
>
>
>
>
> * <ramBufferSizeMB>200</ramBufferSizeMB>     <mergePolicyFactory
> class="org.apache.solr.index.TieredMergePolicyFactory">            <int
> name="maxMergeAtOnce">5</int>            <int
> name="segmentsPerTier">3</int>  </mergePolicyFactory>*
>

Op ma 3 okt. 2022 om 12:06 schreef Satya Nand
<sa...@indiamart.com.invalid>:

> Hi,
>
> This is what our solr cloud's merge policy looks like. and we have approx
> 30% deleted documents in the index.
>
>
>
>    - is this normal?
>    - If not How can I decrease the number of deleted documents?
>    - Will the above help us in response time?
>
>
>
>
>
>
>
> * <ramBufferSizeMB>200</ramBufferSizeMB>     <mergePolicyFactory
> class="org.apache.solr.index.TieredMergePolicyFactory">            <int
> name="maxMergeAtOnce">5</int>            <int
> name="segmentsPerTier">3</int>  </mergePolicyFactory>*
>