You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jostein Elvaker Haande <je...@gmail.com> on 2016/04/01 10:49:47 UTC

Re: Deleted documents and expungeDeletes

On 30 March 2016 at 17:46, Erick Erickson <er...@gmail.com> wrote:
> through a clever bit of reflection, you can set the
> reclaimDeletesWeight variable from solrconfig by including something
> like
> <double name="reclaimDeletesWeight">5</double> (going from memory
> here, you'll get an error on startup if I've messed it up.....)

I added the following to my solrconfig a couple of days ago:

    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
      <int name="maxMergeAtOnce">8</int>
      <int name="segmentsPerTier">8</int>
      <double name="reclaimDeletesWeight">5.0</double>
    </mergePolicy>

There has been several commits and the core is current according to
SOLR admin, however I'm still seeing a lot of deleted docs. These are
my current core statistics.

Last Modified:4 minutes ago
Num Docs:1 675 255
Max Doc:2 353 476
Heap Memory Usage:208 464 267
Deleted Docs:678 221
Version:1 870 539
Segment Count:39

Index size is close to 149GB.

So at the moment, I'm seeing a deleted docs to max docs percentage
ratio of 28.81%. With 'reclaimsWeight' set to 5, it doesn't seem to be
deleting away any deleted docs.

Anything obvious I'm missing?

-- 
Yours sincerely Jostein Elvaker Haande
"A free society is a society where it is safe to be unpopular"
- Adlai Stevenson

http://tolecnal.net -- tolecnal at tolecnal dot net

Re: Deleted documents and expungeDeletes

Posted by David Santamauro <da...@gmail.com>.
The docs on reclaimDeletesWeight say:

"Controls how aggressively merges that reclaim more deletions are 
favored. Higher values favor selecting merges that reclaim deletions."

I can't imagine you would notice anything after only a few commits. I 
have many shards that size or larger and what I do occasionally is to 
loop an optimize, setting maxSegments with decremented values, e.g.,

for maxSegments in $( seq 40 -1 20 ); do
   # optimize maxSegments=$maxSegments
done

It's definitely a poor-man's hack and is clearly not the most efficient 
way of optimizing, but it does remove deletes without requiring double 
or triple the disk space that a full optimize requires. I can usually 
reclaim 100-300GB of disk space in a collection that us currently ~ 2TB 
-- not inconsequential.

Seeing you only have 1.6M documents, perhaps an index rebuild isn't out 
of the question? I did just that on a test collection with 100M 
documents. Starting with 0 deleted docs, a reclaimDeletesWeight=5.0 and 
probably about 1-3% document turnover per week (updates) over the last 3 
months and my deleted percentage is staying below 10%.

If that's not an option, keeping reclaimDeletesWeight at 5.0 and using 
expungeDeletes=true on commit will get that percentage down over time.

//


On 04/01/2016 04:49 AM, Jostein Elvaker Haande wrote:
> On 30 March 2016 at 17:46, Erick Erickson <er...@gmail.com> wrote:
>> through a clever bit of reflection, you can set the
>> reclaimDeletesWeight variable from solrconfig by including something
>> like
>> <double name="reclaimDeletesWeight">5</double> (going from memory
>> here, you'll get an error on startup if I've messed it up.....)
>
> I added the following to my solrconfig a couple of days ago:
>
>      <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>        <int name="maxMergeAtOnce">8</int>
>        <int name="segmentsPerTier">8</int>
>        <double name="reclaimDeletesWeight">5.0</double>
>      </mergePolicy>
>
> There has been several commits and the core is current according to
> SOLR admin, however I'm still seeing a lot of deleted docs. These are
> my current core statistics.
>
> Last Modified:4 minutes ago
> Num Docs:1 675 255
> Max Doc:2 353 476
> Heap Memory Usage:208 464 267
> Deleted Docs:678 221
> Version:1 870 539
> Segment Count:39
>
> Index size is close to 149GB.
>
> So at the moment, I'm seeing a deleted docs to max docs percentage
> ratio of 28.81%. With 'reclaimsWeight' set to 5, it doesn't seem to be
> deleting away any deleted docs.
>
> Anything obvious I'm missing?
>