You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jostein Elvaker Haande <je...@gmail.com> on 2016/04/01 10:49:47 UTC
Re: Deleted documents and expungeDeletes
On 30 March 2016 at 17:46, Erick Erickson <er...@gmail.com> wrote:
> through a clever bit of reflection, you can set the
> reclaimDeletesWeight variable from solrconfig by including something
> like
> <double name="reclaimDeletesWeight">5</double> (going from memory
> here, you'll get an error on startup if I've messed it up.....)
I added the following to my solrconfig a couple of days ago:
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">8</int>
<int name="segmentsPerTier">8</int>
<double name="reclaimDeletesWeight">5.0</double>
</mergePolicy>
There has been several commits and the core is current according to
SOLR admin, however I'm still seeing a lot of deleted docs. These are
my current core statistics.
Last Modified:4 minutes ago
Num Docs:1 675 255
Max Doc:2 353 476
Heap Memory Usage:208 464 267
Deleted Docs:678 221
Version:1 870 539
Segment Count:39
Index size is close to 149GB.
So at the moment, I'm seeing a deleted docs to max docs percentage
ratio of 28.81%. With 'reclaimsWeight' set to 5, it doesn't seem to be
deleting away any deleted docs.
Anything obvious I'm missing?
--
Yours sincerely Jostein Elvaker Haande
"A free society is a society where it is safe to be unpopular"
- Adlai Stevenson
http://tolecnal.net -- tolecnal at tolecnal dot net
Re: Deleted documents and expungeDeletes
Posted by David Santamauro <da...@gmail.com>.
The docs on reclaimDeletesWeight say:
"Controls how aggressively merges that reclaim more deletions are
favored. Higher values favor selecting merges that reclaim deletions."
I can't imagine you would notice anything after only a few commits. I
have many shards that size or larger and what I do occasionally is to
loop an optimize, setting maxSegments with decremented values, e.g.,
for maxSegments in $( seq 40 -1 20 ); do
# optimize maxSegments=$maxSegments
done
It's definitely a poor-man's hack and is clearly not the most efficient
way of optimizing, but it does remove deletes without requiring double
or triple the disk space that a full optimize requires. I can usually
reclaim 100-300GB of disk space in a collection that us currently ~ 2TB
-- not inconsequential.
Seeing you only have 1.6M documents, perhaps an index rebuild isn't out
of the question? I did just that on a test collection with 100M
documents. Starting with 0 deleted docs, a reclaimDeletesWeight=5.0 and
probably about 1-3% document turnover per week (updates) over the last 3
months and my deleted percentage is staying below 10%.
If that's not an option, keeping reclaimDeletesWeight at 5.0 and using
expungeDeletes=true on commit will get that percentage down over time.
//
On 04/01/2016 04:49 AM, Jostein Elvaker Haande wrote:
> On 30 March 2016 at 17:46, Erick Erickson <er...@gmail.com> wrote:
>> through a clever bit of reflection, you can set the
>> reclaimDeletesWeight variable from solrconfig by including something
>> like
>> <double name="reclaimDeletesWeight">5</double> (going from memory
>> here, you'll get an error on startup if I've messed it up.....)
>
> I added the following to my solrconfig a couple of days ago:
>
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> <int name="maxMergeAtOnce">8</int>
> <int name="segmentsPerTier">8</int>
> <double name="reclaimDeletesWeight">5.0</double>
> </mergePolicy>
>
> There has been several commits and the core is current according to
> SOLR admin, however I'm still seeing a lot of deleted docs. These are
> my current core statistics.
>
> Last Modified:4 minutes ago
> Num Docs:1 675 255
> Max Doc:2 353 476
> Heap Memory Usage:208 464 267
> Deleted Docs:678 221
> Version:1 870 539
> Segment Count:39
>
> Index size is close to 149GB.
>
> So at the moment, I'm seeing a deleted docs to max docs percentage
> ratio of 28.81%. With 'reclaimsWeight' set to 5, it doesn't seem to be
> deleting away any deleted docs.
>
> Anything obvious I'm missing?
>