You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Suhan Mao (Jira)" <ji...@apache.org> on 2021/08/05 10:19:00 UTC
[jira] [Commented] (LUCENE-10025)
SoftDeletesRetentionMergePolicy#numDeletesToMerge caused indexing
backlogged
[ https://issues.apache.org/jira/browse/LUCENE-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393782#comment-17393782 ]
Suhan Mao commented on LUCENE-10025:
------------------------------------
[~dnhatn] I think [~zhangchao.es]'s question probably refer to this code:
{code:java}
// @Override
public int numDeletesToMerge(SegmentCommitInfo info, int delCount, IOSupplier<CodecReader> readerSupplier) throws IOException {
final int numDeletesToMerge = super.numDeletesToMerge(info, delCount, readerSupplier);
if (numDeletesToMerge != 0 && info.getSoftDelCount() > 0) {
final CodecReader reader = readerSupplier.get();
if (reader.getLiveDocs() != null) {
BooleanQuery.Builder builder = new BooleanQuery.Builder();
builder.add(new DocValuesFieldExistsQuery(field), BooleanClause.Occur.FILTER);
builder.add(retentionQuerySupplier.get(), BooleanClause.Occur.FILTER);
Scorer scorer = getScorer(builder.build(), FilterCodecReader.wrapLiveDocs(reader, null, reader.maxDoc()));
if (scorer != null) {
DocIdSetIterator iterator = scorer.iterator();
Bits liveDocs = reader.getLiveDocs();
int numDeletedDocs = reader.numDeletedDocs();
while (iterator.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
if (liveDocs.get(iterator.docID()) == false) {
numDeletedDocs--;
}
}
return numDeletedDocs;
}
}
}
{code}
Why we have to iterate the scorer and check if the doc id is not in liveDocs?
Since each doc id from scorer must contain a soft delete field, they should must not in live docs, why we should we do that check of *_liveDocs.get(iterator.docID()) == false_* ?
> SoftDeletesRetentionMergePolicy#numDeletesToMerge caused indexing backlogged
> ----------------------------------------------------------------------------
>
> Key: LUCENE-10025
> URL: https://issues.apache.org/jira/browse/LUCENE-10025
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Affects Versions: 8.4
> Reporter: zhangchao.es
> Priority: Major
> Labels: indexing, soft-delete
> Attachments: flamegraph.html, image-2021-07-14-16-52-34-740.png
>
>
> In lucene-8246, numDeletesToMerge is added in SoftDeletesRetentionMergePolicy.
> if soft deleted docs is very more, and they are also in retention lease,the numDeletesToMerge funcation have performance issue
> for instance,a update indexing is writing to elasticsearch, then we move one a shard to an other node,If the moving continues for a long time, the size of old shard will become very big,because soft-deleted operations need to held by retention lease. The more soft-deleted documents, the slower the indexing. if the shard size is about 20GB, we can get the below flamegraph
>
> !image-2021-07-14-16-52-34-740.png!
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org