You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "René Cordier (Jira)" <se...@james.apache.org> on 2020/06/25 04:15:00 UTC

[jira] [Commented] (JAMES-3202) ReIndexing "filtering" for only outdated indexed data

    [ https://issues.apache.org/jira/browse/JAMES-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144633#comment-17144633 ] 

René Cordier commented on JAMES-3202:
-------------------------------------

[https://github.com/linagora/james-project/pull/3452] was the first part of this work

> ReIndexing "filtering" for only outdated indexed data
> -----------------------------------------------------
>
>                 Key: JAMES-3202
>                 URL: https://issues.apache.org/jira/browse/JAMES-3202
>             Project: James Server
>          Issue Type: Improvement
>            Reporter: René Cordier
>            Priority: Major
>             Fix For: 3.6.0
>
>
> *Why?*
> ReIndexing can be slow, and requires to read all messages in the DB, then trigger the full reIndexing, even when the document is not outdated.
> All these document changes creates a lot of deleted documents. Lucene "marks them as deleted", polluting the entire index until segment merging happens (yet another costly operation). The less we do updates the better. To be noted that partial updates still leads to a full new document in Lucene, and just optimises bandwith + avoids reads.
> *Need specification*
> As an admin, I want to run a reIndex.
> We furtermore handle `RunningOptions` allowing to specify the message rate attempted. See [https://github.com/linagora/james-project/pull/3394]
> We still need, given a message, get it's search index representation (at least for its mutable data). From this we will be able to condition the reindexing to outdated/non exsting data, significantly fasting up the reindexing process on mostly valid indexes. The admin could then mention via query parameter this option (carried over in running options).
> *MessageSearchIndex API changes*:
> {code:java}
> inderface MessageSearchIndex {
>    //...
>    Mono<Flags> retrieveIndexedFlags(MailboxId mailboxId, MessageUid uid);
>    //...
> }
> {code}
> ElasticSearch will rely on the _GET_ verb (not search).
> Unit test will be written for this new method.
> ReIndexing `RunningOptions` will then carry over the option, that ReIndexerPerformer will need to take into account.
> Sample webadmin API:
> {code:bash}
> curl -XPOST http://james:8000/mailboxes?action=reindex&filter=outdatedIndex
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org