You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Benoit Tellier (Jira)" <se...@james.apache.org> on 2023/04/14 23:46:00 UTC
[jira] [Commented] (JAMES-3777) Event sourcing - O[n²] storage for filters

    [ https://issues.apache.org/jira/browse/JAMES-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712579#comment-17712579 ] 

Benoit Tellier commented on JAMES-3777:
---------------------------------------

In https://github.com/apache/james-project/pull/1519 I propose to use an optional read projection for filters. Included is a way to reset the read projection.

> Event sourcing - O[n²] storage for filters
> ------------------------------------------
>
>                 Key: JAMES-3777
>                 URL: https://issues.apache.org/jira/browse/JAMES-3777
>             Project: James Server
>          Issue Type: Improvement
>    Affects Versions: 3.7.0
>            Reporter: Benoit Tellier
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Symptoms
> ```  
> Largest Partitions:     
> [FilteringRule/xxx@linagora.com] 44952069 (45.0 MB)
> ```
> Every time this guy sends an email we load 45 MB of JSON, which can yield  big performance impact.
> h2. What?
> We implemented event sourcing with reset. Given rule A, B if we want to persist rule C then we store a "reset to A, B, C" event.
> So, if we want to store N filter, the resulting structure with have a size depending of O[n²] which proves to be barely sustainable.
> h2. How to fix
> Coming back to O[n] likely would help.
> Implement filter addition / removal both at the storage and JMAP layer
> h2.  Alternatives
> h3. The read projection
> Currently we are loading the full history, building the aggregate each time we process emails, and performing SERIAL lightweight transactions. Which is very common. And impactfull.
> It would be possible to introduce  read projection, maintained by a subscriber to the event source, that would allow efficiently reading current filters for a given user.
> This mean the history would be loaded only upon writes, which are rare.
> Impact: yet another table. Also the solution is local to this usage and does not help other event sourcing usages.
> h3. Event sourcing snapshots
> Augment James event sourcing implementation with a Snapshot mechanism.
> Upon reading history, we would start reading available snapshots, then read the history from that snapshot.
> Event store would be responsible of taking snapshots. Even a one change out of 10 would do the job here.
> This implies being able to serialize state. This implies an additional table for storing event sourcing snapshots.
> My take on it: going `O[n2` -> `O[n]` will likely be a good enough mitigation that we don't need to grow the complexity of the event sourcing code.
> On the other hand, this ewould harden event sourcing code and likely lift most of the limitation for adoptions on the mailboxes write path (to enforce mailbox name unicity constraint).
> Note that both solutions are not exclusive.
> h3. The dirty fix
> For filters the history prior reset event can be dropped, this can be used to solve the immediate problem, even if it is not very clean.
> h1. Proposal
>  - Implement a read projection
>  - Implement addition / removal patches to filtering event sourcing aggregate
>  - Don't implement event sourcing snapshots now
> And also... Remove the obligation to configure JMAP filtering mailet inside JMAP servers: after all this extension is not standard...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org