You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by bt...@apache.org on 2020/04/21 02:56:54 UTC

[james-project] 18/19: JAMES-2997 [ADR] Separate attachment content and metadata

This is an automated email from the ASF dual-hosted git repository.

btellier pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/james-project.git

commit d3022fd4ed0bed025b63fea7711aab7fa3485758
Author: Benoit Tellier <bt...@linagora.com>
AuthorDate: Mon Apr 13 10:47:42 2020 +0700

    JAMES-2997 [ADR] Separate attachment content and metadata
---
 ...030-separate-attachment-content-and-metadata.md | 99 ++++++++++++++++++++++
 1 file changed, 99 insertions(+)

diff --git a/src/adr/0030-separate-attachment-content-and-metadata.md b/src/adr/0030-separate-attachment-content-and-metadata.md
new file mode 100644
index 0000000..704d603
--- /dev/null
+++ b/src/adr/0030-separate-attachment-content-and-metadata.md
@@ -0,0 +1,99 @@
+# 30. Separate attachment content and metadata
+
+Date: 2020-04-13
+
+## Status
+
+Accepted (lazy consensus)
+
+## Context
+
+Some mailbox implementations of James store already parsed attachments for faster retrieval.
+
+This attachment storage capabilities are required for two features:
+
+ - JMAP attachment download
+ - JMAP message search "attachment content" criteria
+ 
+Only Memory and Cassandra backends can be relied upon as a JMAP backend.
+
+Other protocols relies on dynamic EML parsing to expose message subparts (IMAP)
+
+Here are the POJOs related to these attachments:
+
+ - **Attachment** : holds an attachmentId, the attachment content, as well as the content type
+ - **MessageAttachment** : composes an attachment with its disposition within a message (cid, inline and name)
+ - **Message** exposes its list of MessageAttachment when it is read with FetchType Full.
+ - **Blob** represents some downloadable content, and can be either an attachment or a message. Blob has a byte array 
+ payload too.
+ 
+The following classes work with the aforementioned POJOs:
+ 
+ - **AttachmentMapper** and **AttachmentManager** are responsible of storing and retrieving an attachment content.
+ - **BlobManager** is used by JMAP to allow blob downloads.
+ - Mailbox search exposes attachment content related criteria. These criteria are used by the JMAP protocol.
+
+This organisation causes attachment content to be loaded every time a message is fully read (which happens for instance
+when you open a message using JMAP) despite the fact that it is not needed, as attachments are downloadable through a 
+separate JMAP endpoint, their content is not attached to the JMAP message JSON.
+
+Also, the content being loaded "at once", we allocate memory space to store the whole attachment, which is sub-optimal. We
+want to keep the consumed memory low per-message because a given server should be able to handle a high number of messages 
+at a given time.
+
+To be noted that JPA and maildir mailbox implementations do not support attachment storage. To retrieve attachments of a 
+message, these implementations parse the messages to extract their attachments.
+
+Cassandra mailbox prior schema version 4 stored attachment and its metadata in the same table, but from version 5 relies 
+on the blobStore to store the attachment content.
+
+## Decision
+
+Enforce cassandra schema version to be 5 from James release 3.5.0. This allows to drop attachment management prior version
+5.
+
+We will re-organize the attachment POJOs: 
+
+ - **Attachment** should hold an attachmentId, a content type, and a size. It will no longer hold the content. The 
+ content can be loaded from its **AttachmentId** via the **AttachmentLoader** API that the **AttachmentManager** 
+ implements.
+ - **MessageAttachment** : composes an attachment with its disposition within a message (cid, inline and name)
+ - **Blob** would no longer hold the content as a byte array but rather a content retriever (`Supplier<InputStream>`)
+ - **ParsedAttachment** is the direct result of attachment parsing, and composes a **MessageAttachment** and the 
+ corresponding content as byte array. This class is only relied upon when saving a message in mailbox. This is used as 
+ an output of `MessageParser`.
+
+Some adjustments are needed on class working with attachment:
+
+ - **AttachmentMapper** and **AttachmentManager** need to allow from an attachmentId to retrieve the attachment content
+ as an `InputStream`. This is done through a separate `AttachmentLoader` interface.
+ - **AttachmentMapper** and **AttachmentManager** need the Attachment and its content to persist an attachment
+ - **MessageManager** then needs to return attachment metadata as a result of Append operation.
+ - **InMemoryAttachmentMapper** needs to store attachment content separately.
+ - **MessageStorer** will take care of storing a message on the behalf of `MessageManager`. This enables to determine if 
+ attachment should be parsed or not on an implementation aware fashion, saving attachment parsing upon writes for JPA 
+ and Maildir.
+ 
+Maildir and JPA no longer support attachment content loading. Only the JMAP protocol requires attachment content loading,
+which is not supported on top of these technologies.
+
+Mailbox search attachment content criteria will be supported only on implementation supporting attachment storage.
+
+## Consequences
+
+Users running Cassandra schema version prior version 5 will have to go through James release 3.5.0 to upgrade to a 
+version after version 5 before proceeding with their update.
+
+We noticed performance enhancement when using IMAP FETCH and JMAP GetMessages. Running a gatling test suite exercising 
+JMAP getMessages on a dataset containing attachments leads to the following observations:
+
+ - Overall better average performance for all JMAP queries (10% global p50 improvement)
+ - Sharp decrease in tail latency of getMessages (x40 time faster)
+
+We also expect improvements in James memory allocation.
+
+## References
+
+ - [Contribution on this topic](https://github.com/linagora/james-project/pull/3061). Also contains benchmark for this 
+ proposal.
+ - [JIRA](https://issues.apache.org/jira/browse/JAMES-2997)
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org