You are viewing a plain text version of this content. The canonical link for it is here.

Posted to server-dev@james.apache.org by rc...@apache.org on 2019/11/25 10:47:37 UTC

[james-project] branch master updated (248b0b7 -> 31bbd65)

This is an automated email from the ASF dual-hosted git repository.

rcordier pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/james-project.git.


    from 248b0b7  [Refactoring] ModSeq are never used in MetaDataWithContent
     new 11cc859  JAMES-2919 ADRs for JMAP-draft GetMessages partial reads
     new 31bbd65  JAMES-2921 ADR for ObjectStorage usage improvements

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/adr/0012-jmap-partial-reads.md         | 40 ++++++++++++++++++++
 src/adr/0013-precompute-jmap-preview.md    | 48 ++++++++++++++++++++++++
 src/adr/0014-blobstore-storage-policies.md | 57 +++++++++++++++++++++++++++++
 src/adr/0015-objectstorage-blobid-list.md  | 59 ++++++++++++++++++++++++++++++
 4 files changed, 204 insertions(+)
 create mode 100644 src/adr/0012-jmap-partial-reads.md
 create mode 100644 src/adr/0013-precompute-jmap-preview.md
 create mode 100644 src/adr/0014-blobstore-storage-policies.md
 create mode 100644 src/adr/0015-objectstorage-blobid-list.md


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[james-project] 02/02: JAMES-2921 ADR for ObjectStorage usage improvements

Posted by rc...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

rcordier pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/james-project.git

commit 31bbd652c1550f6a6bfad725709d05dff285ad26
Author: Benoit Tellier <bt...@linagora.com>
AuthorDate: Thu Oct 17 09:28:18 2019 +0700

    JAMES-2921 ADR for ObjectStorage usage improvements
---
 src/adr/0014-blobstore-storage-policies.md | 57 +++++++++++++++++++++++++++++
 src/adr/0015-objectstorage-blobid-list.md  | 59 ++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+)

diff --git a/src/adr/0014-blobstore-storage-policies.md b/src/adr/0014-blobstore-storage-policies.md
new file mode 100644
index 0000000..655b1eb
--- /dev/null
+++ b/src/adr/0014-blobstore-storage-policies.md
@@ -0,0 +1,57 @@
+# 14. Add storage policies for BlobStore
+
+Date: 2019-10-09
+
+## Status
+
+Proposed
+
+Adoption needs to be backed by some performance tests, as well as data repartition between Cassandra and object storage shifts.
+
+## Context
+
+James exposes a simple BlobStore API for storing raw data. However such raw data often vary in size and access patterns.
+
+As an example:
+
+ - Mailbox message headers are expected to be small and frequently accessed
+ - Mailbox message body are expected to have sizes ranging from small to big but are unfrequently accessed
+ - DeletedMessageVault message headers are expected to be small and unfrequently accessed
+
+Also, the capabilities of the various implementations of BlobStore have different strengths:
+
+ - CassandraBlobStore is efficient for small blobs and offers low latency. However it is known to be expensive for big blobs. Cassandra storage is expensive.
+ - Object Storage blob store is good at storing big blobs, but it induces higher latencies than Cassandra for small blobs for a cost gain that isn't worth it.
+
+Thus, significant performance and cost ratio refinement could be unlocked by using the right blob store for the right blob.
+
+## Decision
+
+Introduce StoragePolicies at the level of the BlobStore API.
+
+The proposed policies include:
+
+ - SizeBasedStoragePolicy: The blob underlying storage medium will be chosen depending on its size.
+ - LowCostStoragePolicy: The blob is expected to be saved in low cost storage. Access is expected to be unfrequent.
+ - PerformantStoragePolicy: The blob is expected to be saved in performant storage. Access is expected to be frequent.
+
+An HybridBlobStore will replace current UnionBlobStore and will allow to choose between Cassandra and ObjectStorage implementations depending on the policies.
+
+DeletedMessageVault, BlobExport & MailRepository will rely on LowCostStoragePolicy. Other BlobStore users will rely on SizeBasedStoragePolicy.
+
+Some performance tests will be run in order to evaluate the improvements.
+
+## Consequences
+
+We expect small frequently accessed blobs to be located in Cassandra, allowing ObjectStorage to be used mainly for large costly blobs.
+
+In case of a less than 5% improvement, the code will not be added to the codebase and the proposal will get the status 'rejected'.
+
+We expect more data to be stored in Cassandra. We need to quantify this for adoption.
+
+As reads will be reading the two blobStores, no migration is required to use this composite blobstore on top an existing implementation,
+however we will benefits of the performance enhancements only for newly stored blobs.
+
+## References
+
+ - [JIRA](https://issues.apache.org/jira/browse/JAMES-2921)
\ No newline at end of file
diff --git a/src/adr/0015-objectstorage-blobid-list.md b/src/adr/0015-objectstorage-blobid-list.md
new file mode 100644
index 0000000..ef1523b
--- /dev/null
+++ b/src/adr/0015-objectstorage-blobid-list.md
@@ -0,0 +1,59 @@
+# 15. Persist BlobIds for avoiding persisting several time the same blobs within ObjectStorage
+
+Date: 2019-10-09
+
+## Status
+
+Proposed
+
+Adoption needs to be backed by some performance tests.
+
+## Context
+
+A given mail is often written to the blob store by different components. And mail traffic is heavily duplicated (several recipients receiving similar email, same attachments). This causes a given blob to often be persisted several times.
+
+Cassandra was the first implementation of the blobStore. Cassandra is a heavily write optimized NoSQL database. One can assume writes to be fast on top of Cassandra. Thus we assumed we could always overwrite blobs.
+
+This usage pattern was also adopted for BlobStore on top of ObjectStorage.
+
+However writing in Object storage:
+ - Takes time
+ - Is billed by most cloud providers
+
+Thus choosing a right strategy to avoid writing blob twice is desirable.
+
+However, ObjectStorage (OpenStack Swift) `exist` method was not efficient enough to be a real cost and performance saver.
+
+## Decision
+
+Rely on a StoredBlobIdsList API to know which blob is persisted or not in object storage. Provide a Cassandra implementation of it. 
+Located in blob-api for convenience, this it not a top level API. It is intended to be used by some blobStore implementations
+(here only ObjectStorage). We will provide a CassandraStoredBlobIdsList in blob-cassandra project so that guice products combining
+object storage and Cassandra can define a binding to it. 
+
+ - When saving a blob with precomputed blobId, we can check the existence of the blob in storage, avoiding possibly the expensive "save".
+ - When saving a blob too big to precompute its blobId, once the blob had been streamed using a temporary random blobId, copy operation can be avoided and the temporary blob could be directly removed.
+
+Cassandra is probably faster doing "write every time" rather than "read before write" so we should not use the stored blob projection for it
+
+Some performance tests will be run in order to evaluate the improvements.
+
+## Consequences
+
+We expect to reduce the amount of writes to the object storage. This is expected to improve:
+ - operational costs on cloud providers
+ - performance improvement
+ - latency reduction under load
+
+As id persistence in StoredBlobIdsList will be done once the blob successfully saved, inconsistencies in StoredBlobIdsList
+will lead to duplicated saved blobs, which is the current behaviour.
+
+In case of a less than 5% improvement, the code will not be added to the codebase and the proposal will get the status 'rejected'.
+
+## Reference
+
+Previous optimization proposal using blob existence checks before persist. This work was done using ObjectStorage exist method and was prooven not efficient enough.
+
+https://github.com/linagora/james-project/pull/2011 (V2)
+
+ - [JIRA](https://issues.apache.org/jira/browse/JAMES-2921)


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

[james-project] 01/02: JAMES-2919 ADRs for JMAP-draft GetMessages partial reads

Posted by rc...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

rcordier pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/james-project.git

commit 11cc859752560bab52db76c2b3fb57531f40ce4b
Author: Benoit Tellier <bt...@linagora.com>
AuthorDate: Thu Oct 17 09:27:43 2019 +0700

    JAMES-2919 ADRs for JMAP-draft GetMessages partial reads
---
 src/adr/0012-jmap-partial-reads.md      | 40 +++++++++++++++++++++++++++
 src/adr/0013-precompute-jmap-preview.md | 48 +++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+)

diff --git a/src/adr/0012-jmap-partial-reads.md b/src/adr/0012-jmap-partial-reads.md
new file mode 100644
index 0000000..310cd2e
--- /dev/null
+++ b/src/adr/0012-jmap-partial-reads.md
@@ -0,0 +1,40 @@
+# 12. Projections for JMAP Messages
+
+Date: 2019-10-09
+
+## Status
+
+Proposed
+
+Adoption needs to be backed by some performance tests.
+
+## Context
+
+JMAP core RFC8620 requires that the server responds only properties requested by the client.
+
+James currently computes all of the properties regardless of their cost, and if it had been asked by the client.
+
+Clearly we can save some latencies and resources by avoiding reading/computing expensive properties that had not been explicitly requested by the client.
+
+## Decision
+
+Introduce two new datastructures representing JMAP messages:
+ - One with only metadata
+ - One with metadata + headers
+
+Given the properties requested by the client, the most appropriate message datastructure will be computed, on top of 
+existing message storage APIs that should remain unchanged.
+
+Some performance tests will be run in order to evaluate the improvements.
+
+## Consequences
+
+GetMessages with a limited set of requested properties will no longer result necessarily in full database message read. We
+thus expect a significant improvement, for instance when only metadata are requested.
+
+In case of a less than 5% improvement, the code will not be added to the codebase and the proposal will get the status 'rejected'.
+
+## References
+
+ - /get method: https://tools.ietf.org/html/rfc8620#section-5.1
+ - [JIRA](https://issues.apache.org/jira/browse/JAMES-2919)
diff --git a/src/adr/0013-precompute-jmap-preview.md b/src/adr/0013-precompute-jmap-preview.md
new file mode 100644
index 0000000..e683ab2
--- /dev/null
+++ b/src/adr/0013-precompute-jmap-preview.md
@@ -0,0 +1,48 @@
+# 13. Precompute JMAP Email preview
+
+Date: 2019-10-09
+
+## Status
+
+Proposed
+
+Adoption needs to be backed by some performance tests.
+
+## Context
+
+JMAP messages have a handy preview property displaying the firsts 256 characters of meaningful test of a message.
+
+This property is often displayed for message listing in JMAP clients, thus it is queried a lot.
+
+Currently, to get the preview, James retrieves the full message body, parse it using MIME parsers, removes HTML and keep meaningful text.
+
+## Decision
+
+We should pre-compute message preview.
+
+A MailboxListener will compute the preview and store it in a MessagePreviewStore.
+
+We should have a Cassandra and memory implementation.
+
+When the preview is precomputed then for these messages we can consider the "preview" property as a metadata.
+
+When the preview is not precomputed then we should compute the preview for these messages, and save the result for later.
+
+We should provide a webAdmin task allowing to rebuild the projection. The computing and storing in MessagePreviewStore 
+is idempotent and the task can be run in live without any concurrency problem.
+
+Some performance tests will be run in order to evaluate the improvements.
+
+## Consequences
+
+We expect a huge performance enhancement for JMAP clients relying on preview for listing mails.
+
+In case of a less than 5% improvement, the code will not be added to the codebase and the proposal will get the status 'rejected'.
+
+## References
+
+ - https://jmap.io/server.html#1-emails JMAP client guice states that preview needs to be quick to retrieve
+
+ - Similar decision had been taken at FastMail: https://fastmail.blog/2014/12/15/dec-15-putting-the-fast-in-fastmail-loading-your-mailbox-quickly/
+
+ - [JIRA](https://issues.apache.org/jira/browse/JAMES-2919)


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org