You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@james.apache.org by GitBox <gi...@apache.org> on 2020/11/13 09:36:36 UTC

[GitHub] [james-project] mbaechler commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

mbaechler commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522822322



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either

Review comment:
       ```suggestion
    - `Email/get` to retrieve various levels of details. Depending on requested properties, this is either
   ```

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance

Review comment:
       > So, ElasticSearch is queried on every JMAP interaction
   
   Not exactly. It's queried for listing emails.

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+The expected added load to Cassandra is low, as the search is a simple Cassandra read. As we only store messageId,
+Cassandra dataset size will only grow of a few percents if enabled.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep

Review comment:
       it's not an alternative to this decision.
   Maintaining a memory cache would be another solution

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.

Review comment:
       ```suggestion
   Relying on more services for every read also harms our resiliency as ElasticSearch outages have major impacts.
   ```

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.

Review comment:
       I don't understand this sentence

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.

Review comment:
       any clue why?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.

Review comment:
       ```suggestion
   This view will be stored into Cassandra, and updated asynchronously via a MailboxListener.
   ```

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.

Review comment:
       ```suggestion
    retrieved from Cassandra alone or from ObjectStorage.
   ```

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.

Review comment:
       What do you expect? If you loose any service you loose James availability: S3, Cassandra, RabbitMQ, ElasticSearch.
   Why would we want to support unavailability of highly available services in the first place? 

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.

Review comment:
       untrue, you can't assume people won't trigger complex queries




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org