You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@james.apache.org by GitBox <gi...@apache.org> on 2020/11/12 09:23:46 UTC

[GitHub] [james-project] rouazana commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

rouazana commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r521952649



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.

Review comment:
       and object storage ?
   (btw I don't see in which case Email/get uses ES)

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.

Review comment:
       and Cassandra (in particular for rights/acl checking)

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd",
+       "after": "aDate"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```

Review comment:
       no draft?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date

Review comment:
       idem?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.

Review comment:
       what does this sentence mean?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.

Review comment:
       do we have an idea of the load that will be added to Cassandra? and of the size of the data we add to Cassandra?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit

Review comment:
       where is the limit? implicit?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd",
+       "after": "aDate"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+### C: Email list sorted by sentAt, with limit, after a given sentAt date
+
+Draft:

Review comment:
       no rfc?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd",
+       "after": "aDate"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+### C: Email list sorted by sentAt, with limit, after a given sentAt date
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"after":"aDate", "inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+

Review comment:
       what about pagination requests? will they be handled? how?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org