You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@james.apache.org by GitBox <gi...@apache.org> on 2020/11/11 08:23:13 UTC

[GitHub] [james-project] chibenwa opened a new pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

chibenwa opened a new pull request #259:
URL: https://github.com/apache/james-project/pull/259


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522867355



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.

Review comment:
       oh yes, i now understand. The ambiguity comes from the fact I expect responsibilities in this list, not details about how Cassandra works.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r523442001



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.

Review comment:
       This sentence comes after a lot of assumptions: we know the usage pattern of users, we know the client they use, we don't care about QoS of services (ES in this case), etc.
   
   The discussion we just had make these things apparent but the ADR does not AFAICT. 
   
   I'm afraid people draw false conclusion by reading the ADR without the comments.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522569096



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd",
+       "after": "aDate"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+### C: Email list sorted by sentAt, with limit, after a given sentAt date
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"after":"aDate", "inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+

Review comment:
       by reading position+limit items from the view then doing soft filtering as needed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522877330



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending on requested properties, this is either
+ retrieved from Cassandra alone or from ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction for listing emails. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more services for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored into Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+The expected added load to Cassandra is low, as the search is a simple Cassandra read. As we only store messageId,
+Cassandra dataset size will only grow of a few percents if enabled.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.

Review comment:
       Why would we prefer the Cassandra solution to it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522874294



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending on requested properties, this is either
+ retrieved from Cassandra alone or from ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction for listing emails. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more services for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored into Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+The expected added load to Cassandra is low, as the search is a simple Cassandra read. As we only store messageId,
+Cassandra dataset size will only grow of a few percents if enabled.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.

Review comment:
       ```suggestion
   Another solution is to implement the projecting using a in-memory datagrid such as infinispan. The projection would be computed using a MailboxListener and the data would be first fetched from this cache and fallback to ElasticSearch.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on pull request #259:
URL: https://github.com/apache/james-project/pull/259#issuecomment-729336392


   Merged


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] rouazana commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
rouazana commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r521952649



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.

Review comment:
       and object storage ?
   (btw I don't see in which case Email/get uses ES)

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.

Review comment:
       and Cassandra (in particular for rights/acl checking)

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd",
+       "after": "aDate"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```

Review comment:
       no draft?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date

Review comment:
       idem?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.

Review comment:
       what does this sentence mean?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.

Review comment:
       do we have an idea of the load that will be added to Cassandra? and of the size of the data we add to Cassandra?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit

Review comment:
       where is the limit? implicit?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd",
+       "after": "aDate"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+### C: Email list sorted by sentAt, with limit, after a given sentAt date
+
+Draft:

Review comment:
       no rfc?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd",
+       "after": "aDate"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+### C: Email list sorted by sentAt, with limit, after a given sentAt date
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"after":"aDate", "inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+

Review comment:
       what about pagination requests? will they be handled? how?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r521996112



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd",
+       "after": "aDate"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```

Review comment:
       Intended. Draft do have only a single "date" field, there is no distinction between sentAt and receivedAt




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa closed pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa closed pull request #259:
URL: https://github.com/apache/james-project/pull/259


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] Arsnael commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
Arsnael commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r521809406



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,55 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffer the following flows:
+ - Updates of flags leads to updates of the all Email object, leading to sparse segments

Review comment:
       s/leads/lead

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,55 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffer the following flows:

Review comment:
       s/flows/flaws

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,55 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffer the following flows:

Review comment:
       s/suffer/suffers

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,55 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffer the following flows:
+ - Updates of flags leads to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage need to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and perform full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expected better performances by resolving such queries against Cassandra.

Review comment:
       s/expected/expect

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,55 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffer the following flows:
+ - Updates of flags leads to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage need to be adapted to known access patterns.

Review comment:
       s/need/needs

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,55 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffer the following flows:
+ - Updates of flags leads to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage need to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and perform full text search.

Review comment:
       s/perform/performs




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on pull request #259:
URL: https://github.com/apache/james-project/pull/259#issuecomment-726698393


   For solving the scrolling issue, we can design a (git like) DAG to store entries and associate a DAG node to a scrolling state by using state feature of JMAP.
   By the way, we'll need to implement `state` at one point.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522865726



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.

Review comment:
       Cassandra is the source of truth for metadata
   
    -> I think you have no problem understanding this 
   
   its storage needs to be adapted to known access patterns.
   
    -> This come from Cassandra storage constraints. You need to plan your reads ahead (or allow filtering and kill your cluster)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on pull request #259:
URL: https://github.com/apache/james-project/pull/259#issuecomment-726699644


   What is a DAG ?
   
   Or we can wait scrolling being a problem before over-engineering it.
   
   So far, we just don't know if the current proposal is good enough or not.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522914910



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending on requested properties, this is either
+ retrieved from Cassandra alone or from ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction for listing emails. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more services for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored into Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+The expected added load to Cassandra is low, as the search is a simple Cassandra read. As we only store messageId,
+Cassandra dataset size will only grow of a few percents if enabled.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.

Review comment:
       Can you write it down on your proposal so I can one-click apply it?
   
   Also memory is more expensive than disks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r524192956



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.

Review comment:
       Clarified: `given clients following well defined Email/query requests` + `of basic usages (mailbox content listing)`
   
   Out of these assumptions, of course ElasticSearch is a critical dependency.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] rouazana commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
rouazana commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522844247



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -156,4 +156,6 @@ PRIMARY KEY mailboxId
 CLUSTERING COLUMN messageId
 COLUMN sentAt
 COLUMN receivedAt
-```
\ No newline at end of file
+```
+
+Note that to handle position & limit, we need to fetch `position + limit` ordered items then removing `position` firsts items.

Review comment:
       so if I scroll quickly n times, I will generate 1+2+...+n = n*(n+1)/2 cassandra requests ~= O(n²)
   
   that's pretty bad, no? Couldn't it be a cause of ElasticSearch slowness? Could it slow down Cassandra?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] rouazana commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
rouazana commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522789892



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd",
+       "after": "aDate"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+### C: Email list sorted by sentAt, with limit, after a given sentAt date
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"after":"aDate", "inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+

Review comment:
       I think that's a key information (maybe that is one key point causing slow down on ElasticSearch), so could you elaborate a little and explain it in the document please?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522859979



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+The expected added load to Cassandra is low, as the search is a simple Cassandra read. As we only store messageId,
+Cassandra dataset size will only grow of a few percents if enabled.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep

Review comment:
       I'll let you do a PR for documenting this alternative in this ADR if you are willing to :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on pull request #259:
URL: https://github.com/apache/james-project/pull/259#issuecomment-726686135


   > the problem of position and limit is hard, it could have consequences on Cassandra.
   
   We are returning a full list on metadata on every IMAP synchronisation (that does a **full** fetch because we do not support QRSYNC). Clients trigger this every 15 minutes or so, and it get executed (with extra metadata on mutable data) in 1-2 seconds for mailboxes around 200.000 mails.
   
   This is a VERY rare operation in JMAP.
   
   I'm not scared ;-)
   
   If you are (or other people are) they can turn that of.
   
   If users run into issues in production plateform, they can disable this.
   
   Of course if that turns out being a bad idea, that could be removed from the code base and this ADR abandonned. But let's give a chance to this experimental feature a chance first, because I really believe that is the best decision we can take about ElasticSearch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] rouazana commented on pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
rouazana commented on pull request #259:
URL: https://github.com/apache/james-project/pull/259#issuecomment-727059258


   > Exactly why writing an ADR before doing a PoC may not be the best idea
   
   And doing a PoC without a proper ADR is often misunderstood.
   
   Here we have some kind of feature flag, so it can be easily tried and removed if not conclusive. The ADR is interesting because without it the first question I would have asked would have been: "why do you want to do this", and the second one "how do you handle pagination". And thus long debates which are really better explained here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522877081



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.

Review comment:
       We just want to have a simple that always work **.** If that webmail exposes some advanced features that have a lower availability, then it is nice too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522822322



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either

Review comment:
       ```suggestion
    - `Email/get` to retrieve various levels of details. Depending on requested properties, this is either
   ```

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance

Review comment:
       > So, ElasticSearch is queried on every JMAP interaction
   
   Not exactly. It's queried for listing emails.

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+The expected added load to Cassandra is low, as the search is a simple Cassandra read. As we only store messageId,
+Cassandra dataset size will only grow of a few percents if enabled.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep

Review comment:
       it's not an alternative to this decision.
   Maintaining a memory cache would be another solution

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.

Review comment:
       ```suggestion
   Relying on more services for every read also harms our resiliency as ElasticSearch outages have major impacts.
   ```

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.

Review comment:
       I don't understand this sentence

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.

Review comment:
       any clue why?

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.

Review comment:
       ```suggestion
   This view will be stored into Cassandra, and updated asynchronously via a MailboxListener.
   ```

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.

Review comment:
       ```suggestion
    retrieved from Cassandra alone or from ObjectStorage.
   ```

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.

Review comment:
       What do you expect? If you loose any service you loose James availability: S3, Cassandra, RabbitMQ, ElasticSearch.
   Why would we want to support unavailability of highly available services in the first place? 

##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.

Review comment:
       untrue, you can't assume people won't trigger complex queries




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522863940



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -156,4 +156,6 @@ PRIMARY KEY mailboxId
 CLUSTERING COLUMN messageId
 COLUMN sentAt
 COLUMN receivedAt
-```
\ No newline at end of file
+```
+
+Note that to handle position & limit, we need to fetch `position + limit` ordered items then removing `position` firsts items.

Review comment:
       True for Cassandra.
   
   True for ElasticSearch.
   
   JMAP includes some limits concurent call, rate limiting - that can help mitigating these concenrs in the future.
   
   >  Couldn't it be a cause of ElasticSearch slowness?
   
   Maybe for some.
   
   I succeeded to clearly link some to reindexing as well thanks to @tuanlc .




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r521995324



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit

Review comment:
       Yes. JMAP have an implicit limit of 256 if nothing is provided.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on pull request #259:
URL: https://github.com/apache/james-project/pull/259#issuecomment-727934770


   > JMAP state maps to this concept. (DAG)
   
   @mbaechler I would be curious to know why you think that. Can you develop a bit?
   
   I think before starting complicated developments, having a flat, ordered list of changes, served from oldest to newest is way easier to implement than the "from newest to oldest using some intermediate temporary states" documented as an optimization by the spec.
   
   Would that be what you reference as a DAG?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522868265



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.

Review comment:
       What's the point having global queries if you don't expect to build valuable features based on that? Having only simple queries sounds like: we just want a simple webmail to me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522859493



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.

Review comment:
       > untrue, you can't assume people won't trigger complex queries
   
   I agree with this statement.
   
   However complex queries won't happen for carefully crafted "basic email display queries" our users are careful about using for their basic use cases <3




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522891334



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending on requested properties, this is either
+ retrieved from Cassandra alone or from ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction for listing emails. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more services for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored into Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+The expected added load to Cassandra is low, as the search is a simple Cassandra read. As we only store messageId,
+Cassandra dataset size will only grow of a few percents if enabled.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.

Review comment:
       Cassanra:
   + already there
   + well mastered
   In-Memory datagrid:
   + much faster
   + much less restrictions to what can be done (retrieving data and filtering with some code is less of a problem)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r521994982



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.

Review comment:
       Very low. Reading 30 rows is a very small tasks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522880077



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.

Review comment:
       It's responsibility is to handle known, common data access pattern, that's not mutually exclusive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r524185708



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending on requested properties, this is either
+ retrieved from Cassandra alone or from ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction for listing emails. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more services for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored into Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+The expected added load to Cassandra is low, as the search is a simple Cassandra read. As we only store messageId,
+Cassandra dataset size will only grow of a few percents if enabled.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.

Review comment:
       Done. Documenting alternative is important. Commit history will link to that thread for curious peoples.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on pull request #259:
URL: https://github.com/apache/james-project/pull/259#issuecomment-726897847


   > This proposal is a small implementation effort. Discarding it when needed won't be a problem
   
   Exactly why writing an ADR before doing a PoC may not be the best idea


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r523108888



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.

Review comment:
       I don't agree at all but it's not a problem because which webmail you want to offer to your customer is not a James matter (and this feature is optimal).
   Could we just drop this sentence as it's not a consensus?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on pull request #259:
URL: https://github.com/apache/james-project/pull/259#issuecomment-725281716


   https://issues.apache.org/jira/browse/JAMES-3440 is the JIRA entry for this...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522865726



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.

Review comment:
       Cassandra is the source of truth for metadata
   
    -> I think you have no problem understanding this 
   
   its storage needs to be adapted to known access patterns.
   
    -> This come from Cassandra storage constraints. You need to plan your reads ahead (or allow filtering and kill your cluster)
   
   It seems pretty clear to me as it is, please do not hesitate to suggest enhencements.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on pull request #259:
URL: https://github.com/apache/james-project/pull/259#issuecomment-726726768


   > The problem is, if you don't include the needed complexity from the start, you won't know how it will behave once you include the complexity and thus you may loose your time.
   
   I take the risk.
   
   This proposal is a small implementation effort. Discarding it when needed won't be a problem
   
   > A DAG is a direct acyclic graph, like git.
   
   Thanks for the explanation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on pull request #259:
URL: https://github.com/apache/james-project/pull/259#issuecomment-726710922


   The problem is, if you don't include the needed complexity from the start, you won't know how it will behave once you include the complexity and thus you may loose your time.
   
   A DAG is a direct acyclic graph, like git.
   
   Whatever the implementation (a DAG may not be the best idea), the idea is to have a "persistent structure" (every change creates a new immutable state) so that a scroll is bound to a given structure. RBDMS usually implements that using MVCC. JMAP state maps to this concept.
   
   I don't know what is the best implementation for that in Cassandra to be honest. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r523292206



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.

Review comment:
       No. That is one of the more important here.
   
   It is opt-in: you do not necessarily have to consider es as non critical.
   
   But if you want, you can.
   
   What is the problem about alliwing diverging phylosophies?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522867624



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance

Review comment:
       Thanks for the remark, that's right.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] MichaelBailly commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
MichaelBailly commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522129443



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.

Review comment:
       W3 RuL3Z




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r523292206



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.

Review comment:
       That is one of the more important here.
   
   It is opt-in: you do not necessarily have to consider es as non critical.
   
   But if you want, you can.
   
   What is the problem about alliwing diverging phylosophies?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522794738



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,148 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+James powers the JMAP protocol.
+
+A user willing to use a webmail will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ElasticSearch.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored on Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.
+
+## Example of optimized JMAP requests
+
+### A: Email list sorted by sentAt, with limit
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+
+### B: Email list sorted by sentAt, with limit, after a given receivedAt date
+
+RFC-8621:
+
+```
+["Email/query",
+ {
+   "accountId": "29883977c13473ae7cb7678ef767cbfbaffc8a44a6e463d971d23a65c1dc4af6",
+   "filter: {
+       "inMailbox":"abcd",
+       "after": "aDate"
+   }
+   "comparator": [{
+     "property":"sentAt",
+     "isAscending": false
+   }]
+ },
+ "c1"]
+```
+
+### C: Email list sorted by sentAt, with limit, after a given sentAt date
+
+Draft:
+
+```
+[["getMessageList", {"filter":{"after":"aDate", "inMailboxes": ["abcd"]}, "sort": ["date desc"]}, "#0"]]
+```
+

Review comment:
       Well both solutions needs to gather position+limit records then filter them.
   
   Ok to add a line.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] mbaechler commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
mbaechler commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r523110670



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending on requested properties, this is either
+ retrieved from Cassandra alone or from ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction for listing emails. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more services for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.
+
+Regarding Distributed James data-stores responsibilities:
+ - Cassandra is the source of truth for metadata, its storage needs to be adapted to known access patterns.
+ - ElasticSearch allows resolution of arbitrary queries, and performs full text search.
+
+## Decision
+
+Provide an optional view for most common `Email/query` requests both on Draft and RFC-8621 implementations.
+This includes filters and sorts on 'sentAt'.
+
+This view will be stored into Cassandra, and updated asynchronously via a MailboxListener.
+
+## Consequences
+
+A migration task will be provided for new adopters.
+
+Administrators would be offered a configuration option to turn this view on and off as needed.
+
+If enabled administrators would no longer need to ensure high availability and good performances for ElasticSearch.
+We thus expect a decrease in overall ElasticSearch load, allowing savings compared to actual deployments.
+Furthermore, we expect better performances by resolving such queries against Cassandra.
+
+The expected added load to Cassandra is low, as the search is a simple Cassandra read. As we only store messageId,
+Cassandra dataset size will only grow of a few percents if enabled.
+
+## Alternatives
+
+Those not willing to adopt this view will not be affected. By disabling the listener and the view usage, they will keep
+resolving all `Email/query` against ElasticSearch.

Review comment:
       > Can you write it down on your proposal so I can one-click apply it?
   
   Sorry no, I don't care enough to invest the time for that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522867185



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.

Review comment:
       If I loose ES, given that ADR content, I only loose advanced search.
   
   My customers will be waaaay less complaining about "not having search" that "not being able to read their emails".
   
   > Why would we want to support unavailability of highly available services in the first place?
   
   I and the people I work with are human, we do software, there **will** be unavailability on some of those services.
   
   The question now is how we deal with it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #259: [ADR] JMAP: Avoid ElasticSearch on critical reads

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #259:
URL: https://github.com/apache/james-project/pull/259#discussion_r522865150



##########
File path: src/adr/0043-avoid-elasticsearch-on-critical-reads.md
##########
@@ -0,0 +1,161 @@
+# 43. Avoid ElasticSearch on critical reads
+
+Date: 2020-11-11
+
+## Status
+
+Accepted (lazy consensus).
+
+Scope: Distributed James
+
+## Context
+
+A user willing to use a webmail powered by the JMAP protocol will end up doing the following operations:
+ - `Mailbox/get` to retrieve the mailboxes. This call is resolved against metadata stored in Cassandra.
+ - `Email/query` to retrieve the list of emails. This call is nowadays resolved on ElasticSearch for Email search after
+ a right resolution pass against Cassandra.
+ - `Email/get` to retrieve various levels of details. Depending of requested properties, this is either
+ resolved on Cassandra alone or on ObjectStorage.
+
+So, ElasticSearch is queried on every JMAP interaction. Administrators thus need to enforce availability and good performance
+for this component.
+
+Relying on more software for every read also harms our resiliency as ElasticSearch outages have major impacts.
+
+Also we should mention our ElasticSearch implementation in Distributed James suffers the following flaws:
+ - Updates of flags lead to updates of the all Email object, leading to sparse segments
+ - We currently rely on scrolling for JMAP (in order to ensure messageId uniqueness in the response while respecting limit & position)
+ - We noticed some very slow traces against ElasticSearch, even for simple queries.

Review comment:
       And clue why.
   
   But ElasticSearch slow performance likely would require its own ADR. That's a lengthy topic.
   
   Paging is one, there's many others. I described scrolling & data mutabilityu above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org