You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@james.apache.org by bt...@apache.org on 2022/09/22 02:50:28 UTC

[james-project] branch master updated: [BOYSCOUT] Write some missing ADRs regarding recent work (#1196)

This is an automated email from the ASF dual-hosted git repository.

btellier pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/james-project.git


The following commit(s) were added to refs/heads/master by this push:
     new 65d7a3d9c3 [BOYSCOUT] Write some missing ADRs regarding recent work (#1196)
65d7a3d9c3 is described below

commit 65d7a3d9c389517c9c100bf3fc2c703d4eb33c34
Author: Benoit TELLIER <bt...@linagora.com>
AuthorDate: Thu Sep 22 09:50:23 2022 +0700

    [BOYSCOUT] Write some missing ADRs regarding recent work (#1196)
    
     - ADR 57: Reactive IMAP
     - ADR 58: Upgrade to Netty 4
     - ADR 59: Upgrade to Cassandra driver 4
     - ADR 60: Adopt bounded elastic
     - ADR 61: Delegation
     - ADR 62: OIDC token introspection
     - ADR 63: Prevent temporary file leaks
---
 src/adr/0051-oidc.md                         |   4 +-
 src/adr/0053-email-rate-limiting.md          |   2 +-
 src/adr/0057-reactive-imap.md                | 101 +++++++++++++++++++++++++++
 src/adr/0058-upgrade-to-netty-4.md           |  66 +++++++++++++++++
 src/adr/0059-upgrade-to-cassadra-driver-4.md |  52 ++++++++++++++
 src/adr/0060-adopt-bounded-elastic.md        |  60 ++++++++++++++++
 src/adr/0061-delegation.md                   |  61 ++++++++++++++++
 src/adr/0062-oidc-token-introspection.md     |  50 +++++++++++++
 src/adr/0063-temporary-file-leaks.md         |  75 ++++++++++++++++++++
 9 files changed, 469 insertions(+), 2 deletions(-)

diff --git a/src/adr/0051-oidc.md b/src/adr/0051-oidc.md
index ef3f76c837..533474e9b3 100644
--- a/src/adr/0051-oidc.md
+++ b/src/adr/0051-oidc.md
@@ -6,7 +6,9 @@ Date: 2021-12-06
 
 Accepted (lazy consensus).
 
-Not yet implemented.
+Implemented.
+
+Complemented by [ADR-62](0062-oidc-token-introspection.md).
 
 ## Context
 
diff --git a/src/adr/0053-email-rate-limiting.md b/src/adr/0053-email-rate-limiting.md
index e79276ade3..bb3e91e5af 100644
--- a/src/adr/0053-email-rate-limiting.md
+++ b/src/adr/0053-email-rate-limiting.md
@@ -6,7 +6,7 @@ Date: 2021-01-10
 
 Accepted (lazy consensus).
 
-Not yet implemented.
+Implemented.
 
 ## Context
 
diff --git a/src/adr/0057-reactive-imap.md b/src/adr/0057-reactive-imap.md
new file mode 100644
index 0000000000..e54cb28c1b
--- /dev/null
+++ b/src/adr/0057-reactive-imap.md
@@ -0,0 +1,101 @@
+# 57. Reactive IMAP
+
+Date: 2022-09-13
+
+## Status
+
+Accepted (lazy consensus).
+
+Implemented. 
+
+## Context
+
+Classic java programming involves doing some blocking calls. When performing some input-output, the
+thread performing the tasks at hand will hang until a response is received. This means that a server 
+performing IOs, which is the case of James, would end up with a high number of threads: for an IMAP
+server, one IMAP request in flight equals one thread.
+
+The widely documented issue with such an approach is that, in Java threads correspond to system threads,
+and are expensive. Typically, each thread takes up to ~1MB of memory. To cap the count of threads at 
+hand, the Thread Pool model is used to cap the count of threads. Process at most N IMAP requests with
+N threads, and queue requests that cannot yet be handled.
+
+The blocking thread pool model was the concurrency paradigm used by James IMAP implementation in 3.7.x.
+
+Reactive programming instead does not perform blocking operation, thus one thread can handle several
+IO tasks. This can be seen as callbacks executed once the request is received. Reactive programming model,
+amongst others, leads to efficient resource usage. It helps keeping the count of applicative threads 
+low, helps reduce context switches.
+
+James have been, among the past few years, transitioning to a reactive programming model using the
+Reactor library. Migrated components involves:
+ - Cassandra/Distributed storage
+ - JMAP protocol stacks
+ - Some (webadmin) tasks also relies on Reactor
+
+It is to be noted that our IMAP network stack uses Netty 4.x library that allows handling requests
+asynchronously.
+
+## Decision
+
+Migrate IMAP to a reactive model.
+
+Implement a way to limit IMAP concurrency to protect James server from bursts.
+
+## Consequences
+
+Significant rewrite of the IMAP stack.
+
+Going reactive on the IMAP stacks yields the following improvements:
+ - Reduced thread count. IMAP request handling can now be dealt with from the Netty IO threads.
+ - Improve latency/performance. Removing the need of an extra scheduling and of synchronisation
+ leads to improved tail latencies and better throughput.
+ 
+Diagnostic tools will be harder to use on top of the IMAP stack:
+ - Stack traces will be more complex to read
+ - Glowroot APM would not work anymore as asynchronous logic are notoriously badly handled
+ - Flame graphs will be messier.
+ 
+Associated risk:
+ - Badly managed blocking calls would now directly impact the Netty event loop thus strongly degrading the performance
+ of the server. Tools like blockhound can help to mitigate this risk.
+
+## Alternatives
+
+From Java 19 onward, similar concerns are addressed by project Loom. This project delivers "Virtual Threads" scheduled
+by the Java Virtual Machine on top of carrier system threads. This approach allow preserving the imperative programing
+style. Tooling also works better.
+
+Yet, this promissing feature:
+
+ - Would only be available for use in Java 21 (LTS), not to be delivered prior to September 2023. Upgrading the James
+ ecosystem to Java 21 Java Development Kit would be a controversial decision in itself and might raise significant debates
+ within the community.
+ - Operational feedback would likely be required prior adoption.
+ - Do not have yet performance comparison with asynchronous libraries
+
+## Follow up work
+
+Some other components have not yet been migrated to a reactive style. This includes:
+
+ - Mail processing within the mailet container
+ - Remote Delivery (outgoing SMTP)
+ - SMTP server (incoming SMTP / LMTP)
+ - WebAdmin
+ 
+With lower throughput, the benefits of migrating such components to a reactive style are lower thus might not match the
+associated rewrite costs and operational risk. As such, the effort had not yet been undertaken, but might be in the future.
+
+It is to be mentioned that APPEND command is buffered to a file, which, when happening is done on a dedicated thread. The
+associated code is complex as Netty 4 ByteToMessageDecoder is meant to be synchronous. Upcoming Netty 5 would enable us
+to simplify associated code.
+
+## References
+
+- [JIRA JAMES-3737](https://issues.apache.org/jira/browse/JAMES-3737)
+- [JIRA JAMES-3816](https://issues.apache.org/jira/browse/JAMES-3816)
+- [Project reactor](https://projectreactor.io/)
+- [Project loom](https://wiki.openjdk.org/display/loom/Main)
+- [Netty](https://netty.io/)
+- [Blockhound](https://github.com/reactor/BlockHound)
+- [Discussion on the mailing list](https://www.mail-archive.com/server-dev@james.apache.org/msg72113.html)
\ No newline at end of file
diff --git a/src/adr/0058-upgrade-to-netty-4.md b/src/adr/0058-upgrade-to-netty-4.md
new file mode 100644
index 0000000000..248991b460
--- /dev/null
+++ b/src/adr/0058-upgrade-to-netty-4.md
@@ -0,0 +1,66 @@
+# 58. Upgrade to netty 4
+
+Date: 2022-09-13
+
+## Status
+
+Accepted (lazy consensus).
+
+Implemented. 
+
+## Context
+
+James 3.7.0 uses Netty 3 for its TCP stacks (IMAP, SMTP, LMTP, MANAGESIEVE, POP3).
+
+Latests Netty 3 minor releases dates from 20XX. This brings concerns regarding security
+(unmaintained dependency).
+
+Netty 4 delivers many enhancements compared to Netty 3. 
+
+ - We can mention here the buffer APIs
+ - Netty 4 empowers the use of native SSL, native event loops
+ 
+
+## Decision
+
+Upgrade James to Netty 4.
+
+## Consequences
+
+Significant rewrite of the protocol stack. Though the count of lines of code involved is low, the code is
+tricky and would need several iteration to be stabilized. For this reason, James 3.7.0 was released prior
+the Netty 4 adoption.
+ 
+Associated risk:
+ - Buffer leaks. Manual buffer management can be error prone. Netty project includes relevant 
+ tooling.
+ - We encountered issues with write ordering (when execute on, or outside of, the event loop)
+
+Benefits: Netty 4 natively supports graceful shutdown, which enables easily implementing such a feature in 
+James.
+
+Netty 4 upgrades was also the opportunity to significantly improve test coverage for IMAP protocol stacks
+and IMAP 4 extensions, which allowed fixing numerous bugs. To be mentioned:
+ - COMPRESSION
+ - STARTTLS and SSL
+ - IMAP IDLE
+ - QRESYNC and CONDSTORE
+ 
+To be noted that Netty version used by James needs to be aligned with those of our dependencies:
+ - S3 driver
+ - Cassandra 4 driver
+ - Lettuce driver (Redis for rate limiting)
+ - Reactor HTTP
+ 
+Netty upgrades thus needs to be carefully done. So far we successfully aligned versions, so the overall
+compatibility looks good.
+
+## Follow up work
+
+We did not yet succeed to enable native epoll, native SSL.
+
+## References
+
+- [JIRA JAMES-3715](https://issues.apache.org/jira/browse/JAMES-3715)
+- [Netty](https://netty.io)
+- [Netty 4 migration](https://netty.io/wiki/new-and-noteworthy-in-4.0.html)
diff --git a/src/adr/0059-upgrade-to-cassadra-driver-4.md b/src/adr/0059-upgrade-to-cassadra-driver-4.md
new file mode 100644
index 0000000000..d7bf8e2436
--- /dev/null
+++ b/src/adr/0059-upgrade-to-cassadra-driver-4.md
@@ -0,0 +1,52 @@
+# 59. Upgrade to Cassandra driver 4
+
+Date: 2022-09-13
+
+## Status
+
+Accepted (lazy consensus).
+
+Implemented. 
+
+## Context
+
+Cassandra driver 3 latest release took pace in 201X. An outdated dependency pauses a security risk.
+
+Moreover Cassandra driver 4:
+ - Have prime reactive support and do not need to pass through an intermediate "CompletableFuture"
+ representation which is more optimised.
+ - In Cassandra driver 3, paging is blocking. Handling paging the reactive way means we do not 
+ need to aggressively switch the processing to an elastic thread, which helps keeping the count of 
+ threads low.
+ - Cassandra driver 4 has built in support for advanced driver configuration tuning without the 
+ need for the application to programmatically configure the driver. This means great flexibility without
+ the need for the application to proxy the driver configuration parameters.
+
+## Decision
+
+Upgrade James to Cassandra driver 4.
+
+Handle Cassandra requests callbacks on the Cassandra driver thread.
+
+## Consequences
+
+We expect a performance enhancement for native reactive support from the Cassandra driver, as well
+as a better thread handling which translate to improved latencies/throughput.
+
+This code change needs a rewrite of all the Cassandra data access layer.
+
+Associated risk:
+ - Blocking on the Cassandra driver thread is prohibited. Tools like Blockhound can be adapted to 
+ mitigate this.
+ - Blocking a Cassandra request from within a Cassandra driver thread result in a request timeout.
+ - Long running computation should be switched to other threads as it can prevent other Cassandra 
+ requests from being executed.
+ 
+Mechanisms related to Cassandra driver configuration will need to be reworked. This leads to a breaking
+change but our Cassandra driver configuration will be simpler and more standard.
+
+## References
+
+- [JIRA JAMES-3774](https://issues.apache.org/jira/browse/JAMES-3774)
+- [Cassandra driver upgrade instructions](https://docs.datastax.com/en/developer/java-driver/4.0/upgrade_guide/)
+- [BlockHound](https://github.com/reactor/BlockHound)
diff --git a/src/adr/0060-adopt-bounded-elastic.md b/src/adr/0060-adopt-bounded-elastic.md
new file mode 100644
index 0000000000..6dcea13171
--- /dev/null
+++ b/src/adr/0060-adopt-bounded-elastic.md
@@ -0,0 +1,60 @@
+# 60. Adopt bounded elastic
+
+Date: 2022-09-13
+
+## Status
+
+Accepted (lazy consensus).
+
+Implemented. 
+
+## Context
+
+See description of blocking vs reactive programming odel described in [ADR-57](0057-reactive-imap.md).
+
+Sometimes reactive code needs to execute blocking tasks. 
+
+ - This is the case when inter-operating with blocking libraries
+ - James reactive adoption is progressive, some parts of our application
+ are thus not reactive yet, which can result in reactive code calling
+ blocking code calling reactive code.
+ 
+Historically James used the elastic scheduler for scheduling such blocking calls. This scheduler
+starts a thread for each task submitted (and attempt to reuse threads when possible) and results
+in high thread count.
+
+That is why project reactor deprecated the elastic scheduler in favor of bounded-elastic (similar to
+a thread pool).
+
+## Decision
+
+Migrate from elastic scheduler to bounded-elastic scheduler.
+
+## Consequences
+
+We expect a reduction in used threads under load from such a change.
+
+Also, getting rid of elastic scheduler might be a requirement to upgrade to reactor 3.5 upward.
+
+Associated risk:
+ - In some places, James protocol code is reactive, calls blocking intermediates API, to 
+ end up calling reactive data access code. This results in nested blocking calls. Nested blocking
+ calls, when using the same scheduler with a bounded thread cap, can result in a dead lock.
+
+To prevent such a dead-lock code executed on bounded-elastic should not depend on scheduling nested
+blocking call on bounded-elastic scheduler for its completion. We can thus avoid such a situation by 
+using a scheduler dedicated by nested blocking calls.
+
+## Alternatives
+
+Alternatives to the "blocking call wrapper" scheduler described above includes a full reactive 
+migration for Apache James (ie no blocking calls meaning no nested blocking calls).
+
+While this clearly is the target such a work is not realistically done within a limited time frame. 
+As such we need a transitional model allowing reactive code to inter-operate with legacy blocking 
+James code.
+
+## References
+
+- [JIRA JAMES-3773](https://issues.apache.org/jira/browse/JAMES-3773)
+- [Reactor deprecating elastic scheduler](https://github.com/reactor/reactor-core/issues/1893)
diff --git a/src/adr/0061-delegation.md b/src/adr/0061-delegation.md
new file mode 100644
index 0000000000..f18b757288
--- /dev/null
+++ b/src/adr/0061-delegation.md
@@ -0,0 +1,61 @@
+# 61. Delegation
+
+Date: 2022-09-13
+
+## Status
+
+Accepted (lazy consensus).
+
+Implemented. 
+
+## Context
+
+Delegation is a common feature for email servers:
+
+```
+As user A I want to access mailbox of user B.
+```
+
+James currently supports a similar feature called impersonation:
+
+```
+As an administrator I want to acces mailbox of user B.
+```
+
+Impersonation can for instance be used to perform migrations with tools like IMAP-Sync.
+
+## Decision
+
+Implement delegation in Apache James (opt in).
+
+Reuse APIs used for impersonation to also back delegation up. Technically if user B delegates his
+account to user A then user A can impersonate user B.
+
+Stored delegated access in a Cassandra database and expose it through webadmin.
+
+Support delegation while logging in IMAP/SMTP. Both LOGIN/PLAIN authentication and OIDC
+authentication are supported.
+
+## Consequences
+
+Logging traces belongs to the target user and not the user that really authenticated
+though an intermediate log upon logging should allow a correlation.
+
+Associated risk:
+ - Special care needs to be paid to delegation as it can fall into the `improper authorization` 
+ attack class and might result in data leaks / modification / deletion. However, as delegation 
+ is performed upon logging the attack surface is limited, and typically simpler than traditional
+ right management systems.
+
+## Follow-up work
+
+JMAP integration. We can expose delegated accounts through the JMAP session object and support
+using non-default JMAP accounts.
+
+This might come at the price of one Cassandra read per JMAP request when interacting with delegated 
+accounts.
+
+## References
+
+- [JIRA JAMES-3756](https://issues.apache.org/jira/browse/JAMES-3756)
+- [IMAP Sync](https://imapsync.lamiral.info/)
diff --git a/src/adr/0062-oidc-token-introspection.md b/src/adr/0062-oidc-token-introspection.md
new file mode 100644
index 0000000000..c9533dde37
--- /dev/null
+++ b/src/adr/0062-oidc-token-introspection.md
@@ -0,0 +1,50 @@
+# 62. OIDC token introspection
+
+Date: 2022-09-13
+
+## Status
+
+Accepted (lazy consensus).
+
+Implemented. 
+
+Complements [ADR 51](0051-oidc.md).
+
+## Context
+
+[ADR 51](0051-oidc.md) describes work required for OIDC adoption within James.
+
+This work enables the uses of an OIDC access token to authenticate using IMAP and SMTP.
+It validates the signature of the token using cryptographic materials exposes by the 
+Identity Provider server through the mean of a JWKS endpoint. Yet no effort is made to
+see if the access token in question was revoked or not, which can pause a security threat.
+
+OIDC ecosystem can support the following mechanisms to determine if an access token had been 
+revoked:
+
+ - Use of an introspection endpoint: the application asks the OIDC server to validate the token
+ through an HTTP call. This result in load on the identity provider, which becomes central to the
+ authentication process. This can be assimilated to a 'synchronous' mode.
+ - Use of back-channel upon token revocation. The OIDC provider is then responsible to call an 
+ applicative endpoint to invalidate a token. Invalidated tokens are then stored by the application
+ and access token are challenged against that storage. This approach scales better yet is harder 
+ to implement and can be seen as less secure (a network incident can prevent revoked token 
+ propagation to applications for instance). This can be seen as an 'asynchronous' mode.
+ 
+Also, we need to keep in mind that OIDC validation is only done upon establishing the connection in 
+IMAP/SMTP (as they are connected protocols) which defers from stateless protocols like HTTP. Performance
+impact of token introspection is thus lower for connected protocols.
+
+## Decision
+
+Allow opt-in use of an introspection endpoint to further secure IMAP/SMTP OIDC implementation.
+
+## Consequences
+
+Security gains for the IMAP/SMTP OIDC implementation in James.
+
+## References
+
+- [JIRA JAMES-3755](https://issues.apache.org/jira/browse/JAMES-3755)
+- [RFC-7628](https://www.rfc-editor.org/rfc/rfc7628.html) SASL OATH mechanism for SMTP and IMAP: https://datatracker.ietf.org/doc/html/rfc7628
+- [RFC-7662](https://datatracker.ietf.org/doc/html/rfc7662) OAUTH token introspection
\ No newline at end of file
diff --git a/src/adr/0063-temporary-file-leaks.md b/src/adr/0063-temporary-file-leaks.md
new file mode 100644
index 0000000000..6faa3b1bd2
--- /dev/null
+++ b/src/adr/0063-temporary-file-leaks.md
@@ -0,0 +1,75 @@
+# 63. Prevent temporary file leaks
+
+Date: 2022-09-13
+
+## Status
+
+Accepted (lazy consensus).
+
+Implemented. 
+
+## Context
+
+In order to reduce memory allocation, Apache James buffers big emails into temporary files, in both SMTP
+and during the email processing.
+
+This means that all James entities relying on `Mail` object needs to have resource management in place
+to prevent temporary file leaks. If not this can result in both unreasonable disk occupation and file
+descriptor leaks.
+
+Core components were found to badly manage those resources. Some mailets (bounce), mail queue APIs, mail 
+repository APIs were found to be causing temporary file leaks.
+
+James allows users to define custom mailets, that them too can badly manage emails. If insiders tend to
+commit such errors, then we should expect our users to commit them too.
+
+This points toward the need of a systematic approach to detect and mitigate temporary file leaks.
+
+Similar leak detection is performed by other libraries. One can mention here Netty buffer libraries,
+which relies on phantom references to detect leaks. Phantom references allows the Java garbage collector
+to report recently GCed object. Upon phantom reference allocation, a check can then be done to check recently 
+GCed object, and release related resources if need be.
+
+## Decision
+
+Implement a leak detector "a la Netty" using phantom references.
+
+Allow several modes:
+ - Off: does nothing which is James 3.6.x behaviour.
+ - Simple: logs leaks and deletes associated files
+ - Advanced: Same as simple but with the allocation stack trace which comes at a performance cost.
+ Useful for diagnostic.
+ - Testing: Assertion errors upon leaks. Can be used to streangthen test suite and detect leaks as part 
+ of a build.
+
+## Consequences
+
+Makes James core components "leak proofed".
+
+Allows user to safely write extensions involving emails (eg duplication / sending).
+
+Performance impact of turning on "simple" leak detection (default behaviour) is expected not 
+to be significant.
+
+## Alternatives
+
+Similar functionalities can be implemented in a simpler way by relying on Java finalize method, called 
+upon garbage collection.
+
+Yet, such a move should not be done as many operational problems come through the use of 'finalize':
+ - Longer GC pauses as finalise runs
+ - GC is significantly slower when finalize method exists
+ - During finalizing stage objects are still live which ease GC promotion to old generation and can result
+ in so called 'kill the world GCs'.
+
+## Follow up work
+
+Mail management in James test suite is poor which leads to many false positive and prevents
+us from leveraging leaks detector benefits as part of our test suite. Significant work would
+be needed to do so.
+
+## References
+
+- [JIRA JAMES-3724](https://issues.apache.org/jira/browse/JAMES-3724)
+- [Leak detection in Netty](https://netty.io/wiki/reference-counted-objects.html)
+- [Phantom references](https://www.baeldung.com/java-phantom-reference)


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org