You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Josh <an...@protonmail.com.INVALID> on 2022/01/23 18:44:22 UTC

[net] IMAP Memory considerations with large ‘FETCH’ sizes.

IMAP Memory considerations with large ‘FETCH’ sizes.

The following comments concern classes in the [org.apache.common.net.imap](https://commons.apache.org/proper/commons-net/apidocs/org/apache/commons/net/imap/package-summary.html) package.

Consider the following imap ‘fetch’ exchange between a client (>) and server (<):

> A654 FETCH 1:2 (BODY[TEXT])

< * 1 FETCH (BODY[TEXT] {80000000}\r\n

…

< * 2 FETCH …

< A654 OK FETCH completed

The first untagged response (* 1 FETCH …) contains a literal {80000000} indicating the data to come will be a size of 80MB.

After reviewing the [source](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l298), it is my understanding, the entire 80MB sequence of data will be read into Java memory even when using ‘[IMAPChunkListener](https://commons.apache.org/proper/commons-net/apidocs/org/apache/commons/net/imap/IMAP.IMAPChunkListener.html)’. According the the documentation:

Implement this interface and register it via [IMAP.setChunkListener(IMAPChunkListener)](https://commons.apache.org/proper/commons-net/apidocs/org/apache/commons/net/imap/IMAP.html#setChunkListener-org.apache.commons.net.imap.IMAP.IMAPChunkListener-) in order to get access to multi-line partial command responses. Useful when processing large FETCH responses.

It is apparent the partial fetch response is read in full (80MB) before invoking the ‘IMAPChunkListener’ and then discarding the read lines (freeing up memory).

Back to the example:

> A654 FETCH 1:2 (BODY[TEXT])

< * 1 FETCH (BODY[TEXT] {80000000}\r\n

…. <— read in full into memory then discarded after calling IMAPChunkListener

< * 2 FETCH (BODY[TEXT] {250}\r\n

…. <— read in full into memory then discarded after calling IMAPChunkListener

< A654 OK FETCH completed

Above, you can see the chunk listener is good for each individual partial fetch response but does not prevent a large partial from being loaded into memory.

Let’s review the [code](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l298):

[296](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l296) int literalCount = IMAPReply.literalCount(line);

Above counts the size of the literal, in our case 80000000 or 80MB (for the first partial fetch response).

[297](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l297) final boolean isMultiLine = literalCount >= 0;

[298](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l298) while (literalCount >= 0) {

[299](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l299) line=_reader.readLine();

[300](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l300) if (line == null) {

[301](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l301) throw new EOFException("Connection closed without indication.");

[302](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l302) }

[303](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l303) replyLines.add(line);

[304](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l304) literalCount -= line.length() + 2; // Allow for CRLF

[305](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l305) }

The literal count above starts at 8000000 and is decremented until reaching non-zero negative val where 80MB is read in full and the while loop returns.

[306](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l306) if (isMultiLine) {

[307](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l307) final IMAPChunkListener il = chunkListener;

[308](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l308) if (il != null) {

[309](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l309) final boolean clear = il.chunkReceived(this);

[310](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l310) if (clear) {

[311](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l311) fireReplyReceived(IMAPReply.PARTIAL, getReplyString());

[312](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l312) replyLines.clear();

Now, after all 80MB is loaded into memory in full, invoke the IMAPChunkListener and finally throw away the lines freeing memory. And moving onto the next.

[313](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l313) }

[314](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l314) }

[315](https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l315) }

I’m considering modifying the getReply() method, shown above, to chunk the partial responses breaking up the literal so that it’s not loaded into memory in full. This is to prevent the entire 80MB literal value from being loaded into memory.

This would be configurable as not to break the existing users of the API. Something like .setBreakLargeLiteralSize(true), when breakUpLargeLiteralSize is true, a maxLiteralBuffer value is used to chunk the literal preventing all 80MB from being loaded in full, instead loading chunks of it, calling IMAPChunkListener sooner than later. This would require implementations of IMAPChunkListener to handle this behavior if it was turned on. The default behavior will see this chunking disabled as to not break the existing users. Essentially an opt-in feature reducing the risk.

What are you thoughts or concerns with this? Do you agree?

Re: [net] IMAP Memory considerations with large ‘FETCH’ sizes.

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

[Cc to the "dev" ML.]

Le lun. 24 janv. 2022 à 06:05, Josh <an...@protonmail.com> a écrit :
>
> I tried that, to reach the [net] mailing list but it didn’t work

Your message did reach the ML; see it in the archive:
   https://markmail.org/message/xerlcemds4ro43nn

> so I opened an enhancement req on Jira.

Good.
You can of course, reply to your own message on the ML,
in order to provide the link to the corresponding JIRA report.

>
> Could you kindly get this over to net maintainers for my behalf?

Anyone subscribed to the "dev" ML can read and comment
on your proposal.
Please note that nowadays several maintainers are more
responsive to GitHub pull requests.  However, if your
enhancement implies a lot of work, you should indeed
get some beforehand acknowledgment that your contribution
will be accepted (pending that it complies with the code
requirements).

Regards,
Gilles

Le lun. 24 janv. 2022 à 02:40, Gilles Sadowski <gi...@gmail.com> a écrit :
>
> Hi.
>
> I assume that you wanted to send the message
> below to the "dev" ML...
>
> Le lun. 24 janv. 2022 à 02:30, Josh <an...@protonmail.com> a écrit :
> >
> > Gary,
> > Could you kindly direct me to who would be the best person to review the comments below? I’m seeking comment on the following matter::
> >
> > IMAP Memory considerations with large ‘FETCH’ sizes.
> >
> >
> > The following comments concern classes in the org.apache.common.net.imap package.
> >
> >
> >
> > Consider the following imap ‘fetch’ exchange between a client (>) and server (<):
> >
> > > A654 FETCH 1:2 (BODY[TEXT])
> >
> > < * 1 FETCH (BODY[TEXT] {80000000}\r\n
> >
> > …
> >
> > < * 2 FETCH …
> >
> > < A654 OK FETCH completed
> >
> >
> > The first untagged response (* 1 FETCH …) contains a literal {80000000} indicating the data to come will be a size of 80MB.
> >
> >
> > After reviewing the source, it is my understanding, the entire 80MB sequence of data will be read into Java memory even when using  ‘IMAPChunkListener’. According the the documentation:
> >
> >
> > Implement this interface and register it via IMAP.setChunkListener(IMAPChunkListener) in order to get access to multi-line partial command responses. Useful when processing large FETCH responses.
> >
> >
> > It is apparent the partial fetch response is read in full (80MB) before invoking the ‘IMAPChunkListener’ and then discarding the read lines (freeing up memory).
> >
> >
> > Back to the example:
> >
> > > A654 FETCH 1:2 (BODY[TEXT])
> >
> > < * 1 FETCH (BODY[TEXT] {80000000}\r\n
> >
> > …. <— read in full into memory then discarded after calling IMAPChunkListener
> >
> > < * 2 FETCH (BODY[TEXT] {250}\r\n
> >
> > …. <— read in full into memory then discarded after calling IMAPChunkListener
> >
> > < A654 OK FETCH completed
> >
> >
> > Above, you can see the chunk listener is good for each individual partial fetch response but does not prevent a large partial from being loaded into memory.
> >
> >
> > Let’s review the code:
> >
> >
> >  296                 int literalCount = IMAPReply.literalCount(line);
> >
> > Above counts the size of the literal, in our case 80000000 or 80MB (for the first partial fetch response).
> >
> >
> >  297                 final boolean isMultiLine = literalCount >= 0;
> >
> >  298                 while (literalCount >= 0) {
> >
> >  299                     line=_reader.readLine();
> >
> >  300                     if (line == null) {
> >
> >  301                         throw new EOFException("Connection closed without indication.");
> >
> >  302                     }
> >
> >  303                     replyLines.add(line);
> >
> >  304                     literalCount -= line.length() + 2; // Allow for CRLF
> >
> >  305                 }
> >
> > The literal count above starts at 8000000 and is decremented until reaching non-zero negative val where 80MB is read in full and the while loop returns.
> >
> >
> >  306                 if (isMultiLine) {
> >
> >  307                     final IMAPChunkListener il = chunkListener;
> >
> >  308                     if (il != null) {
> >
> >  309                         final boolean clear = il.chunkReceived(this);
> >
> >  310                         if (clear) {
> >
> >  311                             fireReplyReceived(IMAPReply.PARTIAL, getReplyString());
> >
> >  312                             replyLines.clear();
> >
> > Now, after all 80MB is loaded into memory in full, invoke the IMAPChunkListener and finally throw away the lines freeing memory. And moving onto the next.
> >
> >
> >  313                         }
> >
> >  314                     }
> >
> >  315                 }
> >
> >
> > I’m considering modifying the getReply() method, shown above, to chunk the partial responses breaking up the literal so that it’s not loaded into memory in full. This is to prevent the entire 80MB literal value from being loaded into memory.
> >
> >
> > This would be configurable as not to break the existing users of the API. Something like .setBreakLargeLiteralSize(true), when breakUpLargeLiteralSize is true, a maxLiteralBuffer value is used to chunk the literal preventing all 80MB from being loaded in full, instead loading chunks of it, calling IMAPChunkListener sooner than later. This would require implementations of IMAPChunkListener to handle this behavior if it was turned on. The default behavior will see this chunking disabled as to not break the existing users. Essentially an opt-in feature reducing the risk.
> >
> >
> > What are you thoughts or concerns with this? Do you agree?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org