You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary D. Gregory (Jira)" <ji...@apache.org> on 2022/12/02 18:17:00 UTC

[jira] [Comment Edited] (NET-709) IMAP Memory considerations with large ‘FETCH’ sizes.

    [ https://issues.apache.org/jira/browse/NET-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642642#comment-17642642 ] 

Gary D. Gregory edited comment on NET-709 at 12/2/22 6:16 PM:
--------------------------------------------------------------

I think the proper way to address this is many-fold, maybe:
 * Add a hard configurable limit and throw an exception if is met
 * Add a soft configurable limit, truncate and ignore anything beyond that

I would reject the above two based on what the route JRE NIO has taken: Provide an API that returns a Stream.

This gives you full compatibility using the old API without complications and new bugs AND a new API where memory usage will be as low as the client wants it to be.

BUT... all of this does not really apply to IMAP, more to FTP and such. So maybe solution 1 is needed.


was (Author: garydgregory):
I think the proper way to address this is many-fold, maybe:
 * Add a hard configurable limit and throw an exception if is met
 * Add a soft configurable limit, truncate and ignore anything beyond that

I would reject the above two based on what the route JRE NIO has taken: Provide an API that returns a Stream.

This gives you full compatibility using the old API without complications and new bugs AND a new API where memory usage will be as low as the client wants it to be.

> IMAP Memory considerations with large ‘FETCH’ sizes.
> ----------------------------------------------------
>
>                 Key: NET-709
>                 URL: https://issues.apache.org/jira/browse/NET-709
>             Project: Commons Net
>          Issue Type: Improvement
>          Components: IMAP
>    Affects Versions: 3.8.0
>            Reporter: Anders
>            Priority: Minor
>              Labels: IMAP, buffer, chunking, large, literal, memory, partial
>             Fix For: 3.10.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> h2. *IMAP Memory considerations with large ‘FETCH’ sizes.*
>  
> The following comments concern classes in the [org.apache.common.net.imap|https://commons.apache.org/proper/commons-net/apidocs/org/apache/commons/net/imap/package-summary.html] package.
>  
> Consider the following imap ‘fetch’ exchange between a client (>) and server (<):
> {{> A654 FETCH 1:2 (BODY[TEXT])}}
> {{{}< * 1 FETCH (BODY[TEXT] {*}{{*}{*}80000000{*}{*}}{*}\r\n{}}}{{{}…{}}}
> {{< * 2 FETCH …}}
> {{< A654 OK FETCH completed}}
>  
> The first untagged response (* 1 FETCH …) contains a literal \{80000000} or 80MB.
>  
> After reviewing the [source|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l298], it is my understanding, the entire 80MB sequence of data will be read into Java memory even when using  ‘[IMAPChunkListener|https://commons.apache.org/proper/commons-net/apidocs/org/apache/commons/net/imap/IMAP.IMAPChunkListener.html]’. According the the documentation: 
>  
> {quote}Implement this interface and register it via [IMAP.setChunkListener(IMAPChunkListener)|https://commons.apache.org/proper/commons-net/apidocs/org/apache/commons/net/imap/IMAP.html#setChunkListener-org.apache.commons.net.imap.IMAP.IMAPChunkListener-] in order to get access to multi-line partial command responses. Useful when processing large FETCH responses.
> {quote}
>  
> It is apparent the partial fetch response is read in full (80MB) before invoking the ‘IMAPChunkListener’ and then discarding the read lines (freeing up memory).
>  
> Back to the example:
> > A654 FETCH 1:2 (BODY[TEXT])
> < * 1 FETCH (BODY[TEXT] \{80000000}\r\n
> *{color:#ff0000}…. <— read in full into memory then discarded after calling IMAPChunkListener{color}*
> < * 2 FETCH (BODY[TEXT] \{250}\r\n
> {color:#ff0000}*…. <— read in full into memory then discarded after calling IMAPChunkListener*{color}
> < A654 OK FETCH completed
>  
> Above, you can see the chunk listener is good for each individual partial fetch response but does not prevent a large partial from being loaded into memory.
>  
> Let’s review the [code|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l298]:
>  
> [ 296|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l296]                 int literalCount = IMAPReply.literalCount(line);
> {color:#ff0000}Above counts the size of the literal, in our case 80000000 or 80MB (for the first partial fetch response).{color}
>  
>  
> [ 297|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l297]                 final boolean isMultiLine = literalCount >= 0;
> [ 298|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l298]                 while (literalCount >= 0) {
> [ 299|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l299]                     line=_reader.readLine();
> [ 300|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l300]                     if (line == null)  
> {                                  throw new EOFException("Connection closed without indication.");   }
> [ 303|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l303]                     replyLines.add(line);
> [ 304|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l304]                     literalCount -= line.length() + 2; // Allow for CRLF
> [ 305|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l305]                 }
> {color:#ff0000}The literal count above starts at 80000000 and is decremented until reaching a negative non zero value where 80MB is read in full and while loop returns.{color}
>  
> [ 306|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l306]                 if (isMultiLine) {
> [ 307|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l307]                     final IMAPChunkListener il = chunkListener;
> [ 308|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l308]                     if (il != null) {
> [ 309|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l309]                         final boolean clear = il.chunkReceived(this);
> [ 310|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l310]                         if (clear) {
> [ 311|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l311]                             fireReplyReceived(IMAPReply.PARTIAL, getReplyString());
> [ 312|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l312]                             replyLines.clear();
> {color:#ff0000}Now, after all 80MB is loaded into memory in full, invoke the IMAPChunkListener and throw away the lines freeing memory.{color}
>  
> [ 313|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l313]                         }
> [ 314|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l314]                     }
> [ 315|https://gitbox.apache.org/repos/asf?p=commons-net.git;a=blob;f=src/main/java/org/apache/commons/net/imap/IMAP.java;h=d97f1073d8b97545d0a063c6832fe55c116166e2;hb=HEAD#l315]                 }
>  
> I’m considering modifying the getReply() method, shown above, to chunk the partial responses breaking up the literal so that it’s not loaded into memory in full. This is to prevent the entire 80MB literal value from being loaded into memory. 
>  
> This would be configurable as not to break the existing users of the API. Something like .setBreakLargeLiteralSize(true), when breakUpLargeLiteralSize is true, a maxLiteralBuffer value is used to chunk the literal preventing all 80MB from being loaded in full, instead loading chunks of it. This would require implementations of IMAPChunkListener to handle this behavior if it was turned on. The default behavior will see this chunking disabled as to not break the existing users. Essentially an opt-in feature reducing the risk.
>  
> *{color:#403294}What are you thoughts or concerns with this? Do you agree?{color}*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)