You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@jclouds.apache.org by "Andrea Turli (JIRA)" <ji...@apache.org> on 2016/11/23 15:55:58 UTC
[jira] [Updated] (JCLOUDS-1187) Avoid excessive memory usage when processing massive http response message

     [ https://issues.apache.org/jira/browse/JCLOUDS-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrea Turli updated JCLOUDS-1187:
----------------------------------
    Fix Version/s: 1.9.3

> Avoid excessive memory usage when processing massive http response message
> --------------------------------------------------------------------------
>
>                 Key: JCLOUDS-1187
>                 URL: https://issues.apache.org/jira/browse/JCLOUDS-1187
>             Project: jclouds
>          Issue Type: Improvement
>          Components: jclouds-core
>    Affects Versions: 1.9.2
>            Reporter: Aled Sage
>             Fix For: 1.9.3
>
>
> With jclouds 1.9.3-SNAPSHOT and 2.0.0-SNAPSHOT (and all GA versions)...
> If jclouds receives a crazily big http response (e.g. see https://github.com/jclouds/jclouds/pull/1020), then jclouds consumes a huge amount of memory while processing that response. It holds multiple copies of the response in-memory at the same time.
> As reported in https://issues.apache.org/jira/browse/BROOKLYN-364, the memory usage (according to {{jmap -histo:live <pid>}}) is most char arrays (which is what backs the StringBuilder and String). When processing a 154MB response payload, the memory used by char arrays goes something like this: 66MB -> 104MB -> 323MB -> 625MB -> 939MB -> 461MB -> 633MB -> 327MB -> 21MB.
> (I don't know how much to believe the {{:live}}: is that definitely the size of char arrays that could not be GC'ed?)
> There are two main areas where this memory is consumed.
> *Wire Log*
> In the jclouds wire log, the following happens (approximately?!):
> * We read the response input stream, writing it to a {{FileBackedOutputStream}} - nice!
> * In {{Wire.wire(String, InputStream)}}, we read the {{FileBackedOutputStream}} into a {{StringBuilder}}, and then call {{getWireLog().debug(buffer.toString())}}:  
>   The StringBuilder holds the {{char[]}}; the toString() duplicates it - so two copies in memory.  
>   Unfortunately in the Softlayer example, it's all one huge line, so we logged it all in one go.  
>   I think (but need to dig into it more) that the logging framework (slf4j -> log4j in my case) ends up with multiple copies as well, while processing the call to {{log.debug(String)}}. (Hence peaking at 939MB of char arrays in memory).
>   When the method returns, all these copies can be GC'ed.
> * The response payload has now been switched to the {{FileBackedOutputStream}}, so that will be used subsequently.
> *Json Parsing*
> To parse the HTTP response:
> * The response is passed to {{org.jclouds.http.functions.ParseJson.apply(HttpResponse)}}
> * This calls {{json.fromJson(Strings2.toStringAndClose(stream), type)}}.  
> * The {{Strings2.toStringAndClose}} calls {{CharStreams.toString(new InputStreamReader(input, Charsets.UTF_8))}}.  
>   This reads the stream into a StringBuilder, then calls toString - so holds two copies in memory.  
>   This explains the second spike in memory usage (though I'm surprised it went as high as 633MB of char arrays in memory).  
>   When the method returns, we have our one String.  
> *Possible Improvements to Wire Log*
> {{org.jclouds.logging.internal.Wire}} could be configurable to only log the first X bytes of a response (so crazily long messages would be truncated in the log).
> Alternatively/additionally, {{Wire.wire}} could force splitting a huge line into multiple log messages when calling {{getWireLog().debug(buffer.toString())}}. Again this could be configurable.
> In production usage, I personally would always configure it to truncate: better to miss the end of the response rather than risk an {{OutOfMemoryError}}. Note this particular part isn't an issue if jclouds.wire is set to INFO or higher.
> *Possible Improvements to Json Parsing*
> I think {{ParseJson.apply(InputStream stream, Type type)}} should pass the {{new InputStreamReader(inputStream)}} to {{org.jclouds.json.Json.fromJson()}} (in a try-finally to close the stream, obviously).
> This would require adding a new method to the interface {{org.jclouds.json.Json}}.
> The implementation of {{GsonWrapper}} can handle that easily: it can call {{gson.fromJson(Reader, Type)}} instead of {{gson.fromJson(String, Type)}}. It looks to me like that will parse the stream as it reads it, rather than having to hold the whole string in memory at once.
> If we do these improvements, I don't think we'll ever hold the full 154MB char array in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)