You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Richard Eckart de Castilho (JIRA)" <de...@uima.apache.org> on 2014/04/16 15:24:15 UTC

[jira] [Comment Edited] (UIMA-3747) Memory management problem with compressed binary deserialization

    [ https://issues.apache.org/jira/browse/UIMA-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971394#comment-13971394 ] 

Richard Eckart de Castilho edited comment on UIMA-3747 at 4/16/14 1:23 PM:
---------------------------------------------------------------------------

Not that I am aware of. 

You can reproduce the problem with the following simple test. Before running the test, remove the "private" modifier from TypeSystemImpl.typeSystemMappers.

{noformat}
  public void testCasReuseWithDifferentTypeSystems() throws Exception
  {
      // Create a CAS
      CAS cas = CasCreationUtils.createCas((TypeSystemDescription) null, null, null);
      cas.setDocumentLanguage("latin");
      cas.setDocumentText("test");

      // Serialize it
      ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
      Serialization.serializeWithCompression(cas, baos, cas.getTypeSystem());

      // Create a new CAS
      long min = Long.MAX_VALUE;
      long max = 0;
      CAS cas2 = CasCreationUtils.createCas((TypeSystemDescription) null, null, null);
      for (int i = 0; i < 100000; i++) {
          // Simulate us reinitializing the CAS with a new type system.
          TypeSystemImpl tgt = new TypeSystemImpl();
          for (int t = 0; t < 1000; t++) {
              tgt.addType("random"+t, tgt.getTopType());
          }
          tgt.commit();
          
          // Deserialize into the new type system
          ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
          Serialization.deserializeCAS(cas2, bais, tgt, null); 
          
          long cur = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
          max = Math.max(cur, max);
          min = Math.min(cur, min);
          if (i % 100 == 0) {
            System.out.printf("Cached: %d   Max: %d   Room left: %d   %n",
                  ((TypeSystemImpl) cas2.getTypeSystem()).typeSystemMappers.size(), max,
                  Runtime.getRuntime().maxMemory() - max);
          }
      }
  }
{noformat}

Eventually, the output screetches to a halt:

{noformat}
...
Cached: 2301   Max: 1466865472   Room left: 442067136   
Cached: 2401   Max: 1529083736   Room left: 379848872   
Cached: 2501   Max: 1583309160   Room left: 325623448   
Cached: 2601   Max: 1618738616   Room left: 290193992   
Cached: 2701   Max: 1661499672   Room left: 247432936   
Cached: 2801   Max: 1717535904   Room left: 191396704   
Cached: 2901   Max: 1717535904   Room left: 191396704
<hanging>
{noformat}


was (Author: rec):
Not that I am aware of. 

You can reproduce the problem with the following simple test. Before running the test, remove the "private" modifier from TypeSystemImpl.typeSystemMappers.

{noformat}
  public void testCasReuseWithDifferentTypeSystems() throws Exception
  {
      // Create a CAS
      CAS cas = CasCreationUtils.createCas((TypeSystemDescription) null, null, null);
      cas.setDocumentLanguage("latin");
      cas.setDocumentText("test");

      // Serialize it
      ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
      Serialization.serializeWithCompression(cas, baos, cas.getTypeSystem());

      // Create a new CAS
      long min = Long.MAX_VALUE;
      long max = 0;
      CAS cas2 = CasCreationUtils.createCas((TypeSystemDescription) null, null, null);
      for (int i = 0; i < 100000; i++) {
          // Simulate us reinitializing the CAS with a new type system.
          TypeSystemImpl tgt = new TypeSystemImpl();
          for (int t = 0; t < 1000; t++) {
              tgt.addType("random"+t, tgt.getTopType());
          }
          tgt.commit();
          
          // Deserialize into the new type system
          ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
          Serialization.deserializeCAS(cas2, bais, tgt, null); 
          
          long cur = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
          max = Math.max(cur, max);
          min = Math.min(cur, min);
          if (i % 100 == 0) {
            System.out.printf("Cached: %d   Max: %d   Room left: %d   %n",
                  ((TypeSystemImpl) cas2.getTypeSystem()).typeSystemMappers.size(), max,
                  Runtime.getRuntime().maxMemory() - max);
          }
      }
  }
{noformat}


> Memory management problem with compressed binary deserialization
> ----------------------------------------------------------------
>
>                 Key: UIMA-3747
>                 URL: https://issues.apache.org/jira/browse/UIMA-3747
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>    Affects Versions: 2.4.2SDK
>            Reporter: Richard Eckart de Castilho
>            Assignee: Marshall Schor
>             Fix For: 2.6.0SDK
>
>
> We think we stumbled across a memory management problem with the new compressed binary serialization when a CAS is reset/reused in a loop, e.g. in the uimaFIT SimplePipeline. When we use form 6, we consistently run into out-of-memory situations. Finally, we took the time to do a heap dump analysis.
> We found a huge TypeSystemImpl instance in the heap (~450MB). What makes it huge is the field "typeSystemMappers"
> that in our case contains 1000+ entries, each of them using apparently using a TypeSystemImpl as key.
> It looks like typeSystemMappers is never reset when a CAS is reused. My current theory is, that it should be reset when CAS.reset() is called, otherwise type systems accumulate there when the binary deserialization is used to repeatedly load data into a CAS in a loop that is resetting and reusing the CAS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)