You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/02/09 13:21:34 UTC

[GitHub] [incubator-daffodil] stevedlawrence opened a new pull request #489: Allow garbage collection of UStateForSuspension's and DirectOrBufferedDataOutputStream's

stevedlawrence opened a new pull request #489:
URL: https://github.com/apache/incubator-daffodil/pull/489


   The UStateForSuspensions and DirectOrBufferedDataOutputStream classes
   have members that effectively create linked lists. In each of these
   cases, we unknowingly hold onto the head of these linked lists, which
   prevents garbage collection of all UStateForSuspensions and
   DirectOrBufferedDataOutputStream instances. This means we essentially
   hold on to all unparse state, which quickly leads to out of memory
   errors for large format that require many suspensions.
   
   - The first issue is the "prior" member of UStateMain/UStateForSuspensions.
     This member is set so that each UState points to the previous
     UStateForSuspension that has been created, essentially creating a
     linked list of all UStateForSuspensions, with the head in UStateMain.
     This prevents all UStateForSuspensions from being garbage collected,
     as well all the state they point to (it's a lot).
   
     Fortunately, this member isn't used anywhere anymore. Presumably it
     was once used for debugging suspensions, but is no longer used or
     needed. So we can simply remove this member so these
     UStateForSuspensions can be garbage collected once the Suspensions
     that use them are finished and garbage collected.
   
   - The second issue is related to the "following" member in
     DirectOrBufferedDataOutputStream's. This member is used too keep track
     of the buffered DOS that follows this DOS (and iteratively, all
     following DOS's). As the Direct DOS is finished, we make the following
     DOS direct update pointers correctly. However, we create the very
     first direct DOS in the "unparse" function, which means it lives on
     the stack and cannot be garbage collected until unparse finished. And
     because this DOS iteratively points to all following DOS's via the
     "following" member, it means we can never free any DOS's (and all the
     buffered data associated with those DOS's) until the end of unparse.
   
     The solution in this case is to not create the initial direct DOS in
     the unparse function on the stack, but instead to create it as part of
     the UState creation when we initialize the "dataOutputStream" var.
     This way there is no pointer to the initial DOS except for those held
     in UState or Suspensions. As the UState mutates or Suspensions
     resolve, we will complete lose a reference to earlier DOS's and they
     can be garbage collected.
   
   Fixing these two issues allows unparsing very large infosets that
   require buffering, without running into out of memory errors.
   
   DAFFODIL-2468


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #489: Allow garbage collection of UStateForSuspension's and DirectOrBufferedDataOutputStream's

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #489:
URL: https://github.com/apache/incubator-daffodil/pull/489#discussion_r572935732



##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/unparsers/UState.scala
##########
@@ -366,16 +364,15 @@ abstract class UState(
  */
 final class UStateForSuspension(
   val mainUState: UStateMain,
-  dos: DirectOrBufferedDataOutputStream,
+  override var dataOutputStream: DirectOrBufferedDataOutputStream,

Review comment:
       This patch changes things so the UState abstract class requires that the implementations have a dataOutputStream var.
   
   For this UStateForSuspension, this var is initially set when the UStateMain is cloned to create the UStateForSuspension. I think this var in UStateForSuspension should never actually change, but it needs to be a var because UState abstract class require it be a var.
   
   For the dataOutputStream in UStateMain, it is set initially when the UStateMain is created as part of this patch.
   
   This dataOutputStream actually changes to point to different DOS's in just a few places in existing code (none of which needed to change, the logic for removing references if needed was already there). Those places are essentially anytime we create a new DOS and it updates the dataOutputStream in UStateMain to point to that new DOS. 
   
   For example, in the UState abstract class we have the ``splitOnUknownByteAlignmentBitOrderChange`` function which can create a split and change this dataOutputStream to the new DOS. Likewise the ``splitDOS`` function in ``Suspensions.scala`` will create a new DOS and change the UState dataOutputStream. BlobLengthUnparser and LayeredSequnceUnparser do similar things.
   
   So the code that actually changes the dataOutputStream is a bit spread out, but only in a couple of places. But also was correct and didn't need changes. These changes are really just so the initial DOS isn't on the stack in the unparse() function.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #489: Allow garbage collection of UStateForSuspension's and DirectOrBufferedDataOutputStream's

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #489:
URL: https://github.com/apache/incubator-daffodil/pull/489#discussion_r572920290



##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/unparsers/UState.scala
##########
@@ -366,16 +364,15 @@ abstract class UState(
  */
 final class UStateForSuspension(
   val mainUState: UStateMain,
-  dos: DirectOrBufferedDataOutputStream,
+  override var dataOutputStream: DirectOrBufferedDataOutputStream,

Review comment:
       I'm looking for where there is an assignment to this var, and not finding it in this change set. 
   Presumably scala would tell you if this had no assignments, so I'm assuming the place where this is reset to null is elsewhere?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence merged pull request #489: Allow garbage collection of UStateForSuspension's and DirectOrBufferedDataOutputStream's

Posted by GitBox <gi...@apache.org>.
stevedlawrence merged pull request #489:
URL: https://github.com/apache/incubator-daffodil/pull/489


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on pull request #489: Allow garbage collection of UStateForSuspension's and DirectOrBufferedDataOutputStream's

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on pull request #489:
URL: https://github.com/apache/incubator-daffodil/pull/489#issuecomment-775990558


   > That's another reason why we need to fix the scala compiler warning settings. In principle it should have told us there were no reads of prior pointer right?
   
   I'm not sure. There was technically a read here in this code that was removed:
   ```scala
     private var _prior: UStateForSuspension = null
     override def prior = _prior
   ```
   And although the ``def prior`` is never called anywhere in Daffodil, it is a public function so theoretically could be used by library using Daffodil. I think because of that scalac wouldn't warn about that. We probably need to get much better at marking things as public/private/protected before we get unused warnings.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org