You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/11/15 18:46:19 UTC

[GitHub] [daffodil] stevedlawrence opened a new pull request #682: Improve how we capture and restore variable maps in points of uncertainty

stevedlawrence opened a new pull request #682:
URL: https://github.com/apache/daffodil/pull/682


   Currently variable instances maintain their own prior state with special
   "prior" variables. When resetting back to a PoU, we would use this prior
   value to figure out what state to reset to. But this is fragile, and
   really only keeps track of one level of change. This mean multiple
   changes to variables within a single point of uncertainly can lead to
   unexpected behavior or assertion failures.
   
   This changes how we keep track of variable state. Rather than attempting
   to roll back individual state changes, when needed to simply create a
   deep copy of the entire variable map. But to avoid extra overhead, we
   only make a copy right before state is about to change, essentially
   implementing copy-on-write. This greatly simplifies the code while still
   maintain good performance.
   
   DAFFODIL-2580
   
   See PR #674 for discussions that led to this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #682: Improve how we capture and restore variable maps in points of uncertainty

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #682:
URL: https://github.com/apache/daffodil/pull/682#discussion_r752642500



##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala
##########
@@ -328,24 +308,51 @@ final class PState private (
     this.infoset = newParent
   }
 
+  /**
+   * When we take marks of this PState when we enter a PoU, we intentionally do
+   * not take a deep copy of the variable map because that is too expensive of
+   * an operation for a data structure that rarely changes. So taking a mark
+   * just makes a shallow copy of the PState variable map.
+   *
+   * But this means modifications of the variable map could potentially affect
+   * PoU marks, which means resetting back to a mark does not work as intended.
+   * To resolve this issue, anytime a function is called that will change the
+   * state of a variable, we must first call this function. This will take a
+   * deep copy of the variable map and set that copy as the variable map for
+   * this PState. This way, the state already captured in the mark will not be
+   * modified and we can correctly reset back to that state when resetting the
+   * PoU. Note that we only do this if we have not already done a deep copy in
+   * this PoU--if we've already done a deep copy, we don't need to do it again
+   * since the PoU copy can't be modified.
+   */
+  @inline
+  private def changingVariable(): Unit = {
+    if (!pointsOfUncertainty.isEmpty) {
+      val curPoU = pointsOfUncertainty.top
+      if (curPoU.variableMap eq this.variableMap) {
+        this.setVariableMap(this.variableMap.copy())
+      }
+    }
+  }
+
   override def setVariable(vrd: VariableRuntimeData, newValue: DataValuePrimitive, referringContext: ThrowsSDE): Unit = {
+    changingVariable()
     variableMap.setVariable(vrd, newValue, referringContext, this)
-    changedVariablesStack.top += vrd.globalQName
   }
 
   override def getVariable(vrd: VariableRuntimeData, referringContext: ThrowsSDE): DataValuePrimitive = {
-    val res = variableMap.readVariable(vrd, referringContext, this)
-    changedVariablesStack.top += vrd.globalQName
-    res
+    changingVariable()

Review comment:
       After some testing, it doesn't seem to make a noticable difference. My guess is the schema/data I used just doesn't read enough variables or have enough PoUs for any differences to show up. But avoiding the deep copy seems like and obvious optimization, and the extra check doesn't seem to make things worse. I've added it to this change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #682: Improve how we capture and restore variable maps in points of uncertainty

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #682:
URL: https://github.com/apache/daffodil/pull/682#discussion_r752593023



##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala
##########
@@ -328,24 +308,51 @@ final class PState private (
     this.infoset = newParent
   }
 
+  /**
+   * When we take marks of this PState when we enter a PoU, we intentionally do
+   * not take a deep copy of the variable map because that is too expensive of
+   * an operation for a data structure that rarely changes. So taking a mark
+   * just makes a shallow copy of the PState variable map.
+   *
+   * But this means modifications of the variable map could potentially affect
+   * PoU marks, which means resetting back to a mark does not work as intended.
+   * To resolve this issue, anytime a function is called that will change the
+   * state of a variable, we must first call this function. This will take a
+   * deep copy of the variable map and set that copy as the variable map for
+   * this PState. This way, the state already captured in the mark will not be
+   * modified and we can correctly reset back to that state when resetting the
+   * PoU. Note that we only do this if we have not already done a deep copy in
+   * this PoU--if we've already done a deep copy, we don't need to do it again
+   * since the PoU copy can't be modified.
+   */
+  @inline
+  private def changingVariable(): Unit = {
+    if (!pointsOfUncertainty.isEmpty) {
+      val curPoU = pointsOfUncertainty.top
+      if (curPoU.variableMap eq this.variableMap) {
+        this.setVariableMap(this.variableMap.copy())
+      }
+    }
+  }
+
   override def setVariable(vrd: VariableRuntimeData, newValue: DataValuePrimitive, referringContext: ThrowsSDE): Unit = {
+    changingVariable()
     variableMap.setVariable(vrd, newValue, referringContext, this)
-    changedVariablesStack.top += vrd.globalQName
   }
 
   override def getVariable(vrd: VariableRuntimeData, referringContext: ThrowsSDE): DataValuePrimitive = {
-    val res = variableMap.readVariable(vrd, referringContext, this)
-    changedVariablesStack.top += vrd.globalQName
-    res
+    changingVariable()
+    variableMap.readVariable(vrd, referringContext, this)
   }
 
   def newVariableInstance(vrd: VariableRuntimeData): VariableInstance = {
-    val v = variableMap.newVariableInstance(vrd)
-    changedVariablesStack.top += vrd.globalQName
-    v
+    changingVariable()

Review comment:
       newVariableInstance does modify the variable map, it's code looks like this:
   ```scala
       val varQName = vrd.globalQName
       val stack = vTable.get(varQName)
       Assert.invariant(stack.isDefined)
       stack.get.push(vrd.createVariableInstance)
   ```
   So it gets the stack of instances associated with the variable qname and then pushes a new instance. The new instances state is likely empty until it's set/read, but it has changed the variable map in a way that needs to be undone if we backtrack.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence merged pull request #682: Improve how we capture and restore variable maps in points of uncertainty

Posted by GitBox <gi...@apache.org>.
stevedlawrence merged pull request #682:
URL: https://github.com/apache/daffodil/pull/682


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #682: Improve how we capture and restore variable maps in points of uncertainty

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #682:
URL: https://github.com/apache/daffodil/pull/682#discussion_r752580798



##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala
##########
@@ -328,24 +308,51 @@ final class PState private (
     this.infoset = newParent
   }
 
+  /**
+   * When we take marks of this PState when we enter a PoU, we intentionally do
+   * not take a deep copy of the variable map because that is too expensive of
+   * an operation for a data structure that rarely changes. So taking a mark
+   * just makes a shallow copy of the PState variable map.
+   *
+   * But this means modifications of the variable map could potentially affect
+   * PoU marks, which means resetting back to a mark does not work as intended.
+   * To resolve this issue, anytime a function is called that will change the
+   * state of a variable, we must first call this function. This will take a
+   * deep copy of the variable map and set that copy as the variable map for
+   * this PState. This way, the state already captured in the mark will not be
+   * modified and we can correctly reset back to that state when resetting the
+   * PoU. Note that we only do this if we have not already done a deep copy in
+   * this PoU--if we've already done a deep copy, we don't need to do it again
+   * since the PoU copy can't be modified.
+   */
+  @inline
+  private def changingVariable(): Unit = {
+    if (!pointsOfUncertainty.isEmpty) {
+      val curPoU = pointsOfUncertainty.top
+      if (curPoU.variableMap eq this.variableMap) {
+        this.setVariableMap(this.variableMap.copy())
+      }
+    }
+  }
+
   override def setVariable(vrd: VariableRuntimeData, newValue: DataValuePrimitive, referringContext: ThrowsSDE): Unit = {
+    changingVariable()
     variableMap.setVariable(vrd, newValue, referringContext, this)
-    changedVariablesStack.top += vrd.globalQName
   }
 
   override def getVariable(vrd: VariableRuntimeData, referringContext: ThrowsSDE): DataValuePrimitive = {
-    val res = variableMap.readVariable(vrd, referringContext, this)
-    changedVariablesStack.top += vrd.globalQName
-    res
+    changingVariable()

Review comment:
       Can this be optimized so that reading a variable that has already been read is not a changingVariable()? Or is that not necessary because the overhead is minimal to call this because it will detect it has already happened and just return?

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala
##########
@@ -328,24 +308,51 @@ final class PState private (
     this.infoset = newParent
   }
 
+  /**
+   * When we take marks of this PState when we enter a PoU, we intentionally do
+   * not take a deep copy of the variable map because that is too expensive of
+   * an operation for a data structure that rarely changes. So taking a mark
+   * just makes a shallow copy of the PState variable map.
+   *
+   * But this means modifications of the variable map could potentially affect
+   * PoU marks, which means resetting back to a mark does not work as intended.
+   * To resolve this issue, anytime a function is called that will change the
+   * state of a variable, we must first call this function. This will take a
+   * deep copy of the variable map and set that copy as the variable map for
+   * this PState. This way, the state already captured in the mark will not be
+   * modified and we can correctly reset back to that state when resetting the
+   * PoU. Note that we only do this if we have not already done a deep copy in
+   * this PoU--if we've already done a deep copy, we don't need to do it again
+   * since the PoU copy can't be modified.
+   */
+  @inline
+  private def changingVariable(): Unit = {
+    if (!pointsOfUncertainty.isEmpty) {
+      val curPoU = pointsOfUncertainty.top
+      if (curPoU.variableMap eq this.variableMap) {
+        this.setVariableMap(this.variableMap.copy())
+      }
+    }
+  }
+
   override def setVariable(vrd: VariableRuntimeData, newValue: DataValuePrimitive, referringContext: ThrowsSDE): Unit = {
+    changingVariable()
     variableMap.setVariable(vrd, newValue, referringContext, this)
-    changedVariablesStack.top += vrd.globalQName
   }
 
   override def getVariable(vrd: VariableRuntimeData, referringContext: ThrowsSDE): DataValuePrimitive = {
-    val res = variableMap.readVariable(vrd, referringContext, this)
-    changedVariablesStack.top += vrd.globalQName
-    res
+    changingVariable()
+    variableMap.readVariable(vrd, referringContext, this)
   }
 
   def newVariableInstance(vrd: VariableRuntimeData): VariableInstance = {
-    val v = variableMap.newVariableInstance(vrd)
-    changedVariablesStack.top += vrd.globalQName
-    v
+    changingVariable()

Review comment:
       This seems odd to me. Creating a new variable instance doesn't change any existing variable. Once the new variable instance exists, then yes, that instance's value can be defaulted or set, and then read, but why does creating a new instance "change" anything?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #682: Improve how we capture and restore variable maps in points of uncertainty

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #682:
URL: https://github.com/apache/daffodil/pull/682#discussion_r752605859



##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala
##########
@@ -328,24 +308,51 @@ final class PState private (
     this.infoset = newParent
   }
 
+  /**
+   * When we take marks of this PState when we enter a PoU, we intentionally do
+   * not take a deep copy of the variable map because that is too expensive of
+   * an operation for a data structure that rarely changes. So taking a mark
+   * just makes a shallow copy of the PState variable map.
+   *
+   * But this means modifications of the variable map could potentially affect
+   * PoU marks, which means resetting back to a mark does not work as intended.
+   * To resolve this issue, anytime a function is called that will change the
+   * state of a variable, we must first call this function. This will take a
+   * deep copy of the variable map and set that copy as the variable map for
+   * this PState. This way, the state already captured in the mark will not be
+   * modified and we can correctly reset back to that state when resetting the
+   * PoU. Note that we only do this if we have not already done a deep copy in
+   * this PoU--if we've already done a deep copy, we don't need to do it again
+   * since the PoU copy can't be modified.
+   */
+  @inline
+  private def changingVariable(): Unit = {
+    if (!pointsOfUncertainty.isEmpty) {
+      val curPoU = pointsOfUncertainty.top
+      if (curPoU.variableMap eq this.variableMap) {
+        this.setVariableMap(this.variableMap.copy())
+      }
+    }
+  }
+
   override def setVariable(vrd: VariableRuntimeData, newValue: DataValuePrimitive, referringContext: ThrowsSDE): Unit = {
+    changingVariable()
     variableMap.setVariable(vrd, newValue, referringContext, this)
-    changedVariablesStack.top += vrd.globalQName
   }
 
   override def getVariable(vrd: VariableRuntimeData, referringContext: ThrowsSDE): DataValuePrimitive = {
-    val res = variableMap.readVariable(vrd, referringContext, this)
-    changedVariablesStack.top += vrd.globalQName
-    res
+    changingVariable()

Review comment:
       Good question. Definitely seems like avoiding the deep copy could be beneficial, even with extra overhead of checking if a variable has been read before. But no way to really know without running some tests. Should be a small enough change, I'll give it a shot and compare the numbers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org