You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2022/02/08 13:29:41 UTC

[GitHub] [daffodil] stevedlawrence opened a new pull request #748: Update alignment to improve correctness and prevent deadlocks

stevedlawrence opened a new pull request #748:
URL: https://github.com/apache/daffodil/pull/748


   Three changes are made to how alignment works:
   
   1. When statically determining if alignment is needed or not, we
      previously did not implement prefixed length elements and just
      assumed no alignment information. This prevented optimizations when
      prefixed elements existed. This implements the approximate length of
      prefixed elements to be a multiple of their lengthUnits, which should
      allow many alignment parsers/unparsers to optimize out and avoid
      deadlocks.
   
   2. When asking about the alignment of prior elements (e.g. parents,
      previous siblings) we can sometimes get an answer of no information.
      This most commonly happens when asking about the prior alignment of
      model group that is the first child in a global declaration. In these
      cases, we do not know how or where the global declaration will be
      referenced, and so it could have any alignment. And because the
      alignment algorithm only looks at lexical scope, it cannot make any
      decisions based on those references. So, in this kinds of cases where
      we have no information about prior alignment, we must assume we have
      no information. This means we will now have some alignment
      parsers/unparsers that we simply cannot optimize out, but this also
      means we fix a bug where there was required alignment that was
      incorrectly optimized out because it assumed known alignment.
   
   3. Be less pessimistic about assigning lengths of the
      SimpleTypeRetryUnparser. This unparser previously assumed no length
      information about the suspension, which meant that we lost bit
      position information about suspensions and the locations of buffered
      data output streams. This could lead to AlignmentFillUnparsers
      getting deadlocked. This changes the logic so that when we can
      statically determine that there will be no padding/fill bytes/etc.
      and that the actual unparsed length will be exactly the same as the
      target unparse length, then we can use that length and give more
      information to the DataOutputStreams to avoid deadlocks.
   
   DAFFODIL-2626


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence commented on a change in pull request #748: Update alignment to improve correctness and prevent deadlocks

Posted by GitBox <gi...@apache.org>.

stevedlawrence commented on a change in pull request #748:
URL: https://github.com/apache/daffodil/pull/748#discussion_r801985256



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/dsom/SchemaComponent.scala
##########
@@ -141,6 +141,17 @@ trait SchemaComponent
     res.distinct
   }
 
+  /**
+   * The lexically enclosing term, if one exists
+   */
+  final lazy val optEnclosingLexicalTerm: Option[Term] = {
+    optLexicalParent match {
+      case Some(t: Term) => Some(t)
+      case Some(sc: SchemaComponent) => sc.optEnclosingLexicalTerm
+      case _ => None

Review comment:
       Schemas are represented in memory by the DSOM, which has what I believe is a pretty up to date UML diagram:
   
   https://cwiki.apache.org/confluence/display/DAFFODIL/DFDL+Schema+Object+Model+%28DSOM%29+with+UML
   
   It's not super easy to follow, but if you follow the lines enough you can see the relationships. It's definitely helpful to see what is a Term and what isn't.
   
   For example, a local `<xs:complex>` is modeled by a `LocalComplexTypeDef`. This is not a `Term`, but has a lexical parent that could be, depending if the parent is a `LocalElementDecl` (a `Term) or a `GlobalElementDecl` (not a `Term`).
   
   Another example, a local `<xs:sequence>` is modeled by a `LocalSequence` (a Term), which may have a lexical parent that is a `LocalComplexTypeDef` (not a `Term`), which may have a lexical parent that is a `LocalElementDef` (a `Term`), etc.
   
   So this `optEnclsoingLexicalTerm` essentially skips over lexical parents like `LocalComplexTypeDef` since they really don't provide any information--you can't put any DFDL properties on them. `Term`s are usually the more interesting things, and what the alignment stuff cares about.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle commented on a change in pull request #748: Update alignment to improve correctness and prevent deadlocks

Posted by GitBox <gi...@apache.org>.

mbeckerle commented on a change in pull request #748:
URL: https://github.com/apache/daffodil/pull/748#discussion_r802043696



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/AlignedMixin.scala
##########
@@ -202,9 +202,15 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
       val parentAlignmentApprox =

Review comment:
       Not in this PR, but for users I'm quite sure we're still going to need the switches to "just shut the d*mn alignment stuff off" for them. 
   
   It is beyond state-of-the-art in data format description to help people debug and give sensible easily followed diagnostics, in these sorts of circular or near-circular alignment relationships. "Just shut off alignment" is a perfectly acceptable response from users to our currently minimal attempts to provide diagnostics. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence merged pull request #748: Update alignment to improve correctness and prevent deadlocks

Posted by GitBox <gi...@apache.org>.

stevedlawrence merged pull request #748:
URL: https://github.com/apache/daffodil/pull/748


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence commented on a change in pull request #748: Update alignment to improve correctness and prevent deadlocks

Posted by GitBox <gi...@apache.org>.

stevedlawrence commented on a change in pull request #748:
URL: https://github.com/apache/daffodil/pull/748#discussion_r802633170



##########
File path: daffodil-io/src/test/scala/org/apache/daffodil/io/TestDataOutputStream4.scala
##########
@@ -41,10 +41,10 @@ class TestDataOutputStream4 {
 
     val out = direct.addBuffered
     if (setAbs)
-      out.setAbsStartingBitPos0b(ULong(20))
+      out.setAbsStartingBitPos0b(ULong(19))

Review comment:
       These tests definitely run, that's how I found that they had this off-by-one bug. I think the issue was that these tests just didn't do anything that required a correct absolution starting position. When this tests makes the buffers final and we deliver the buffered content, we just concatenate the buffered content to the direct buffer--we don't actually care where the buffer content starts. And these tests only verify that the content is correct--it doesn't look at any other state of the DOS.
   
   I think the only things that really require correct abs bit positions is things like alignment unparsers, length calculations, etc. none of which exist in this small unit test.
   
   And I think all our other DOS logic in Daffodil proper is correct and so we never ran into issues where the starting absolute bit position was actually used (e.g. alignment, length calculations, etc.). This new assertion I added verifies that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] tuxji commented on a change in pull request #748: Update alignment to improve correctness and prevent deadlocks

Posted by GitBox <gi...@apache.org>.

tuxji commented on a change in pull request #748:
URL: https://github.com/apache/daffodil/pull/748#discussion_r801849377



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/AlignedMixin.scala
##########
@@ -202,9 +202,15 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
       val parentAlignmentApprox =

Review comment:
       GitHub won't let me attach a comment to an earlier line, but if you expand your view up, you will see that this section of code is inside a lazy value's definition (`priorAlignmentApprox`) and there's a comment at line 133 referencing another bug, [DAFFODIL-2295](https://issues.apache.org/jira/projects/DAFFODIL/issues/DAFFODIL-2295?filter=allopenissues).  I read the bug and it's unrelated (handling a text separator preceding this term), but I thought I would mention that another open issue also exists although it has minor priority and I didn't find any other occurrence of "2995" such as a test for checking it's fixed.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/dsom/SchemaComponent.scala
##########
@@ -141,6 +141,17 @@ trait SchemaComponent
     res.distinct
   }
 
+  /**
+   * The lexically enclosing term, if one exists
+   */
+  final lazy val optEnclosingLexicalTerm: Option[Term] = {
+    optLexicalParent match {
+      case Some(t: Term) => Some(t)
+      case Some(sc: SchemaComponent) => sc.optEnclosingLexicalTerm
+      case _ => None

Review comment:
       I don't see any problem with this function even though it calls itself recursively on SchemaComponent (eventually it must hit either a Term or a Root and terminate).  I'm simply curious how schemas are represented in memory.  A schema always has an optically enclosing Root, right, then the Root has lexically enclosed children with children of their own, and are the first level children a mix of different types (SchemaComponent, Term, and what other types fall into the _ case?) or all one type (GlobalDecl)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence merged pull request #748: Update alignment to improve correctness and prevent deadlocks

Posted by GitBox <gi...@apache.org>.

stevedlawrence merged pull request #748:
URL: https://github.com/apache/daffodil/pull/748


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] mbeckerle commented on a change in pull request #748: Update alignment to improve correctness and prevent deadlocks

Posted by GitBox <gi...@apache.org>.

mbeckerle commented on a change in pull request #748:
URL: https://github.com/apache/daffodil/pull/748#discussion_r802048084



##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/io/DirectOrBufferedDataOutputStream.scala
##########
@@ -966,6 +966,12 @@ object DirectOrBufferedDataOutputStream {
     Assert.invariant(bufDOS.isBuffering)
     Assert.invariant(!directDOS.isBuffering)
 
+    // ensure that if we know the absoluste starting position of the buffered
+    // DOS that it matches the ending bit position of the direct DOS we are
+    // about to deliver it to
+    val maybeBufStartPos = bufDOS.maybeAbsStartingBitPos0b
+    Assert.invariant(maybeBufStartPos.isEmpty || maybeBufStartPos.get == directDOS.maybeAbsBitPos0b.get)

Review comment:
       Hard to imagine it worked at all before without this. 

##########
File path: daffodil-io/src/test/scala/org/apache/daffodil/io/TestDataOutputStream4.scala
##########
@@ -41,10 +41,10 @@ class TestDataOutputStream4 {
 
     val out = direct.addBuffered
     if (setAbs)
-      out.setAbsStartingBitPos0b(ULong(20))
+      out.setAbsStartingBitPos0b(ULong(19))

Review comment:
       How could this possibly have worked before? Were these tests not called?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence commented on pull request #748: Update alignment to improve correctness and prevent deadlocks

Posted by GitBox <gi...@apache.org>.

stevedlawrence commented on pull request #748:
URL: https://github.com/apache/daffodil/pull/748#issuecomment-1033735069


   > I have a schema that is substantially simplified by using prefix lengths, so I am anxious to try this on it.
   
   Should we hold off on merging this until you verify this? It works for the small tests you've added, but might have edge cases for a larger schema that still need fixing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence commented on a change in pull request #748: Update alignment to improve correctness and prevent deadlocks

Posted by GitBox <gi...@apache.org>.

stevedlawrence commented on a change in pull request #748:
URL: https://github.com/apache/daffodil/pull/748#discussion_r802626326



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/AlignedMixin.scala
##########
@@ -202,9 +202,15 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
       val parentAlignmentApprox =

Review comment:
       Agreed, I've added [DAFODIL-2652](https://issues.apache.org/jira/browse/DAFFODIL-2652) to add the ability to disable alignment checking/parsers/unparsers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [daffodil] stevedlawrence commented on a change in pull request #748: Update alignment to improve correctness and prevent deadlocks

Posted by GitBox <gi...@apache.org>.

stevedlawrence commented on a change in pull request #748:
URL: https://github.com/apache/daffodil/pull/748#discussion_r801990109



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/AlignedMixin.scala
##########
@@ -202,9 +202,15 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
       val parentAlignmentApprox =

Review comment:
       Yeah, I did not attempt to fix that issue. I don't think we've come across any formats that run into this issue, especially since most formats actually have really simple alignment--either all bits or all bytes. My guess is that's why it's marked as minor. 
   
   My goal with with this PR was to just get the issues with alignment deadlocks resolved that were affecting real world schemas that Mike found. This alignment algorithm is complex enough that I'm sure there are some edge cases we just don't handle correctly, some we probably don't even know about yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org