You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/03/24 13:49:59 UTC

[GitHub] [daffodil] stevedlawrence opened a new pull request #517: Fix hang caused by a zero-length first separated infix element

stevedlawrence opened a new pull request #517:
URL: https://github.com/apache/daffodil/pull/517


   With infix separators, we only consume a separator after we have parsed
   the first field, i.e. when groupPos > 1. The sequence logic has a
   condition that prevents groupPos from being incremented when a
   zero-length field is parsed when the field was positional. But this
   means that if the first field is zero-length, and this condition was
   hit, we do not increment groupPos, we never consume the first infix
   separator, and thus we parse the same zero-length data infinitely,
   causing a hang.
   
   To fix this, we need to remove the isPositional condition that prevented
   the groupPos from being incremented when zero length separated field was
   parsed. Whether or not the field is positional should not matter--if a
   separated field was successfully parsed, even if it was zero length, we
   must still increment groupPos for the following separators to be parsed
   correctly.
   
   DAFFODIL-2487


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r600836769



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       I think this is a different bug. 
   
   Super thorny area of the DFDL spec. A whole tutorial needs to be written on this, and lots of tests written that we don't have so as to accurately test if we're in conformance. 
   
   The emptyValueDelimiterPolicy is not applicable. So section 9.4.2.3 says that since this is an optional occurrence of record (only the first instance is required, as minOccurs defaults to 1, not 0), nothing should be added to the infoset. 
   
   The property emptyElementParsePolicy (late addition to the DFDL v1.0 spec) is also potentially relevant here. Daffodil's implementation of this is off slightly. Our enum uses "treatAsMissing", but the DFDL spec calls this "treatAsAbsent", but regardless, this newly added property defaults to 'treatAsEmpty'. I believe it wouldn't matter either way though as the record is in optional position. 
   
   The record element is complexType and empty (no children) - which is allowed because the only child element, "item", has minOccurs="0", so none of them are required. 
   
   I'm guessing that's where the bug is. The runtime isn't properly recognizing this is an empty complex element and optional and so shouldn't be added to the infoset. 
   
   I think this is tied up in the lack of implementing defaulting for complex elements. We're not bothering to properly determine if a complex element is empty or not, because we didn't bother implementing defaulting. So the assumption is that the complex element isn't empty, so we're creating an infoset instance when we shouldn't. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r601740690



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       ``emptyElementParsePolicy="treatAsMissing"`` does change Daffodil's behavior, but it is still slightly different from IBM's default behavior. With that property set, Daffodil stops parsing when it hits that comma, considers the parse a success, and outputs an infoset, just with left over data starting at that comma. So it does seem like we both think that comma is an error, but Daffodil interprets it as the end of the ``record`` array, whereas IBM DFDL interprets it as a fatal error.
   
   We both have the same behavior when ``minOccurs="0"``. So it seems there's only one discrepency about how this parse error should be handled. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] tuxji commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
tuxji commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r600588277



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       Interesting question.  I looked at the DFDL 1.0 spec and then searched for dfdl:emptyValueDelimiterPolicy in the codebase.  Lines 90-94 in SequenceChildBases.scala say:
   
   > An element of complex type can have EmptyRep, but dfdl:emptyValueDelimiterPolicy does not apply.  TBD: CONFIRM THIS. When a complex type element is parsed, and zero data is consumed, but the parse is successful, then any infoset created is the "empty value" for this complex type element. When the element is required, this infoset is retained. When the element is optional, this infoset is discarded, any side-effects that occurred in its creation are backtracked.
   
   The DFDL 1.0 spec says:
   
   > 9.4.2.3      Complex element
   > Required occurrence: An item is added to the Infoset.
   > 
   > Optional occurrence: if dfdl:emptyValueDelimiterPolicy is applicable and is not 'none' [29], then an item is added to the Infoset, otherwise nothing is added to the Infoset.
   
   This TDML file inherits DFDLGeneralFormat.dfdl.xsd without changing dfdl:emptyValueDelimiterPolicy from its original value which is "both".  This implies the empty record item should be added to the infoset.  However, the comment in SequenceChildBases.scala is either out of date or it implies Daffodil does not check dfdl:emptyValueDelimiterPolicy for complex elements, which would seem to be contrary to the DFDL 1.0 specification.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r600715502



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       9.4.2.4 has an interesting example and seems related to what's going on here. The last line of that section says:
   > Upon this successful parse of E1, it is therefore known-to-exist. However, because the position in the data has not changed, E1 therefore has the empty representation. Because E1 is empty and optional (it has XSD minOccurs='0') and dfdl:emptyValueDelimiterPolicy does not apply, it is not added to the Infoset, and the temporary Infoset item for E1 containing E2 is discarded.
   
   I think this is very similar in this case, where E1 is record, and E2 is item. However, there's one differnce related to the wording
   
   > However, because the position in the data has not changed, ...
   
   In our case, the empty ``<ex:record />`` element comes from the line that is all commas. So when parsing record, the position in the data does change while parsing the ``item`` elements, those elements just happened to not contribute anything to the infoset.
   
   I also noticed that if I change the line of all commas to be an empty line, then the empty record element doesn't show up in the infoset. So it seems this position changing or not changing while parsing the complex record element might be the difference. In which case, this empty record element is maybe correct?
   
   I guess the question is if you have a optional complex element and all it's children are empty rep and contribute nothign to the infoset, but they consume separator data, should that complex element end up in the infoset?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r601726676



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       IBM DFDL differs from Daffodil in this new emptyElementParsePolicy property. They only implement the treatAsAbsent semantics. 
   
   I'm thinking the minOccurs of this vector is 1. That means the first element is actually required, not optional. IBM DFDL treats this as absent, and causes the failure since an absent required element causes an error. 
   
   I think. 
   
   We can try specifying dfdlx:emptyElementParsePolicy='treatAsMissing' (which is Daffodil's current realization of this property, and that behavior should then be the same.
   
   Or add a minOccurs="0" to the array, then both IBM and Daffodil should behave the same. 
   
   That's my theory anyway. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r601742727



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       I want to capture this for DFDL workgroup discussion on what the "right" behavior is. We get a bug ticket or IBM does. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] tuxji commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
tuxji commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r600747459



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       If I were you, I also would run an experiment.  Edit SequenceGroupNestedArray.tdml, define `emptyValueDelimiterPolicy="none"` after inheriting DFDLGeneralFormat.dfdl.xsd, and rerun the test.  Does Daffodil still add the empty record item to the infoset?  The spec says:
   
   > Optional occurrence: if dfdl:emptyValueDelimiterPolicy is applicable and is not 'none' [29], then an item is added to the Infoset, otherwise nothing is added to the Infoset.
   
   I know that there is a caveat "if dfdl:emptyValueDelimiterPolicy is applicable" and that policy applies only when an element defines an initiator and/or terminator, otherwise it does not apply.  Example 9.4.2.4 also says emptyValueDelimiterPolicy does not apply to it.  However, this experiment might reveal a bug in Daffodil if Daffodil suddenly reverses its position and stops adding the empty record item to the infoset.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] mbeckerle commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r600842760



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       Ah. Re-reading the thread above. I think SL found that the caveat "but no data is consumed" is the rub here. So this record element with no children is NOT considered empty nor absent. It consumed data, just that consumption of data didn't result in any item children. 
   
   So I think we're correct now. 
   
   This example is really good, and I'd like to try it on IBM DFDL to see what they do with it. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence merged pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
stevedlawrence merged pull request #517:
URL: https://github.com/apache/daffodil/pull/517


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r600495783



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       The line of all commas resulted in an empty ``record`` element. Is that that expected, or does this mean there is some other bug that is preventing this empty record element from being removed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r601708496



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       I managed to get the ibm DFDL crosstester working, and it has the same behavior as Daffodil--there is an empty ``<ex:record />`` element when a line is all commas. Likewise, empty lines that contain no commas do not create a record element at all. So I think we're are consistent with IBM DFDL and are interpreting the spec the same. The other ``test_csv_hang_*`` tests pass as well.
   
   Unfortunately, ``test_csv_nohang_1`` does not pass in IBM DFDL. It fails with a parse error with message
   > Unexpected data found at offset '94' after parsing completed. Data: '0x2c...'
   
   The 0x2c byte at offset 94 is the first comma on the line
   ```
   ,preceded by an empty field,
   ```
   So it seems IBM does not like a line starting with a zero-length field in the nohang test that has ``minOccurs="1"``. It feels like Daffodil has the correct behavior, but this area of the spec is complicated, so I can't say for sure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r600767476



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       Good idea. I've tried setting emptyValueDelimiterPolicy to all possible values to be sure, and it has no affect on the result. Daffodil always adds the empty record element. 
   
   Looking at the emptyValueDelimiterPolicy section it says the property is
   
   > Ignored if both dfdl:initiator and dfdl:terminator are "" (empty string).
   
   So I think this property is just going to be ignored, and isn't applicable in this case, as you suggest.
   
   So it's probably a good thing that the behavior is the same regardless of the value of that property (assuming that behavior is correct).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [daffodil] stevedlawrence commented on a change in pull request #517: Fix hang caused by a zero-length first separated infix element

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #517:
URL: https://github.com/apache/daffodil/pull/517#discussion_r601757574



##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/section14/sequence_groups/SequenceGroupNestedArray.tdml
##########
@@ -219,6 +219,7 @@ notice blank lines are skipped
           <ex:record>
             <ex:item>notice lines of all commas are skipped</ex:item>
           </ex:record>
+          <ex:record />

Review comment:
       Sounds good. I'll merge this as is then since it fixes a pretty serious issue. If we determine daffodil's behavior is wrong, we can create a new ticket.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org