You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by "Carlson, Ian" <ic...@owlcyberdefense.com> on 2020/10/21 22:04:15 UTC

2399 - Changing error code for missing separators

All,

I’ve done a great deal of digging to find the cause of the bug DAFFODIL-2399. It appears to come down to
SeparatedSequenceChildParseResultHelper.scala line 282. In this case, we have found an out of scope delimiter %NL; before finding any elements, and as a result we abort parsing the sequence. We sensibly set the return status to ParseAttemptStatus.MissingSeparator. However, since the requiredOptional field is true, we have a status of success.

So far so good – everything works out until we roll back up to SequenceParserBases.scala, line 270. The MissingSeparator status again sets a status to success and terminates parsing of the sequence, but does not reset to a PoU. This is the reason our errors aren’t being discarded.

My suggested solution is simply to replace ParseAttemptStatus.MissingSeparator with ParseAttemptStatus.AbsentRep.

I bring this up to the mailing list because while this does seem to solve the problem without breaking any other tests, this is a very deep behavior in our parsing routine, and I want to see if anyone sees any potential dangers with the change.

I’ve created a pull request: https://github.com/apache/incubator-daffodil/pull/444 with this change for review.

[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
[Owl Cyber Defense]
W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
Connect with us!
Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>

[Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>

The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
If you have received this transmission in error, please notify the sender immediately



Re: 2399 - Changing error code for missing separators

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
Feathers going Ruffle Ruffle Ruffle. 🙂

setSuccess is a narrow little thing. It's used in the interactive debugger,.... Maybe you can change it by adding an optional argument.

Backtracking in the parser involves

  *   the success/failure flag/object
  *   diagnostics accumulated since the PoU started.
  *   variable bindings - new instances, assignments to them, default value reads of them
  *   stacks of various things in the PState (array index stack, group index stack, etc.)
  *   things that get unwound by finally clauses of try-catch-finally (current parser, format-info caches, delimiters stack, infoset size,...)

I see how the backtracking in sequences seems faulty. Things like this from SequenceParserBases.scala:

             // So we mask the failure, and exit the sequence successfully
              pstate.setSuccess()
              isDone = true

This is clearly not doing everything needed to "mask the failure".
At minimum this needs to say what part of this it is doing, and what it is depending on the calling context to handle for it based on what it returns to the caller.

Of the 8 different locations where setSuccess() is called, 4 are related to the interactive debugger. 1 is the definition of the method.

That leaves 3 others. Two are in SequenceParserBases.scala  like the above where the intention, expressed in comments, is "mask the failures". The other is similar I think, in SeparatedSequenceChildParseResultHelper.scala.

It seems common to these 3 is that they're setting Success, and assuming something else is doing the rest of the backtracking such as cleaning up the diagnostics, based on other information such as the returned status value.

That cleaning up should be happening when the calling context somewhere up the stack calls pstate.resetToPointOfUncertainty or alternatively pstate.discarPointOfUncertainty if the intention is to move on, discarding the backtrack point.  Ultimately this modifies the diagnostics by way of the PState.restoreInto(ps) call, which overwrites one diagnostics set with another.

So we need to know why we're not ending up in PState.restoreInto(ps) call, or if we are, why the diagnostics we're restoring aren't the right ones.

It is important to keep in mind that as we backtrack through the branches of a choice, we could fall off the end and not have a successful parse at all, in which case, the aggregate "reason" for the failure of the whole choice, is that none of the branches was able to succeed, and the reasons why. Hence, we can't just clear the prior diagnostics. We have to save them up (in the case of a choice). The various choice parser combinators are responsible for this.








________________________________
From: Carlson, Ian <ic...@owlcyberdefense.com>
Sent: Monday, October 26, 2020 4:35 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: RE: 2399 - Changing error code for missing separators


The Short Version:

I believe that we should change PState.setSuccess to take as an argument a PState.Mark and revert the diagnostics that have occurred since that Mark. The current uses of setSuccess leave diagnostics in the PState that should be discarded. Does this ruffle anyone’s feathers?



Long Version:



Pursuing Steve Lawrence’s comment in the pull request (#444) – I attempted to add an assert to the TDMLRunner’s doParseExpectSuccess that requires a successful test case to produce no error level diagnostics if it returns a success and an infoset.



e.g.

val actual = processor.parse(…)

val diagObjs = actual.getDiagnostics



if (actual.isProcessingError) {

  …

 // unrelated failure stuff

  …

} else {

  // If we think we've succeeded, verify there are no errors

  // captured in the diagnostics. Otherwise there's probably

  // an internal bug causing us to miss setting isProcessingError

  val hasErrorDiags = diagObjs.exists { diag =>

    diag.isError && !diag.isValidation

    }

    Assert.invariant(!hasErrorDiags)

}



This revealed that at least 35 test cases produce errors even though they succeed and provide an infoset. At first glance, I thought we just had more occurrences of AbsentRep and MissingSeparator being misused, but it appears the actual cause might be the use of

PState.setSuccess() to mask errors.



setSuccess appears to reset only the overall error/success state, but not affect diagnostics. This seems like an oversight. I would like to modify setSuccess:



final def setSuccess(pou: PState.Mark): Unit = {

    _processorStatus = Success

    this.diagnostics = pou.diagnostics

  }



Modifying the state without fully reverting to a PoU seems dangerous, but the other option seems to be rethinking our 7 uses of setSuccess entirely. Thoughts on this change?





[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer

[Owl Cyber Defense]

W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>

Connect with us!

Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>



[Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>



The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.

If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.

If you have received this transmission in error, please notify the sender immediately





From: Beckerle, Mike<ma...@owlcyberdefense.com>
Sent: Monday, October 26, 2020 11:27 AM
To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
Subject: Re: 2399 - Changing error code for missing separators



There's some discussion in the PR, but for this mailing list I wanted to comment that this is indeed a deep area of algorithmic complexity in the Daffodil runtime.

Whether the code is actually factored adequately such that it enables implementing the right DFDL behavior is not entirely clear, but it is encouraging that this fix seems to be a small change without requiring refactoring.

I am sure there are other bugs in this general area of the code and getting everything exactly right so that every nuance of the DFDL spec is properly implemented is quite difficult and will require development of an extensive collection of tests that cover all these various combinations of features of DFDL.

I think right now, the important thing is no regression on any known existing DFDL schemas.

________________________________
From: Carlson, Ian <ic...@owlcyberdefense.com>
Sent: Wednesday, October 21, 2020 6:04 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: 2399 - Changing error code for missing separators


All,



I’ve done a great deal of digging to find the cause of the bug DAFFODIL-2399. It appears to come down to

SeparatedSequenceChildParseResultHelper.scala line 282. In this case, we have found an out of scope delimiter %NL; before finding any elements, and as a result we abort parsing the sequence. We sensibly set the return status to ParseAttemptStatus.MissingSeparator. However, since the requiredOptional field is true, we have a status of success.



So far so good – everything works out until we roll back up to SequenceParserBases.scala, line 270. The MissingSeparator status again sets a status to success and terminates parsing of the sequence, but does not reset to a PoU. This is the reason our errors aren’t being discarded.



My suggested solution is simply to replace ParseAttemptStatus.MissingSeparator with ParseAttemptStatus.AbsentRep.



I bring this up to the mailing list because while this does seem to solve the problem without breaking any other tests, this is a very deep behavior in our parsing routine, and I want to see if anyone sees any potential dangers with the change.



I’ve created a pull request: https://github.com/apache/incubator-daffodil/pull/444 with this change for review.



[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer

[Owl Cyber Defense]

W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>

Connect with us!

Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>



[Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>



The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.

If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.

If you have received this transmission in error, please notify the sender immediately






RE: 2399 - Changing error code for missing separators

Posted by "Carlson, Ian" <ic...@owlcyberdefense.com>.
The Short Version:
I believe that we should change PState.setSuccess to take as an argument a PState.Mark and revert the diagnostics that have occurred since that Mark. The current uses of setSuccess leave diagnostics in the PState that should be discarded. Does this ruffle anyone’s feathers?

Long Version:

Pursuing Steve Lawrence’s comment in the pull request (#444) – I attempted to add an assert to the TDMLRunner’s doParseExpectSuccess that requires a successful test case to produce no error level diagnostics if it returns a success and an infoset.

e.g.
val actual = processor.parse(…)
val diagObjs = actual.getDiagnostics

if (actual.isProcessingError) {
  …
 // unrelated failure stuff
  …
} else {
  // If we think we've succeeded, verify there are no errors
  // captured in the diagnostics. Otherwise there's probably
  // an internal bug causing us to miss setting isProcessingError
  val hasErrorDiags = diagObjs.exists { diag =>
    diag.isError && !diag.isValidation
    }
    Assert.invariant(!hasErrorDiags)
}

This revealed that at least 35 test cases produce errors even though they succeed and provide an infoset. At first glance, I thought we just had more occurrences of AbsentRep and MissingSeparator being misused, but it appears the actual cause might be the use of
PState.setSuccess() to mask errors.

setSuccess appears to reset only the overall error/success state, but not affect diagnostics. This seems like an oversight. I would like to modify setSuccess:

final def setSuccess(pou: PState.Mark): Unit = {
    _processorStatus = Success
    this.diagnostics = pou.diagnostics
  }

Modifying the state without fully reverting to a PoU seems dangerous, but the other option seems to be rethinking our 7 uses of setSuccess entirely. Thoughts on this change?


[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
[Owl Cyber Defense]
W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
Connect with us!
Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>

[Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>

The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
If you have received this transmission in error, please notify the sender immediately


From: Beckerle, Mike<ma...@owlcyberdefense.com>
Sent: Monday, October 26, 2020 11:27 AM
To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
Subject: Re: 2399 - Changing error code for missing separators

There's some discussion in the PR, but for this mailing list I wanted to comment that this is indeed a deep area of algorithmic complexity in the Daffodil runtime.

Whether the code is actually factored adequately such that it enables implementing the right DFDL behavior is not entirely clear, but it is encouraging that this fix seems to be a small change without requiring refactoring.

I am sure there are other bugs in this general area of the code and getting everything exactly right so that every nuance of the DFDL spec is properly implemented is quite difficult and will require development of an extensive collection of tests that cover all these various combinations of features of DFDL.

I think right now, the important thing is no regression on any known existing DFDL schemas.

________________________________
From: Carlson, Ian <ic...@owlcyberdefense.com>
Sent: Wednesday, October 21, 2020 6:04 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: 2399 - Changing error code for missing separators


All,



I’ve done a great deal of digging to find the cause of the bug DAFFODIL-2399. It appears to come down to

SeparatedSequenceChildParseResultHelper.scala line 282. In this case, we have found an out of scope delimiter %NL; before finding any elements, and as a result we abort parsing the sequence. We sensibly set the return status to ParseAttemptStatus.MissingSeparator. However, since the requiredOptional field is true, we have a status of success.



So far so good – everything works out until we roll back up to SequenceParserBases.scala, line 270. The MissingSeparator status again sets a status to success and terminates parsing of the sequence, but does not reset to a PoU. This is the reason our errors aren’t being discarded.



My suggested solution is simply to replace ParseAttemptStatus.MissingSeparator with ParseAttemptStatus.AbsentRep.



I bring this up to the mailing list because while this does seem to solve the problem without breaking any other tests, this is a very deep behavior in our parsing routine, and I want to see if anyone sees any potential dangers with the change.



I’ve created a pull request: https://github.com/apache/incubator-daffodil/pull/444 with this change for review.



[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer

[Owl Cyber Defense]

W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>

Connect with us!

Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>



[Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>



The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.

If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.

If you have received this transmission in error, please notify the sender immediately





Re: 2399 - Changing error code for missing separators

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
There's some discussion in the PR, but for this mailing list I wanted to comment that this is indeed a deep area of algorithmic complexity in the Daffodil runtime.

Whether the code is actually factored adequately such that it enables implementing the right DFDL behavior is not entirely clear, but it is encouraging that this fix seems to be a small change without requiring refactoring.

I am sure there are other bugs in this general area of the code and getting everything exactly right so that every nuance of the DFDL spec is properly implemented is quite difficult and will require development of an extensive collection of tests that cover all these various combinations of features of DFDL.

I think right now, the important thing is no regression on any known existing DFDL schemas.

________________________________
From: Carlson, Ian <ic...@owlcyberdefense.com>
Sent: Wednesday, October 21, 2020 6:04 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: 2399 - Changing error code for missing separators


All,



I’ve done a great deal of digging to find the cause of the bug DAFFODIL-2399. It appears to come down to

SeparatedSequenceChildParseResultHelper.scala line 282. In this case, we have found an out of scope delimiter %NL; before finding any elements, and as a result we abort parsing the sequence. We sensibly set the return status to ParseAttemptStatus.MissingSeparator. However, since the requiredOptional field is true, we have a status of success.



So far so good – everything works out until we roll back up to SequenceParserBases.scala, line 270. The MissingSeparator status again sets a status to success and terminates parsing of the sequence, but does not reset to a PoU. This is the reason our errors aren’t being discarded.



My suggested solution is simply to replace ParseAttemptStatus.MissingSeparator with ParseAttemptStatus.AbsentRep.



I bring this up to the mailing list because while this does seem to solve the problem without breaking any other tests, this is a very deep behavior in our parsing routine, and I want to see if anyone sees any potential dangers with the change.



I’ve created a pull request: https://github.com/apache/incubator-daffodil/pull/444 with this change for review.



[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer

[Owl Cyber Defense]

W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>

Connect with us!

Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>



[Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>



The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.

If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.

If you have received this transmission in error, please notify the sender immediately