You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by Larry Barber <la...@nteligen.com> on 2020/12/18 18:49:29 UTC

How to add warnings that are not lost due to backtracking

I hoping someone could give me some pointers on adding a warning message to the Daffodil io code.
I'm looking at Daffodil-412 and want to generate a warning message when the REGex search gets expanded and another if it exceeds the tunable for maximum length.

I've located the code that does these expansions in io/InputSourceDaraInputStream.scala, but I'm unsure how to generate the warning messages.
I don't see any other warning messages being generated in the io code. I've seen several instances in core that just use SDW(...) and others in DSOM that use context.SDW(...), but I'm confused about this - I'm afraid that this method buffers warnings and throws them away in the case of backtracking. Since the REGex search may be the cause of backtracking, I think these warnings need to be presented always.

I'm just not sure of the proper way to access SDW in this situation and need to make sure that the messages will not be discarded.

Re: How to add warnings that are not lost due to backtracking

Posted by Steve Lawrence <sl...@apache.org>.
Another option might be to just have a separate structure in the PState
for storing just SDWs. This way you don't have to worry about dealing
with backtracking. At the end of parse it could be combined with the
existing diagnostics so that the functionality stays the same. This
also might make it easier to implement an efficient mechanism for
determining if a SDW has already been emitted so it could be ignored.

On 1/4/21 8:05 AM, Beckerle, Mike wrote:
> PState.scala has methods used for backtracking like resetToPointOfUncertainty and discardPointOfUncertainty. Those ultimately lead to a call that clears the diagnostics I believe.
> 
> This should be a simple fix when backtracking to remove only isError() true diagnostics from the list of accumulated diagnostic objects. This would leave the warnings behind to accumulate.
> 
> When adding warnings to diagnostics we probably need to be careful to not add duplicates so that this list doesn't become longer and longer over a run.
> 
> ________________________________
> From: Larry Barber <la...@nteligen.com>
> Sent: Thursday, December 31, 2020 12:47 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: RE: How to add warnings that are not lost due to backtracking
> 
> Thanks' Mike,
> So, it would seem that the backtracking code would need to be updated not to remove SDWs.
> It seems like the backtracking code would only be active during parsing (runtime) - not compiling - so we could probably just alter the backtrack code to remove everything except SDWs from the Pstate. I'm not familiar with that - would there be a structural issue discarding everything except SDWs?
> Or, would you recommend using another method of reporting the regex limit message?
> 
> -----Original Message-----
> From: Beckerle, Mike [mailto:mbeckerle@owlcyberdefense.com]
> Sent: Monday, December 28, 2020 12:15 PM
> To: dev@daffodil.apache.org
> Subject: Re: How to add warnings that are not lost due to backtracking
> 
> So, I wanted to clarify a few things. Then I think I agree we want runtime-issued SDW to not be lost when backtracking.
> 
> An SDE, or schema definition error, is most commonly the Daffodil schema compiler telling you your schema isn't meaningful, so parsing/unparsing cannot even be started. We divide up Daffodil into "compiling the schema" or "compile time" and runtime (parse/unparse time).
> 
> Some SDEs cannot be detected until runtime, but SDEs are always fatal. I.e, there is never any backtracking from them, because they mean there is something wrong with your DFDL schema.
> 
> Processing errors (parse error or unparse error) are errors where your schema is meaningful but the data doesn't match the schema. Some parse errors are a normal part of parsing as they are suppressed by backtracking to try other alternatives.
> 
> Schema-definition Warnings (SDW) are not parse errors but the warning version of a SDE. I.e., they suggest a possible error in the schema.  SDWs detected at compile time are always output by the compiler. If an SDW is issued at runtime, there is an interesting question of should those be suppressed by backtracking?
> 
> I don't know of runtime SDWs off hand, I searched the source for them, but found only one possibility where the SDW could be issued at runtime. Which is this code in DState.scala:
> 
> 
> private def isAnArray(): Boolean = {
>   if (!currentNode.isInstanceOf[DIArray]) {
>     Assert.invariant(errorOrWarn.isDefined)
>     if (currentNode.isInstanceOf[DIElement]) {
>       errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path to element %s is not to an array. Suggest using fn:exists instead.", currentElement.name)
>     } else {
>       errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path is not to an array. Suggest using fn:exists instead.")
>     }
>     false
>   } else {
>     true
>   }
> }
> 
> This does get called at runtime. I just would expect path expressions to be compiled and this to have been checked already at compilation time, which should render this runtime check unnecessary, I think. I did not find a test that produces this warning message.
> 
> I think a SDW that is warning about a implementation limit like regex match length limit, being reached, should  not be suppressed by backtracking. As you pointed out, such a warning could be telling you about the reason for the backtracking, and suppressing the warning means you would not be able to diagnose why the backtracking is occurring.
> 
> Calling these implementation limit hits "schema definition" warnings is ok with me, because the schema goes along with the tunables like the max regex size limit. Both are static things that the data must comply with for parse/unparse to be successful.
> 
> I imagine that if you just add an SDW call at runtime, it will put the warning onto the diagnostics in the PState, and they will be discarded on backtracking, but probably that should not happen for runtime SDWs, only for parse errors.
> 
> -mikeb
> 
> 
> 
> 
> ________________________________
> From: Larry Barber <la...@nteligen.com>
> Sent: Friday, December 18, 2020 2:59 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: RE: How to add warnings that are not lost due to backtracking
> 
> I actually ran into this problem with parsing a large jpeg file. I thought that I had uncovered a bug because  the file was not being parsed correctly. Once it was pointed out to me, the problem was solved by changing the tunable to increase the REGex search length, the file parsed as expected. The REGex search failure caused (erroneous) backtracking, so I need to see the information about the search failing.
> This is part of Daffodil-412, which required a 2 part solution. The tunable was implemented for the first part, but the second part - the warning message was not.
> 
> If SDW is not the way to go, I'd be happy to work with another suggestion.
> 
> 
> From: Carlson, Ian [mailto:icarlson@owlcyberdefense.com]
> Sent: Friday, December 18, 2020 2:37 PM
> To: dev@daffodil.apache.org
> Subject: RE: How to add warnings that are not lost due to backtracking
> 
> I'm still new at this - but I've found a great way to learn is to invite people to tell me I'm wrong, so here's my two cents.
> 
> SDE in particular is generally used to tell the parser that something has gone wrong. This invites the parser to either back up to the most recent point of uncertainty and try another path or fail completely if none exists. That's how we select one branch over another in the cases where there are multiple possible paths.
> 
> If we do select a path that turns out to be invalid, we generally don't want those errors to propagate back up the chain, since they are for a "path not taken" and failing in a way that leads us to the correct path is both expected and desired behavior. By extension, warnings encountered on our "path not taken" also get discarded since. For instance, if we have a regex failure looking for the length of a discriminator that ultimately doesn't exist because this is an invalid path - that isn't really a failure at the top level.
> 
> So using SDW for a global "something weird you might want to examine" sort of warning is somewhat at odds with the way SDE and SDW are usually used.
> 
> Our runtime does generate quite a bit of text - so simply printing to console for a warning is likely to be missed. If we want to have a sort of global log that doesn't get cleared, but also isn't mingled with the runtime console output - we may need a new facility for that.
> 
> Side note - there are certain classes of diagnostics around choice branches that don't get discarded currently, which may cause some warnings and errors to escape even though we output a successful infoset. Ticket 2399 discusses this issue, and a partial attempt at a fix is languishing WIP on https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751199918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=O7PvPW3%2B%2B7geK4FnFZbE%2Bv8fPcDpqsUPvk%2FsHPpJeHw%3D&amp;reserved=0<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=QwMTtKw%2FwRyUjjTLeRLGNNevVcuxrYgq1oy26O56vHc%3D&amp;reserved=0>. The short version being that I wouldn't want to rely on any information from SDW or SDE escaping a "path not taken" once that fix is in place.
> 
> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
> [Owl Cyber Defense]
> W  icarlson@owlcyberdefense.com<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vmzsHrJvQHchI%2BT%2Fpdc650Hy4t6bsRCUEAWYjw7%2BZuA%3D&amp;reserved=0>
> Connect with us!
> Facebook<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.facebook.com%2Fowlcyberdefense%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ybumIVSPrM78R6H2xb9zDrEtGRIpjfNNgUR1sE%2FMUqo%3D&amp;reserved=0> | LinkedIn<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fowlcyberdefense%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=P%2FxXmQfAK9dR8TRObM%2FDIxvUgIZbgEpsPnidsUEuMKI%3D&amp;reserved=0> | Twitter<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Ftwitter.com%2Fowlcyberdefense&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=elegVzECF0zrulnfq%2Fiq%2B%2B0VWR8wGzwLLdaj9XJQdo8%3D&amp;reserved=0>
> 
> [Find us at our next event. Click Here.]<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2Fresources%2Fevents%2F%3Futm_source%3Dowl-cyber-defense%26utm_medium%3Demail%26utm_content%3Dbanner%26utm_campaign%3D2020-events&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751219915%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=QsycqteQLntlaDfUDc9Cr7DhftxxjVgnZI0y%2BBpVWPM%3D&amp;reserved=0>
> 
> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
> If you have received this transmission in error, please notify the sender immediately
> 
> 
> From: Larry Barber<ma...@nteligen.com>
> Sent: Friday, December 18, 2020 12:49 PM
> To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
> Subject: How to add warnings that are not lost due to backtracking
> 
> I hoping someone could give me some pointers on adding a warning message to the Daffodil io code.
> I'm looking at Daffodil-412 and want to generate a warning message when the REGex search gets expanded and another if it exceeds the tunable for maximum length.
> 
> I've located the code that does these expansions in io/InputSourceDaraInputStream.scala, but I'm unsure how to generate the warning messages.
> I don't see any other warning messages being generated in the io code. I've seen several instances in core that just use SDW(...) and others in DSOM that use context.SDW(...), but I'm confused about this - I'm afraid that this method buffers warnings and throws them away in the case of backtracking. Since the REGex search may be the cause of backtracking, I think these warnings need to be presented always.
> 
> I'm just not sure of the proper way to access SDW in this situation and need to make sure that the messages will not be discarded.
> 
> 


Re: How to add warnings that are not lost due to backtracking

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
PState.scala has methods used for backtracking like resetToPointOfUncertainty and discardPointOfUncertainty. Those ultimately lead to a call that clears the diagnostics I believe.

This should be a simple fix when backtracking to remove only isError() true diagnostics from the list of accumulated diagnostic objects. This would leave the warnings behind to accumulate.

When adding warnings to diagnostics we probably need to be careful to not add duplicates so that this list doesn't become longer and longer over a run.

________________________________
From: Larry Barber <la...@nteligen.com>
Sent: Thursday, December 31, 2020 12:47 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: RE: How to add warnings that are not lost due to backtracking

Thanks' Mike,
So, it would seem that the backtracking code would need to be updated not to remove SDWs.
It seems like the backtracking code would only be active during parsing (runtime) - not compiling - so we could probably just alter the backtrack code to remove everything except SDWs from the Pstate. I'm not familiar with that - would there be a structural issue discarding everything except SDWs?
Or, would you recommend using another method of reporting the regex limit message?

-----Original Message-----
From: Beckerle, Mike [mailto:mbeckerle@owlcyberdefense.com]
Sent: Monday, December 28, 2020 12:15 PM
To: dev@daffodil.apache.org
Subject: Re: How to add warnings that are not lost due to backtracking

So, I wanted to clarify a few things. Then I think I agree we want runtime-issued SDW to not be lost when backtracking.

An SDE, or schema definition error, is most commonly the Daffodil schema compiler telling you your schema isn't meaningful, so parsing/unparsing cannot even be started. We divide up Daffodil into "compiling the schema" or "compile time" and runtime (parse/unparse time).

Some SDEs cannot be detected until runtime, but SDEs are always fatal. I.e, there is never any backtracking from them, because they mean there is something wrong with your DFDL schema.

Processing errors (parse error or unparse error) are errors where your schema is meaningful but the data doesn't match the schema. Some parse errors are a normal part of parsing as they are suppressed by backtracking to try other alternatives.

Schema-definition Warnings (SDW) are not parse errors but the warning version of a SDE. I.e., they suggest a possible error in the schema.  SDWs detected at compile time are always output by the compiler. If an SDW is issued at runtime, there is an interesting question of should those be suppressed by backtracking?

I don't know of runtime SDWs off hand, I searched the source for them, but found only one possibility where the SDW could be issued at runtime. Which is this code in DState.scala:


private def isAnArray(): Boolean = {
  if (!currentNode.isInstanceOf[DIArray]) {
    Assert.invariant(errorOrWarn.isDefined)
    if (currentNode.isInstanceOf[DIElement]) {
      errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path to element %s is not to an array. Suggest using fn:exists instead.", currentElement.name)
    } else {
      errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path is not to an array. Suggest using fn:exists instead.")
    }
    false
  } else {
    true
  }
}

This does get called at runtime. I just would expect path expressions to be compiled and this to have been checked already at compilation time, which should render this runtime check unnecessary, I think. I did not find a test that produces this warning message.

I think a SDW that is warning about a implementation limit like regex match length limit, being reached, should  not be suppressed by backtracking. As you pointed out, such a warning could be telling you about the reason for the backtracking, and suppressing the warning means you would not be able to diagnose why the backtracking is occurring.

Calling these implementation limit hits "schema definition" warnings is ok with me, because the schema goes along with the tunables like the max regex size limit. Both are static things that the data must comply with for parse/unparse to be successful.

I imagine that if you just add an SDW call at runtime, it will put the warning onto the diagnostics in the PState, and they will be discarded on backtracking, but probably that should not happen for runtime SDWs, only for parse errors.

-mikeb




________________________________
From: Larry Barber <la...@nteligen.com>
Sent: Friday, December 18, 2020 2:59 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: RE: How to add warnings that are not lost due to backtracking

I actually ran into this problem with parsing a large jpeg file. I thought that I had uncovered a bug because  the file was not being parsed correctly. Once it was pointed out to me, the problem was solved by changing the tunable to increase the REGex search length, the file parsed as expected. The REGex search failure caused (erroneous) backtracking, so I need to see the information about the search failing.
This is part of Daffodil-412, which required a 2 part solution. The tunable was implemented for the first part, but the second part - the warning message was not.

If SDW is not the way to go, I'd be happy to work with another suggestion.


From: Carlson, Ian [mailto:icarlson@owlcyberdefense.com]
Sent: Friday, December 18, 2020 2:37 PM
To: dev@daffodil.apache.org
Subject: RE: How to add warnings that are not lost due to backtracking

I'm still new at this - but I've found a great way to learn is to invite people to tell me I'm wrong, so here's my two cents.

SDE in particular is generally used to tell the parser that something has gone wrong. This invites the parser to either back up to the most recent point of uncertainty and try another path or fail completely if none exists. That's how we select one branch over another in the cases where there are multiple possible paths.

If we do select a path that turns out to be invalid, we generally don't want those errors to propagate back up the chain, since they are for a "path not taken" and failing in a way that leads us to the correct path is both expected and desired behavior. By extension, warnings encountered on our "path not taken" also get discarded since. For instance, if we have a regex failure looking for the length of a discriminator that ultimately doesn't exist because this is an invalid path - that isn't really a failure at the top level.

So using SDW for a global "something weird you might want to examine" sort of warning is somewhat at odds with the way SDE and SDW are usually used.

Our runtime does generate quite a bit of text - so simply printing to console for a warning is likely to be missed. If we want to have a sort of global log that doesn't get cleared, but also isn't mingled with the runtime console output - we may need a new facility for that.

Side note - there are certain classes of diagnostics around choice branches that don't get discarded currently, which may cause some warnings and errors to escape even though we output a successful infoset. Ticket 2399 discusses this issue, and a partial attempt at a fix is languishing WIP on https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751199918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=O7PvPW3%2B%2B7geK4FnFZbE%2Bv8fPcDpqsUPvk%2FsHPpJeHw%3D&amp;reserved=0<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=QwMTtKw%2FwRyUjjTLeRLGNNevVcuxrYgq1oy26O56vHc%3D&amp;reserved=0>. The short version being that I wouldn't want to rely on any information from SDW or SDE escaping a "path not taken" once that fix is in place.

[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
[Owl Cyber Defense]
W  icarlson@owlcyberdefense.com<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vmzsHrJvQHchI%2BT%2Fpdc650Hy4t6bsRCUEAWYjw7%2BZuA%3D&amp;reserved=0>
Connect with us!
Facebook<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.facebook.com%2Fowlcyberdefense%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ybumIVSPrM78R6H2xb9zDrEtGRIpjfNNgUR1sE%2FMUqo%3D&amp;reserved=0> | LinkedIn<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fowlcyberdefense%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=P%2FxXmQfAK9dR8TRObM%2FDIxvUgIZbgEpsPnidsUEuMKI%3D&amp;reserved=0> | Twitter<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Ftwitter.com%2Fowlcyberdefense&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=elegVzECF0zrulnfq%2Fiq%2B%2B0VWR8wGzwLLdaj9XJQdo8%3D&amp;reserved=0>

[Find us at our next event. Click Here.]<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2Fresources%2Fevents%2F%3Futm_source%3Dowl-cyber-defense%26utm_medium%3Demail%26utm_content%3Dbanner%26utm_campaign%3D2020-events&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751219915%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=QsycqteQLntlaDfUDc9Cr7DhftxxjVgnZI0y%2BBpVWPM%3D&amp;reserved=0>

The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
If you have received this transmission in error, please notify the sender immediately


From: Larry Barber<ma...@nteligen.com>
Sent: Friday, December 18, 2020 12:49 PM
To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
Subject: How to add warnings that are not lost due to backtracking

I hoping someone could give me some pointers on adding a warning message to the Daffodil io code.
I'm looking at Daffodil-412 and want to generate a warning message when the REGex search gets expanded and another if it exceeds the tunable for maximum length.

I've located the code that does these expansions in io/InputSourceDaraInputStream.scala, but I'm unsure how to generate the warning messages.
I don't see any other warning messages being generated in the io code. I've seen several instances in core that just use SDW(...) and others in DSOM that use context.SDW(...), but I'm confused about this - I'm afraid that this method buffers warnings and throws them away in the case of backtracking. Since the REGex search may be the cause of backtracking, I think these warnings need to be presented always.

I'm just not sure of the proper way to access SDW in this situation and need to make sure that the messages will not be discarded.


RE: How to add warnings that are not lost due to backtracking

Posted by Larry Barber <la...@nteligen.com>.
Thanks' Mike,
So, it would seem that the backtracking code would need to be updated not to remove SDWs.
It seems like the backtracking code would only be active during parsing (runtime) - not compiling - so we could probably just alter the backtrack code to remove everything except SDWs from the Pstate. I'm not familiar with that - would there be a structural issue discarding everything except SDWs?
Or, would you recommend using another method of reporting the regex limit message?

-----Original Message-----
From: Beckerle, Mike [mailto:mbeckerle@owlcyberdefense.com] 
Sent: Monday, December 28, 2020 12:15 PM
To: dev@daffodil.apache.org
Subject: Re: How to add warnings that are not lost due to backtracking

So, I wanted to clarify a few things. Then I think I agree we want runtime-issued SDW to not be lost when backtracking.

An SDE, or schema definition error, is most commonly the Daffodil schema compiler telling you your schema isn't meaningful, so parsing/unparsing cannot even be started. We divide up Daffodil into "compiling the schema" or "compile time" and runtime (parse/unparse time).

Some SDEs cannot be detected until runtime, but SDEs are always fatal. I.e, there is never any backtracking from them, because they mean there is something wrong with your DFDL schema.

Processing errors (parse error or unparse error) are errors where your schema is meaningful but the data doesn't match the schema. Some parse errors are a normal part of parsing as they are suppressed by backtracking to try other alternatives.

Schema-definition Warnings (SDW) are not parse errors but the warning version of a SDE. I.e., they suggest a possible error in the schema.  SDWs detected at compile time are always output by the compiler. If an SDW is issued at runtime, there is an interesting question of should those be suppressed by backtracking?

I don't know of runtime SDWs off hand, I searched the source for them, but found only one possibility where the SDW could be issued at runtime. Which is this code in DState.scala:


private def isAnArray(): Boolean = {
  if (!currentNode.isInstanceOf[DIArray]) {
    Assert.invariant(errorOrWarn.isDefined)
    if (currentNode.isInstanceOf[DIElement]) {
      errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path to element %s is not to an array. Suggest using fn:exists instead.", currentElement.name)
    } else {
      errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path is not to an array. Suggest using fn:exists instead.")
    }
    false
  } else {
    true
  }
}

This does get called at runtime. I just would expect path expressions to be compiled and this to have been checked already at compilation time, which should render this runtime check unnecessary, I think. I did not find a test that produces this warning message.

I think a SDW that is warning about a implementation limit like regex match length limit, being reached, should  not be suppressed by backtracking. As you pointed out, such a warning could be telling you about the reason for the backtracking, and suppressing the warning means you would not be able to diagnose why the backtracking is occurring.

Calling these implementation limit hits "schema definition" warnings is ok with me, because the schema goes along with the tunables like the max regex size limit. Both are static things that the data must comply with for parse/unparse to be successful.

I imagine that if you just add an SDW call at runtime, it will put the warning onto the diagnostics in the PState, and they will be discarded on backtracking, but probably that should not happen for runtime SDWs, only for parse errors.

-mikeb




________________________________
From: Larry Barber <la...@nteligen.com>
Sent: Friday, December 18, 2020 2:59 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: RE: How to add warnings that are not lost due to backtracking

I actually ran into this problem with parsing a large jpeg file. I thought that I had uncovered a bug because  the file was not being parsed correctly. Once it was pointed out to me, the problem was solved by changing the tunable to increase the REGex search length, the file parsed as expected. The REGex search failure caused (erroneous) backtracking, so I need to see the information about the search failing.
This is part of Daffodil-412, which required a 2 part solution. The tunable was implemented for the first part, but the second part - the warning message was not.

If SDW is not the way to go, I'd be happy to work with another suggestion.


From: Carlson, Ian [mailto:icarlson@owlcyberdefense.com]
Sent: Friday, December 18, 2020 2:37 PM
To: dev@daffodil.apache.org
Subject: RE: How to add warnings that are not lost due to backtracking

I'm still new at this - but I've found a great way to learn is to invite people to tell me I'm wrong, so here's my two cents.

SDE in particular is generally used to tell the parser that something has gone wrong. This invites the parser to either back up to the most recent point of uncertainty and try another path or fail completely if none exists. That's how we select one branch over another in the cases where there are multiple possible paths.

If we do select a path that turns out to be invalid, we generally don't want those errors to propagate back up the chain, since they are for a "path not taken" and failing in a way that leads us to the correct path is both expected and desired behavior. By extension, warnings encountered on our "path not taken" also get discarded since. For instance, if we have a regex failure looking for the length of a discriminator that ultimately doesn't exist because this is an invalid path - that isn't really a failure at the top level.

So using SDW for a global "something weird you might want to examine" sort of warning is somewhat at odds with the way SDE and SDW are usually used.

Our runtime does generate quite a bit of text - so simply printing to console for a warning is likely to be missed. If we want to have a sort of global log that doesn't get cleared, but also isn't mingled with the runtime console output - we may need a new facility for that.

Side note - there are certain classes of diagnostics around choice branches that don't get discarded currently, which may cause some warnings and errors to escape even though we output a successful infoset. Ticket 2399 discusses this issue, and a partial attempt at a fix is languishing WIP on https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751199918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=O7PvPW3%2B%2B7geK4FnFZbE%2Bv8fPcDpqsUPvk%2FsHPpJeHw%3D&amp;reserved=0<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=QwMTtKw%2FwRyUjjTLeRLGNNevVcuxrYgq1oy26O56vHc%3D&amp;reserved=0>. The short version being that I wouldn't want to rely on any information from SDW or SDE escaping a "path not taken" once that fix is in place.

[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
[Owl Cyber Defense]
W  icarlson@owlcyberdefense.com<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vmzsHrJvQHchI%2BT%2Fpdc650Hy4t6bsRCUEAWYjw7%2BZuA%3D&amp;reserved=0>
Connect with us!
Facebook<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.facebook.com%2Fowlcyberdefense%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ybumIVSPrM78R6H2xb9zDrEtGRIpjfNNgUR1sE%2FMUqo%3D&amp;reserved=0> | LinkedIn<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fowlcyberdefense%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=P%2FxXmQfAK9dR8TRObM%2FDIxvUgIZbgEpsPnidsUEuMKI%3D&amp;reserved=0> | Twitter<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Ftwitter.com%2Fowlcyberdefense&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751209914%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=elegVzECF0zrulnfq%2Fiq%2B%2B0VWR8wGzwLLdaj9XJQdo8%3D&amp;reserved=0>

[Find us at our next event. Click Here.]<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2Fresources%2Fevents%2F%3Futm_source%3Dowl-cyber-defense%26utm_medium%3Demail%26utm_content%3Dbanner%26utm_campaign%3D2020-events&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C625e388582344f35954308d8ab541f83%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637447725751219915%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=QsycqteQLntlaDfUDc9Cr7DhftxxjVgnZI0y%2BBpVWPM%3D&amp;reserved=0>

The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
If you have received this transmission in error, please notify the sender immediately


From: Larry Barber<ma...@nteligen.com>
Sent: Friday, December 18, 2020 12:49 PM
To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
Subject: How to add warnings that are not lost due to backtracking

I hoping someone could give me some pointers on adding a warning message to the Daffodil io code.
I'm looking at Daffodil-412 and want to generate a warning message when the REGex search gets expanded and another if it exceeds the tunable for maximum length.

I've located the code that does these expansions in io/InputSourceDaraInputStream.scala, but I'm unsure how to generate the warning messages.
I don't see any other warning messages being generated in the io code. I've seen several instances in core that just use SDW(...) and others in DSOM that use context.SDW(...), but I'm confused about this - I'm afraid that this method buffers warnings and throws them away in the case of backtracking. Since the REGex search may be the cause of backtracking, I think these warnings need to be presented always.

I'm just not sure of the proper way to access SDW in this situation and need to make sure that the messages will not be discarded.


Re: How to add warnings that are not lost due to backtracking

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
So, I wanted to clarify a few things. Then I think I agree we want runtime-issued SDW to not be lost when backtracking.

An SDE, or schema definition error, is most commonly the Daffodil schema compiler telling you your schema isn't meaningful, so parsing/unparsing cannot even be started. We divide up Daffodil into "compiling the schema" or "compile time" and runtime (parse/unparse time).

Some SDEs cannot be detected until runtime, but SDEs are always fatal. I.e, there is never any backtracking from them, because they mean there is something wrong with your DFDL schema.

Processing errors (parse error or unparse error) are errors where your schema is meaningful but the data doesn't match the schema. Some parse errors are a normal part of parsing as they are suppressed by backtracking to try other alternatives.

Schema-definition Warnings (SDW) are not parse errors but the warning version of a SDE. I.e., they suggest a possible error in the schema.  SDWs detected at compile time are always output by the compiler. If an SDW is issued at runtime, there is an interesting question of should those be suppressed by backtracking?

I don't know of runtime SDWs off hand, I searched the source for them, but found only one possibility where the SDW could be issued at runtime. Which is this code in DState.scala:


private def isAnArray(): Boolean = {
  if (!currentNode.isInstanceOf[DIArray]) {
    Assert.invariant(errorOrWarn.isDefined)
    if (currentNode.isInstanceOf[DIElement]) {
      errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path to element %s is not to an array. Suggest using fn:exists instead.", currentElement.name)
    } else {
      errorOrWarn.get.SDW(WarnID.PathNotToArray, "The specified path is not to an array. Suggest using fn:exists instead.")
    }
    false
  } else {
    true
  }
}

This does get called at runtime. I just would expect path expressions to be compiled and this to have been checked already at compilation time, which should render this runtime check unnecessary, I think. I did not find a test that produces this warning message.

I think a SDW that is warning about a implementation limit like regex match length limit, being reached, should  not be suppressed by backtracking. As you pointed out, such a warning could be telling you about the reason for the backtracking, and suppressing the warning means you would not be able to diagnose why the backtracking is occurring.

Calling these implementation limit hits "schema definition" warnings is ok with me, because the schema goes along with the tunables like the max regex size limit. Both are static things that the data must comply with for parse/unparse to be successful.

I imagine that if you just add an SDW call at runtime, it will put the warning onto the diagnostics in the PState, and they will be discarded on backtracking, but probably that should not happen for runtime SDWs, only for parse errors.

-mikeb




________________________________
From: Larry Barber <la...@nteligen.com>
Sent: Friday, December 18, 2020 2:59 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: RE: How to add warnings that are not lost due to backtracking

I actually ran into this problem with parsing a large jpeg file. I thought that I had uncovered a bug because  the file was not being parsed correctly. Once it was pointed out to me, the problem was solved by changing the tunable to increase the REGex search length, the file parsed as expected. The REGex search failure caused (erroneous) backtracking, so I need to see the information about the search failing.
This is part of Daffodil-412, which required a 2 part solution. The tunable was implemented for the first part, but the second part - the warning message was not.

If SDW is not the way to go, I'd be happy to work with another suggestion.


From: Carlson, Ian [mailto:icarlson@owlcyberdefense.com]
Sent: Friday, December 18, 2020 2:37 PM
To: dev@daffodil.apache.org
Subject: RE: How to add warnings that are not lost due to backtracking

I'm still new at this - but I've found a great way to learn is to invite people to tell me I'm wrong, so here's my two cents.

SDE in particular is generally used to tell the parser that something has gone wrong. This invites the parser to either back up to the most recent point of uncertainty and try another path or fail completely if none exists. That's how we select one branch over another in the cases where there are multiple possible paths.

If we do select a path that turns out to be invalid, we generally don't want those errors to propagate back up the chain, since they are for a "path not taken" and failing in a way that leads us to the correct path is both expected and desired behavior. By extension, warnings encountered on our "path not taken" also get discarded since. For instance, if we have a regex failure looking for the length of a discriminator that ultimately doesn't exist because this is an invalid path - that isn't really a failure at the top level.

So using SDW for a global "something weird you might want to examine" sort of warning is somewhat at odds with the way SDE and SDW are usually used.

Our runtime does generate quite a bit of text - so simply printing to console for a warning is likely to be missed. If we want to have a sort of global log that doesn't get cleared, but also isn't mingled with the runtime console output - we may need a new facility for that.

Side note - there are certain classes of diagnostics around choice branches that don't get discarded currently, which may cause some warnings and errors to escape even though we output a successful infoset. Ticket 2399 discusses this issue, and a partial attempt at a fix is languishing WIP on https://github.com/apache/incubator-daffodil/pull/444<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=kbcZFqwx5eFlNoL0YbogmKvNZ35oKZW6AxzLVadJaKc%3D&reserved=0>. The short version being that I wouldn't want to rely on any information from SDW or SDE escaping a "path not taken" once that fix is in place.

[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
[Owl Cyber Defense]
W  icarlson@owlcyberdefense.com<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=bs1FXboduaYSt80y5vDzoqomiA06rrsU95a%2BbXal9bQ%3D&reserved=0>
Connect with us!
Facebook<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.facebook.com%2Fowlcyberdefense%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=%2F%2BBHSw8LkVl1Or4M0QuecYfyVdiLJPr9Jp2jnp51Eus%3D&reserved=0> | LinkedIn<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fowlcyberdefense%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=pmypykaqtCXhz2ouRUHU67vSADVmF2seFcpJJlhfSsg%3D&reserved=0> | Twitter<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Ftwitter.com%2Fowlcyberdefense&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=ec%2FeGxSnfJ1bS73sr3x5U7v%2FOyTT40xxY4SclD%2FY8cE%3D&reserved=0>

[Find us at our next event. Click Here.]<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2Fresources%2Fevents%2F%3Futm_source%3Dowl-cyber-defense%26utm_medium%3Demail%26utm_content%3Dbanner%26utm_campaign%3D2020-events&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=VEqfoXLudcBsZ9Yl770496AlbhfmvZlc1wx%2BeP7XO%2Fw%3D&reserved=0>

The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
If you have received this transmission in error, please notify the sender immediately


From: Larry Barber<ma...@nteligen.com>
Sent: Friday, December 18, 2020 12:49 PM
To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
Subject: How to add warnings that are not lost due to backtracking

I hoping someone could give me some pointers on adding a warning message to the Daffodil io code.
I'm looking at Daffodil-412 and want to generate a warning message when the REGex search gets expanded and another if it exceeds the tunable for maximum length.

I've located the code that does these expansions in io/InputSourceDaraInputStream.scala, but I'm unsure how to generate the warning messages.
I don't see any other warning messages being generated in the io code. I've seen several instances in core that just use SDW(...) and others in DSOM that use context.SDW(...), but I'm confused about this - I'm afraid that this method buffers warnings and throws them away in the case of backtracking. Since the REGex search may be the cause of backtracking, I think these warnings need to be presented always.

I'm just not sure of the proper way to access SDW in this situation and need to make sure that the messages will not be discarded.


RE: How to add warnings that are not lost due to backtracking

Posted by Larry Barber <la...@nteligen.com>.
I actually ran into this problem with parsing a large jpeg file. I thought that I had uncovered a bug because  the file was not being parsed correctly. Once it was pointed out to me, the problem was solved by changing the tunable to increase the REGex search length, the file parsed as expected. The REGex search failure caused (erroneous) backtracking, so I need to see the information about the search failing.
This is part of Daffodil-412, which required a 2 part solution. The tunable was implemented for the first part, but the second part - the warning message was not.

If SDW is not the way to go, I'd be happy to work with another suggestion.


From: Carlson, Ian [mailto:icarlson@owlcyberdefense.com]
Sent: Friday, December 18, 2020 2:37 PM
To: dev@daffodil.apache.org
Subject: RE: How to add warnings that are not lost due to backtracking

I'm still new at this - but I've found a great way to learn is to invite people to tell me I'm wrong, so here's my two cents.

SDE in particular is generally used to tell the parser that something has gone wrong. This invites the parser to either back up to the most recent point of uncertainty and try another path or fail completely if none exists. That's how we select one branch over another in the cases where there are multiple possible paths.

If we do select a path that turns out to be invalid, we generally don't want those errors to propagate back up the chain, since they are for a "path not taken" and failing in a way that leads us to the correct path is both expected and desired behavior. By extension, warnings encountered on our "path not taken" also get discarded since. For instance, if we have a regex failure looking for the length of a discriminator that ultimately doesn't exist because this is an invalid path - that isn't really a failure at the top level.

So using SDW for a global "something weird you might want to examine" sort of warning is somewhat at odds with the way SDE and SDW are usually used.

Our runtime does generate quite a bit of text - so simply printing to console for a warning is likely to be missed. If we want to have a sort of global log that doesn't get cleared, but also isn't mingled with the runtime console output - we may need a new facility for that.

Side note - there are certain classes of diagnostics around choice branches that don't get discarded currently, which may cause some warnings and errors to escape even though we output a successful infoset. Ticket 2399 discusses this issue, and a partial attempt at a fix is languishing WIP on https://github.com/apache/incubator-daffodil/pull/444<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-daffodil%2Fpull%2F444&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=kbcZFqwx5eFlNoL0YbogmKvNZ35oKZW6AxzLVadJaKc%3D&reserved=0>. The short version being that I wouldn't want to rely on any information from SDW or SDE escaping a "path not taken" once that fix is in place.

[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
[Owl Cyber Defense]
W  icarlson@owlcyberdefense.com<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=bs1FXboduaYSt80y5vDzoqomiA06rrsU95a%2BbXal9bQ%3D&reserved=0>
Connect with us!
Facebook<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.facebook.com%2Fowlcyberdefense%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=%2F%2BBHSw8LkVl1Or4M0QuecYfyVdiLJPr9Jp2jnp51Eus%3D&reserved=0> | LinkedIn<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fowlcyberdefense%2F&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=pmypykaqtCXhz2ouRUHU67vSADVmF2seFcpJJlhfSsg%3D&reserved=0> | Twitter<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Ftwitter.com%2Fowlcyberdefense&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=ec%2FeGxSnfJ1bS73sr3x5U7v%2FOyTT40xxY4SclD%2FY8cE%3D&reserved=0>

[Find us at our next event. Click Here.]<https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fowlcyberdefense.com%2Fresources%2Fevents%2F%3Futm_source%3Dowl-cyber-defense%26utm_medium%3Demail%26utm_content%3Dbanner%26utm_campaign%3D2020-events&data=04%7C01%7Clarry.barber%40nteligen.com%7Cc8f0ac12f1c3496ebe5a08d8a38c4668%7C379c214c5c944e86a6062d047675f02a%7C0%7C1%7C637439170732507978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=VEqfoXLudcBsZ9Yl770496AlbhfmvZlc1wx%2BeP7XO%2Fw%3D&reserved=0>

The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
If you have received this transmission in error, please notify the sender immediately


From: Larry Barber<ma...@nteligen.com>
Sent: Friday, December 18, 2020 12:49 PM
To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
Subject: How to add warnings that are not lost due to backtracking

I hoping someone could give me some pointers on adding a warning message to the Daffodil io code.
I'm looking at Daffodil-412 and want to generate a warning message when the REGex search gets expanded and another if it exceeds the tunable for maximum length.

I've located the code that does these expansions in io/InputSourceDaraInputStream.scala, but I'm unsure how to generate the warning messages.
I don't see any other warning messages being generated in the io code. I've seen several instances in core that just use SDW(...) and others in DSOM that use context.SDW(...), but I'm confused about this - I'm afraid that this method buffers warnings and throws them away in the case of backtracking. Since the REGex search may be the cause of backtracking, I think these warnings need to be presented always.

I'm just not sure of the proper way to access SDW in this situation and need to make sure that the messages will not be discarded.


RE: How to add warnings that are not lost due to backtracking

Posted by "Carlson, Ian" <ic...@owlcyberdefense.com>.
I’m still new at this – but I’ve found a great way to learn is to invite people to tell me I’m wrong, so here’s my two cents.

SDE in particular is generally used to tell the parser that something has gone wrong. This invites the parser to either back up to the most recent point of uncertainty and try another path or fail completely if none exists. That’s how we select one branch over another in the cases where there are multiple possible paths.

If we do select a path that turns out to be invalid, we generally don’t want those errors to propagate back up the chain, since they are for a “path not taken” and failing in a way that leads us to the correct path is both expected and desired behavior. By extension, warnings encountered on our “path not taken” also get discarded since. For instance, if we have a regex failure looking for the length of a discriminator that ultimately doesn’t exist because this is an invalid path - that isn’t really a failure at the top level.

So using SDW for a global “something weird you might want to examine” sort of warning is somewhat at odds with the way SDE and SDW are usually used.

Our runtime does generate quite a bit of text – so simply printing to console for a warning is likely to be missed. If we want to have a sort of global log that doesn’t get cleared, but also isn’t mingled with the runtime console output – we may need a new facility for that.

Side note – there are certain classes of diagnostics around choice branches that don’t get discarded currently, which may cause some warnings and errors to escape even though we output a successful infoset. Ticket 2399 discusses this issue, and a partial attempt at a fix is languishing WIP on https://github.com/apache/incubator-daffodil/pull/444. The short version being that I wouldn’t want to rely on any information from SDW or SDE escaping a “path not taken” once that fix is in place.

[A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
[Owl Cyber Defense]
W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
Connect with us!
Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>

[Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>

The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
If you have received this transmission in error, please notify the sender immediately


From: Larry Barber<ma...@nteligen.com>
Sent: Friday, December 18, 2020 12:49 PM
To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
Subject: How to add warnings that are not lost due to backtracking

I hoping someone could give me some pointers on adding a warning message to the Daffodil io code.
I'm looking at Daffodil-412 and want to generate a warning message when the REGex search gets expanded and another if it exceeds the tunable for maximum length.

I've located the code that does these expansions in io/InputSourceDaraInputStream.scala, but I'm unsure how to generate the warning messages.
I don't see any other warning messages being generated in the io code. I've seen several instances in core that just use SDW(...) and others in DSOM that use context.SDW(...), but I'm confused about this - I'm afraid that this method buffers warnings and throws them away in the case of backtracking. Since the REGex search may be the cause of backtracking, I think these warnings need to be presented always.

I'm just not sure of the proper way to access SDW in this situation and need to make sure that the messages will not be discarded.