You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/11/12 18:21:53 UTC

[GitHub] [daffodil] stevedlawrence commented on pull request #674: Work around issue resetting predefined vars

stevedlawrence commented on pull request #674:
URL: https://github.com/apache/daffodil/pull/674#issuecomment-967322823


   I dug into this issue a bit and think I've track down the cause, still not sure of the best fix. Here's a pretty minimal schema that reproduces the issue:
   
   ```xml
   <xs:schema
     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
     xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
     <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
   
     <xs:annotation>
       <xs:appinfo source="http://www.ogf.org/dfdl/">
         <dfdl:defineVariable defaultValue="x" name="var1" type="xs:string"/> 
         <dfdl:format ref="GeneralFormat" />
       </xs:appinfo>
     </xs:annotation>
     
     <xs:element name="root">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="opt" minOccurs="0">
             <xs:complexType>
               <xs:sequence>
                 <xs:sequence>
                   <xs:annotation>
                     <xs:appinfo source="http://www.ogf.org/dfdl/">
                       <dfdl:assert test="{ $var1 eq 'x' }" />
                     </xs:appinfo>
                   </xs:annotation>
                 </xs:sequence>
                 <xs:sequence>
                   <xs:annotation>
                     <xs:appinfo source="http://www.ogf.org/dfdl/">
                       <dfdl:assert test="{ $var1 eq 'y' }" />
                     </xs:appinfo>
                   </xs:annotation>
                 </xs:sequence>
               </xs:sequence>
             </xs:complexType>
           </xs:element>
         </xs:sequence>
       </xs:complexType>
     </xs:element>
     
   </xs:schema>
   ```
   So we define a variable with a default value, in this case `x`. We have an optional element which creates a point of uncertainty. Within this PoU we read the variable twice, once in the first assert (which passes) and again in the second assert (which fails). The second assert fails and so we need to back track to the PoU and undo the changes to the variable state.
   
   How we undo this state is broken. For our variables, the variables themselves keep track of the current state and the prior state. When we need to reset back to the PoU, sometimes we use this prior state, but sometimes we just make assumptions about what the prior state should have been. These assumptions aren't always correct. This test finds a case where this isn't correct.
   
   What happens here is we read a variable twice, so we mark it as needing to be reset to the prior states twice to get back to the correct state. But the second read didn't actually change the state. So when we reset once we reset to VariableDefined. And when we reset again we hit this case. Note that in some cases we do actually need to reset twice, so it's not just a matter of ignoring duplicate resets.
   
   I think the core issue here is that variables are trying to keep track of their own state, and only really keep track of their prior state, and then we make assumptions from there. I think the *right* fix is probably to handle variables like we do other state and PoU's and just take a snapshot. When we reset back to the PoU we just restore that snapshot. Unfortunately, variables are a bit more complicated and are mutable, so taking a snapshot may add some complication and overhead.
   
   Maybe another option is instead of PState snapshotting the entire variable map, we instead just snapshot the state of each individual value prior to the change. So it's a sort of copy-on-write behavior. Then when we reset back to the PoU, we just need to reset back to the state of things that were changed. That might be a better approach with less overhead.
   
   Note that both EDIFACT and NACHA hit this double-read issue but in different ways. NACHA reads an `encoding` variable multiple times in a PoU, and EDIFACT reads variables related to escape schemas multiple times in a PoU.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org