You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@daffodil.apache.org by "Adams, Joshua" <ja...@owlcyberdefense.com> on 2020/12/21 14:54:22 UTC

Implementing variable "direction" property

I've been working on DAFFODIL-2429, which is adding a "direction" property to defineVariable. I believe I have most of the implementation in place. It is primarily implemented as follows:

dfdl:defineVariable has a property "dfdlx:direction" which is an enumeration with the following values: "parseOnly", "unparseOnly", or "both" which is the default value.

When we are compiling the schema and are about to generate a SetVariabler or NewVariableInstance parser/unparser, we check the "direction" property and if the direction does not match (ie. Creating a SetVariable parser when the variable in question is "unparseOnly") we instead create a NadaParser.

I'm not 100% sure that this is necessarily the correct approach, but it doesn't break any existing tests. Speaking of tests, I am attempting to create a test to demonstrate this feature based off the pull request mentioned in the bug ticket: https://github.com/DFDLSchemas/PCAP/pull/10

The schema for my test is as follows:

    <dfdl:defineVariable name="remainingAddr" type="xs:string" dfdlx:direction="unparseOnly" />
    <xs:element name="root">
      <xs:complexType>
        <xs:sequence>
          <xs:annotation>
            <xs:appinfo source="http://www.ogf.org/dfdl/">
              <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ ex:IPsrc }" />
            </xs:appinfo>
          </xs:annotation>
          <xs:element name="byte1" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
          <xs:sequence>
            <xs:annotation>
              <xs:appinfo source="http://www.ogf.org/dfdl/">
                <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
              </xs:appinfo>
            </xs:annotation>
            <xs:element name="byte2" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
            <xs:sequence>
              <xs:annotation>
                <xs:appinfo source="http://www.ogf.org/dfdl/">
                  <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
                </xs:appinfo>
              </xs:annotation>
              <xs:element name="byte3" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
              <xs:sequence>
                <xs:annotation>
                  <xs:appinfo source="http://www.ogf.org/dfdl/">
                    <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
                  </xs:appinfo>
                </xs:annotation>
                <xs:element name="byte4" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
              </xs:sequence>
            </xs:sequence>
          </xs:sequence>
          <xs:element name="IPsrc" type="xs:string" dfdl:lengthKind="explicit" dfdl:length="7"
            dfdl:inputValueCalc="{ fn:concat(../ex:byte1, '.', ../ex:byte2, '.', ../ex:byte3, '.', ../ex:byte4) }" />
        </xs:sequence>
      </xs:complexType>
    </xs:element>

This is pretty much a direct copy of the schema in the pull request that on parse will parse 4 bytes and then use an inputValueCalc to combine the 4 bytes into an IP address. On unparse outputValueCalc's is used to pull apart the combined address back into the individual bytes. During unparse, the byte* elements do a forward reference to IPsrc and this seems to be causing a problem in my test. I'm getting a "Unparse Error: Expression Evaluation Error: Child element {http://example.com}IPsrc does not exist".

So, my question regarding the test is should this work or am I missing something that is preventing this forward reference from working during unparsing?

Josh Adams

Re: Implementing variable "direction" property

Posted by "Adams, Joshua" <ja...@owlcyberdefense.com>.

Good catch!  I had not implemented the SuspendableExpression stuff for NewVariableInstance.  It's going to take a little re-architecture as I had been treating expressions in newVariableInstance default values a little differently than expressions in setVariable, but I expect it will start working or at least uncover the next hurdle in getting this working.

As Steve said, this suspending can definitely cause some interesting things to happen with newVariableInstance, so I'll need to make sure I add proper test coverage.  If I'm lucky, I'll get this finished before I start Christmas vacation and forget everything, although my daycare situation implies I'm running low on luck, so we will see.

Josh
________________________________
From: Beckerle, Mike <mb...@owlcyberdefense.com>
Sent: Monday, December 21, 2020 10:25 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Implementing variable "direction" property

Are you missing the use of a SuspendableExpression for NewVariableInstance?

I see SetVariableSuspendableExpression used in the Daffodil code base for SetVariable.

This suggests that the mechanism to suspend on a variable-reference that hits a variable the assignment of which is suspended will work.

(This seems to provide at least part of what Stevev L was pointing out as missing)

I see no equivalent for NewVariableInstance  in ExpressionEvaluatingUnparsers.scala

________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Monday, December 21, 2020 10:16 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Implementing variable "direction" property

This is interesting. Normally, newVariableInstance/setVariable are not
allowed to have forward referencing expressions. This makes sense
because expression evaluated during parse must only be backwards
referencing. And normally the same newVariableInstance/setVariable
expressions are use during parse and unparse.

But with this new direction concept, unparseOnly expressions could
theoretically be allowed to be forward referencing, similar to how
outputValueCalc are allowed to be forward referencing. They are both
unparse only, so is fine.

But this means an unparseOnly NVI/SV expression evaluation must be
suspendable. And this means variables must have a concept of "I've been
set, but don't have a value yet because I'm waiting for some infoset
element to show up".

And this leads to a cascade of interesting behaviors. For example,
because our newVariableInstance must suspend, the OVC's that access that
variable must suspend when they try to use it but it has no value yet.
Fortnately, OVC's can already suspend, so I suspect this shouldn't be
too hard to add logic to suspend on "variable set but no value yet".

Things might also get tricky because variable state is mutable. When we
suspend a NVI/SV, we have to remember what instance was of that variable
we suspended at. And accesses to those (such as OVC) must also remember
what instance was accessed. I suspect this can all be made to work the
suspension and UState clones, but certainly complicates logic, and at
the very least might be worth considering if there is a different
approach to "direction" of variables.

On 12/21/20 9:54 AM, Adams, Joshua wrote:
> I've been working on DAFFODIL-2429, which is adding a "direction" property to defineVariable. I believe I have most of the implementation in place. It is primarily implemented as follows:
>
> dfdl:defineVariable has a property "dfdlx:direction" which is an enumeration with the following values: "parseOnly", "unparseOnly", or "both" which is the default value.
>
> When we are compiling the schema and are about to generate a SetVariabler or NewVariableInstance parser/unparser, we check the "direction" property and if the direction does not match (ie. Creating a SetVariable parser when the variable in question is "unparseOnly") we instead create a NadaParser.
>
> I'm not 100% sure that this is necessarily the correct approach, but it doesn't break any existing tests. Speaking of tests, I am attempting to create a test to demonstrate this feature based off the pull request mentioned in the bug ticket: https://github.com/DFDLSchemas/PCAP/pull/10
>
> The schema for my test is as follows:
>
>     <dfdl:defineVariable name="remainingAddr" type="xs:string" dfdlx:direction="unparseOnly" />
>     <xs:element name="root">
>       <xs:complexType>
>         <xs:sequence>
>           <xs:annotation>
>             <xs:appinfo source="http://www.ogf.org/dfdl/">
>               <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ ex:IPsrc }" />
>             </xs:appinfo>
>           </xs:annotation>
>           <xs:element name="byte1" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>           <xs:sequence>
>             <xs:annotation>
>               <xs:appinfo source="http://www.ogf.org/dfdl/">
>                 <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
>               </xs:appinfo>
>             </xs:annotation>
>             <xs:element name="byte2" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>             <xs:sequence>
>               <xs:annotation>
>                 <xs:appinfo source="http://www.ogf.org/dfdl/">
>                   <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
>                 </xs:appinfo>
>               </xs:annotation>
>               <xs:element name="byte3" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>               <xs:sequence>
>                 <xs:annotation>
>                   <xs:appinfo source="http://www.ogf.org/dfdl/">
>                     <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
>                   </xs:appinfo>
>                 </xs:annotation>
>                 <xs:element name="byte4" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>               </xs:sequence>
>             </xs:sequence>
>           </xs:sequence>
>           <xs:element name="IPsrc" type="xs:string" dfdl:lengthKind="explicit" dfdl:length="7"
>             dfdl:inputValueCalc="{ fn:concat(../ex:byte1, '.', ../ex:byte2, '.', ../ex:byte3, '.', ../ex:byte4) }" />
>         </xs:sequence>
>       </xs:complexType>
>     </xs:element>
>
> This is pretty much a direct copy of the schema in the pull request that on parse will parse 4 bytes and then use an inputValueCalc to combine the 4 bytes into an IP address. On unparse outputValueCalc's is used to pull apart the combined address back into the individual bytes. During unparse, the byte* elements do a forward reference to IPsrc and this seems to be causing a problem in my test. I'm getting a "Unparse Error: Expression Evaluation Error: Child element {http://example.com}IPsrc does not exist".
>
> So, my question regarding the test is should this work or am I missing something that is preventing this forward reference from working during unparsing?
>
> Josh Adams
>

Re: Implementing variable "direction" property

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

Are you missing the use of a SuspendableExpression for NewVariableInstance?

I see SetVariableSuspendableExpression used in the Daffodil code base for SetVariable.

This suggests that the mechanism to suspend on a variable-reference that hits a variable the assignment of which is suspended will work.

(This seems to provide at least part of what Stevev L was pointing out as missing)

I see no equivalent for NewVariableInstance  in ExpressionEvaluatingUnparsers.scala








________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Monday, December 21, 2020 10:16 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Implementing variable "direction" property

This is interesting. Normally, newVariableInstance/setVariable are not
allowed to have forward referencing expressions. This makes sense
because expression evaluated during parse must only be backwards
referencing. And normally the same newVariableInstance/setVariable
expressions are use during parse and unparse.

But with this new direction concept, unparseOnly expressions could
theoretically be allowed to be forward referencing, similar to how
outputValueCalc are allowed to be forward referencing. They are both
unparse only, so is fine.

But this means an unparseOnly NVI/SV expression evaluation must be
suspendable. And this means variables must have a concept of "I've been
set, but don't have a value yet because I'm waiting for some infoset
element to show up".

And this leads to a cascade of interesting behaviors. For example,
because our newVariableInstance must suspend, the OVC's that access that
variable must suspend when they try to use it but it has no value yet.
Fortnately, OVC's can already suspend, so I suspect this shouldn't be
too hard to add logic to suspend on "variable set but no value yet".

Things might also get tricky because variable state is mutable. When we
suspend a NVI/SV, we have to remember what instance was of that variable
we suspended at. And accesses to those (such as OVC) must also remember
what instance was accessed. I suspect this can all be made to work the
suspension and UState clones, but certainly complicates logic, and at
the very least might be worth considering if there is a different
approach to "direction" of variables.

On 12/21/20 9:54 AM, Adams, Joshua wrote:
> I've been working on DAFFODIL-2429, which is adding a "direction" property to defineVariable. I believe I have most of the implementation in place. It is primarily implemented as follows:
>
> dfdl:defineVariable has a property "dfdlx:direction" which is an enumeration with the following values: "parseOnly", "unparseOnly", or "both" which is the default value.
>
> When we are compiling the schema and are about to generate a SetVariabler or NewVariableInstance parser/unparser, we check the "direction" property and if the direction does not match (ie. Creating a SetVariable parser when the variable in question is "unparseOnly") we instead create a NadaParser.
>
> I'm not 100% sure that this is necessarily the correct approach, but it doesn't break any existing tests. Speaking of tests, I am attempting to create a test to demonstrate this feature based off the pull request mentioned in the bug ticket: https://github.com/DFDLSchemas/PCAP/pull/10
>
> The schema for my test is as follows:
>
>     <dfdl:defineVariable name="remainingAddr" type="xs:string" dfdlx:direction="unparseOnly" />
>     <xs:element name="root">
>       <xs:complexType>
>         <xs:sequence>
>           <xs:annotation>
>             <xs:appinfo source="http://www.ogf.org/dfdl/">
>               <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ ex:IPsrc }" />
>             </xs:appinfo>
>           </xs:annotation>
>           <xs:element name="byte1" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>           <xs:sequence>
>             <xs:annotation>
>               <xs:appinfo source="http://www.ogf.org/dfdl/">
>                 <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
>               </xs:appinfo>
>             </xs:annotation>
>             <xs:element name="byte2" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>             <xs:sequence>
>               <xs:annotation>
>                 <xs:appinfo source="http://www.ogf.org/dfdl/">
>                   <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
>                 </xs:appinfo>
>               </xs:annotation>
>               <xs:element name="byte3" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>               <xs:sequence>
>                 <xs:annotation>
>                   <xs:appinfo source="http://www.ogf.org/dfdl/">
>                     <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
>                   </xs:appinfo>
>                 </xs:annotation>
>                 <xs:element name="byte4" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>               </xs:sequence>
>             </xs:sequence>
>           </xs:sequence>
>           <xs:element name="IPsrc" type="xs:string" dfdl:lengthKind="explicit" dfdl:length="7"
>             dfdl:inputValueCalc="{ fn:concat(../ex:byte1, '.', ../ex:byte2, '.', ../ex:byte3, '.', ../ex:byte4) }" />
>         </xs:sequence>
>       </xs:complexType>
>     </xs:element>
>
> This is pretty much a direct copy of the schema in the pull request that on parse will parse 4 bytes and then use an inputValueCalc to combine the 4 bytes into an IP address. On unparse outputValueCalc's is used to pull apart the combined address back into the individual bytes. During unparse, the byte* elements do a forward reference to IPsrc and this seems to be causing a problem in my test. I'm getting a "Unparse Error: Expression Evaluation Error: Child element {http://example.com}IPsrc does not exist".
>
> So, my question regarding the test is should this work or am I missing something that is preventing this forward reference from working during unparsing?
>
> Josh Adams
>

Re: Implementing variable "direction" property

Posted by Steve Lawrence <sl...@apache.org>.

This is interesting. Normally, newVariableInstance/setVariable are not
allowed to have forward referencing expressions. This makes sense
because expression evaluated during parse must only be backwards
referencing. And normally the same newVariableInstance/setVariable
expressions are use during parse and unparse.

But with this new direction concept, unparseOnly expressions could
theoretically be allowed to be forward referencing, similar to how
outputValueCalc are allowed to be forward referencing. They are both
unparse only, so is fine.

But this means an unparseOnly NVI/SV expression evaluation must be
suspendable. And this means variables must have a concept of "I've been
set, but don't have a value yet because I'm waiting for some infoset
element to show up".

And this leads to a cascade of interesting behaviors. For example,
because our newVariableInstance must suspend, the OVC's that access that
variable must suspend when they try to use it but it has no value yet.
Fortnately, OVC's can already suspend, so I suspect this shouldn't be
too hard to add logic to suspend on "variable set but no value yet".

Things might also get tricky because variable state is mutable. When we
suspend a NVI/SV, we have to remember what instance was of that variable
we suspended at. And accesses to those (such as OVC) must also remember
what instance was accessed. I suspect this can all be made to work the
suspension and UState clones, but certainly complicates logic, and at
the very least might be worth considering if there is a different
approach to "direction" of variables.

On 12/21/20 9:54 AM, Adams, Joshua wrote:
> I've been working on DAFFODIL-2429, which is adding a "direction" property to defineVariable. I believe I have most of the implementation in place. It is primarily implemented as follows:
> 
> dfdl:defineVariable has a property "dfdlx:direction" which is an enumeration with the following values: "parseOnly", "unparseOnly", or "both" which is the default value.
> 
> When we are compiling the schema and are about to generate a SetVariabler or NewVariableInstance parser/unparser, we check the "direction" property and if the direction does not match (ie. Creating a SetVariable parser when the variable in question is "unparseOnly") we instead create a NadaParser.
> 
> I'm not 100% sure that this is necessarily the correct approach, but it doesn't break any existing tests. Speaking of tests, I am attempting to create a test to demonstrate this feature based off the pull request mentioned in the bug ticket: https://github.com/DFDLSchemas/PCAP/pull/10
> 
> The schema for my test is as follows:
> 
>     <dfdl:defineVariable name="remainingAddr" type="xs:string" dfdlx:direction="unparseOnly" />
>     <xs:element name="root">
>       <xs:complexType>
>         <xs:sequence>
>           <xs:annotation>
>             <xs:appinfo source="http://www.ogf.org/dfdl/">
>               <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ ex:IPsrc }" />
>             </xs:appinfo>
>           </xs:annotation>
>           <xs:element name="byte1" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>           <xs:sequence>
>             <xs:annotation>
>               <xs:appinfo source="http://www.ogf.org/dfdl/">
>                 <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
>               </xs:appinfo>
>             </xs:annotation>
>             <xs:element name="byte2" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>             <xs:sequence>
>               <xs:annotation>
>                 <xs:appinfo source="http://www.ogf.org/dfdl/">
>                   <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
>                 </xs:appinfo>
>               </xs:annotation>
>               <xs:element name="byte3" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>               <xs:sequence>
>                 <xs:annotation>
>                   <xs:appinfo source="http://www.ogf.org/dfdl/">
>                     <dfdl:newVariableInstance ref="ex:remainingAddr" defaultValue="{ fn:substring-after($ex:remainingAddr, '.') }" />
>                   </xs:appinfo>
>                 </xs:annotation>
>                 <xs:element name="byte4" type="xs:unsignedByte" dfdl:outputValueCalc="{ xs:unsignedByte(fn:substring-before($ex:remainingAddr, '.')) }" />
>               </xs:sequence>
>             </xs:sequence>
>           </xs:sequence>
>           <xs:element name="IPsrc" type="xs:string" dfdl:lengthKind="explicit" dfdl:length="7"
>             dfdl:inputValueCalc="{ fn:concat(../ex:byte1, '.', ../ex:byte2, '.', ../ex:byte3, '.', ../ex:byte4) }" />
>         </xs:sequence>
>       </xs:complexType>
>     </xs:element>
> 
> This is pretty much a direct copy of the schema in the pull request that on parse will parse 4 bytes and then use an inputValueCalc to combine the 4 bytes into an IP address. On unparse outputValueCalc's is used to pull apart the combined address back into the individual bytes. During unparse, the byte* elements do a forward reference to IPsrc and this seems to be causing a problem in my test. I'm getting a "Unparse Error: Expression Evaluation Error: Child element {http://example.com}IPsrc does not exist".
> 
> So, my question regarding the test is should this work or am I missing something that is preventing this forward reference from working during unparsing?
> 
> Josh Adams
>