You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "John Interrante (Jira)" <ji...@apache.org> on 2021/03/02 19:45:00 UTC

[jira] [Updated] (DAFFODIL-2202) Code Gen Framework

     [ https://issues.apache.org/jira/browse/DAFFODIL-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Interrante updated DAFFODIL-2202:
--------------------------------------
    Description: 
We have built an initial C code generator backend for Apache Daffodil. Currently the C code generator can support binary boolean, integer, and real numbers, arrays of simple and complex elements, choice groups using dispatch/branch keys, validation of "fixed" values, and padding of explicit length complex elements with fill bytes. We plan to continue building out the C code generator until it supports a minimal subset of the DFDL 1.0 specification for embedded devices.

Here are some changes which have been requested by collaborators or reviewers so we don't forget them. If someone wants to help (which would be appreciated), please add a comment to this issue or let the dev list know in order to avoid duplication.
h3. C struct/field name collisions

To avoid possible name collisions, we should prepend struct names and field names with namespace prefixes if their infoset elements have non-null namespace prefixes.
h3. Error reporting 

To make runtime2 error messages easier to format and translate for internationalization, we should change the way runtime2 functions report errors to callers. Currently runtime2 functions report errors by returning a non-null pointer to a constant char array (that is, a pointer to a string literal). It would be better to report errors by returning a non-null pointer to an error struct object with member fields initialized to report an error. Only the runtime2 function which prints error messages would need to perform formatting and translation - all the other functions only need to fill in some member fields and return a pointer.

h3. Anonymous/multiple choice groups

In addition to handling elements with xs:choice complex types, we should detect anonymous choice groups and refine the choice runtime structure in order to allow multiple choice groups to be inlined into parent elements. Example schema and corresponding C code:
{code:xml}
  <xs:complexType name="NestedUnionType">
    <xs:sequence>
      <xs:element name="first_tag" type="idl:int32"/>
      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
      </xs:choice>
      <xs:element name="second_tag" type="idl:int32"/>
      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
      </xs:choice>
    </xs:sequence>
  </xs:complexType>
{code}
{code:c}
typedef struct NestedUnion
{
    InfosetBase _base;
    int32_t     first_tag;
    size_t      _choice_1; // choice of which union field to use
    union
    {
        foo foo;
        bar bar;
    };
    int32_t     second_tag;
    size_t      _choice_2; // choice of which union field to use
    union
    {
        fie fie;
        fum fum;
    };
} NestedUnion;
{code}
h3. Choice dispatch key expressions

We currently support only very restricted and simple subset of choice dispatch key expressions. We would like to refactor the DPath expression compiler and make it generate C code in order to support more kinds of choice dispatch key expressions.
h3. No match between choice dispatch key and choice branch keys

Right now c-daffodil is more strict than scala-daffodil when unparsing infoset XML files with no matches (or mismatches) between choice dispatch keys and branch keys. Perhaps c-daffodil should load such an XML file without a no match processing error and unparse the infoset to a binary data file without a no match processing error. We would have to code and call a choice branch resolver in C which peeks at the next XML element, figures out which branch does that element indicate exists inside the choice group, and initializes the choice and element runtime data (_choice and childNode->erd member fields) accordingly. We probably would replace the initChoice() call in walkInfosetNode() with a call to that choice branch resolver and we might not need to call initChoice() in unparseSelf(). When I called initChoice() in all these parse, walk, and unparse places, I was pondering removing the _choice member field and calling initChoice() as a function to tell us which element to visit next, but we probably should have a mutable choice runtime data structure.
h3. Floating point numbers

Right now runtime2 prints floating point numbers in XML infoset files slightly differently than runtime1 does. This means TDML tests may need to use different XML infoset files for different runtimes. We should be able to make the TDML Runner compare floating point numbers numerically, not textually, so that TDML tests won't have to use two different XML infoset files.
h3. Arrays

Instead of expanding arrays inline within childrenERDs, we may want to store a single entry for an array in childrenERDs giving the array's offset and size of all its elements. We would have to write code for special case treatment of array member fields versus scalar member fields but we could save space/memory in childrenERDs for use cases with very large arrays. An array element's ERD should have minOccurs and maxOccurs where minOccurs is unsigned and maxOccurs is signed with -1 meaning "unbounded". The actual number of children in an array instance would have to be stored in the array instance object (where, in the C struct or what?). An array node has to be a different kind of infoset node with a place for this number of actual children to be stored. Probably all ERDs should just get minOccurs and maxOccurs and a scalar is just one with 1, 1 as those values, an optional element is 0,1, and an array is all other legal combinations. N, -1 and N, M with N<=M. A restriction that minOccurs is 0, 1, or equal to maxOccurs (which is not -1) is acceptable. A restriction that maxOccurs is 1, -1, or equal to minOccurs is also fine (means variable-length arrays always have unbounded number of elements.)
h3. Daffodil module/subdirectory names

When Daffodil is ready to move from a 3.x to a 4.x release, rename the modules to have shorter and easier to understand names as discussed in DAFFODIL-2406.

  was:
We have built an initial C code generator backend for Apache Daffodil. Currently the C code generator can support binary boolean, integer, and real numbers, arrays of simple and complex elements, choice groups using dispatch/branch keys, validation of "fixed" values, and padding of explicit length complex elements with fill bytes. We plan to continue building out the C code generator until it supports a minimal subset of the DFDL 1.0 specification for embedded devices.

Here are some changes which have been requested by collaborators or reviewers so we don't forget them. If someone wants to help (which would be appreciated), please add a comment to this issue or let the dev list know in order to avoid duplication.
h3. C struct/field name collisions

To avoid possible name collisions, we should prepend struct names and field names with namespace prefixes if their infoset elements have non-null namespace prefixes.
h3. Anonymous/multiple choice groups

In addition to handling elements with xs:choice complex types, we should detect anonymous choice groups and refine the choice runtime structure in order to allow multiple choice groups to be inlined into parent elements. Example schema and corresponding C code:
{code:xml}
  <xs:complexType name="NestedUnionType">
    <xs:sequence>
      <xs:element name="first_tag" type="idl:int32"/>
      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
      </xs:choice>
      <xs:element name="second_tag" type="idl:int32"/>
      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
      </xs:choice>
    </xs:sequence>
  </xs:complexType>
{code}
{code:c}
typedef struct NestedUnion
{
    InfosetBase _base;
    int32_t     first_tag;
    size_t      _choice_1; // choice of which union field to use
    union
    {
        foo foo;
        bar bar;
    };
    int32_t     second_tag;
    size_t      _choice_2; // choice of which union field to use
    union
    {
        fie fie;
        fum fum;
    };
} NestedUnion;
{code}
h3. Choice dispatch key expressions

We currently support only very restricted and simple subset of choice dispatch key expressions. We would like to refactor the DPath expression compiler and make it generate C code in order to support more kinds of choice dispatch key expressions.
h3. No match between choice dispatch key and choice branch keys

Right now c-daffodil is more strict than scala-daffodil when unparsing infoset XML files with no matches (or mismatches) between choice dispatch keys and branch keys. Perhaps c-daffodil should load such an XML file without a no match processing error and unparse the infoset to a binary data file without a no match processing error. We would have to code and call a choice branch resolver in C which peeks at the next XML element, figures out which branch does that element indicate exists inside the choice group, and initializes the choice and element runtime data (_choice and childNode->erd member fields) accordingly. We probably would replace the initChoice() call in walkInfosetNode() with a call to that choice branch resolver and we might not need to call initChoice() in unparseSelf(). When I called initChoice() in all these parse, walk, and unparse places, I was pondering removing the _choice member field and calling initChoice() as a function to tell us which element to visit next, but we probably should have a mutable choice runtime data structure.
h3. Floating point numbers

Right now runtime2 prints floating point numbers in XML infoset files slightly differently than runtime1 does. This means TDML tests may need to use different XML infoset files for different runtimes. We should be able to make the TDML Runner compare floating point numbers numerically, not textually, so that TDML tests won't have to use two different XML infoset files.
h3. Arrays

Instead of expanding arrays inline within childrenERDs, we may want to store a single entry for an array in childrenERDs giving the array's offset and size of all its elements. We would have to write code for special case treatment of array member fields versus scalar member fields but we could save space/memory in childrenERDs for use cases with very large arrays. An array element's ERD should have minOccurs and maxOccurs where minOccurs is unsigned and maxOccurs is signed with -1 meaning "unbounded". The actual number of children in an array instance would have to be stored in the array instance object (where, in the C struct or what?). An array node has to be a different kind of infoset node with a place for this number of actual children to be stored. Probably all ERDs should just get minOccurs and maxOccurs and a scalar is just one with 1, 1 as those values, an optional element is 0,1, and an array is all other legal combinations. N, -1 and N, M with N<=M. A restriction that minOccurs is 0, 1, or equal to maxOccurs (which is not -1) is acceptable. A restriction that maxOccurs is 1, -1, or equal to minOccurs is also fine (means variable-length arrays always have unbounded number of elements.)
h3. Daffodil module/subdirectory names

When Daffodil is ready to move from a 3.x to a 4.x release, rename the modules to have shorter and easier to understand names as discussed in DAFFODIL-2406.


> Code Gen Framework
> ------------------
>
>                 Key: DAFFODIL-2202
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2202
>             Project: Daffodil
>          Issue Type: Improvement
>          Components: Back End
>    Affects Versions: 2.4.0
>            Reporter: Mike Beckerle
>            Assignee: John Interrante
>            Priority: Minor
>
> We have built an initial C code generator backend for Apache Daffodil. Currently the C code generator can support binary boolean, integer, and real numbers, arrays of simple and complex elements, choice groups using dispatch/branch keys, validation of "fixed" values, and padding of explicit length complex elements with fill bytes. We plan to continue building out the C code generator until it supports a minimal subset of the DFDL 1.0 specification for embedded devices.
> Here are some changes which have been requested by collaborators or reviewers so we don't forget them. If someone wants to help (which would be appreciated), please add a comment to this issue or let the dev list know in order to avoid duplication.
> h3. C struct/field name collisions
> To avoid possible name collisions, we should prepend struct names and field names with namespace prefixes if their infoset elements have non-null namespace prefixes.
> h3. Error reporting 
> To make runtime2 error messages easier to format and translate for internationalization, we should change the way runtime2 functions report errors to callers. Currently runtime2 functions report errors by returning a non-null pointer to a constant char array (that is, a pointer to a string literal). It would be better to report errors by returning a non-null pointer to an error struct object with member fields initialized to report an error. Only the runtime2 function which prints error messages would need to perform formatting and translation - all the other functions only need to fill in some member fields and return a pointer.
> h3. Anonymous/multiple choice groups
> In addition to handling elements with xs:choice complex types, we should detect anonymous choice groups and refine the choice runtime structure in order to allow multiple choice groups to be inlined into parent elements. Example schema and corresponding C code:
> {code:xml}
>   <xs:complexType name="NestedUnionType">
>     <xs:sequence>
>       <xs:element name="first_tag" type="idl:int32"/>
>       <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
>         <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
>         <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
>       </xs:choice>
>       <xs:element name="second_tag" type="idl:int32"/>
>       <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
>         <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
>         <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
>       </xs:choice>
>     </xs:sequence>
>   </xs:complexType>
> {code}
> {code:c}
> typedef struct NestedUnion
> {
>     InfosetBase _base;
>     int32_t     first_tag;
>     size_t      _choice_1; // choice of which union field to use
>     union
>     {
>         foo foo;
>         bar bar;
>     };
>     int32_t     second_tag;
>     size_t      _choice_2; // choice of which union field to use
>     union
>     {
>         fie fie;
>         fum fum;
>     };
> } NestedUnion;
> {code}
> h3. Choice dispatch key expressions
> We currently support only very restricted and simple subset of choice dispatch key expressions. We would like to refactor the DPath expression compiler and make it generate C code in order to support more kinds of choice dispatch key expressions.
> h3. No match between choice dispatch key and choice branch keys
> Right now c-daffodil is more strict than scala-daffodil when unparsing infoset XML files with no matches (or mismatches) between choice dispatch keys and branch keys. Perhaps c-daffodil should load such an XML file without a no match processing error and unparse the infoset to a binary data file without a no match processing error. We would have to code and call a choice branch resolver in C which peeks at the next XML element, figures out which branch does that element indicate exists inside the choice group, and initializes the choice and element runtime data (_choice and childNode->erd member fields) accordingly. We probably would replace the initChoice() call in walkInfosetNode() with a call to that choice branch resolver and we might not need to call initChoice() in unparseSelf(). When I called initChoice() in all these parse, walk, and unparse places, I was pondering removing the _choice member field and calling initChoice() as a function to tell us which element to visit next, but we probably should have a mutable choice runtime data structure.
> h3. Floating point numbers
> Right now runtime2 prints floating point numbers in XML infoset files slightly differently than runtime1 does. This means TDML tests may need to use different XML infoset files for different runtimes. We should be able to make the TDML Runner compare floating point numbers numerically, not textually, so that TDML tests won't have to use two different XML infoset files.
> h3. Arrays
> Instead of expanding arrays inline within childrenERDs, we may want to store a single entry for an array in childrenERDs giving the array's offset and size of all its elements. We would have to write code for special case treatment of array member fields versus scalar member fields but we could save space/memory in childrenERDs for use cases with very large arrays. An array element's ERD should have minOccurs and maxOccurs where minOccurs is unsigned and maxOccurs is signed with -1 meaning "unbounded". The actual number of children in an array instance would have to be stored in the array instance object (where, in the C struct or what?). An array node has to be a different kind of infoset node with a place for this number of actual children to be stored. Probably all ERDs should just get minOccurs and maxOccurs and a scalar is just one with 1, 1 as those values, an optional element is 0,1, and an array is all other legal combinations. N, -1 and N, M with N<=M. A restriction that minOccurs is 0, 1, or equal to maxOccurs (which is not -1) is acceptable. A restriction that maxOccurs is 1, -1, or equal to minOccurs is also fine (means variable-length arrays always have unbounded number of elements.)
> h3. Daffodil module/subdirectory names
> When Daffodil is ready to move from a 3.x to a 4.x release, rename the modules to have shorter and easier to understand names as discussed in DAFFODIL-2406.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)