You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2022/11/12 20:58:46 UTC

[GitHub] [daffodil-site] tuxji opened a new pull request, #98: Update cli and runtime2-todos

tuxji opened a new pull request, #98:
URL: https://github.com/apache/daffodil-site/pull/98

   Daffodil 3.4.0 has added a new CLI test option and fixed some runtime2 todos, so it's time to update these website pages.
   
   site/cli.md: Add the new CLI test -I daffodilC option.
   
   site/dev/design-notes/runtime2-todos.adoc: Remove todos fixed in 3.4.0 (Improve TDML Runner, C struct/field name collisions, Floating point numbers).  Reorder remaining todos (they were not sorted by any criteria before, now sort by most useful for DFDL developers).
   
   DAFFODIL-2748


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-site] mbeckerle commented on a diff in pull request #98: Update cli and runtime2-todos

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on code in PR #98:
URL: https://github.com/apache/daffodil-site/pull/98#discussion_r1035005523


##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs
+for use cases with very large arrays.
+An array element's ERD should have minOccurs and maxOccurs
+where minOccurs is unsigned
+and maxOccurs is signed with -1 meaning "unbounded".
+The actual number of children in an array instance
+would have to be stored with the array instance
+in the C struct or the ERD.
+An array node has to be a different kind of infoset node
+with a place for this number of actual children to be stored.
+Probably all ERDs should just get minOccurs and maxOccurs
+and a scalar is just one with 1, 1 as those values,
+an optional element is 0, 1,
+and an array is all other legal combinations
+like N, -1 and N, and M with N<=M.
+A restriction that minOccurs is 0, 1,
+or equal to maxOccurs (which is not -1)
+is acceptable.
+A restriction that maxOccurs is 1, -1,
+or equal to minOccurs
+is also fine
+(means variable-length arrays always have unbounded number of elements).

Review Comment:
   
   For dfdl:occursCountKind 'expression', should we check min/max occurs? I think that's a question of validation or not. I think data is well-formed for OCK expression no matter what, so I see no reason to enforce min/max occurs.  
   
   I think validation should always be optional. It is valuable to be able to parse sell-formed but invalid data. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-site] mbeckerle commented on a diff in pull request #98: Update cli and runtime2-todos

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on code in PR #98:
URL: https://github.com/apache/daffodil-site/pull/98#discussion_r1035006393


##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -109,141 +211,26 @@ and try the parse call again as an attempt
 to resynchronize with a correct data stream
 after a bunch of failures.
 
-Note that we actually run the generated code in an embedded processor
+Note that we sometimes run the generated code in an embedded processor
 and call our own fread/frwrite functions
 which replace the stdio fread/fwrite functions
 since the C code runs bare metal without OS functions.
-We can implement fseek but we should have a good use case.
-
-=== Javadoc-like tool for C code
-
-We should consider adopting one of the javadoc-like tools for C code
-and structuring our comments that way.
+We can implement the fseek function on the embedded processor too
+but we would need a good use case requiring recovering after errors.
 
 === Validate "fixed" values in runtime1 too
 
 If we change runtime1 to validate "fixed" values
 like runtime2 does, then we can resolve 
 https://issues.apache.org/jira/browse/DAFFODIL-117[DAFFODIL-117].
 
-=== Improve TDML Runner
-
-We want to improve the TDML Runner
-to make it easier to run TDML tests
-with both runtime1 and runtime2.
-We want to eliminate the need
-to configure a `daf:tdmlImplementation` tunable
-in the TDML test using 12 lines of code.
-
-I had an initial idea which was that
-the TDML Runner could run both runtime1 and runtime2 
-automatically (in parallel or serially)
-if it sees a TDML root attribute
-saying `defaultImplementations="daffodil daffodil-runtime2"`
-or a parser/unparseTestCase attribute
-saying `implementations="daffodil daffodil-runtime2"`.
-To make running the same test on runtime1/runtime2 easier
-we also could add an implementation attribute
-to tdml:errors/warnings elements
-saying which implementation they are for
-and tell the TDML Runner to check errors/warnings
-for runtime2 as well as runtime1.
-
-Then I had another idea which might be easier to implement.
-If we could find a way to set Daffodil's tdmlImplementation tunable
-using a command line option or environment variable
-or some other way to change TDML Runner's behavior
-when running both "sbt test" and "daffodil test"
-then we could simply run "sbt test" or "daffodil test" twice
-(first using runtime1 and then using runtime2)
-in order to verify all the cross tests work on both.
-I think this way would be easier than making TDML Runner
-automatically run all the implementations it can find
-in parallel or serially when running cross tests.
-
-If the second idea works as I hope it does,
-then we can start the process of adding "daffodil-runtime2"
-to some of the cross tests we have for daffodil and ibm.
-We also chould change ibm's ProcessFactory class
-to have a different name than daffodil's ProcessFactory class
-and update TDML Runner's match expression to use the new class name.
-Then some developers could add the ibmDFDLCrossTester plugin
-to their daffodil checkout permanently
-instead of having to do & undo that change
-each time they want to run daffodil/ibm cross tests.
-
-=== C struct/field name collisions
-
-To avoid possible name collisions,
-we should prepend struct names and field names with namespace prefixes
-if their infoset elements have non-null namespace prefixes.
-Alternatively, we may need to use enclosing elements' names
-as prefixes to avoid name collisions without namespaces.
-
-=== Anonymous/multiple choice groups
-
-We already handle elements having xs:choice complex types.
-In addition, we should support anonymous/multiple choice groups.
-We may need to refine the choice runtime structure
-in order to allow multiple choice groups
-to be inlined into parent elements.
-Here is an example schema
-and corresponding C code to demonstrate:
-
-[source,xml]
-----
-  <xs:complexType name="NestedUnionType">
-    <xs:sequence>
-      <xs:element name="first_tag" type="idl:int32"/>
-      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
-        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
-        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
-      </xs:choice>
-      <xs:element name="second_tag" type="idl:int32"/>
-      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
-        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
-        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
-      </xs:choice>
-    </xs:sequence>
-  </xs:complexType>
-----
-
-[source,c]
-----
-typedef struct NestedUnion
-{
-    InfosetBase _base;
-    int32_t     first_tag;
-    size_t      _choice_1; // choice of which union field to use
-    union
-    {
-        foo foo;
-        bar bar;
-    };
-    int32_t     second_tag;
-    size_t      _choice_2; // choice of which union field to use
-    union
-    {
-        fie fie;
-        fum fum;
-    };
-} NestedUnion;
-----
-
-=== Choice dispatch key expressions
-
-We currently support only a very restricted
-and simple subset of choice dispatch key expressions.
-We would like to refactor the DPath expression compiler
-and make it generate C code
-in order to support arbitrary choice dispatch key expressions.
-
 === No match between choice dispatch key and choice branch keys
 
-Right now c-daffodil is more strict than scala-daffodil
+Right now c/daffodil is more strict than daffodil
 when unparsing infoset XML files with no matches (or mismatches)
 between choice dispatch keys and branch keys.
-Perhaps c-daffodil should load such an XML file
+Such a situation always makes c/daffodil exit with an error.
+Perhaps c/daffodil should load such an XML file

Review Comment:
   Agree. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-site] mbeckerle commented on a diff in pull request #98: Update cli and runtime2-todos

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on code in PR #98:
URL: https://github.com/apache/daffodil-site/pull/98#discussion_r1035000296


##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs
+for use cases with very large arrays.
+An array element's ERD should have minOccurs and maxOccurs
+where minOccurs is unsigned
+and maxOccurs is signed with -1 meaning "unbounded".
+The actual number of children in an array instance
+would have to be stored with the array instance
+in the C struct or the ERD.
+An array node has to be a different kind of infoset node
+with a place for this number of actual children to be stored.
+Probably all ERDs should just get minOccurs and maxOccurs
+and a scalar is just one with 1, 1 as those values,

Review Comment:
   Right now, our infoset objects store an ERD pointer in each infoset item. This makes it possible to point at any one item, and interpret it correctly (e.g, convert that one item to XML for visualization/debug) because you can indirect over to the ERD to get all the static info.
   
   But this takes up space, and in most cases, what ERD is needed for a child item is static information of the enclosing parent, so could be information stored only in the parent item's ERD. And inductively then, most infoset items would not have ERDs stored in them, rather the ERD "nest" is all static info. 
   
   That's the idea I was exploring. Can we get rid of infoset items carrying ERD pointers?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-site] tuxji commented on a diff in pull request #98: Update cli and runtime2-todos

Posted by GitBox <gi...@apache.org>.
tuxji commented on code in PR #98:
URL: https://github.com/apache/daffodil-site/pull/98#discussion_r1036508892


##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs

Review Comment:
   I've revised this section with a better understanding of how to handle dynamically sized arrays with dfdl:occursCountKind = "expression" and dfdl:occursCount = { ../count }.



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs
+for use cases with very large arrays.
+An array element's ERD should have minOccurs and maxOccurs
+where minOccurs is unsigned
+and maxOccurs is signed with -1 meaning "unbounded".
+The actual number of children in an array instance
+would have to be stored with the array instance
+in the C struct or the ERD.
+An array node has to be a different kind of infoset node
+with a place for this number of actual children to be stored.
+Probably all ERDs should just get minOccurs and maxOccurs
+and a scalar is just one with 1, 1 as those values,

Review Comment:
   I understand what you mean better, and I've added a new "Making infosets more efficient" section with the idea you expressed.  



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs
+for use cases with very large arrays.
+An array element's ERD should have minOccurs and maxOccurs
+where minOccurs is unsigned
+and maxOccurs is signed with -1 meaning "unbounded".
+The actual number of children in an array instance
+would have to be stored with the array instance
+in the C struct or the ERD.
+An array node has to be a different kind of infoset node
+with a place for this number of actual children to be stored.
+Probably all ERDs should just get minOccurs and maxOccurs
+and a scalar is just one with 1, 1 as those values,
+an optional element is 0, 1,
+and an array is all other legal combinations
+like N, -1 and N, and M with N<=M.
+A restriction that minOccurs is 0, 1,
+or equal to maxOccurs (which is not -1)
+is acceptable.
+A restriction that maxOccurs is 1, -1,
+or equal to minOccurs
+is also fine
+(means variable-length arrays always have unbounded number of elements).

Review Comment:
   I agree, and I've made the "Arrays" section say the normal case should be no validation since Daffodil must not enforce min/maxOccurs if the user wants to parse and unparse well-formed but invalid data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-site] tuxji commented on a diff in pull request #98: Update cli and runtime2-todos

Posted by GitBox <gi...@apache.org>.
tuxji commented on code in PR #98:
URL: https://github.com/apache/daffodil-site/pull/98#discussion_r1029304721


##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs

Review Comment:
   Do you think array ERDs might also help us implement dynamically sized arrays with dfdl:occursCountKind = "expression" and dfdl:occursCount = { ../count } in generated C code?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-site] tuxji merged pull request #98: Update cli and runtime2-todos

Posted by GitBox <gi...@apache.org>.
tuxji merged PR #98:
URL: https://github.com/apache/daffodil-site/pull/98


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-site] mbeckerle commented on a diff in pull request #98: Update cli and runtime2-todos

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on code in PR #98:
URL: https://github.com/apache/daffodil-site/pull/98#discussion_r1024559443


##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use

Review Comment:
   Why are these size_t ? Not at all obvious that a type intended to hold sizes is the right type for these. 
   
   union_tag_t maybe?



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.

Review Comment:
   by "size of all its elements" you mean in total? or of each of its elements? I think you mean the total size since each child element could be a different size (if they are variable length).



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -109,141 +211,26 @@ and try the parse call again as an attempt
 to resynchronize with a correct data stream
 after a bunch of failures.
 
-Note that we actually run the generated code in an embedded processor
+Note that we sometimes run the generated code in an embedded processor
 and call our own fread/frwrite functions
 which replace the stdio fread/fwrite functions
 since the C code runs bare metal without OS functions.
-We can implement fseek but we should have a good use case.
-
-=== Javadoc-like tool for C code
-
-We should consider adopting one of the javadoc-like tools for C code
-and structuring our comments that way.
+We can implement the fseek function on the embedded processor too
+but we would need a good use case requiring recovering after errors.
 
 === Validate "fixed" values in runtime1 too
 
 If we change runtime1 to validate "fixed" values
 like runtime2 does, then we can resolve 
 https://issues.apache.org/jira/browse/DAFFODIL-117[DAFFODIL-117].
 
-=== Improve TDML Runner
-
-We want to improve the TDML Runner
-to make it easier to run TDML tests
-with both runtime1 and runtime2.
-We want to eliminate the need
-to configure a `daf:tdmlImplementation` tunable
-in the TDML test using 12 lines of code.
-
-I had an initial idea which was that
-the TDML Runner could run both runtime1 and runtime2 
-automatically (in parallel or serially)
-if it sees a TDML root attribute
-saying `defaultImplementations="daffodil daffodil-runtime2"`
-or a parser/unparseTestCase attribute
-saying `implementations="daffodil daffodil-runtime2"`.
-To make running the same test on runtime1/runtime2 easier
-we also could add an implementation attribute
-to tdml:errors/warnings elements
-saying which implementation they are for
-and tell the TDML Runner to check errors/warnings
-for runtime2 as well as runtime1.
-
-Then I had another idea which might be easier to implement.
-If we could find a way to set Daffodil's tdmlImplementation tunable
-using a command line option or environment variable
-or some other way to change TDML Runner's behavior
-when running both "sbt test" and "daffodil test"
-then we could simply run "sbt test" or "daffodil test" twice
-(first using runtime1 and then using runtime2)
-in order to verify all the cross tests work on both.
-I think this way would be easier than making TDML Runner
-automatically run all the implementations it can find
-in parallel or serially when running cross tests.
-
-If the second idea works as I hope it does,
-then we can start the process of adding "daffodil-runtime2"
-to some of the cross tests we have for daffodil and ibm.
-We also chould change ibm's ProcessFactory class
-to have a different name than daffodil's ProcessFactory class
-and update TDML Runner's match expression to use the new class name.
-Then some developers could add the ibmDFDLCrossTester plugin
-to their daffodil checkout permanently
-instead of having to do & undo that change
-each time they want to run daffodil/ibm cross tests.
-
-=== C struct/field name collisions
-
-To avoid possible name collisions,
-we should prepend struct names and field names with namespace prefixes
-if their infoset elements have non-null namespace prefixes.
-Alternatively, we may need to use enclosing elements' names
-as prefixes to avoid name collisions without namespaces.
-
-=== Anonymous/multiple choice groups
-
-We already handle elements having xs:choice complex types.
-In addition, we should support anonymous/multiple choice groups.
-We may need to refine the choice runtime structure
-in order to allow multiple choice groups
-to be inlined into parent elements.
-Here is an example schema
-and corresponding C code to demonstrate:
-
-[source,xml]
-----
-  <xs:complexType name="NestedUnionType">
-    <xs:sequence>
-      <xs:element name="first_tag" type="idl:int32"/>
-      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
-        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
-        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
-      </xs:choice>
-      <xs:element name="second_tag" type="idl:int32"/>
-      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
-        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
-        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
-      </xs:choice>
-    </xs:sequence>
-  </xs:complexType>
-----
-
-[source,c]
-----
-typedef struct NestedUnion
-{
-    InfosetBase _base;
-    int32_t     first_tag;
-    size_t      _choice_1; // choice of which union field to use
-    union
-    {
-        foo foo;
-        bar bar;
-    };
-    int32_t     second_tag;
-    size_t      _choice_2; // choice of which union field to use
-    union
-    {
-        fie fie;
-        fum fum;
-    };
-} NestedUnion;
-----
-
-=== Choice dispatch key expressions
-
-We currently support only a very restricted
-and simple subset of choice dispatch key expressions.
-We would like to refactor the DPath expression compiler
-and make it generate C code
-in order to support arbitrary choice dispatch key expressions.
-
 === No match between choice dispatch key and choice branch keys
 
-Right now c-daffodil is more strict than scala-daffodil
+Right now c/daffodil is more strict than daffodil
 when unparsing infoset XML files with no matches (or mismatches)
 between choice dispatch keys and branch keys.
-Perhaps c-daffodil should load such an XML file
+Such a situation always makes c/daffodil exit with an error.
+Perhaps c/daffodil should load such an XML file

Review Comment:
   Agreed. The choiceDispatchKey should not be evaluated at unparse time. Only at parse time. 
   
   dfdl:outputValueCalc is needed to update elements such that on a re-parse, the choiceDispatchKey value *will be* the right one matching the unparsed choice branch. But that's not automatic. Schema authors must write the dfdl:outputValueCalcs to ensure this. 



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -52,7 +153,8 @@ coursier picks up the new directories version,
 sbt picks up the new coursier version,
 and daffodil picks up the new sbt version,
 before we can remove the "echo >> $GITHUB_ENV" lines
-from .github/workflows/main.yml.
+from .github/workflows/main.yml
+which prevent the sbt hanging problem.

Review Comment:
   Below is discussion of backtracking. I didn't think we were going to allow backtracking or forward speculation. Deterministic subset of DFDL being the target. This means parse errors are fatal. 
   
   Is this something you are thinking we should revisit?
   



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.

Review Comment:
   Not so sure. 
   
   I'm perfectly fine with disallowing anonymous choices. 
   @stevedlawrence and I were looking at DFDL integration for other systems like Apache Drill, NiFi, Avro, etc. and they generally do not allow anonymous choices. Hence, any DFDL schema that has anonymous choices doesn't integrate well with any of those unless we generate a child element with a generated name, and that makes paths awkward, etc. 
   
   Maybe we just say the Runtime2 DFDL subset doesn't allow anonymous choices?



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs

Review Comment:
   Agreed. We really want arrays to be efficient. No per-element overhead for ERDs. 
   
   This means an "array child" (child within an array) has a distinct representation from an element child. 
   
   Or maybe the general case is arrays, and scalar children are treated as array children with min/maxOccurs 1. 



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs
+for use cases with very large arrays.
+An array element's ERD should have minOccurs and maxOccurs
+where minOccurs is unsigned
+and maxOccurs is signed with -1 meaning "unbounded".
+The actual number of children in an array instance
+would have to be stored with the array instance
+in the C struct or the ERD.
+An array node has to be a different kind of infoset node
+with a place for this number of actual children to be stored.
+Probably all ERDs should just get minOccurs and maxOccurs
+and a scalar is just one with 1, 1 as those values,
+an optional element is 0, 1,
+and an array is all other legal combinations
+like N, -1 and N, and M with N<=M.
+A restriction that minOccurs is 0, 1,
+or equal to maxOccurs (which is not -1)
+is acceptable.
+A restriction that maxOccurs is 1, -1,
+or equal to minOccurs
+is also fine
+(means variable-length arrays always have unbounded number of elements).

Review Comment:
   I'm fine with this restriction as well. 



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs
+for use cases with very large arrays.
+An array element's ERD should have minOccurs and maxOccurs
+where minOccurs is unsigned
+and maxOccurs is signed with -1 meaning "unbounded".
+The actual number of children in an array instance
+would have to be stored with the array instance
+in the C struct or the ERD.
+An array node has to be a different kind of infoset node
+with a place for this number of actual children to be stored.
+Probably all ERDs should just get minOccurs and maxOccurs
+and a scalar is just one with 1, 1 as those values,

Review Comment:
   Ah. Same thought. 
   
   What crosses my mind now, is that all these ERDs, even tough it is one per array, are mostly all statically known in the metadata (ERD) of the enclosing parent element. 
   
   I feel like we should be storing more ERD pointers in parent ERDs and not needing them in instance info in more cases. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-site] tuxji commented on a diff in pull request #98: Update cli and runtime2-todos

Posted by GitBox <gi...@apache.org>.
tuxji commented on code in PR #98:
URL: https://github.com/apache/daffodil-site/pull/98#discussion_r1029321583


##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs
+for use cases with very large arrays.
+An array element's ERD should have minOccurs and maxOccurs
+where minOccurs is unsigned
+and maxOccurs is signed with -1 meaning "unbounded".
+The actual number of children in an array instance
+would have to be stored with the array instance
+in the C struct or the ERD.
+An array node has to be a different kind of infoset node
+with a place for this number of actual children to be stored.
+Probably all ERDs should just get minOccurs and maxOccurs
+and a scalar is just one with 1, 1 as those values,
+an optional element is 0, 1,
+and an array is all other legal combinations
+like N, -1 and N, and M with N<=M.
+A restriction that minOccurs is 0, 1,
+or equal to maxOccurs (which is not -1)
+is acceptable.
+A restriction that maxOccurs is 1, -1,
+or equal to minOccurs
+is also fine
+(means variable-length arrays always have unbounded number of elements).

Review Comment:
   Hmm, I must have written that down with the intention of simplifying the C code.  However, I now think that all Daffodil implementations should enforce minOccurs <= count <= maxOccurs if the schema author picks a non-zero maxOccurs for security.  If a network protocol wants to say this element array can hold from 0 to 16 integers, we should check these bounds if the runtime2 DFDL subset supports variable length arrays at all, yes?  



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.

Review Comment:
   Yes, total size of all its elements makes more sense.



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.

Review Comment:
   I'm fine with that interpretation too.  I've revised this todo to say if you write your schema with anonymous choices, it won't be allowed, and you will have to replace your anonymous choices with named elements holding choice groups instead.



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -52,7 +153,8 @@ coursier picks up the new directories version,
 sbt picks up the new coursier version,
 and daffodil picks up the new sbt version,
 before we can remove the "echo >> $GITHUB_ENV" lines
-from .github/workflows/main.yml.
+from .github/workflows/main.yml
+which prevent the sbt hanging problem.

Review Comment:
   No, I believe you're right that a deterministic subset of DFDL is still the target for runtime2.  I don't think we are ever going to need backtracking or forward speculation for typical binary network protocols, since these protocols are designed for programs to read them easily.  It seems only certain text and binary file formats will need DFDL's full power (especially text since text can be more ambiguous than binary).



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs

Review Comment:
   Do you think array ERDs might also help us implement dynamically sized arrays with dfdl:lengthKind = "expression" and dfdl:count = { ../count } in generated C code?



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -109,141 +211,26 @@ and try the parse call again as an attempt
 to resynchronize with a correct data stream
 after a bunch of failures.
 
-Note that we actually run the generated code in an embedded processor
+Note that we sometimes run the generated code in an embedded processor
 and call our own fread/frwrite functions
 which replace the stdio fread/fwrite functions
 since the C code runs bare metal without OS functions.
-We can implement fseek but we should have a good use case.
-
-=== Javadoc-like tool for C code
-
-We should consider adopting one of the javadoc-like tools for C code
-and structuring our comments that way.
+We can implement the fseek function on the embedded processor too
+but we would need a good use case requiring recovering after errors.
 
 === Validate "fixed" values in runtime1 too
 
 If we change runtime1 to validate "fixed" values
 like runtime2 does, then we can resolve 
 https://issues.apache.org/jira/browse/DAFFODIL-117[DAFFODIL-117].
 
-=== Improve TDML Runner
-
-We want to improve the TDML Runner
-to make it easier to run TDML tests
-with both runtime1 and runtime2.
-We want to eliminate the need
-to configure a `daf:tdmlImplementation` tunable
-in the TDML test using 12 lines of code.
-
-I had an initial idea which was that
-the TDML Runner could run both runtime1 and runtime2 
-automatically (in parallel or serially)
-if it sees a TDML root attribute
-saying `defaultImplementations="daffodil daffodil-runtime2"`
-or a parser/unparseTestCase attribute
-saying `implementations="daffodil daffodil-runtime2"`.
-To make running the same test on runtime1/runtime2 easier
-we also could add an implementation attribute
-to tdml:errors/warnings elements
-saying which implementation they are for
-and tell the TDML Runner to check errors/warnings
-for runtime2 as well as runtime1.
-
-Then I had another idea which might be easier to implement.
-If we could find a way to set Daffodil's tdmlImplementation tunable
-using a command line option or environment variable
-or some other way to change TDML Runner's behavior
-when running both "sbt test" and "daffodil test"
-then we could simply run "sbt test" or "daffodil test" twice
-(first using runtime1 and then using runtime2)
-in order to verify all the cross tests work on both.
-I think this way would be easier than making TDML Runner
-automatically run all the implementations it can find
-in parallel or serially when running cross tests.
-
-If the second idea works as I hope it does,
-then we can start the process of adding "daffodil-runtime2"
-to some of the cross tests we have for daffodil and ibm.
-We also chould change ibm's ProcessFactory class
-to have a different name than daffodil's ProcessFactory class
-and update TDML Runner's match expression to use the new class name.
-Then some developers could add the ibmDFDLCrossTester plugin
-to their daffodil checkout permanently
-instead of having to do & undo that change
-each time they want to run daffodil/ibm cross tests.
-
-=== C struct/field name collisions
-
-To avoid possible name collisions,
-we should prepend struct names and field names with namespace prefixes
-if their infoset elements have non-null namespace prefixes.
-Alternatively, we may need to use enclosing elements' names
-as prefixes to avoid name collisions without namespaces.
-
-=== Anonymous/multiple choice groups
-
-We already handle elements having xs:choice complex types.
-In addition, we should support anonymous/multiple choice groups.
-We may need to refine the choice runtime structure
-in order to allow multiple choice groups
-to be inlined into parent elements.
-Here is an example schema
-and corresponding C code to demonstrate:
-
-[source,xml]
-----
-  <xs:complexType name="NestedUnionType">
-    <xs:sequence>
-      <xs:element name="first_tag" type="idl:int32"/>
-      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
-        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
-        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
-      </xs:choice>
-      <xs:element name="second_tag" type="idl:int32"/>
-      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
-        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
-        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
-      </xs:choice>
-    </xs:sequence>
-  </xs:complexType>
-----
-
-[source,c]
-----
-typedef struct NestedUnion
-{
-    InfosetBase _base;
-    int32_t     first_tag;
-    size_t      _choice_1; // choice of which union field to use
-    union
-    {
-        foo foo;
-        bar bar;
-    };
-    int32_t     second_tag;
-    size_t      _choice_2; // choice of which union field to use
-    union
-    {
-        fie fie;
-        fum fum;
-    };
-} NestedUnion;
-----
-
-=== Choice dispatch key expressions
-
-We currently support only a very restricted
-and simple subset of choice dispatch key expressions.
-We would like to refactor the DPath expression compiler
-and make it generate C code
-in order to support arbitrary choice dispatch key expressions.
-
 === No match between choice dispatch key and choice branch keys
 
-Right now c-daffodil is more strict than scala-daffodil
+Right now c/daffodil is more strict than daffodil
 when unparsing infoset XML files with no matches (or mismatches)
 between choice dispatch keys and branch keys.
-Perhaps c-daffodil should load such an XML file
+Such a situation always makes c/daffodil exit with an error.
+Perhaps c/daffodil should load such an XML file

Review Comment:
   I take it that you agree the C unparse code should process an XML infoset without any "no match" errors, even if the choiceDispatchKey is invalid.  I also take it that the C unparse code should not modify the choiceDispatchKey either.  If the schema writer wants to enforce that the choiceDispatchKey is the right one matching the unparsed choice branch, the writer must write the dfdl:outputValueCalc to ensure this.



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use

Review Comment:
   Yes, you're right that it's actually not obvious whether an index (which is what _choice represents) should be signed or unsigned.  I had thought _choice should be unsigned to avoid cutting the usable range in half and it should be size_t because size_t is the maximum allowable length of any type of C array.  However, I've googled and found out that people have equally compelling reasons why indices should be signed instead of unsigned as well (<https://www.quora.com/Why-is-size_t-sometimes-used-instead-of-int-for-declaring-an-array-index-in-C-Is-there-any-difference>).  There appears to be no One Right Answer what type _choice should have and using `choice_t` would allow us to change the definition in one place if we needed to.  I've added a todo about using `choice_t` instead of `size_t`.



##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs
+for use cases with very large arrays.
+An array element's ERD should have minOccurs and maxOccurs
+where minOccurs is unsigned
+and maxOccurs is signed with -1 meaning "unbounded".
+The actual number of children in an array instance
+would have to be stored with the array instance
+in the C struct or the ERD.
+An array node has to be a different kind of infoset node
+with a place for this number of actual children to be stored.
+Probably all ERDs should just get minOccurs and maxOccurs
+and a scalar is just one with 1, 1 as those values,

Review Comment:
   What do you mean by storing more ERD pointers in parent ERDs?  I've been thinking of minOccurs, maxOccurs, count, but pointers to where?  Dynamically sized arrays instance info, if more than a handful, won't fit into C structs and would have to be dynamically allocated, but that's instance info, not ERD metadata.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-site] mbeckerle commented on a diff in pull request #98: Update cli and runtime2-todos

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on code in PR #98:
URL: https://github.com/apache/daffodil-site/pull/98#discussion_r1034994553


##########
site/dev/design-notes/runtime2-todos.adoc:
##########
@@ -36,7 +36,108 @@ If someone wants to help
 please let the mailto:dev@daffodil.apache.org[dev] list know
 in order to avoid duplication.
 
-=== Report hanging problem running sbt (really dev.dirs) from MSYS2 on Windows
+=== Anonymous/multiple choice groups
+
+We already handle elements having xs:choice complex types.
+In addition, we should support anonymous/multiple choice groups.
+We may need to refine the choice runtime structure
+in order to allow multiple choice groups
+to be inlined into parent elements.
+Here is an example schema
+and corresponding C code to demonstrate:
+
+[source,xml]
+----
+  <xs:complexType name="NestedUnionType">
+    <xs:sequence>
+      <xs:element name="first_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}">
+        <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/>
+        <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/>
+      </xs:choice>
+      <xs:element name="second_tag" type="idl:int32"/>
+      <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}">
+        <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/>
+        <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/>
+      </xs:choice>
+    </xs:sequence>
+  </xs:complexType>
+----
+
+[source,c]
+----
+typedef struct NestedUnion
+{
+    InfosetBase _base;
+    int32_t     first_tag;
+    size_t      _choice_1; // choice of which union field to use
+    union
+    {
+        foo foo;
+        bar bar;
+    };
+    int32_t     second_tag;
+    size_t      _choice_2; // choice of which union field to use
+    union
+    {
+        fie fie;
+        fum fum;
+    };
+} NestedUnion;
+----
+
+=== Arrays
+
+Instead of expanding arrays inline within childrenERDs,
+we may want to store a single entry
+for an array in childrenERDs
+giving the array's offset and size of all its elements.
+We would have to write code
+for special case treatment of array member fields
+versus scalar member fields
+but we could save space/memory in childrenERDs

Review Comment:
   I can't really comment on whether array ERDs help with that. I just think that for this runtime we really do need to make the implementation of arrays very efficient. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org